<SYSTEM>This is the full developer documentation for Autonoma</SYSTEM>

# Introduction

> Learn how to integrate Autonoma's agentic end-to-end testing platform into your application.

Autonoma is an agentic end-to-end testing platform. Users create and run automated tests for web, iOS, and Android applications using natural language. The system executes tests on real devices and emulators, with AI models handling element selection, assertions, self-healing, and agentic decision-making.

## Using these docs with AI

These docs are available as plain text for LLMs. Pass the link below to your coding agent (Claude Code, Cursor, Copilot, etc.) so it can read the full documentation in context:

```plaintext
https://docs.agent.autonoma.app/llms.txt
```

The file links to every page individually, so the model can fetch only what it needs. A single [complete file](/llms-full.txt) with all pages is also available.

## Getting started

There are two paths depending on where you are:

Analyze and test your app

Claude Code plugin that reads your codebase, finds where bugs hide, and generates a full E2E test suite. Install once, run `/autonoma-test-planner:generate-tests`.

[Start the Test Planner →](/test-planner/)

Set up your backend

Connect your app so tests always start with clean, isolated data. One endpoint, automatic teardown.

[Read the guide →](/guides/environment-factory/)

## How Autonoma runs tests

Before each test run:

1. Autonoma calls your endpoint with `action: "up"` and a scenario name
2. Your endpoint creates isolated test data and returns auth credentials
3. Autonoma uses those credentials to log in and run the test
4. After the test, Autonoma calls `action: "down"` with signed refs
5. Your endpoint verifies the signature and deletes only the data it created

The signed refs mechanism ensures teardown can never delete data it didn’t create - even if someone gains access to your endpoint.

## Framework examples

Next.js

Complete working example with Next.js + Prisma + PostgreSQL.

[See the example →](/examples/nextjs/)

React + Vite

Static SPA with a separate Express backend.

[See the example →](/examples/react-vite/)

Elixir/Phoenix

Phoenix + Ecto + JOSE implementation.

[See the example →](/examples/elixir-phoenix/)

TanStack Start

TanStack Start + Drizzle ORM implementation.

[See the example →](/examples/tanstack-start/)

## Contributing

Want to run Autonoma locally or contribute to the project? Start here:

Development Setup

Clone the repo, install dependencies, configure environment variables, and get the platform running locally.

[Get started →](/development/setup/)

Architecture Overview

Understand the monorepo structure, how apps and packages connect, and the key design decisions behind the platform.

[Read the overview →](/development/architecture/)

Code Conventions

ESM-only, strict TypeScript, logging patterns, and all the rules that keep the codebase consistent.

[See conventions →](/development/conventions/)

Architecture Deep Dives

Detailed documentation on the execution agent core and AI primitives package.

[Explore →](/architecture/execution-agent/)

# Step 2: Generate Knowledge Base

> Analyze your codebase to produce AUTONOMA.md after the SDK Integration step has finished.

The knowledge base generator runs **after SDK Integration**. By the time this step starts, the plugin has already verified a working SDK endpoint and can use that fact later when it plans scenarios.

This step analyzes your codebase and produces a user-perspective guide to every important page, flow, and interaction in your application. It also writes navigation skill files that later steps can reuse.

## Prerequisites

* Step 1 SDK Integration must have completed successfully.
* Your application codebase must be available in the workspace.

## What this produces

* `autonoma/AUTONOMA.md`
* `autonoma/skills/*.md`
* `autonoma/features.json`

## What to review

The most important output is the **core flows** table. Core flows are the workflows that receive the heaviest test coverage later in the pipeline.

When reviewing:

* check that the product areas are named the way your team names them
* confirm the true core flows are marked as core
* make sure obvious high-value flows were not missed

If the core flows are wrong, the rest of the suite will be prioritized incorrectly.

# Step 3: Generate Scenarios

> Design the standard, empty, and large test data environments from the knowledge base and the SDK discover artifact.

The scenario generator runs **after SDK Integration and Knowledge Base generation**. It does not expect you to provide the SDK endpoint URL or shared request-signing secret manually anymore. Instead, it consumes the verified endpoint and `discover` artifact produced earlier in the pipeline.

## Prerequisites

* `autonoma/AUTONOMA.md` and `autonoma/skills/` from [Step 2](/test-planner/step-1-knowledge-base/)
* a verified SDK integration from Step 1
* `autonoma/discover.json` captured from that integration

If the SDK endpoint is unavailable, this step stops and the pipeline needs the Step 1 integration/dev server issue fixed first.

## What this produces

* `autonoma/discover.json`
* `autonoma/scenarios.md`

The scenarios file describes three named environments:

* `standard` for realistic day-to-day coverage
* `empty` for onboarding and zero-state flows
* `large` for pagination and high-volume behavior

## What to review

Scenarios are a contract between planning and execution. Review:

* whether the important entities and relationships are represented
* whether fixed names and counts are realistic enough for meaningful assertions
* whether variable fields are marked only where runtime variation is actually required
* whether the three scenarios cover your core flows without over-segmenting the data model

Later stages use these values directly. Wrong names, missing relationships, or unnecessary variable fields will make the generated tests and validation results worse.

# Step 4: Generate E2E Tests

> Generate an E2E test suite as markdown files after the knowledge base and scenarios are ready.

The E2E test generation step consumes the knowledge base and scenarios produced earlier in the pipeline and turns them into natural-language markdown test cases.

## Prerequisites

* `autonoma/AUTONOMA.md` and `autonoma/skills/` from [Step 2](/test-planner/step-1-knowledge-base/)
* `autonoma/scenarios.md` from [Step 3](/test-planner/step-2-scenarios/)

## What this produces

* `autonoma/qa-tests/INDEX.md`
* `autonoma/qa-tests/**/*`

The generated suite is organized by flow and priority. Core flows should receive the deepest coverage, while supporting areas still get enough tests to catch regressions.

## What to review

You do not need to read every generated test. Review a representative sample:

* journey tests that cross multiple important flows
* critical tests for the product’s highest-value behaviors
* tests that reference scenario data with variable placeholders

Good tests reference actual UI text and visible outcomes. Vague steps or generic assertions usually mean the suite needs another pass.

# Step 5: Scenario Validation

> Validate the planned scenarios against the live SDK endpoint and persist executable recipe artifacts.

The final plugin stage is now **Scenario Validation**, not SDK implementation. By the time this step runs, the plugin has already integrated the SDK in Step 1 and verified that a live endpoint exists.

This step takes the planned scenarios and checks that they work against the live endpoint by exercising the current SDK contract.

## Prerequisites

* `autonoma/discover.json`
* `autonoma/scenarios.md`
* a live SDK endpoint produced by Step 1

If the endpoint is unreachable at this stage, the pipeline should fail with guidance to fix the SDK integration or dev server rather than attempting to re-implement the SDK.

## What this produces

* `autonoma/scenario-recipes.json`

The generated recipe file is the validated handoff between planning and execution. It is what later Autonoma flows use to create and tear down scenario data reliably.

## What this validates

* `discover` still works against the live endpoint
* `up` can create the planned scenario data
* `down` can clean it up again
* the scenario recipes conform to the current SDK contract

## What to review

Review the validation summary for:

* any scenario that fails lifecycle validation
* schema or relationship mismatches between the plan and the live endpoint
* missing recipe coverage for `standard`, `empty`, or `large`

This stage should validate and upload recipes only. It should not be the point where the plugin starts rewriting your SDK integration.

# Environment Factory Guide

> How to implement the Autonoma Environment Factory in your application — a single POST endpoint for creating and destroying isolated test environments.

> **Note:**
>
> This guide teaches you how to implement the Autonoma Environment Factory in your application, regardless of language or framework. For a complete working example in Next.js + Prisma, see the [Next.js Implementation](/examples/nextjs/).

## The Big Picture

Before Autonoma runs an E2E test, it needs two things:

1. **Data** — a user account, some test records, whatever the test scenario requires
2. **Authentication** — a way to log in as that user (cookies, headers, or credentials)

After the test finishes, everything gets cleaned up so the next test starts fresh.

Your job is to implement **one endpoint** that handles three actions:

| Action       | When it’s called       | What you do                                                               |
| ------------ | ---------------------- | ------------------------------------------------------------------------- |
| **discover** | When Autonoma connects | Return a list of available scenarios (e.g., “empty”, “standard”, “large”) |
| **up**       | Before each test run   | Create data, generate auth credentials, return everything                 |
| **down**     | After each test run    | Verify the request is legitimate, then delete the data you created        |

That’s it. One endpoint, three actions, and Autonoma handles the rest.

### Why “scenarios”?

Different tests need different data. A test for “empty state messaging” needs an org with zero data. A test for “pagination in the runs table” needs hundreds of runs. Instead of one giant seed, you define named **scenarios** — each one creates exactly the data its tests need.

## How the Protocol Works

All communication is a single **POST** request with a JSON body. The `action` field tells your endpoint what to do.

### Discover

Autonoma asks: “What scenarios do you support?”

**Request fields:**

| Field    | Type         | Description                    |
| -------- | ------------ | ------------------------------ |
| `action` | `"discover"` | Always the string `"discover"` |

**Response fields:**

| Field                        | Type   | Description                                                                       |
| ---------------------------- | ------ | --------------------------------------------------------------------------------- |
| `environments`               | array  | List of available scenarios                                                       |
| `environments[].name`        | string | Scenario identifier (e.g., `"empty"`, `"standard"`, `"large"`)                    |
| `environments[].description` | string | Human-readable description. Autonoma’s AI reads this to choose the right scenario |
| `environments[].fingerprint` | string | 16-character hex hash of the scenario’s data structure                            |

**Example:**

```plaintext
→ POST /your-endpoint
  { "action": "discover" }


← 200 OK
  {
    "environments": [
      {
        "name": "empty",
        "description": "Brand-new organization with minimal data. Suitable for testing empty states, onboarding flows, and first-time user experiences.",
        "fingerprint": "f0e1d2c3b4a59687"
      },
      {
        "name": "standard",
        "description": "Full dataset, suitable for core workflows.",
        "fingerprint": "a1b2c3d4e5f67890"
      },
      {
        "name": "large",
        "description": "High-volume dataset for pagination, filtering, and performance behavior.",
        "fingerprint": "45aef220d52320cd"
      }
    ]
  }
```

### Up

Autonoma says: “Create the `standard` scenario for test run `run-abc123`.”

**Request fields:**

| Field         | Type   | Description                                                                  |
| ------------- | ------ | ---------------------------------------------------------------------------- |
| `action`      | `"up"` | Always the string `"up"`                                                     |
| `environment` | string | The scenario name (must match one returned by `discover`)                    |
| `testRunId`   | string | Unique identifier for this test run. Use it to make emails, org names unique |

**Response fields:**

| Field              | Type   | Description                                                                                                                                                                           |
| ------------------ | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `auth`             | object | Credentials Autonoma uses to act as the test user                                                                                                                                     |
| `auth.cookies`     | array  | Session cookies to inject. Each has `name`, `value`, `httpOnly`, `sameSite`, `path`                                                                                                   |
| `refs`             | object | IDs of everything you created. These come back verbatim in `down`                                                                                                                     |
| `refsToken`        | string | A signed (JWT or equivalent) copy of `refs`                                                                                                                                           |
| `metadata`         | object | Extra info for Autonoma’s AI agent (email, role, org name, etc.)                                                                                                                      |
| `expiresInSeconds` | number | Optional. How long (in seconds) before Autonoma considers this instance expired. Defaults to 7200 (2 hours). Override if your scenario data has a shorter or longer natural lifetime. |

**Example:**

```json
// → POST /your-endpoint
{
    "action": "up",
    "environment": "standard",
    "testRunId": "run-abc123"
}


// ← 200 OK
{
    "auth": {
        "cookies": [
            {
                "name": "session",
                "value": "eyJ...",
                "httpOnly": true,
                "sameSite": "lax",
                "path": "/"
            }
        ]
    },
    "refs": {
        "organizationId": "org_xyz",
        "userId": "usr_abc",
        "productIds": ["prod_1", "prod_2"]
    },
    "refsToken": "eyJhbGciOiJIUzI1NiIs...",
    "metadata": {
        "email": "test-user@example.com",
        "scenario": "standard"
    },
    "expiresInSeconds": 7200
}
```

### Down

Autonoma says: “I’m done with test run `run-abc123`. Here are the refs you gave me — delete everything.”

**Request fields:**

| Field       | Type     | Description                              |
| ----------- | -------- | ---------------------------------------- |
| `action`    | `"down"` | Always the string `"down"`               |
| `testRunId` | string   | Same test run ID from the `up` call      |
| `refs`      | object   | The exact `refs` object returned by `up` |
| `refsToken` | string   | The exact `refsToken` returned by `up`   |

**Response fields:**

| Field | Type    | Description                  |
| ----- | ------- | ---------------------------- |
| `ok`  | boolean | `true` if teardown completed |

**Example:**

```json
// → POST /your-endpoint
{
    "action": "down",
    "testRunId": "run-abc123",
    "refs": {
        "organizationId": "org_xyz",
        "userId": "usr_abc",
        "productIds": ["prod_1", "prod_2"]
    },
    "refsToken": "eyJhbGciOiJIUzI1NiIs..."
}


// ← 200 OK
{ "ok": true }
```

Before deleting anything, you **must** verify the `refsToken` and confirm it matches the `refs` in the request body. This prevents anyone from crafting a fake `down` request to delete arbitrary data.

## Security Model

Three layers of security protect your endpoint, using **two separate secrets** with very different purposes.

### The Two Secrets

Your implementation requires one secret: `AUTONOMA_SIGNING_SECRET`. It is used for both HMAC request verification and for signing the teardown token (`refsToken`).

| Secret             | Env Variable              | Purpose                                                                                                                                                                |
| ------------------ | ------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Signing secret** | `AUTONOMA_SIGNING_SECRET` | HMAC-SHA256 signature of every request, and signing/verifying the `refsToken` for teardown. You paste this value into the Autonoma dashboard when connecting your app. |

**Generate with `openssl`:**

```bash
openssl rand -hex 32
```

This produces a 64-character hex string (256 bits of entropy).

### Layer 1: Environment Gating

Your endpoint should **not exist in production** unless explicitly enabled. The simplest approach: return 404 when `NODE_ENV=production` (or your framework’s equivalent) unless you’ve set a specific override flag.

This is the first line of defense. Even if someone discovers the URL, it doesn’t respond in production.

### Layer 2: Request Signing (HMAC-SHA256) - uses `AUTONOMA_SIGNING_SECRET`

Every request from Autonoma includes a signature header:

```plaintext
x-signature: <hex-digest>
```

The signature is an HMAC-SHA256 of the raw request body, using the **signing secret** that both you and Autonoma know. Your endpoint must:

1. Read the raw request body (before JSON parsing)
2. Compute HMAC-SHA256 of that body using `AUTONOMA_SIGNING_SECRET`
3. Compare your result with the `x-signature` header
4. Reject if they don’t match (return 401)

This guarantees every request actually came from Autonoma.

### Layer 3: Signed Refs (for `down` only) - uses `AUTONOMA_SIGNING_SECRET`

When `up` creates data, it signs the `refs` map into a token (`refsToken`) using `AUTONOMA_SIGNING_SECRET`. Autonoma stores this token and sends it back when calling `down`. When `down` receives the token:

1. Verify the token signature and expiry (24h) using `AUTONOMA_SIGNING_SECRET`
2. Decode the refs from inside the token
3. Compare them with the refs in the request body
4. Only proceed if they match exactly

This guarantees that `down` can only delete data that `up` actually created. Autonoma never sees the token internals - it just holds onto the opaque string and passes it back.

### Error Responses

Use consistent error codes so Autonoma can handle failures gracefully:

| Situation                            | HTTP Status | Error Code            |
| ------------------------------------ | ----------- | --------------------- |
| Unknown action                       | 400         | `UNKNOWN_ACTION`      |
| Unknown scenario name                | 400         | `UNKNOWN_ENVIRONMENT` |
| `up` fails during creation           | 500         | `UP_FAILED`           |
| `down` fails during deletion         | 500         | `DOWN_FAILED`         |
| Invalid, expired, or mismatched refs | 403         | `INVALID_REFS_TOKEN`  |
| Missing or invalid HMAC signature    | 401         | *(no code needed)*    |

Response shape:

```json
{ "error": "Human-readable description", "code": "ERROR_CODE" }
```

## Implementing the Actions

Implementing Discover

This is the simplest action. It returns your list of scenarios with their metadata.

**What to return for each scenario:**

| Field         | Type   | Description                                                                       |
| ------------- | ------ | --------------------------------------------------------------------------------- |
| `name`        | string | Identifier (e.g., `"standard"`, `"empty"`)                                        |
| `description` | string | Human-readable description. Autonoma’s AI reads this to choose the right scenario |
| `fingerprint` | string | A 16-character hex hash of the scenario’s data structure                          |

```plaintext
function handleDiscover():
    scenarios = getAllRegisteredScenarios()
    return {
        environments: scenarios.map(s => ({
            name: s.name,
            description: s.description,
            fingerprint: s.computeFingerprint()
        }))
    }
```

Implementing Up

This is where the real work happens. `up` receives a scenario name and a test run ID, and creates all the data.

**Step by step:**

1. **Find the scenario** by name. Return 400 `UNKNOWN_ENVIRONMENT` if not found.
2. **Call the scenario’s `up` function**, which creates all database records and collects their IDs into a `refs` map.
3. **Sign the refs** into a JWT token (the `refsToken`).
4. **Create auth credentials** — whatever your app needs to log in as the test user.
5. **Return everything**: auth, refs, refsToken, metadata.

**Important design decisions:**

* **Every `up` creates a NEW isolated dataset.** Use the `testRunId` to make names/emails unique (e.g., `test-user-run-abc123@example.com`). This allows parallel test runs without collisions.
* **Collect ALL created IDs into `refs`.** You’ll need them for teardown.
* **Handle creation order carefully.** Parent records must be created before children.
* **Return `expiresInSeconds` if your data has a natural TTL.** Autonoma defaults to 2 hours. If your scenario creates time-sensitive records (e.g., OTP codes, pending invitations), override this to match their lifetime.

Implementing Down

`down` receives the refs map and the signed token, verifies them, and deletes everything.

**Step by step:**

1. **Verify the `refsToken`** — decode the JWT, check it hasn’t expired (24h max), extract the refs.
2. **Compare decoded refs with request refs** — they must match exactly. If someone sends a valid token but swaps the refs in the request body, reject with 403.
3. **Determine which scenario** was used (from the refs structure, or store the scenario name in refs).
4. **Call the scenario’s `down` function**, which deletes all records.
5. **Return** `{ ok: true }`.

> **Why verify before deleting?:**
>
> Without verification, anyone who can reach your endpoint could send:
>
> ```json
> { "action": "down", "refs": { "orgId": "PRODUCTION_ORG_ID" } }
> ```
>
> …and delete your production data. The signed token makes this impossible.

## Scenario Fingerprinting

Each scenario has a **fingerprint** — a hash of its structural definition. It serves two purposes: **drift detection** and **validation**.

### The problem it solves

You add a new field to your `users` table, but forget to update the scenario’s `up` function to populate it. Now your tests are running against incomplete data. The fingerprint catches this.

### How Autonoma uses it

Autonoma stores the fingerprint from your last successful run. Before each new test run, it calls `discover` and compares fingerprints. If they differ, Autonoma knows the scenario data has changed and can re-analyze accordingly.

### How to build it

1. Define a **descriptor** object that mirrors the structure of what your `up` creates
2. JSON-serialize it and hash with SHA-256
3. Take the first 16 hex characters

```plaintext
descriptor = {
    users: 4,
    products: { count: 10, statuses: { active: 8, draft: 2 } },
    orders: 5
}
fingerprint = sha256(JSON.stringify(descriptor)).substring(0, 16)
```

**The key property:** The fingerprint is computed from the same constants your `up` function reads. When you add a product, the descriptor’s count changes, and the fingerprint changes automatically.

> **Tip:**
>
> The fingerprint must be deterministic. Don’t include timestamps, random values, or anything that changes between requests. Call `discover` twice — the fingerprints must be identical.

## Signed Refs — How Teardown Stays Safe

This is the most important security concept. Here’s the full flow:

```plaintext
┌── up ──────────────────────────────────────────────────────────┐
│                                                                 │
│  1. Create org, users, products...                              │
│  2. Collect IDs: refs = { orgId, userIds, ... }                 │
│  3. Sign: refsToken = JWT.sign({ refs }, INTERNAL_SECRET)       │
│  4. Return both refs AND refsToken                              │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
         │
         │  (Autonoma stores refsToken, runs tests)
         │
         ▼
┌── down ────────────────────────────────────────────────────────┐
│                                                                 │
│  1. Receive refs AND refsToken                                  │
│  2. Verify: decoded = JWT.verify(refsToken, INTERNAL_SECRET)    │
│  3. Compare: decoded.refs === request.refs?                     │
│     NO  → 403 INVALID_REFS_TOKEN                                │
│     YES → proceed to delete                                     │
│  4. Delete everything in refs                                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
```

Autonoma treats `refsToken` as an opaque string - stores it after `up` and sends it back in `down`.

**What this prevents:**

| Attack                                            | Why it fails                      |
| ------------------------------------------------- | --------------------------------- |
| Attacker sends fake refs with made-up IDs         | No valid token → rejected         |
| Attacker sends a valid token but changes the refs | Refs don’t match token → rejected |
| Attacker replays a token from a week ago          | Token expired (24h) → rejected    |

No server-side state needed. The token itself is the proof.

## Authentication Strategies

The `auth` object in your `up` response tells Autonoma how to log in as the test user.

Option A: Session Cookies (most common)

If your app uses cookie-based sessions, generate a session during `up` and return the cookies:

```json
{
    "auth": {
        "cookies": [
            {
                "name": "session-token",
                "value": "abc123",
                "httpOnly": true,
                "sameSite": "lax",
                "path": "/"
            }
        ]
    }
}
```

Works with: NextAuth, custom JWT cookies, session stores, etc.

Option B: Bearer Token / Headers

If your app uses API tokens or bearer auth:

```json
{
    "auth": {
        "headers": {
            "Authorization": "Bearer eyJ..."
        }
    }
}
```

Works with: Auth0, custom API keys, OAuth tokens, etc.

Option C: Username + Password

If your app has a login page and you want Autonoma to log in through it:

```json
{
    "auth": {
        "credentials": {
            "email": "test-user@example.com",
            "password": "TestP@ssw0rd123!"
        }
    }
}
```

Options A and B can be used together. For web applications, cookies or headers are preferred because Autonoma can inject them directly without navigating a login page.

> **Mobile apps: use credentials only:**
>
> For **iOS and Android** applications, cookies and bearer tokens/headers (Options A and B) are **not supported**. Autonoma cannot inject cookies or HTTP headers into native mobile apps - there is no browser context to set them in.
>
> Instead, use **Option C (Username + Password)** and return credentials that Autonoma’s agent can use to log in through your app’s login screen. The agent will navigate the login flow just like a real user would.

## Writing Your Teardown Function

Teardown is where most bugs hide. Key rules:

Rule 1: Delete in reverse creation order

If `up` creates: org → users → products → orders, then `down` must delete: orders → products → users → org. Foreign key constraints enforce this.

Rule 2: Don't rely on ORM cascade behavior

ORMs have inconsistent cascade defaults. Explicit deletion in reverse order is always safer.

Rule 3: Handle circular foreign keys

If your schema has tables that reference each other, you can’t delete either table first.

**Solution:** Use raw SQL in a transaction to temporarily drop the FK constraint:

```sql
BEGIN;
  ALTER TABLE components DROP CONSTRAINT components_default_version_id_fkey;
  DELETE FROM component_versions WHERE org_id = $1;
  DELETE FROM components WHERE org_id = $1;
  ALTER TABLE components ADD CONSTRAINT components_default_version_id_fkey
    FOREIGN KEY (default_version_id) REFERENCES component_versions(id);
COMMIT;
```

Rule 4: Handle nested/self-referential records

If a table references itself (e.g., folders with parent folders), delete children before parents:

```sql
DELETE FROM folders WHERE org_id = $1 AND parent_id IS NOT NULL;
DELETE FROM folders WHERE org_id = $1;
```

## Testing Your Implementation

Write integration tests that cover the full lifecycle.

Happy Path Tests

| Test                         | What it verifies                                             |
| ---------------------------- | ------------------------------------------------------------ |
| `discover` returns scenarios | Correct names, descriptions, 16-char fingerprints            |
| Fingerprints are stable      | Calling `discover` twice returns identical fingerprints      |
| `up` creates data            | Query your database after `up` — verify entity counts        |
| `down` deletes data          | Query your database after `down` — verify everything is gone |
| Full round-trip              | `up` → verify data exists → `down` → verify data is gone     |

Security Tests

| Test              | What it verifies                                         |
| ----------------- | -------------------------------------------------------- |
| Tampered token    | Send a random string as `refsToken` → expect 403         |
| Mismatched refs   | Send a valid token but change the refs body → expect 403 |
| Expired token     | Create a token with past expiry → expect 403             |
| Missing signature | Send a request without `x-signature` → expect 401        |
| Invalid signature | Send a request with a wrong signature → expect 401       |

Error Handling Tests

| Test                | What it verifies                                            |
| ------------------- | ----------------------------------------------------------- |
| Unknown action      | `{ action: "explode" }` → expect 400                        |
| Unknown environment | `{ action: "up", environment: "nonexistent" }` → expect 400 |
| Malformed body      | Send non-JSON → expect 400                                  |

## Quick Test Script

The fastest way to verify your implementation is the one-liner test script. It runs the full `discover` -> `up` -> `down` lifecycle against your endpoint and validates every response.

```bash
curl -fsSL https://docs.agent.autonoma.app/test-scenario.sh | bash -s -- \
  --url https://your-app.com/api/autonoma \
  --secret your-signing-secret \
  --scenario standard
```

The `--secret` flag takes your `AUTONOMA_SIGNING_SECRET` value (the HMAC signing secret shared with Autonoma).

**What it does:**

1. Calls `discover` and verifies your scenario is listed with the correct response shape
2. Calls `up` for the specified scenario and validates it returns `auth`, `refs`, and `refsToken`
3. Calls `down` with the refs from `up` and verifies teardown succeeds

**Options:**

| Flag                | Description                                                          |
| ------------------- | -------------------------------------------------------------------- |
| `--url URL`         | **(required)** Your Environment Factory endpoint URL                 |
| `--secret SECRET`   | **(required)** Your `AUTONOMA_SIGNING_SECRET` value                  |
| `--scenario NAME`   | **(required)** Scenario to test (e.g., `standard`, `empty`, `large`) |
| `--keep-up`         | Skip teardown - leaves data in place so you can inspect it           |
| `--skip-discover`   | Go straight to `up`/`down` without calling `discover` first          |
| `--test-run-id ID`  | Use a fixed test run ID instead of generating one                    |
| `--timeout SECONDS` | Per-request timeout (default: 30)                                    |

Run with `--help` for full usage:

```bash
curl -fsSL https://docs.agent.autonoma.app/test-scenario.sh | bash -s -- --help
```

**Requirements:** `curl`, `openssl`, `python3` (all pre-installed on macOS and most Linux distributions).

## Manual Testing with curl

curl commands for discover, up, and down

Set your signing secret first:

```bash
export SECRET="your-signing-secret"  # AUTONOMA_SIGNING_SECRET value
export BASE_URL="https://your-app.example.com"
```

**Discover:**

```bash
BODY='{"action":"discover"}'
SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | sed 's/.*= //')
curl -s -X POST "$BASE_URL/api/autonoma" \
  -H "Content-Type: application/json" \
  -H "x-signature: $SIG" \
  -d "$BODY" | jq .
```

**Up:**

```bash
BODY='{"action":"up","environment":"standard","testRunId":"manual-test-001"}'
SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | sed 's/.*= //')
UP=$(curl -s -X POST "$BASE_URL/api/autonoma" \
  -H "Content-Type: application/json" \
  -H "x-signature: $SIG" \
  -d "$BODY")
echo "$UP" | jq .


# Save for down
REFS=$(echo "$UP" | jq -c '.refs')
TOKEN=$(echo "$UP" | jq -r '.refsToken')
```

**Down:**

```bash
BODY=$(jq -n -c --argjson refs "$REFS" --arg token "$TOKEN" \
  '{action:"down", testRunId:"manual-test-001", refs:$refs, refsToken:$token}')
SIG=$(echo -n "$BODY" | openssl dgst -sha256 -hmac "$SECRET" | sed 's/.*= //')
curl -s -X POST "$BASE_URL/api/autonoma" \
  -H "Content-Type: application/json" \
  -H "x-signature: $SIG" \
  -d "$BODY" | jq .
```

> **Note:**
>
> On some systems, `openssl dgst` outputs a prefix like `SHA2-256(stdin)= `. The `sed 's/.*= //'` strips everything before the actual digest. Do **not** use `awk '{print $2}'` — it doesn’t work reliably across all OpenSSL versions.

## Deployment Checklist

Before sharing your endpoint URL with Autonoma:

* [ ] **Production guard works** — endpoint returns 404 in production (unless explicitly overridden)
* [ ] **Signing secret configured** — `AUTONOMA_SIGNING_SECRET` is set in your environment and matches the value in the Autonoma dashboard
* [ ] **Signing secret configured** — `AUTONOMA_SIGNING_SECRET` is set in your environment and matches the value in the Autonoma dashboard
* [ ] **`discover` returns correct data** — scenario names, descriptions, and fingerprints
* [ ] **`up` creates all entities** — spot-check counts in your database
* [ ] **Auth works** — use the returned cookies/headers to navigate your app
* [ ] **`down` deletes all entities** — no orphaned records left behind
* [ ] **`down` rejects bad tokens** — tampered, expired, and mismatched refs return 403
* [ ] **Response times acceptable** — `up` < 30s, `down` < 10s
* [ ] **Integration tests pass**
* [ ] **Test script passes** — `curl -fsSL https://docs.agent.autonoma.app/test-scenario.sh | bash -s -- --url $URL --secret $AUTONOMA_SIGNING_SECRET --scenario standard` succeeds for all scenarios

## Troubleshooting

| Problem                              | Cause                                | Fix                                                                                    |
| ------------------------------------ | ------------------------------------ | -------------------------------------------------------------------------------------- |
| `up` fails with FK violation         | Creating child before parent         | Check your creation order — parents first                                              |
| `down` fails with FK violation       | Deleting parent before child         | Check your deletion order — children first                                             |
| `down` fails on circular FK          | Two tables reference each other      | Drop the constraint temporarily in a transaction                                       |
| Signature verification fails locally | Secret not set or wrong value        | Check `AUTONOMA_SIGNING_SECRET` matches between your server and the Autonoma dashboard |
| Fingerprint changes between calls    | Non-deterministic data in descriptor | Remove timestamps, random values from descriptor                                       |
| `openssl dgst` output looks wrong    | Different OpenSSL versions           | Use `sed 's/.*= //'` instead of `awk '{print $2}'`                                     |
| Token expired immediately            | Clock skew or wrong expiry           | Check server time, ensure JWT expiry is `24h` not `-24h`                               |
| Parallel tests collide               | Same email/name used across runs     | Use `testRunId` in all unique fields                                                   |

# Next.js Implementation

> Complete working example of the Autonoma Environment Factory with Next.js + Prisma + PostgreSQL.

> **Prerequisites:**
>
> Read the [Environment Factory Guide](/guides/environment-factory/) first for concepts. This doc is the code.
>
> **Stack:** Next.js 16 (App Router), Prisma ORM, PostgreSQL, JWT-based bypass auth, Vitest for integration tests.

## File Structure

Each file has one clear responsibility:

```plaintext
src/
├── app/api/autonoma/
│   └── route.ts                    ← The POST endpoint (action routing, auth, security)
│
└── lib/private/autonoma/
    ├── types.ts                    ← Zod schemas for request validation + response types
    ├── ScenarioBuilder.ts          ← Abstract base class (fingerprinting, metadata)
    ├── RefsTokenFactory.ts         ← Signs and verifies refs with JWT
    ├── ScenarioRegistry.ts         ← Imports and registers all scenarios
    ├── teardownOrganization.ts     ← Shared teardown: deletes all org data in FK order
    │
    └── scenarios/
        ├── EmptyScenario.ts        ← Minimal scenario: org + user only
        └── StandardScenario.ts     ← Full scenario: apps, tests, runs, tags, etc.


test/integration/
└── autonoma-scenario.test.ts      ← Integration tests for the full lifecycle
```

## Step 1: Define the Types

This file defines the request/response contract. We use Zod to validate incoming requests as a discriminated union on `action`.

**File: `src/lib/private/autonoma/types.ts`**

```typescript
import { z } from "zod"


export const DiscoverBodySchema = z.object({ action: z.literal("discover") })


export const UpBodySchema = z.object({
    action: z.literal("up"),
    environment: z.string(),
    testRunId: z.string(),
})


export const DownBodySchema = z.object({
    action: z.literal("down"),
    testRunId: z.string(),
    refs: z.record(z.string(), z.unknown()),
    refsToken: z.string(),
})


export const AutonomaBodySchema = z.discriminatedUnion("action", [
    DiscoverBodySchema,
    UpBodySchema,
    DownBodySchema,
])


export type AutonomaBody = z.infer<typeof AutonomaBodySchema>


export type ErrorCode =
    | "UNKNOWN_ACTION"
    | "UNKNOWN_ENVIRONMENT"
    | "UP_FAILED"
    | "DOWN_FAILED"
    | "INVALID_REFS_TOKEN"


export type ScenarioRefs = Record<string, unknown>


export type ScenarioUpResult = {
    organizationId: string
    userId: string
    userEmail: string
    refs: ScenarioRefs
}


export type UpResponse = {
    auth: {
        cookies: Array<{
            name: string
            value: string
            httpOnly: boolean
            sameSite: "lax"
            path: string
        }>
    }
    refs: ScenarioRefs
    refsToken: string
    metadata: Record<string, unknown>
}


export type DownResponse = {
    ok: boolean
}


export type DiscoverEnvironment = {
    name: string
    description: string
    fingerprint: string
}


export type DiscoverResponse = {
    environments: DiscoverEnvironment[]
}


export type ErrorResponse = {
    error: string
    code: ErrorCode
}
```

The discriminated union means `AutonomaBodySchema.parse(body)` returns a narrowed type. When you `switch (body.action)`, TypeScript knows exactly which fields exist on each branch.

## Step 2: Build the Scenario Base Class

Every scenario extends this abstract class. It provides a contract (`up`/`down`) and automatic fingerprinting.

**File: `src/lib/private/autonoma/ScenarioBuilder.ts`**

```typescript
import crypto from "crypto"
import type { ScenarioRefs, ScenarioUpResult } from "./types"


export abstract class ScenarioBuilder {
    abstract readonly name: string
    abstract readonly description: string


    protected abstract readonly descriptor: Record<string, unknown>


    abstract up(testRunId: string): Promise<ScenarioUpResult>
    abstract down(refs: ScenarioRefs): Promise<void>


    fingerprint(): string {
        const json = JSON.stringify(this.descriptor)
        const hash = crypto.createHash("sha256").update(json).digest("hex")
        return hash.substring(0, 16)
    }


    meta() {
        return {
            name: this.name,
            description: this.description,
            fingerprint: this.fingerprint(),
        }
    }
}
```

Each scenario defines a `descriptor` — a plain object that mirrors the shape of the data it creates. SHA-256 hash of `JSON.stringify(descriptor)`, truncated to 16 hex chars.

## Step 3: Build the Refs Token Factory

This module handles JWT signing/verification for refs. Three functions, no state.

**File: `src/lib/private/autonoma/RefsTokenFactory.ts`**

```typescript
import { sign, verify, TokenExpiredError } from "jsonwebtoken"
import type { ScenarioRefs } from "./types"


const JWT_ALGORITHM = "HS256" as const
const REFS_TOKEN_EXPIRY = "24h"


function getSecret(): string {
    const secret = process.env.AUTONOMA_JWT_SECRET
    if (secret == null) throw new Error("AUTONOMA_JWT_SECRET is not configured")
    return secret
}


export function signRefs(refs: ScenarioRefs): string {
    return sign({ refs }, getSecret(), {
        algorithm: JWT_ALGORITHM,
        expiresIn: REFS_TOKEN_EXPIRY,
    })
}


export function verifyRefs(
    token: string,
): { refs: ScenarioRefs } | { error: string } {
    try {
        const decoded = verify(token, getSecret(), {
            algorithms: [JWT_ALGORITHM],
        }) as { refs: ScenarioRefs }
        return { refs: decoded.refs }
    } catch (error) {
        if (error instanceof TokenExpiredError) {
            return { error: "Refs token expired (older than 24h)" }
        }
        return { error: "Invalid refs token" }
    }
}


export function refsMatch(
    tokenRefs: ScenarioRefs,
    requestRefs: ScenarioRefs,
): boolean {
    return JSON.stringify(tokenRefs) === JSON.stringify(requestRefs)
}
```

## Step 4: Write Your First Scenario (Empty)

The simplest scenario — creates an org, a user, and nothing else. Used for testing empty states, onboarding flows, and first-time user experiences.

**File: `src/lib/private/autonoma/scenarios/EmptyScenario.ts`**

```typescript
import { db } from "@repo/database"
import { ScenarioBuilder } from "../ScenarioBuilder"
import { teardownOrganization } from "../teardownOrganization"
import type { ScenarioRefs, ScenarioUpResult } from "../types"


export class EmptyScenario extends ScenarioBuilder {
    readonly name = "empty"
    readonly description =
        "An organization with no data. Used for testing empty states, " +
        "first-time user experience, and onboarding flows."


    protected readonly descriptor = {
        org: { hasQuota: true },
        users: 1,
        applications: 0,
        tests: 0,
        folders: 1,
        tags: 0,
        runs: 0,
    }


    async up(testRunId: string): Promise<ScenarioUpResult> {
        // 1. Create the organization
        const org = await db.organization.create({
            data: {
                name: `Autonoma QA Empty [${testRunId}]`,
                orgConfiguration: {
                    regenOn: true,
                    smartClick: true,
                    architecture: { web: true, android: false, ios: false },
                },
            },
        })


        // 2. Create the quota
        await db.organizationQuota.create({
            data: {
                organizationId: org.id,
                active: true,
                unlimited: true,
                runsLeft: 0,
                purchasedRuns: 0,
            },
        })


        // 3. Create the root folder
        const rootFolder = await db.folder.create({
            data: {
                name: "Root",
                organizationID: org.id,
            },
        })


        // 4. Create the test user
        const user = await db.user.create({
            data: {
                name: "QA Empty",
                lastName: "Admin",
                email: `qa-empty-${testRunId}@autonoma.dev`,
                organizationID: org.id,
                isOnboarded: true,
            },
        })


        return {
            organizationId: org.id,
            userId: user.id,
            userEmail: user.email,
            refs: {
                organizationId: org.id,
                userId: user.id,
                folderId: rootFolder.id,
            },
        }
    }


    async down(refs: ScenarioRefs): Promise<void> {
        await teardownOrganization(refs.organizationId as string)
    }
}
```

Notice:

* The `testRunId` is baked into the org name and user email for uniqueness.
* The `refs` contain exactly the IDs of what was created.
* The `down` function delegates to `teardownOrganization` — a shared function that handles FK-order deletion.

## Step 5: Write the Full Scenario (Standard)

The standard scenario creates a representative dataset: 3 applications, 12 tests, 58 runs, folders, tags, components, variables, and more.

**The key pattern:** All data is defined as module-level constants. The `up` function iterates these constants to create records. The `descriptor` summarizes the same constants. They can’t drift from each other.

**File: `src/lib/private/autonoma/scenarios/StandardScenario.ts`**

The data constants define the scenario (truncated for brevity — see the full source for all constants):

```typescript
const APPLICATIONS = [
    { name: "My Web App", type: "web" as const, versions: ["v1.0", "v1.1", "v2.0"] },
    { name: "Android Shopping", type: "android" as const, versions: ["v1.0", "v1.1"] },
    { name: "iOS Banking", type: "ios" as const, versions: ["v1.0"] },
] as const


const FOLDERS = [
    { name: "Smoke Tests", parentIdx: null },
    { name: "Regression", parentIdx: null },
    { name: "Deep Regression", parentIdx: 1 },   // subfolder of Regression
    { name: "Mobile Tests", parentIdx: null },
] as const


const TAGS = [
    { name: "critical", description: "Critical path tests", color: "#ef4444" },
    { name: "web", description: "Web application tests", color: "#3b82f6" },
    { name: "mobile", description: "Mobile application tests", color: "#22c55e" },
    { name: "ios", description: "iOS-specific tests", color: "#a855f7" },
] as const
```

The class itself — descriptor and `up`/`down` methods:

```typescript
export class StandardScenario extends ScenarioBuilder {
    readonly name = "standard"
    readonly description =
        "The default scenario for most tests. Contains 3 applications, " +
        "12 tests, 58 runs, 4 folders, 4 tags, 2 components, and more."


    protected readonly descriptor = {
        org: { hasQuota: true },
        users: 1,
        applications: APPLICATIONS.map((a) => ({
            type: a.type,
            versionCount: a.versions.length,
        })),
        folders: FOLDERS.length,
        tags: TAGS.map((t) => t.name),
        tests: {
            count: TESTS.length,
            byType: {
                final: TESTS.filter((t) => t.type === "final").length,
                draft: TESTS.filter((t) => t.type === "draft").length,
            },
        },
        runs: {
            count: RUNS.length,
            byStatus: countBy(RUNS, (r) => r.status),
            bySources: countBy(RUNS, (r) => r.source),
        },
        variables: VARIABLES.length,
        apiKeys: 1,
        schedules: 1,
        webhooks: 1,
        scripts: 1,
    }


    async up(testRunId: string): Promise<ScenarioUpResult> {
        const org = await createOrganization(testRunId)
        const user = await createUser(testRunId, org.id)
        const rootFolder = await createRootFolder(org.id, user.id)
        const folderIds = await createFolders(org.id, rootFolder.id)
        const { applicationIds, versionMap } = await createApplications(
            testRunId, org.id,
        )
        const tagIds = await createTags(org.id)
        // ... create tests, components, runs, variables, etc.


        return {
            organizationId: org.id,
            userId: user.id,
            userEmail: user.email,
            refs: {
                organizationId: org.id, userId: user.id,
                rootFolderId: rootFolder.id, folderIds,
                applicationIds, versionMap, tagIds,
                // ... all other IDs
            },
        }
    }


    async down(refs: ScenarioRefs): Promise<void> {
        await teardownOrganization(refs.organizationId as string)
    }
}
```

The descriptor reads from the same constants that `up` iterates. When you add a test, `TESTS.length` changes, the descriptor changes, the fingerprint changes, and Autonoma knows to re-analyze.

**Handling circular foreign keys:**

```typescript
async function createComponents(organizationId, userId, applicationIds, folderIds, versionMap) {
    for (const comp of COMPONENTS) {
        const componentId = createId()
        const versionId = createId()


        await db.$transaction(async (tx) => {
            await tx.$executeRaw`ALTER TABLE "component" DISABLE TRIGGER ALL`
            await tx.$executeRaw`ALTER TABLE "component_version" DISABLE TRIGGER ALL`


            await tx.component.create({
                data: {
                    id: componentId,
                    name: comp.name,
                    defaultVersionId: versionId,
                    // ...
                },
            })


            await tx.componentVersion.create({
                data: {
                    id: versionId,
                    componentId,
                    // ...
                },
            })


            await tx.$executeRaw`ALTER TABLE "component" ENABLE TRIGGER ALL`
            await tx.$executeRaw`ALTER TABLE "component_version" ENABLE TRIGGER ALL`
        })
    }
}
```

## Step 6: Write the Teardown Function

Deletes **all** records for an organization in reverse FK order. Both scenarios share it.

**File: `src/lib/private/autonoma/teardownOrganization.ts`**

```typescript
import { db } from "@repo/database"


export async function teardownOrganization(organizationId: string): Promise<void> {
    // 1. Webhook notification rules
    await db.webhookNotificationRule.deleteMany({
        where: { webhook: { organizationID: organizationId } },
    })


    // 2. Webhooks
    await db.webhook.deleteMany({ where: { organizationID: organizationId } })


    // 3. Run schedules
    await db.runSchedule.deleteMany({
        where: {
            OR: [
                { folder: { organizationID: organizationId } },
                { testGroup: { organizationID: organizationId } },
                { tag: { organizationID: organizationId } },
            ],
        },
    })


    // 4. Run metadata
    await db.runMetadata.deleteMany({
        where: { run: { organizationID: organizationId } },
    })


    // 5-6. Steps, then Runs
    await db.step.deleteMany({
        where: { run: { organizationID: organizationId } },
    })
    await db.run.deleteMany({ where: { organizationID: organizationId } })


    // 7. Run groups (orphaned)
    await db.runGroup.deleteMany({ where: { runs: { none: {} } } })


    // 8-10. Tag assignments, component assignments, test steps
    await db.tagInTestGroup.deleteMany({ where: { organizationID: organizationId } })
    await db.componentVersionAssignment.deleteMany({
        where: { component: { organizationId } },
    })
    await db.testStep.deleteMany({
        where: { stepSequence: { organizationId } },
    })


    // 11. Components + ComponentVersions (CIRCULAR FK — drop constraint)
    await db.$transaction([
        db.$executeRaw`ALTER TABLE "component" DROP CONSTRAINT IF EXISTS "component_default_version_id_fkey"`,
        db.$executeRaw`DELETE FROM "component_version" WHERE "organization_id" = ${organizationId}`,
        db.$executeRaw`DELETE FROM "component" WHERE "organization_id" = ${organizationId}`,
        db.$executeRaw`ALTER TABLE "component" ADD CONSTRAINT "component_default_version_id_fkey" FOREIGN KEY ("default_version_id") REFERENCES "component_version"("id") ON DELETE RESTRICT ON UPDATE CASCADE`,
    ])


    // 12-20. Step sequences, tests, test groups, tags, app versions, apps, variables, scripts, API keys
    await db.stepSequence.deleteMany({ where: { organizationId } })
    await db.test.deleteMany({ where: { organizationID: organizationId } })
    await db.testGroup.deleteMany({ where: { organizationID: organizationId } })
    await db.tag.deleteMany({ where: { organizationID: organizationId } })
    await db.applicationVersion.deleteMany({ where: { organizationID: organizationId } })
    await db.application.deleteMany({ where: { organizationId } })
    await db.variable.deleteMany({ where: { organizationID: organizationId } })
    await db.script.deleteMany({ where: { organizationID: organizationId } })
    await db.apiKey.deleteMany({ where: { organizationId } })


    // 21. Folders — children first
    await db.folder.deleteMany({
        where: { organizationID: organizationId, parentID: { not: null } },
    })
    await db.folder.deleteMany({ where: { organizationID: organizationId } })


    // 22-24. Users, quota, organization
    await db.user.deleteMany({ where: { organizationID: organizationId } })
    await db.organizationQuota.deleteMany({ where: { organizationId } })
    await db.organization.delete({ where: { id: organizationId } })
}
```

## Step 7: Register Scenarios

**File: `src/lib/private/autonoma/ScenarioRegistry.ts`**

```typescript
import type { ScenarioBuilder } from "./ScenarioBuilder"
import { EmptyScenario } from "./scenarios/EmptyScenario"
import { StandardScenario } from "./scenarios/StandardScenario"


const scenarios: ScenarioBuilder[] = [
    new StandardScenario(),
    new EmptyScenario(),
]


export function findScenario(name: string): ScenarioBuilder | undefined {
    return scenarios.find((s) => s.name === name)
}


export function getAllScenarios(): ScenarioBuilder[] {
    return scenarios
}
```

## Step 8: Build the Route Handler

The endpoint Autonoma calls. Handles security, parsing, action routing, and auth cookie generation.

**File: `src/app/api/autonoma/route.ts`**

```typescript
import { NextRequest, NextResponse } from "next/server"
import { sign } from "jsonwebtoken"
import crypto from "crypto"
import { AutonomaBodySchema, type ErrorCode } from "@/lib/private/autonoma/types"
import { findScenario, getAllScenarios } from "@/lib/private/autonoma/ScenarioRegistry"
import { signRefs, verifyRefs, refsMatch } from "@/lib/private/autonoma/RefsTokenFactory"


const JWT_ALGORITHM = "HS256" as const


export async function POST(request: NextRequest) {
    // Layer 1: Environment gating
    if (
        process.env.NODE_ENV === "production" &&
        process.env.AUTONOMA_FACTORY_ENABLED !== "true"
    ) {
        return new NextResponse(null, { status: 404 })
    }


    // Layer 2: HMAC signature verification (uses AUTONOMA_SIGNING_SECRET)
    const rawBody = await request.text()
    const signature = request.headers.get("x-signature")


    if (signature == null) {
        return errorResponse("Missing signature", "UNKNOWN_ACTION", 401)
    }
    if (!verifySignature(rawBody, signature)) {
        return errorResponse("Invalid signature", "UNKNOWN_ACTION", 401)
    }


    // Parse and validate
    let body
    try {
        body = AutonomaBodySchema.parse(JSON.parse(rawBody))
    } catch {
        return errorResponse("Invalid request body", "UNKNOWN_ACTION", 400)
    }


    // Route to action handler
    switch (body.action) {
        case "discover":
            return handleDiscover()
        case "up":
            return handleUp(body.environment, body.testRunId)
        case "down":
            return handleDown(body.refs, body.refsToken)
    }
}


function handleDiscover() {
    const environments = getAllScenarios().map((s) => s.meta())
    return NextResponse.json({ environments })
}


async function handleUp(environment: string, testRunId: string) {
    const scenario = findScenario(environment)
    if (scenario == null) {
        return errorResponse(
            `Unknown environment: ${environment}`,
            "UNKNOWN_ENVIRONMENT",
            400,
        )
    }


    try {
        const result = await scenario.up(testRunId)
        const refsToken = signRefs(result.refs)
        const cookies = createBypassCookies(
            result.userEmail,
            result.organizationId,
        )


        return NextResponse.json({
            auth: { cookies },
            refs: result.refs,
            refsToken,
            metadata: {
                organizationId: result.organizationId,
                email: result.userEmail,
                scenario: environment,
            },
        })
    } catch (error) {
        console.error("[Autonoma] up failed", { environment, testRunId, error })
        return errorResponse("Failed to create environment", "UP_FAILED", 500)
    }
}


async function handleDown(
    refs: Record<string, unknown>,
    refsToken: string,
) {
    // Layer 3: Verify signed refs
    const verification = verifyRefs(refsToken)
    if ("error" in verification) {
        return errorResponse(verification.error, "INVALID_REFS_TOKEN", 403)
    }


    if (!refsMatch(verification.refs, refs)) {
        return errorResponse("Refs do not match token", "INVALID_REFS_TOKEN", 403)
    }


    const scenarioName = findScenarioForRefs(refs)
    const scenario = scenarioName != null ? findScenario(scenarioName) : null


    if (scenario == null) {
        return errorResponse(
            "Could not determine scenario for refs",
            "DOWN_FAILED",
            400,
        )
    }


    try {
        await scenario.down(refs)
        return NextResponse.json({ ok: true })
    } catch (error) {
        console.error("[Autonoma] down failed", { error })
        return errorResponse("Teardown failed", "DOWN_FAILED", 500)
    }
}


function verifySignature(rawBody: string, signature: string): boolean {
    const secret = process.env.AUTONOMA_SIGNING_SECRET
    if (secret == null) return false
    const expected = crypto.createHmac("sha256", secret).update(rawBody).digest("hex")
    return crypto.timingSafeEqual(Buffer.from(expected), Buffer.from(signature))
}


function errorResponse(message: string, code: ErrorCode, status: number) {
    return NextResponse.json({ error: message, code }, { status })
}


function createBypassCookies(email: string, organizationId: string) {
    const secret = process.env.AUTONOMA_JWT_SECRET
    if (secret == null) throw new Error("AUTONOMA_JWT_SECRET is not configured")


    const accessToken = sign(
        { email, externalOrganizationId: organizationId },
        secret,
        { algorithm: JWT_ALGORITHM, expiresIn: "1h" },
    )


    const refreshToken = sign(
        { email, externalOrganizationId: organizationId, type: "refresh" },
        secret,
        { algorithm: JWT_ALGORITHM, expiresIn: "7d" },
    )


    return [
        {
            name: "autonoma-bypass-token",
            value: accessToken,
            httpOnly: true,
            sameSite: "lax" as const,
            path: "/",
        },
        {
            name: "autonoma-bypass-refresh-token",
            value: refreshToken,
            httpOnly: true,
            sameSite: "lax" as const,
            path: "/",
        },
    ]
}


function findScenarioForRefs(
    refs: Record<string, unknown>,
): string | null {
    if (refs.organizationId == null) return null


    const allScenarios = getAllScenarios()
    for (const scenario of allScenarios) {
        if (canHandleRefs(scenario.name, refs)) return scenario.name
    }


    return allScenarios[0]?.name ?? null
}


function canHandleRefs(
    scenarioName: string,
    refs: Record<string, unknown>,
): boolean {
    if (scenarioName === "empty") {
        return refs.applicationIds == null && refs.testIds == null
    }
    if (scenarioName === "standard") {
        return refs.applicationIds != null
    }
    return false
}
```

## Step 9: Write Integration Tests

The tests verify the full lifecycle end-to-end against a real database.

**File: `test/integration/autonoma-scenario.test.ts`**

```typescript
import { beforeAll, afterEach, describe, expect, it } from "vitest"
import { POST } from "@/app/api/autonoma/route"
import { NextRequest } from "next/server"
import { db } from "@repo/database"
import crypto from "crypto"
import { sign } from "jsonwebtoken"


const SHARED_SECRET = "test-shared-secret"
const INTERNAL_SECRET = "test-internal-secret"


function signBody(body: string): string {
    return crypto
        .createHmac("sha256", SHARED_SECRET)
        .update(body)
        .digest("hex")
}


function createRequest(body: object): NextRequest {
    const bodyStr = JSON.stringify(body)
    const signature = signBody(bodyStr)


    return new NextRequest("http://localhost:3000/api/autonoma", {
        method: "POST",
        headers: {
            "Content-Type": "application/json",
            "x-signature": signature,
        },
        body: bodyStr,
    })
}


beforeAll(() => {
    process.env.AUTONOMA_SIGNING_SECRET = SHARED_SECRET
    process.env.AUTONOMA_JWT_SECRET = INTERNAL_SECRET
})


describe("POST /api/autonoma", () => {
    describe("discover", () => {
        it("returns all available scenarios with fingerprints", async () => {
            const request = createRequest({ action: "discover" })
            const response = await POST(request)


            expect(response.status).toBe(200)
            const data = await response.json()


            expect(data.environments).toHaveLength(2)


            const standard = data.environments.find(
                (e: any) => e.name === "standard",
            )
            expect(standard).toBeDefined()
            expect(standard.fingerprint).toHaveLength(16)


            const empty = data.environments.find(
                (e: any) => e.name === "empty",
            )
            expect(empty).toBeDefined()
            expect(empty.fingerprint).toHaveLength(16)
        })


        it("returns stable fingerprints across calls", async () => {
            const response1 = await POST(
                createRequest({ action: "discover" }),
            )
            const data1 = await response1.json()


            const response2 = await POST(
                createRequest({ action: "discover" }),
            )
            const data2 = await response2.json()


            expect(data1.environments[0].fingerprint).toBe(
                data2.environments[0].fingerprint,
            )
        })
    })


    describe("up + down — empty scenario", () => {
        it("creates org and user, then tears down cleanly", async () => {
            const upResponse = await POST(
                createRequest({
                    action: "up",
                    environment: "empty",
                    testRunId: "test-empty-001",
                }),
            )
            expect(upResponse.status).toBe(200)
            const upData = await upResponse.json()


            // Verify response shape
            expect(upData.refs.organizationId).toBeDefined()
            expect(upData.refs.userId).toBeDefined()
            expect(upData.refsToken).toBeDefined()
            expect(upData.auth.cookies).toHaveLength(2)


            // Verify data exists
            const org = await db.organization.findUnique({
                where: { id: upData.refs.organizationId },
            })
            expect(org).not.toBeNull()


            // Tear down
            const downResponse = await POST(
                createRequest({
                    action: "down",
                    testRunId: "test-empty-001",
                    refs: upData.refs,
                    refsToken: upData.refsToken,
                }),
            )
            expect(downResponse.status).toBe(200)
            expect((await downResponse.json()).ok).toBe(true)


            // Verify cleanup
            const orgAfter = await db.organization.findUnique({
                where: { id: upData.refs.organizationId },
            })
            expect(orgAfter).toBeNull()
        })
    })


    describe("down security", () => {
        it("rejects tampered refs token", async () => {
            const response = await POST(
                createRequest({
                    action: "down",
                    testRunId: "test-tampered",
                    refs: { organizationId: "some-production-id" },
                    refsToken: "tampered.token.value",
                }),
            )
            expect(response.status).toBe(403)
            expect((await response.json()).code).toBe("INVALID_REFS_TOKEN")
        })


        it("rejects expired refs token", async () => {
            const expiredToken = sign(
                { refs: { organizationId: "test" } },
                INTERNAL_SECRET,
                { algorithm: "HS256", expiresIn: "-1h" },
            )


            const response = await POST(
                createRequest({
                    action: "down",
                    testRunId: "test-expired",
                    refs: { organizationId: "test" },
                    refsToken: expiredToken,
                }),
            )
            expect(response.status).toBe(403)
        })
    })


    describe("signature verification", () => {
        it("rejects request without signature", async () => {
            const body = JSON.stringify({ action: "discover" })
            const request = new NextRequest(
                "http://localhost:3000/api/autonoma",
                {
                    method: "POST",
                    headers: { "Content-Type": "application/json" },
                    body,
                },
            )
            const response = await POST(request)
            expect(response.status).toBe(401)
        })
    })


    describe("error handling", () => {
        it("returns 400 for unknown environment", async () => {
            const response = await POST(
                createRequest({
                    action: "up",
                    environment: "nonexistent",
                    testRunId: "test-bad",
                }),
            )
            expect(response.status).toBe(400)
            expect((await response.json()).code).toBe("UNKNOWN_ENVIRONMENT")
        })
    })
})
```

## Data Reference Tables

Quick reference for the standard scenario’s data.

### Applications (3)

| Name             | Platform | Versions         |
| ---------------- | -------- | ---------------- |
| My Web App       | Web      | v1.0, v1.1, v2.0 |
| Android Shopping | Android  | v1.0, v1.1       |
| iOS Banking      | iOS      | v1.0             |

### Tests (12)

| Name                | App                   | Status    | Folder       | Tags             |
| ------------------- | --------------------- | --------- | ------------ | ---------------- |
| Login Flow          | My Web App v1.0       | Published | Smoke Tests  | critical, web    |
| Checkout Flow       | My Web App v1.1       | Published | Regression   | web              |
| Search Products     | My Web App v2.0       | Published | Smoke Tests  | web              |
| User Profile Update | My Web App v1.0       | Published | Regression   | web              |
| Add to Cart         | My Web App v1.1       | Published | Regression   | critical, web    |
| Mobile Login        | Android Shopping v1.0 | Published | Smoke Tests  | critical, mobile |
| Browse Categories   | Android Shopping v1.1 | Published | Mobile Tests | mobile           |
| iOS Onboarding      | iOS Banking v1.0      | Published | Mobile Tests | mobile, ios      |
| Password Reset      | My Web App v1.0       | Published | *(none)*     | web              |
| Signup Flow         | My Web App v2.0       | Published | *(none)*     | web              |
| Draft Test Alpha    | My Web App v1.0       | Draft     | *(none)*     | *(none)*         |
| Draft Test Beta     | Android Shopping v1.0 | Draft     | *(none)*     | *(none)*         |

### Runs (58)

* **Statuses:** \~29 passed, \~15 failed, \~5 running, \~5 pending, \~4 stopped
* **Sources:** \~35 manual, \~12 scheduled, \~6 API, \~3 CI-CD, \~2 Vercel
* **Date range:** Spread over 60 days

### Other Entities

| Entity     | Count     | Details                                                      |
| ---------- | --------- | ------------------------------------------------------------ |
| Folders    | 4 (+root) | Smoke Tests, Regression, Deep Regression (sub), Mobile Tests |
| Tags       | 4         | critical, web, mobile, ios                                   |
| Components | 2         | Login Component (3 steps), Navigation Component (2 steps)    |
| Variables  | 3         | BASE\_URL, TEST\_EMAIL, TEST\_PASSWORD                       |
| API Keys   | 1         | Hashed with bcrypt                                           |
| Schedules  | 1         | Daily at 2AM, runs Smoke Tests folder                        |
| Webhooks   | 1         | Slack notifications on failure                               |
| Scripts    | 1         | Pre-run curl script                                          |

# React + Vite Implementation

> Autonoma Environment Factory with React + Vite + Express backend + Prisma + PostgreSQL.

> **Prerequisites:**
>
> Read the [Environment Factory Guide](/guides/environment-factory/) first for concepts. This doc is the code.
>
> **Stack:** React 19 + Vite (frontend), Express.js (backend API), Prisma ORM, PostgreSQL, JWT-based bypass auth, Vitest for integration tests.

## Architecture Overview

Unlike Next.js (which has built-in API routes), a React + Vite app is a static SPA. You need a **separate backend** to host the Autonoma endpoint. This example uses Express.js, but the pattern works with any Node.js server (Fastify, Hono, Koa, etc.).

```plaintext
project/
├── client/                         ← React + Vite SPA
│   ├── src/
│   └── vite.config.ts
├── server/                         ← Express.js backend
│   ├── src/
│   │   ├── index.ts               ← Express app entry
│   │   ├── routes/
│   │   │   └── autonoma.ts        ← The POST endpoint
│   │   └── lib/autonoma/
│   │       ├── types.ts
│   │       ├── ScenarioBuilder.ts
│   │       ├── RefsTokenFactory.ts
│   │       ├── ScenarioRegistry.ts
│   │       ├── teardown.ts
│   │       └── scenarios/
│   │           ├── EmptyScenario.ts
│   │           └── StandardScenario.ts
│   └── test/
│       └── autonoma.test.ts
├── prisma/
│   └── schema.prisma
└── package.json
```

## Step 1: Express Route Handler

The endpoint is an Express route instead of a Next.js API route. The core logic is identical — only the HTTP layer differs.

**File: `server/src/routes/autonoma.ts`**

```typescript
import { Router, type Request, type Response } from "express"
import { AutonomaBodySchema, type ErrorCode } from "../lib/autonoma/types"
import { findScenario, getAllScenarios } from "../lib/autonoma/ScenarioRegistry"
import { signRefs, verifyRefs, refsMatch } from "../lib/autonoma/RefsTokenFactory"
import { verifySignature } from "../lib/autonoma/signature"
import { createBypassToken } from "../lib/autonoma/auth"


const router = Router()


router.post("/api/autonoma", async (req: Request, res: Response) => {
    // Layer 1: Environment gating
    if (
        process.env.NODE_ENV === "production" &&
        process.env.AUTONOMA_FACTORY_ENABLED !== "true"
    ) {
        return res.status(404).end()
    }


    // Layer 2: HMAC signature verification
    const rawBody = req.body
    const signature = req.headers["x-signature"] as string | undefined


    if (signature == null) {
        return res.status(401).json({ error: "Missing signature" })
    }
    if (!verifySignature(rawBody, signature)) {
        return res.status(401).json({ error: "Invalid signature" })
    }


    // Parse and validate
    const parsed = AutonomaBodySchema.safeParse(JSON.parse(rawBody))
    if (!parsed.success) {
        return errorResponse(res, "Invalid request body", "UNKNOWN_ACTION", 400)
    }


    const body = parsed.data


    switch (body.action) {
        case "discover":
            return handleDiscover(res)
        case "up":
            return handleUp(res, body.environment, body.testRunId)
        case "down":
            return handleDown(res, body.refs, body.refsToken)
    }
})


function handleDiscover(res: Response) {
    const environments = getAllScenarios().map((s) => s.meta())
    return res.json({ environments })
}


async function handleUp(res: Response, environment: string, testRunId: string) {
    const scenario = findScenario(environment)
    if (scenario == null) {
        return errorResponse(res, `Unknown environment: ${environment}`, "UNKNOWN_ENVIRONMENT", 400)
    }


    try {
        const result = await scenario.up(testRunId)
        const refsToken = signRefs(result.refs)


        const accessToken = createBypassToken(result.userEmail, result.organizationId)


        return res.json({
            auth: {
                headers: {
                    Authorization: `Bearer ${accessToken}`,
                },
            },
            refs: result.refs,
            refsToken,
            metadata: {
                organizationId: result.organizationId,
                email: result.userEmail,
                scenario: environment,
            },
        })
    } catch (error) {
        console.error("[Autonoma] up failed", { environment, testRunId, error })
        return errorResponse(res, "Failed to create environment", "UP_FAILED", 500)
    }
}


async function handleDown(
    res: Response,
    refs: Record<string, unknown>,
    refsToken: string,
) {
    const verification = verifyRefs(refsToken)
    if ("error" in verification) {
        return errorResponse(res, verification.error, "INVALID_REFS_TOKEN", 403)
    }


    if (!refsMatch(verification.refs, refs)) {
        return errorResponse(res, "Refs do not match token", "INVALID_REFS_TOKEN", 403)
    }


    const scenario = findScenario("standard") ?? findScenario("empty")
    if (scenario == null) {
        return errorResponse(res, "No scenario found", "DOWN_FAILED", 400)
    }


    try {
        await scenario.down(refs)
        return res.json({ ok: true })
    } catch (error) {
        console.error("[Autonoma] down failed", { error })
        return errorResponse(res, "Teardown failed", "DOWN_FAILED", 500)
    }
}


function errorResponse(res: Response, message: string, code: ErrorCode, status: number) {
    return res.status(status).json({ error: message, code })
}


export { router as autonomaRouter }
```

## Step 2: Raw Body Middleware

Express parses JSON by default, but HMAC verification needs the **raw** body. Add a middleware that preserves it.

**File: `server/src/index.ts`**

```typescript
import express from "express"
import { autonomaRouter } from "./routes/autonoma"


const app = express()


// Preserve raw body for signature verification on the autonoma route
app.use("/api/autonoma", express.raw({ type: "application/json" }))


// JSON parsing for all other routes
app.use(express.json())


app.use(autonomaRouter)


app.listen(4000, () => {
    console.log("Server running on port 4000")
})
```

## Step 3: Bearer Token Auth (instead of cookies)

Since your React SPA stores tokens in memory or localStorage (not httpOnly cookies), return a bearer token instead:

**File: `server/src/lib/autonoma/auth.ts`**

```typescript
import { sign } from "jsonwebtoken"


export function createBypassToken(email: string, organizationId: string): string {
    const secret = process.env.AUTONOMA_JWT_SECRET
    if (secret == null) throw new Error("AUTONOMA_JWT_SECRET is not configured")


    return sign(
        { email, organizationId, bypass: true },
        secret,
        { algorithm: "HS256", expiresIn: "1h" },
    )
}
```

The `auth` response uses `headers` instead of `cookies`:

```json
{
    "auth": {
        "headers": {
            "Authorization": "Bearer eyJ..."
        }
    }
}
```

Autonoma injects this header into every request during the test run.

## Step 4: Vite Proxy Configuration

During development, proxy the Autonoma endpoint from Vite’s dev server to Express:

**File: `client/vite.config.ts`**

```typescript
import { defineConfig } from "vite"
import react from "@vitejs/plugin-react"


export default defineConfig({
    plugins: [react()],
    server: {
        proxy: {
            "/api": {
                target: "http://localhost:4000",
                changeOrigin: true,
            },
        },
    },
})
```

## Key Differences from Next.js

| Aspect                | Next.js                                | React + Vite + Express                    |
| --------------------- | -------------------------------------- | ----------------------------------------- |
| **Endpoint location** | `app/api/autonoma/route.ts`            | `server/src/routes/autonoma.ts`           |
| **HTTP framework**    | Next.js `NextRequest` / `NextResponse` | Express `req` / `res`                     |
| **Auth strategy**     | httpOnly cookies (session-based)       | Bearer token in `Authorization` header    |
| **Raw body access**   | `request.text()` (built-in)            | `express.raw()` middleware                |
| **Dev server**        | Single `next dev`                      | Two processes: `vite dev` + `node server` |
| **Deployment**        | Single Vercel/Docker deploy            | Separate static hosting + API server      |

## Shared Files

The following files are **identical** to the Next.js example:

* `types.ts` — Zod schemas and TypeScript types
* `ScenarioBuilder.ts` — Abstract base class with fingerprinting
* `RefsTokenFactory.ts` — JWT signing/verification for refs
* `ScenarioRegistry.ts` — Scenario registration
* `scenarios/EmptyScenario.ts` — Empty scenario
* `scenarios/StandardScenario.ts` — Standard scenario
* `teardown.ts` — Organization teardown in FK order

See the [Next.js Implementation](/examples/nextjs/) for the full source of these files.

# Elixir/Phoenix Implementation

> Autonoma Environment Factory with Elixir, Phoenix Framework, and Ecto + PostgreSQL.

> **Prerequisites:**
>
> Read the [Environment Factory Guide](/guides/environment-factory/) first for concepts. This doc is the code.
>
> **Stack:** Elixir 1.17+, Phoenix 1.7+ (JSON API), Ecto ORM, PostgreSQL, JOSE for JWT, ExUnit for integration tests.

## File Structure

```plaintext
lib/
├── my_app_web/
│   ├── controllers/
│   │   └── autonoma_controller.ex    ← The POST endpoint
│   └── router.ex                     ← Route registration
│
└── my_app/
    └── autonoma/
        ├── scenario.ex               ← Scenario behaviour (like an interface)
        ├── scenario_registry.ex      ← Finds scenarios by name
        ├── refs_token.ex             ← Signs and verifies refs with JOSE/JWT
        ├── signature.ex              ← HMAC-SHA256 signature verification
        ├── teardown.ex               ← Shared teardown: deletes org data in FK order
        └── scenarios/
            ├── empty_scenario.ex     ← Minimal scenario: org + user only
            └── standard_scenario.ex  ← Full scenario with all test data


test/integration/
└── autonoma_test.exs                 ← Integration tests
```

## Step 1: Define the Scenario Behaviour

Elixir uses behaviours (similar to interfaces/abstract classes). Every scenario implements this contract.

**File: `lib/my_app/autonoma/scenario.ex`**

```elixir
defmodule MyApp.Autonoma.Scenario do
  @type refs :: %{String.t() => any()}


  @type up_result :: %{
    organization_id: String.t(),
    user_id: String.t(),
    user_email: String.t(),
    refs: refs()
  }


  @type meta :: %{
    name: String.t(),
    description: String.t(),
    fingerprint: String.t()
  }


  @callback name() :: String.t()
  @callback description() :: String.t()
  @callback descriptor() :: map()
  @callback up(test_run_id :: String.t()) :: {:ok, up_result()} | {:error, term()}
  @callback down(refs :: refs()) :: :ok | {:error, term()}


  @doc "Compute a 16-char hex fingerprint from the descriptor."
  def fingerprint(module) do
    module.descriptor()
    |> Jason.encode!()
    |> then(&:crypto.hash(:sha256, &1))
    |> Base.encode16(case: :lower)
    |> String.slice(0, 16)
  end


  @doc "Return the scenario metadata for discover."
  def meta(module) do
    %{
      name: module.name(),
      description: module.description(),
      fingerprint: fingerprint(module)
    }
  end
end
```

## Step 2: Build the Refs Token Module

Uses the JOSE library for JWT signing/verification.

**File: `lib/my_app/autonoma/refs_token.ex`**

```elixir
defmodule MyApp.Autonoma.RefsToken do
  @algorithm "HS256"
  @expiry_seconds 86_400  # 24 hours


  defp secret do
    Application.fetch_env!(:my_app, :autonoma_jwt_secret)
  end


  @doc "Sign refs into a JWT token."
  def sign_refs(refs) do
    signer = JOSE.JWS.from_map(%{"alg" => @algorithm})
    jwk = JOSE.JWK.from_oct(secret())


    payload = %{
      "refs" => refs,
      "exp" => System.system_time(:second) + @expiry_seconds
    }


    {_, token} = JOSE.JWT.sign(jwk, signer, payload) |> JOSE.JWS.compact()
    token
  end


  @doc "Verify a refs token. Returns {:ok, refs} or {:error, reason}."
  def verify_refs(token) do
    jwk = JOSE.JWK.from_oct(secret())


    case JOSE.JWT.verify_strict(jwk, [@algorithm], token) do
      {true, %JOSE.JWT{fields: %{"refs" => refs, "exp" => exp}}, _} ->
        if exp > System.system_time(:second) do
          {:ok, refs}
        else
          {:error, "Token expired"}
        end


      _ ->
        {:error, "Invalid token"}
    end
  end


  @doc "Check if two refs maps match."
  def refs_match?(token_refs, request_refs) do
    Jason.encode!(token_refs) == Jason.encode!(request_refs)
  end
end
```

## Step 3: HMAC Signature Verification

**File: `lib/my_app/autonoma/signature.ex`**

```elixir
defmodule MyApp.Autonoma.Signature do
  @doc "Verify the x-signature header against the raw request body."
  def verify(raw_body, signature) do
    secret = Application.fetch_env!(:my_app, :autonoma_signing_secret)
    expected = :crypto.mac(:hmac, :sha256, secret, raw_body) |> Base.encode16(case: :lower)


    Plug.Crypto.secure_compare(expected, signature)
  end
end
```

## Step 4: Empty Scenario

**File: `lib/my_app/autonoma/scenarios/empty_scenario.ex`**

```elixir
defmodule MyApp.Autonoma.Scenarios.EmptyScenario do
  @behaviour MyApp.Autonoma.Scenario


  alias MyApp.Repo
  alias MyApp.Accounts.{Organization, User}
  alias MyApp.Autonoma.Teardown


  @impl true
  def name, do: "empty"


  @impl true
  def description do
    "An organization with no data. Used for testing empty states and onboarding flows."
  end


  @impl true
  def descriptor do
    %{org: %{has_quota: true}, users: 1, applications: 0, tests: 0}
  end


  @impl true
  def up(test_run_id) do
    Repo.transaction(fn ->
      {:ok, org} =
        %Organization{}
        |> Organization.changeset(%{name: "Autonoma QA Empty [#{test_run_id}]"})
        |> Repo.insert()


      {:ok, user} =
        %User{}
        |> User.changeset(%{
          name: "QA Empty",
          email: "qa-empty-#{test_run_id}@autonoma.dev",
          organization_id: org.id
        })
        |> Repo.insert()


      %{
        organization_id: org.id,
        user_id: user.id,
        user_email: user.email,
        refs: %{
          "organizationId" => org.id,
          "userId" => user.id
        }
      }
    end)
  end


  @impl true
  def down(refs) do
    Teardown.teardown_organization(refs["organizationId"])
  end
end
```

## Step 5: The Controller

**File: `lib/my_app_web/controllers/autonoma_controller.ex`**

```elixir
defmodule MyAppWeb.AutonomaController do
  use MyAppWeb, :controller


  alias MyApp.Autonoma.{Scenario, ScenarioRegistry, RefsToken, Signature}


  plug :verify_environment
  plug :verify_signature


  def handle(conn, _params) do
    raw_body = conn.assigns[:raw_body]
    body = Jason.decode!(raw_body)


    case body["action"] do
      "discover" -> handle_discover(conn)
      "up"       -> handle_up(conn, body["environment"], body["testRunId"])
      "down"     -> handle_down(conn, body["refs"], body["refsToken"])
      _          -> error_response(conn, "Unknown action", "UNKNOWN_ACTION", 400)
    end
  end


  defp handle_discover(conn) do
    environments =
      ScenarioRegistry.all()
      |> Enum.map(&Scenario.meta/1)


    json(conn, %{environments: environments})
  end


  defp handle_up(conn, environment, test_run_id) do
    case ScenarioRegistry.find(environment) do
      nil ->
        error_response(conn, "Unknown environment: #{environment}", "UNKNOWN_ENVIRONMENT", 400)


      module ->
        case module.up(test_run_id) do
          {:ok, result} ->
            refs_token = RefsToken.sign_refs(result.refs)
            bypass_token = MyApp.Auth.create_bypass_token(result.user_email, result.organization_id)


            json(conn, %{
              auth: %{headers: %{"Authorization" => "Bearer #{bypass_token}"}},
              refs: result.refs,
              refsToken: refs_token,
              metadata: %{
                organizationId: result.organization_id,
                email: result.user_email,
                scenario: environment
              }
            })


          {:error, reason} ->
            error_response(conn, "Up failed: #{inspect(reason)}", "UP_FAILED", 500)
        end
    end
  end


  defp handle_down(conn, refs, refs_token) do
    with {:ok, token_refs} <- RefsToken.verify_refs(refs_token),
         true <- RefsToken.refs_match?(token_refs, refs) do
      case ScenarioRegistry.find_for_refs(refs) do
        nil ->
          error_response(conn, "No scenario for refs", "DOWN_FAILED", 400)


        module ->
          case module.down(refs) do
            :ok -> json(conn, %{ok: true})
            {:error, reason} -> error_response(conn, "Down failed: #{inspect(reason)}", "DOWN_FAILED", 500)
          end
      end
    else
      {:error, reason} -> error_response(conn, reason, "INVALID_REFS_TOKEN", 403)
      false -> error_response(conn, "Refs do not match token", "INVALID_REFS_TOKEN", 403)
    end
  end


  # Plugs


  defp verify_environment(conn, _opts) do
    if Application.get_env(:my_app, :env) == :prod &&
       Application.get_env(:my_app, :autonoma_factory_enabled) != true do
      conn |> send_resp(404, "") |> halt()
    else
      conn
    end
  end


  defp verify_signature(conn, _opts) do
    raw_body = read_raw_body(conn)
    signature = get_req_header(conn, "x-signature") |> List.first()


    if signature != nil && Signature.verify(raw_body, signature) do
      assign(conn, :raw_body, raw_body)
    else
      conn
      |> put_status(401)
      |> json(%{error: "Invalid or missing signature"})
      |> halt()
    end
  end


  defp read_raw_body(conn) do
    {:ok, body, _conn} = Plug.Conn.read_body(conn)
    body
  end


  defp error_response(conn, message, code, status) do
    conn
    |> put_status(status)
    |> json(%{error: message, code: code})
  end
end
```

## Step 6: Route Registration

**File: `lib/my_app_web/router.ex`** (add to your existing router)

```elixir
scope "/api", MyAppWeb do
  pipe_through :api


  post "/autonoma", AutonomaController, :handle
end
```

## Step 7: Teardown with Ecto

Ecto uses explicit multi-queries. Delete in reverse FK order:

**File: `lib/my_app/autonoma/teardown.ex`**

```elixir
defmodule MyApp.Autonoma.Teardown do
  alias MyApp.Repo
  import Ecto.Query


  def teardown_organization(organization_id) do
    Repo.transaction(fn ->
      # Delete children first, parents last
      delete_all(MyApp.Runs.Step, :run, organization_id)
      delete_all(MyApp.Runs.Run, :organization_id, organization_id)
      delete_all(MyApp.Tests.Test, :organization_id, organization_id)
      delete_all(MyApp.Tags.Tag, :organization_id, organization_id)
      delete_all(MyApp.Apps.ApplicationVersion, :organization_id, organization_id)
      delete_all(MyApp.Apps.Application, :organization_id, organization_id)


      # Folders — children first
      from(f in MyApp.Folders.Folder,
        where: f.organization_id == ^organization_id and not is_nil(f.parent_id)
      ) |> Repo.delete_all()


      from(f in MyApp.Folders.Folder,
        where: f.organization_id == ^organization_id
      ) |> Repo.delete_all()


      # Users, then organization
      delete_all(MyApp.Accounts.User, :organization_id, organization_id)
      Repo.get!(MyApp.Accounts.Organization, organization_id) |> Repo.delete!()
    end)


    :ok
  end


  defp delete_all(schema, field, organization_id) do
    from(s in schema, where: field(s, ^field) == ^organization_id)
    |> Repo.delete_all()
  end
end
```

## Key Differences from Node.js

| Aspect                | Node.js (Express/Next.js)       | Elixir (Phoenix)                     |
| --------------------- | ------------------------------- | ------------------------------------ |
| **Abstract class**    | `class extends ScenarioBuilder` | `@behaviour Scenario` + callbacks    |
| **JWT library**       | `jsonwebtoken`                  | `jose` (Erlang JOSE)                 |
| **HMAC**              | `crypto.createHmac()`           | `:crypto.mac(:hmac, ...)`            |
| **ORM**               | Prisma (query builder)          | Ecto (composable queries)            |
| **Error handling**    | try/catch + error responses     | `with` chains + pattern matching     |
| **Transactions**      | `db.$transaction()`             | `Repo.transaction()`                 |
| **Raw body access**   | Middleware (`express.raw()`)    | `Plug.Conn.read_body()` in plug      |
| **Environment guard** | `process.env.NODE_ENV`          | `Application.get_env(:my_app, :env)` |

## Dependencies

Add to `mix.exs`:

```elixir
defp deps do
  [
    {:jose, "~> 1.11"},        # JWT signing/verification
    {:jason, "~> 1.4"},        # JSON encoding
    # ... your existing deps
  ]
end
```

## Configuration

**File: `config/dev.exs`**

```elixir
config :my_app,
  autonoma_signing_secret: System.get_env("AUTONOMA_SIGNING_SECRET"),
  autonoma_jwt_secret: System.get_env("AUTONOMA_JWT_SECRET"),
  autonoma_factory_enabled: true
```

**File: `config/prod.exs`**

```elixir
config :my_app,
  autonoma_factory_enabled: false  # Override with env var if needed
```

# TanStack Start Implementation

> Autonoma Environment Factory with TanStack Start, Vinxi server functions, and Drizzle ORM + PostgreSQL.

> **Prerequisites:**
>
> Read the [Environment Factory Guide](/guides/environment-factory/) first for concepts. This doc is the code.
>
> **Stack:** TanStack Start (full-stack React), Vinxi server functions, Drizzle ORM, PostgreSQL, JWT-based auth, Vitest for integration tests.

## Architecture Overview

TanStack Start provides full-stack capabilities with server functions (via Vinxi). The Autonoma endpoint is a server-side API route — similar to Next.js but using TanStack’s routing primitives.

```plaintext
app/
├── routes/
│   └── api/
│       └── autonoma.ts              ← The POST endpoint (API route)
├── lib/
│   └── autonoma/
│       ├── types.ts                 ← Zod schemas + TypeScript types
│       ├── scenario-builder.ts      ← Abstract base class
│       ├── refs-token.ts            ← JWT signing/verification
│       ├── signature.ts             ← HMAC-SHA256 verification
│       ├── scenario-registry.ts     ← Scenario registration
│       ├── teardown.ts              ← Shared teardown logic
│       └── scenarios/
│           ├── empty.ts             ← Minimal scenario
│           └── standard.ts          ← Full data scenario
├── db/
│   ├── schema.ts                    ← Drizzle schema definitions
│   └── index.ts                     ← Database connection
└── test/
    └── autonoma.test.ts
```

## Step 1: API Route Handler

TanStack Start uses file-based API routes. The handler receives a standard `Request` object and returns a `Response`.

**File: `app/routes/api/autonoma.ts`**

```typescript
import { json } from "@tanstack/react-start"
import { createAPIFileRoute } from "@tanstack/react-start/api"
import { AutonomaBodySchema, type ErrorCode } from "~/lib/autonoma/types"
import { findScenario, getAllScenarios } from "~/lib/autonoma/scenario-registry"
import { signRefs, verifyRefs, refsMatch } from "~/lib/autonoma/refs-token"
import { verifySignature } from "~/lib/autonoma/signature"
import { createBypassToken } from "~/lib/autonoma/auth"


export const APIRoute = createAPIFileRoute("/api/autonoma")({
    POST: async ({ request }) => {
        // Layer 1: Environment gating
        if (
            process.env.NODE_ENV === "production" &&
            process.env.AUTONOMA_FACTORY_ENABLED !== "true"
        ) {
            return new Response(null, { status: 404 })
        }


        // Layer 2: HMAC signature verification
        const rawBody = await request.text()
        const signature = request.headers.get("x-signature")


        if (signature == null) {
            return json({ error: "Missing signature" }, { status: 401 })
        }
        if (!verifySignature(rawBody, signature)) {
            return json({ error: "Invalid signature" }, { status: 401 })
        }


        // Parse and validate
        const parsed = AutonomaBodySchema.safeParse(JSON.parse(rawBody))
        if (!parsed.success) {
            return json({ error: "Invalid request body", code: "UNKNOWN_ACTION" }, { status: 400 })
        }


        const body = parsed.data


        switch (body.action) {
            case "discover":
                return handleDiscover()
            case "up":
                return handleUp(body.environment, body.testRunId)
            case "down":
                return handleDown(body.refs, body.refsToken)
        }
    },
})


function handleDiscover() {
    const environments = getAllScenarios().map((s) => s.meta())
    return json({ environments })
}


async function handleUp(environment: string, testRunId: string) {
    const scenario = findScenario(environment)
    if (scenario == null) {
        return errorResponse(`Unknown environment: ${environment}`, "UNKNOWN_ENVIRONMENT", 400)
    }


    try {
        const result = await scenario.up(testRunId)
        const refsToken = signRefs(result.refs)
        const accessToken = createBypassToken(result.userEmail, result.organizationId)


        return json({
            auth: {
                cookies: [
                    {
                        name: "autonoma-bypass-token",
                        value: accessToken,
                        httpOnly: true,
                        sameSite: "lax" as const,
                        path: "/",
                    },
                ],
            },
            refs: result.refs,
            refsToken,
            metadata: {
                organizationId: result.organizationId,
                email: result.userEmail,
                scenario: environment,
            },
        })
    } catch (error) {
        console.error("[Autonoma] up failed", { environment, testRunId, error })
        return errorResponse("Failed to create environment", "UP_FAILED", 500)
    }
}


async function handleDown(refs: Record<string, unknown>, refsToken: string) {
    const verification = verifyRefs(refsToken)
    if ("error" in verification) {
        return errorResponse(verification.error, "INVALID_REFS_TOKEN", 403)
    }


    if (!refsMatch(verification.refs, refs)) {
        return errorResponse("Refs do not match token", "INVALID_REFS_TOKEN", 403)
    }


    const scenario = findScenario("standard") ?? findScenario("empty")
    if (scenario == null) {
        return errorResponse("No scenario found", "DOWN_FAILED", 400)
    }


    try {
        await scenario.down(refs)
        return json({ ok: true })
    } catch (error) {
        console.error("[Autonoma] down failed", { error })
        return errorResponse("Teardown failed", "DOWN_FAILED", 500)
    }
}


function errorResponse(message: string, code: ErrorCode, status: number) {
    return json({ error: message, code }, { status })
}
```

## Step 2: Drizzle Schema & Teardown

TanStack Start commonly pairs with Drizzle ORM. Here’s the teardown using Drizzle:

**File: `app/lib/autonoma/teardown.ts`**

```typescript
import { eq, and, isNotNull } from "drizzle-orm"
import { db } from "~/db"
import {
    runs, steps, tests, tags, applications, applicationVersions,
    folders, users, organizations, organizationQuotas,
} from "~/db/schema"


export async function teardownOrganization(organizationId: string): Promise<void> {
    // Delete in reverse FK order
    await db.delete(steps).where(
        eq(steps.organizationId, organizationId),
    )
    await db.delete(runs).where(
        eq(runs.organizationId, organizationId),
    )
    await db.delete(tests).where(
        eq(tests.organizationId, organizationId),
    )
    await db.delete(tags).where(
        eq(tags.organizationId, organizationId),
    )
    await db.delete(applicationVersions).where(
        eq(applicationVersions.organizationId, organizationId),
    )
    await db.delete(applications).where(
        eq(applications.organizationId, organizationId),
    )


    // Folders — children first
    await db.delete(folders).where(
        and(
            eq(folders.organizationId, organizationId),
            isNotNull(folders.parentId),
        ),
    )
    await db.delete(folders).where(
        eq(folders.organizationId, organizationId),
    )


    // Users, quota, organization
    await db.delete(users).where(
        eq(users.organizationId, organizationId),
    )
    await db.delete(organizationQuotas).where(
        eq(organizationQuotas.organizationId, organizationId),
    )
    await db.delete(organizations).where(
        eq(organizations.id, organizationId),
    )
}
```

## Step 3: Empty Scenario with Drizzle

**File: `app/lib/autonoma/scenarios/empty.ts`**

```typescript
import { db } from "~/db"
import { organizations, users, folders } from "~/db/schema"
import { ScenarioBuilder } from "../scenario-builder"
import { teardownOrganization } from "../teardown"
import type { ScenarioRefs, ScenarioUpResult } from "../types"


export class EmptyScenario extends ScenarioBuilder {
    readonly name = "empty"
    readonly description =
        "An organization with no data. Used for testing empty states and onboarding flows."


    protected readonly descriptor = {
        org: { hasQuota: true },
        users: 1,
        applications: 0,
        tests: 0,
    }


    async up(testRunId: string): Promise<ScenarioUpResult> {
        const [org] = await db.insert(organizations).values({
            name: `Autonoma QA Empty [${testRunId}]`,
        }).returning()


        const [user] = await db.insert(users).values({
            name: "QA Empty",
            email: `qa-empty-${testRunId}@autonoma.dev`,
            organizationId: org.id,
        }).returning()


        const [rootFolder] = await db.insert(folders).values({
            name: "Root",
            organizationId: org.id,
        }).returning()


        return {
            organizationId: org.id,
            userId: user.id,
            userEmail: user.email,
            refs: {
                organizationId: org.id,
                userId: user.id,
                folderId: rootFolder.id,
            },
        }
    }


    async down(refs: ScenarioRefs): Promise<void> {
        await teardownOrganization(refs.organizationId as string)
    }
}
```

## Key Differences from Next.js

| Aspect                   | Next.js                         | TanStack Start                              |
| ------------------------ | ------------------------------- | ------------------------------------------- |
| **API route definition** | `export async function POST()`  | `createAPIFileRoute().POST()`               |
| **Response helper**      | `NextResponse.json()`           | `json()` from `@tanstack/react-start`       |
| **Request object**       | `NextRequest`                   | Standard `Request` (Web API)                |
| **ORM**                  | Prisma                          | Drizzle (SQL-like, lighter)                 |
| **Insert syntax**        | `db.user.create({ data })`      | `db.insert(users).values(data).returning()` |
| **Delete syntax**        | `db.user.deleteMany({ where })` | `db.delete(users).where(eq(...))`           |
| **File routing**         | `app/api/autonoma/route.ts`     | `app/routes/api/autonoma.ts`                |
| **Build tool**           | Next.js + Webpack/Turbopack     | Vinxi + Vite                                |

## Shared Files

These files are **identical** to the Next.js example (only the ORM calls differ):

* `types.ts` — Zod schemas and TypeScript types
* `scenario-builder.ts` — Abstract base class with fingerprinting
* `refs-token.ts` — JWT signing/verification for refs
* `signature.ts` — HMAC-SHA256 verification
* `scenario-registry.ts` — Scenario registration

See the [Next.js Implementation](/examples/nextjs/) for the full source of these shared files.

# Development Setup

> How to get Autonoma AI running locally - from prerequisites through a working dev environment.

## Prerequisites

You need three things installed before starting:

| Tool                              | Version | How to get it                                                   |
| --------------------------------- | ------- | --------------------------------------------------------------- |
| [Node.js](https://nodejs.org/)    | >= 24   | Use [nvm](https://github.com/nvm-sh/nvm) or download directly   |
| [pnpm](https://pnpm.io/)          | 10.x    | Run `corepack enable` - the version is pinned in `package.json` |
| [Docker](https://www.docker.com/) | Latest  | Docker Desktop or Docker Engine                                 |

**Optional tools** (only needed if you’re working on specific engines):

* [Playwright](https://playwright.dev/) - for `engine-web` development
* [Appium](https://appium.io/) - for `engine-mobile` development

## Clone and install

```bash
git clone https://github.com/autonoma-ai/autonoma.git
cd agent
pnpm install
```

`pnpm install` handles the entire monorepo - all apps and packages get their dependencies in one pass.

## Start infrastructure

PostgreSQL and Redis run via Docker Compose:

```bash
docker compose up -d
```

This starts:

* **PostgreSQL 18** on `localhost:5432` (user: `postgres`, password: `postgres`)
* **Redis** on `localhost:6379`

Verify they’re running:

```bash
docker compose ps
```

Both containers should show `running` status.

## Environment variables

Copy the example file and fill in the required values:

```bash
cp .env.example .env
```

### Minimum required variables

| Variable               | Description                  | Where to get it                                                                                                                                                                                |
| ---------------------- | ---------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `DATABASE_URL`         | PostgreSQL connection string | Use `postgresql://postgres:postgres@localhost:5432/autonoma` for the Docker Compose setup                                                                                                      |
| `REDIS_URL`            | Redis connection string      | Use `redis://localhost:6379` for the Docker Compose setup                                                                                                                                      |
| `BETTER_AUTH_SECRET`   | Session signing secret       | Generate any random string: `openssl rand -hex 32`                                                                                                                                             |
| `GOOGLE_CLIENT_ID`     | Google OAuth client ID       | Create OAuth credentials in the [Google Cloud Console](https://console.cloud.google.com/apis/credentials). Set the authorized redirect URI to `http://localhost:4000/api/auth/callback/google` |
| `GOOGLE_CLIENT_SECRET` | Google OAuth client secret   | Same Google Cloud Console OAuth credentials page                                                                                                                                               |
| `GEMINI_API_KEY`       | Google Gemini API key        | Get one from [Google AI Studio](https://aistudio.google.com/apikey)                                                                                                                            |

### How environment variables work in the codebase

The project uses `createEnv` from `@t3-oss/env-core` for environment variable validation. Each app has an `env.ts` file that defines its required variables with Zod schemas. Variables are validated at startup - if something is missing, you get a clear error message telling you exactly what to add.

You should never read `process.env` directly in application code. Instead, import from the app’s `env.ts` file.

See `.env.example` for the full list of variables grouped by service. Most optional variables have sensible defaults or are only needed for specific features (S3 storage, Sentry, PostHog, etc.).

## Database setup

Generate the Prisma client and run migrations:

```bash
pnpm db:generate
pnpm db:migrate
```

`db:generate` creates the TypeScript client from the Prisma schema. `db:migrate` applies all migrations to create the database tables.

You need to re-run `db:generate` whenever the Prisma schema changes (after pulling new changes or editing the schema yourself).

## Start development servers

```bash
pnpm dev
```

This starts both servers concurrently:

* **UI** at `http://localhost:3000` (Vite + React)
* **API** at `http://localhost:4000` (Hono + tRPC)

To run them individually:

```bash
pnpm api    # API only (port 4000)
pnpm ui     # UI only (port 3000)
```

## Verify everything works

1. Open `http://localhost:3000` in your browser
2. You should see the login page
3. Sign in with Google OAuth
4. If you see the dashboard, everything is working

Run the full check suite to make sure nothing is broken:

```bash
pnpm typecheck    # TypeScript type checking
pnpm lint         # ESLint
pnpm test         # Vitest
pnpm build        # Full build
```

## Other useful commands

| Command            | Description                                      |
| ------------------ | ------------------------------------------------ |
| `pnpm dev`         | Start API + UI in development mode               |
| `pnpm build`       | Build all packages and apps                      |
| `pnpm typecheck`   | Run TypeScript type checking across all packages |
| `pnpm lint`        | Lint all packages                                |
| `pnpm test`        | Run tests across all packages                    |
| `pnpm format`      | Format code with Biome                           |
| `pnpm check`       | Lint and format with Biome                       |
| `pnpm db:generate` | Generate Prisma client from schema               |
| `pnpm db:migrate`  | Run database migrations                          |
| `pnpm docs`        | Start the documentation site (port 4321)         |

## Troubleshooting

### `pnpm install` fails

Make sure you’re using pnpm 10.x. Run `corepack enable` to let Node manage the pnpm version, then try again.

### Database connection refused

Check that Docker Compose is running: `docker compose ps`. If PostgreSQL isn’t up, check logs with `docker compose logs postgres`.

### Prisma generate fails

This usually means dependencies aren’t installed. Run `pnpm install` first, then `pnpm db:generate`.

### Port already in use

Another process is using port 3000 or 4000. Find and kill it:

```bash
lsof -i :3000  # or :4000
kill <PID>
```

### Google OAuth redirect error

Make sure your Google Cloud OAuth credentials have `http://localhost:4000/api/auth/callback/google` as an authorized redirect URI.

### ”Missing environment variable” error on startup

The app validates all required environment variables at startup using `createEnv`. Check the error message for which variable is missing, then add it to your `.env` file.

### TypeScript errors after pulling changes

Run `pnpm db:generate` first (the Prisma client may have changed), then `pnpm build` to rebuild all packages. TypeScript errors in the UI or API often come from stale package builds.

# Architecture Overview

> High-level architecture of Autonoma AI - how the monorepo is organized, how data flows, and why each technology was chosen.

## How Autonoma works

Autonoma is an agentic E2E testing platform. Users describe tests in natural language, and an AI agent executes them on real browsers and devices. The core loop is:

1. User writes a test instruction (“Log in, go to settings, verify the avatar is visible”)
2. The execution agent takes a screenshot of the current screen
3. An LLM decides which action to perform (click, type, scroll, assert)
4. Platform drivers execute the action (Playwright for web, Appium for mobile)
5. The agent records the step and repeats until the test is done

Everything else - the API, the UI, the jobs - exists to support this loop.

## Monorepo structure

The codebase is split into **apps** (deployable services) and **packages** (shared libraries). Each package has exactly one concern.

```plaintext
apps/
  api/              Hono + tRPC API server
  ui/               Vite + React 19 SPA
  engine-web/       Playwright web test execution
  engine-mobile/    Appium mobile test execution
  docs/             Astro Starlight documentation site
  jobs/             Background jobs (multiple sub-services)


packages/
  ai/               AI primitives - models, vision, point detection
  analytics/        PostHog server-side event tracking
  billing/          Subscription and billing logic
  blacklight/       Shared UI component library
  db/               Prisma schema + generated client
  diffs/            Test diff computation
  emulator/         Mobile emulator management
  engine/           Platform-agnostic execution agent core
  errors/           Custom error hierarchy
  image/            Image processing utilities
  integration-test/ Test harness with Testcontainers
  k8s/              Kubernetes helpers
  logger/           Sentry-based structured logging
  review/           Post-execution AI review
  scenario/         Environment Factory scenario logic
  storage/          S3 file storage
  test-updates/     Test suite update logic
  types/            Shared Zod schemas and TypeScript types
  utils/            Shared utilities
  workflow/         Temporal workflow definitions
```

### Why apps vs packages?

**Apps** are independently deployable. Each one becomes its own Docker image and runs as its own process. The API, UI, and each engine are separate images - they never share a runtime.

**Packages** are shared code. They’re consumed by apps at build time via pnpm workspaces. A package like `@autonoma/ai` is used by both `engine-web` and `engine-mobile`, but it never runs on its own.

## How the apps connect

```plaintext
Browser
  |
  | HTTP (port 3000)
  v
 UI (Vite + React SPA)
  |
  | tRPC (port 4000)
  v
 API (Hono + tRPC)
  |
  |--- Prisma ---> PostgreSQL
  |--- Redis ----> Device locks, caching
  |
  | (dispatches jobs)
  v
 Engine Web / Engine Mobile
  |
  | Execution Agent (packages/engine)
  |--- Playwright (web) or Appium (mobile)
  |--- AI models (packages/ai)
  v
 Test results, recordings, artifacts
```

**UI to API**: The React SPA communicates with the API exclusively through tRPC. Types flow end-to-end - the frontend never manually defines API response types. Zod schemas in `packages/types` are the single source of truth for both sides.

**API to Database**: The API uses Prisma as its ORM. The schema lives in `packages/db` and is shared across all backend services.

**API to Engines**: When a test run starts, the API dispatches it to the appropriate engine (web or mobile). Engines execute tests independently and report results back.

**Engines to AI**: During execution, engines call into `packages/ai` for element detection, visual assertions, and agent decision-making. AI calls go to external providers (Google Gemini, Groq, OpenRouter).

## Tech stack

| Layer          | Technology                                 | Why                                                                                                  |
| -------------- | ------------------------------------------ | ---------------------------------------------------------------------------------------------------- |
| Runtime        | Node.js 24, ESM-only                       | Latest LTS with native ESM. No CommonJS compatibility issues                                         |
| Monorepo       | pnpm workspaces + Turborepo                | pnpm for fast, disk-efficient installs. Turborepo for cached, parallel builds                        |
| Language       | TypeScript (strictest)                     | Full type safety with `noUncheckedIndexedAccess`, `exactOptionalPropertyTypes`, and all strict flags |
| API            | Hono + tRPC                                | Hono is fast and lightweight. tRPC gives end-to-end type safety without code generation              |
| Frontend       | React 19 + Vite + TanStack Router          | Vite for fast dev builds. TanStack Router for type-safe routing with built-in data loading           |
| Database       | PostgreSQL + Prisma                        | PostgreSQL for reliability. Prisma for type-safe queries and migration management                    |
| Cache/Locking  | Redis                                      | Distributed device locking and caching across engine instances                                       |
| AI             | Gemini, Groq, OpenRouter via Vercel AI SDK | Multiple providers for different tasks. Vercel AI SDK unifies the interface                          |
| Web testing    | Playwright                                 | Most reliable browser automation library. Supports all major browsers                                |
| Mobile testing | Appium                                     | Industry standard for iOS and Android automation on real devices                                     |
| UI components  | Radix UI + Tailwind CSS v4 + CVA           | Accessible primitives (Radix), utility-first styling (Tailwind), type-safe variants (CVA)            |
| Observability  | Sentry                                     | Error tracking, performance monitoring, and structured logging in one tool                           |
| Analytics      | PostHog                                    | Product analytics with server-side event tracking                                                    |
| Deployment     | Kubernetes + Temporal                      | K8s for orchestration. Temporal for workflow-based test execution pipelines                          |

## The execution flow

This is the most important flow in the system - how a test goes from natural language to executed results.

### 1. Test creation

The user writes a test as a natural language instruction, optionally with a URL and configuration. The API stores it in PostgreSQL.

### 2. Test dispatch

When a test run starts, the API dispatches it to the appropriate engine based on the application type (web or mobile). For mobile, Redis-based device locking ensures exclusive access to physical devices.

### 3. Execution agent loop

The execution agent (`packages/engine`) runs a loop powered by the Vercel AI SDK:

```plaintext
Screenshot -> LLM decides action -> Execute command -> Record step -> Repeat
```

The agent has access to these commands:

| Command    | What it does                                                                            |
| ---------- | --------------------------------------------------------------------------------------- |
| **click**  | Uses vision AI to locate an element from a natural language description, then clicks it |
| **type**   | Locates an element, clicks it, then types text                                          |
| **scroll** | Scrolls up or down                                                                      |
| **assert** | Checks visual conditions against the current screenshot                                 |
| **wait**   | Pauses for a specified duration (for loading states)                                    |

The LLM (currently Gemini) sees the screenshot, the test instruction, and the steps taken so far, then decides which command to call next. When it determines the test is complete (or has failed), it calls `execution-finished`.

### 4. AI-powered element detection

Instead of CSS selectors or XPaths, the agent uses vision models to find UI elements. The `PointDetector` takes a screenshot and a natural language description (“the blue Submit button”) and returns pixel coordinates. This is what makes tests resilient to UI changes - the AI adapts to visual changes automatically.

### 5. Results and artifacts

Every test run produces:

* Step-by-step execution log with before/after screenshots
* Video recording of the entire session
* AI conversation log (what the model “thought” at each step)
* Success/failure status with reasoning

These artifacts are stored in S3 and accessible through the UI.

## Key design decisions

### ESM-only

Every `package.json` has `"type": "module"`. No CommonJS anywhere. This eliminates an entire class of import/export bugs and aligns with the direction of the Node.js ecosystem.

### Strictest TypeScript

All strict flags enabled, including `noUncheckedIndexedAccess` (array/object access returns `T | undefined`) and `exactOptionalPropertyTypes`. This catches real bugs at compile time. It’s more work upfront, but prevents entire categories of runtime errors.

### Constructor injection

All dependencies are passed through constructors. No DI framework, no decorators, no magic. You can read any class and immediately see what it depends on.

### Separate Docker images

Each engine (web, mobile) and each job type gets its own Docker image. This keeps images small and deployment independent. A change to the web engine doesn’t require redeploying the mobile engine.

### Platform-agnostic agent core

All execution logic lives in `packages/engine`. Platform-specific apps (`engine-web`, `engine-mobile`) only implement driver interfaces (`ScreenDriver`, `MouseDriver`, etc.). The same agent loop, command system, and AI integration works for both Playwright and Appium.

## Deployment model

The platform runs on Kubernetes:

* **API** and **UI** are standard deployments with horizontal scaling
* **Engines** run on device-hosting machines (physical or virtual). Web engines need browsers, mobile engines need connected devices or emulators
* **Jobs** run as Temporal workflows - triggered on demand via Temporal workers
* **Redis** handles distributed device locking across engine instances
* **PostgreSQL** is the single source of truth for all state

# Package Guide

> What each package and app does, what it exports, and when you would modify it.

## Packages

Every package in `packages/` is a shared library consumed by one or more apps. Each has exactly one concern.

### ai

AI primitives used by the execution agent. Contains the model registry (manages LLM instances and providers), visual AI (screenshot analysis, assertion checking, element selection), point detection (locating UI elements from natural language descriptions), object detection (bounding box generation), and structured output generation.

**Key exports:** `ModelRegistry`, `PointDetector`, `ObjectDetector`, `VisualConditionChecker`, `AssertChecker`, `ObjectGenerator`, `AssertionSplitter`

**When to modify:** Adding a new AI model or provider, changing how elements are detected, adjusting assertion logic, or adding a new visual AI capability.

### analytics

PostHog server-side event tracking. Wraps `posthog-node` with Sentry trace linking. No-ops when not initialized, so it’s safe to import in dev and test environments.

**Key exports:** `analytics` (singleton)

**When to modify:** Adding new server-side analytics events, changing event properties, or adjusting the PostHog integration.

### billing

Subscription and billing logic. Handles plan management, usage tracking, and payment integration.

**Key exports:** Billing service classes and plan definitions

**When to modify:** Changing pricing plans, adding billing features, or integrating new payment providers.

### blacklight

Shared UI component library built on Radix UI + Tailwind CSS v4 + CVA. This is where all reusable frontend components live - buttons, cards, inputs, dialogs, tables, and more. Follows shadcn/ui patterns.

**Key exports:** `Button`, `Card`, `Input`, `Dialog`, `Table`, `Select`, `cn()`, and many more components

**When to modify:** Adding new UI components, updating component styles, or changing the design system. The path alias `@/*` maps to `packages/blacklight/src/*` inside the package.

### db

Prisma schema and generated client for PostgreSQL. This is the single source of truth for the database structure.

**Key exports:** `PrismaClient`, generated types for all models

**When to modify:** Adding or changing database tables, columns, relations, or indexes. After editing the schema, run `pnpm db:generate` and `pnpm db:migrate`.

### diffs

Test diff computation. Computes differences between test suite versions for change tracking and review.

**Key exports:** Diff computation functions

**When to modify:** Changing how test diffs are calculated or displayed.

### emulator

Mobile emulator management. Handles lifecycle management of iOS simulators and Android emulators.

**Key exports:** Emulator management classes

**When to modify:** Adding support for new device types, changing emulator configuration, or adjusting lifecycle management.

### engine

The core of test execution. This is a platform-agnostic AI agent that web and mobile engines extend. Contains the execution agent loop, command system (click, type, scroll, assert), driver interfaces, runner orchestration, and artifact management.

Everything is parameterized with generics (`TSpec` for command specs, `TContext` for driver context), so the same agent core works for both Playwright and Appium.

**Key exports:** `ExecutionAgent`, `ExecutionAgentRunner`, `AgentCommand`, `CommandRegistry`, driver interfaces (`ScreenDriver`, `MouseDriver`, `KeyboardDriver`, `NavigationDriver`, `ApplicationDriver`)

**When to modify:** Adding new commands to the agent, changing the execution loop, adjusting the system prompt, or modifying how steps are recorded.

### errors

Custom error hierarchy for the project. All errors extend `AutonomaError` with specific subclasses for different failure types.

**Key exports:** `AutonomaError`, `TestError`, `DriverError`, `PreconditionError`, `VerificationError`, `ThirdPartyError`

**When to modify:** Adding new error types or changing how errors are categorized.

### image

Image processing utilities. Handles screenshot manipulation, resizing, and format conversion used throughout the execution pipeline.

**Key exports:** Image processing functions

**When to modify:** Changing how screenshots are processed, adding new image operations, or adjusting compression settings.

### integration-test

Test harness using Testcontainers. Provides `IntegrationHarness` and `integrationTestSuite` for writing integration tests that use real PostgreSQL and Redis containers.

**Key exports:** `IntegrationHarness`, `integrationTestSuite`

**When to modify:** Changing the test harness setup, adding new test utilities, or supporting new infrastructure in tests.

### k8s

Kubernetes helpers. Utilities for interacting with the K8s API, managing pods, and reading cluster state.

**Key exports:** Kubernetes client wrappers and helpers

**When to modify:** Changing how the platform interacts with Kubernetes, or adding new K8s operations.

### logger

Sentry-based structured logging. Provides a logger that integrates with Sentry for error tracking, performance monitoring, and structured context.

**Key exports:** `logger` (root logger), `Logger` type

**When to modify:** Changing the logging format, adjusting Sentry integration, or adding new logging capabilities.

### review

Post-execution AI review. Analyzes test execution recordings and results to validate whether tests passed correctly.

**Key exports:** Review service classes

**When to modify:** Changing how test results are reviewed, adjusting AI review prompts, or adding new review criteria.

### scenario

Environment Factory scenario logic. Handles test scenario definitions, data seeding, and teardown for isolated test environments.

**Key exports:** Scenario classes and types

**When to modify:** Adding new test scenarios, changing how test data is seeded, or adjusting the Environment Factory protocol.

### storage

S3 file storage. Handles uploading and downloading artifacts (screenshots, videos, test results) to S3-compatible storage.

**Key exports:** Storage service classes

**When to modify:** Changing storage providers, adjusting upload/download logic, or adding new artifact types.

### test-updates

Test suite update logic. Handles applying changes to test suites - adding, removing, and modifying test cases.

**Key exports:** Test update service classes

**When to modify:** Changing how test suites are modified, or adding new update operations.

### types

Shared Zod schemas and TypeScript types. This is the contract layer between the API and frontend. Schemas defined here are used for tRPC input validation and frontend type inference.

**Key exports:** Zod schemas for all API inputs/outputs, TypeScript types, constants

**When to modify:** Adding new API endpoints, changing request/response shapes, or adding shared constants.

### utils

Shared utilities that don’t fit into a more specific package.

**Key exports:** Various utility functions

**When to modify:** Adding general-purpose utilities used across multiple packages.

### workflow

Temporal workflow definitions and client. Orchestrates test execution pipelines using Temporal workflows and activities.

**Key exports:** Workflow builder classes

**When to modify:** Changing how test execution is orchestrated, adjusting workflow templates, or adding new pipeline steps.

## Apps

### api

The backend server. Built with Hono (HTTP framework) and tRPC (type-safe API layer). Routers are thin - they wire tRPC procedures to controller files in `controllers/<routerName>/`. One file per procedure.

**When to modify:** Adding new API endpoints, changing business logic, or adjusting authentication.

### ui

The frontend SPA. Built with React 19, Vite, and TanStack Router. Compiled to static files - no SSR. Uses `@autonoma/blacklight` for all UI components.

**When to modify:** Adding new pages, changing the UI, or adjusting frontend behavior.

### engine-web

Playwright-based web test execution. Implements the driver interfaces from `packages/engine` using Playwright’s API. Handles browser lifecycle, screenshot capture, network idle detection, and video recording.

**When to modify:** Changing web-specific test execution behavior, adjusting Playwright configuration, or fixing browser-related issues.

### engine-mobile

Appium-based mobile test execution for iOS and Android. Implements the same driver interfaces using Appium/WebDriver. Uses `@autonoma/device-lock` for Redis-based device allocation.

**When to modify:** Changing mobile-specific test execution behavior, adjusting Appium configuration, or adding support for new device types.

### docs

This documentation site. Built with Astro Starlight and deployed to S3 + CloudFront.

**When to modify:** Adding or updating documentation pages.

### jobs

Background job services, each deployed as a separate Docker image:

| Job                             | Purpose                                           |
| ------------------------------- | ------------------------------------------------- |
| **generation-reviewer**         | AI review of generated test cases                 |
| **replay-reviewer**             | Post-test AI validation using video recordings    |
| **run-completion-notification** | Slack/email notifications when test runs complete |
| **generation-assigner**         | Assigns generated tests to appropriate scenarios  |
| **scenario**                    | Environment Factory scenario execution            |
| **diffs**                       | Computes test suite diffs                         |

## Dependency graph

The general dependency flow (simplified):

```plaintext
apps (api, ui, engines, jobs)
 |
 +-- packages/types        (shared schemas - used by almost everything)
 +-- packages/db           (database - used by api, jobs)
 +-- packages/engine       (execution core - used by engines)
 +-- packages/ai           (AI primitives - used by engine, jobs)
 +-- packages/try          (error handling - used by everything)
 +-- packages/logger       (logging - used by everything)
 +-- packages/errors       (error types - used by engine, api)
 +-- packages/storage      (S3 - used by api, engines, jobs)
 +-- packages/blacklight   (UI components - used by ui only)
 +-- packages/analytics    (PostHog - used by api)
 +-- packages/workflow     (Temporal workflows - used by api, workers)
```

Key relationships:

* `packages/engine` depends on `packages/ai` for all AI operations
* `packages/ai` is self-contained - it only depends on `try`, `logger`, and `image`
* `packages/types` is a leaf dependency - it depends on nothing else in the monorepo
* `packages/try` is a leaf dependency - used everywhere, depends on nothing
* Both `engine-web` and `engine-mobile` depend on `packages/engine` but never on each other

# Code Conventions

> The rules of the Autonoma AI codebase - TypeScript patterns, error handling, logging, testing, and style guidelines.

## ESM-only

Every `package.json` has `"type": "module"`. No CommonJS anywhere in the codebase.

**Never use `.js` extensions in imports.** TypeScript and the bundler resolve modules automatically.

```ts
// Good
import { foo } from "./foo";
import { bar } from "@autonoma/types";


// Bad
import { foo } from "./foo.js";
```

## TypeScript strictness

All strict flags are enabled. Every package extends `tsconfig.base.json`, which includes:

* `strict: true` (enables all strict checks)
* `noUncheckedIndexedAccess` - array and object index access returns `T | undefined`
* `exactOptionalPropertyTypes` - optional properties can’t be assigned `undefined` explicitly unless typed that way
* `verbatimModuleSyntax` - enforces explicit `type` imports

In practice, this means:

* You must check array access results before using them
* You must narrow types before passing them to functions that expect non-nullable values
* You must use `import type { ... }` for type-only imports

## Classes vs functions

**Needs state or dependencies?** Use a class with constructor injection.

**Pure logic with no state?** Use a function file.

In practice, almost everything is a class because most logic needs a logger, a database client, or some other dependency.

## Dependency injection

Plain constructor injection. No DI framework, no decorators.

```ts
class StepExecutor {
  private readonly logger: Logger;


  constructor(
    private readonly engine: Engine,
    private readonly db: PrismaClient,
  ) {
    this.logger = logger.child({ name: this.constructor.name });
  }
}
```

You can read any class constructor and immediately see all its dependencies. No magic, no hidden state.

## One export per file

A file exports exactly one thing - a class, a function, or a type. The exported item tells the story top-to-bottom. Private helpers follow in call order.

This keeps files focused and makes imports predictable.

### Custom error hierarchy

```plaintext
AutonomaError (base)
  TestError          - test execution failures
  DriverError        - Appium/Playwright driver failures
  PreconditionError  - setup/precondition failures
  VerificationError  - assertion failures
  ThirdPartyError    - external service failures
```

## Prefer undefined over null

Always use `undefined` as the absence-of-value sentinel. Use optional properties (`?`) instead of `| null` types. Never initialize to `null`.

```ts
// Good
private timeout?: number


// Bad
private timeout: number | null = null
```

This applies everywhere: class properties, function parameters, return types, object shapes.

## Nullish checks

Always `??`, never `||`. Always `!= null` / `== null`, never truthy/falsy checks.

```ts
// Good
const timeout = config.timeout ?? 3000;
if (element != null) { /* ... */ }


// Bad - truthy/falsy has unexpected behavior with 0, "", false
const timeout = config.timeout || 3000;  // 0 becomes 3000!
if (element) { /* ... */ }
```

The `!= null` check covers both `null` and `undefined`, which is exactly what you want.

## Early returns

Always prefer early returns to reduce nesting. If a function has deeply nested `if` blocks, extract the inner logic into a separate function with guard clauses.

```ts
// Good
function processOrder(order: Order): Result {
  if (order.status === "cancelled") throw new OrderCancelledError();
  if (order.items.length === 0) throw new EmptyOrderError();


  return calculateTotal(order);
}


// Bad - deeply nested
function processOrder(order: Order): Result {
  if (order.status !== "cancelled") {
    if (order.items.length > 0) {
      return calculateTotal(order);
    }
  }
  // ...
}
```

## No complex destructuring or spread

If constructing an object requires multiple `...` spreads or ternary-based spreads, build the object explicitly instead.

```ts
// Good
const permissions = isAdmin ? allPermissions : readOnly;
return {
  name: baseConfig.name,
  timeout: baseConfig.timeout,
  permissions,
  retries: overrides.retries ?? baseConfig.retries,
};


// Bad
return {
  ...baseConfig,
  ...((isAdmin) ? { permissions: allPermissions } : { permissions: readOnly }),
  ...overrides,
};
```

## Extract complex conditions

If a condition isn’t immediately obvious, extract it into a descriptively named variable.

```ts
// Good
const isTrialExpired = subscription.status === "trial" && subscription.endsAt < now;
const hasNoPaymentMethod = user.paymentMethods.length === 0;
if (isTrialExpired && hasNoPaymentMethod) { /* ... */ }


// Bad - what does this check?
if (subscription.status === "trial" && subscription.endsAt < now && user.paymentMethods.length === 0) { /* ... */ }
```

## Avoid let + conditional assignment

Instead of using `let` and assigning in `if/else` blocks, extract a function with early returns.

## Logging with Sentry

Every class and every function file must have logging. When in doubt, add a log. Overlogging is always better than underlogging.

### What to log

* Service startup and configuration
* Incoming requests and their resolution (success/failure)
* External API calls (start, success, failure)
* State transitions (agent steps, job status changes)
* Resource acquisition/release (device locks, browser sessions)
* Every public method entry with relevant parameters
* Every method exit with relevant results

Use structured context (Sentry breadcrumbs, tags, extra data) so logs are searchable. Never log sensitive data (credentials, tokens).

### Class logger pattern

Every class gets a `private readonly logger` instance, created in the constructor as a child of the root logger with the class name and identifying context.

```ts
import { type Logger, logger } from "@autonoma/logger";


export class TestSuiteUpdater {
  private readonly logger: Logger;


  constructor(private readonly snapshotId: string) {
    this.logger = logger.child({ name: this.constructor.name, snapshotId });
  }


  public async apply(change: TestSuiteChange) {
    this.logger.info("Applying test suite change", { type: change.constructor.name });
    // ... do work ...
    this.logger.info("Finished applying change");
  }
}
```

### Function logger pattern - called from classes

If a reusable function is called from a class method, accept a `Logger` parameter to preserve the logging context chain.

```ts
import type { Logger } from "@autonoma/logger";


export function computeChanges(branchId: string, logger: Logger) {
  logger.info("Computing changes", { branchId });
  // ... do work ...
  logger.info("Changes computed", { count: changes.length });
  return changes;
}
```

### Function logger pattern - standalone files

If a file exports independently useful functions (not called from a single class), import the root logger and create a child per function.

```ts
import { logger as rootLogger } from "@autonoma/logger";


export function syncDevices(deviceIds: string[]) {
  const logger = rootLogger.child({ name: "syncDevices" });
  logger.info("Syncing devices", { count: deviceIds.length });
  // ... do work ...
  logger.info("Devices synced");
}
```

## Testing

### Philosophy

* **Vitest** for all tests
* **Prefer integration tests** over unit tests. Test the real thing, not mocks
* **Never mock the database.** Use Testcontainers with a real PostgreSQL container
* Only test what makes sense - don’t test trivial getters

### Setup

Test files go in `test/` directories that mirror the `src/` structure. File naming: `*.test.ts`.

For integration tests that need a database, use the `@autonoma/integration-test` package:

```ts
import { integrationTestSuite } from "@autonoma/integration-test";


integrationTestSuite("MyService", (harness) => {
  it("should create a record", async () => {
    const db = harness.db;
    // ... test with a real database
  });
});
```

The harness spins up a real PostgreSQL container via Testcontainers, runs migrations, and gives you a fresh database for each test suite.

### Running tests

```bash
pnpm test              # run all tests
pnpm test --filter=ai  # run tests in a specific package
```

## Database transactions

Wrap sequential database queries in a Prisma `$transaction` when they must be consistent. If a service method reads then writes (or writes to multiple tables), use `$transaction`:

```ts
async createGeneration(userId: string, orgId: string, appId: string) {
  return await this.db.$transaction(async (tx) => {
    const app = await tx.application.findFirst({
      where: { id: appId, organizationId: orgId },
    });
    if (app == null) throw new Error("Application not found");


    const generation = await tx.applicationGeneration.create({
      data: { /* ... */ },
    });


    await tx.onboardingState.upsert({
      where: { applicationId: appId },
      /* ... */
    });


    return { id: generation.id };
  });
}
```

Pass `tx` to all queries inside the transaction - not the original `db` client.

## Adding dependencies

**Always check `pnpm-workspace.yaml` first.** The catalog section defines pinned versions for shared dependencies. When adding a dependency:

1. Check if it already exists in the `catalog:` section
2. If it does, use `"catalog:"` as the version in `package.json`
3. If it doesn’t, consider whether it should be added to the catalog (used by multiple packages) or pinned locally

```jsonc
// Good - uses catalog version
"dependencies": {
  "zod": "catalog:"
}


// Bad - hardcodes a version when a catalog entry exists
"dependencies": {
  "zod": "^3.23.0"
}
```

## Environment variables

Never read `process.env` directly. Define all environment variables in a dedicated `env.ts` file using `createEnv` from `@t3-oss/env-core` with Zod schemas:

```ts
import { createEnv } from "@t3-oss/env-core";
import { z } from "zod";


export const env = createEnv({
  server: {
    DATABASE_URL: z.string().url(),
    REDIS_URL: z.string().url(),
    BETTER_AUTH_SECRET: z.string().min(1),
  },
  runtimeEnv: process.env,
});
```

This gives you type safety, runtime validation, and a single source of truth for all required variables. Pass validated env values as function parameters rather than reading `process.env` in library code.

# Common Workflows

> Step-by-step guides for common development tasks - adding routes, pages, commands, models, tests, and more.

This page covers the most common development tasks you will perform in the Autonoma monorepo. Each workflow is a step-by-step guide with file paths and code patterns.

## Adding a New tRPC Route

Types flow through tRPC from API to frontend. Never manually define API response types on the frontend.

**1. Define Zod schemas** in `packages/types/src/schemas/`:

packages/types/src/schemas/my-feature.ts

```ts
import z from "zod";


export const myFeatureInput = z.object({
  name: z.string(),
  organizationId: z.string(),
});


export const myFeatureOutput = z.object({
  id: z.string(),
  createdAt: z.date(),
});
```

**2. Create a controller** in `apps/api/src/controllers/<routerName>/<procedureName>.ts`. Controllers hold all business logic:

apps/api/src/controllers/myFeature/create.ts

```ts
import type { PrismaClient } from "@autonoma/db";
import type { z } from "zod";
import type { myFeatureInput } from "@autonoma/types";


export async function createMyFeature(
  db: PrismaClient,
  input: z.infer<typeof myFeatureInput>,
) {
  return db.myFeature.create({
    data: { name: input.name, organizationId: input.organizationId },
  });
}
```

**3. Create or update the router** in `apps/api/src/routers/`. Routers are thin wiring - they delegate to controllers:

apps/api/src/routers/my-feature.ts

```ts
import { router, protectedProcedure } from "../trpc";
import { myFeatureInput } from "@autonoma/types";
import { createMyFeature } from "../controllers/myFeature/create";


export const myFeatureRouter = router({
  create: protectedProcedure
    .input(myFeatureInput)
    .mutation(async ({ ctx, input }) => {
      return createMyFeature(ctx.db, input);
    }),
});
```

**4. Add to `appRouter`** in `apps/api/src/router.ts` (if this is a new router):

```ts
export const appRouter = router({
  // ...existing routers
  myFeature: myFeatureRouter,
});
```

**5. Use on the frontend.** For queries, use `useSuspenseQuery` with `queryOptions`:

```ts
const { data } = useSuspenseQuery(
  trpc.myFeature.list.queryOptions({ organizationId }),
);
```

For mutations, use `useAPIMutation` with `mutationOptions`:

```ts
const createMutation = useAPIMutation(
  trpc.myFeature.create.mutationOptions(),
);
```

## Adding a New Page

TanStack Router with file-based routing makes this straightforward.

**1. Create a route file** in `apps/ui/src/routes/`:

apps/ui/src/routes/my-feature.tsx

```ts
import { createFileRoute } from "@tanstack/react-router";


export const Route = createFileRoute("/my-feature")({
  component: MyFeaturePage,
});


function MyFeaturePage() {
  return <div>My Feature</div>;
}
```

**2. That’s it.** The TanStack Router plugin auto-generates the route tree. The page is immediately accessible at `/my-feature`.

For pages that need data, add a `loader`:

```ts
export const Route = createFileRoute("/my-feature")({
  loader: ({ context }) => {
    context.queryClient.ensureQueryData(
      trpc.myFeature.list.queryOptions(),
    );
  },
  component: MyFeaturePage,
});
```

## Database Schema Changes

**1. Edit the schema** at `packages/db/prisma/schema.prisma`.

**2. Create a migration:**

```bash
pnpm db:migrate
```

This generates a migration file and applies it to your local database.

**3. Regenerate the Prisma client:**

```bash
pnpm db:generate
```

**4. Run typecheck** to catch any type errors from the schema change:

```bash
pnpm typecheck
```

If multiple queries in a service method need to be consistent (read-then-write, or writes to multiple tables), wrap them in a Prisma `$transaction`:

```ts
return await this.db.$transaction(async (tx) => {
  const existing = await tx.myTable.findFirst({ where: { id } });
  if (existing == null) throw new Error("Not found");
  return tx.myTable.update({ where: { id }, data: { ... } });
});
```

## Adding a New Command to the Execution Agent

See the [Execution Agent](/architecture/execution-agent/#adding-a-new-command) page for a detailed walkthrough. The short version:

**1. Define the spec** with a `CommandSpec` interface and Zod schema in `packages/engine/src/commands/commands/<name>/<name>.def.ts`.

**2. Implement the command** by extending `Command<TSpec, TContext>` in `packages/engine/src/commands/commands/<name>/<name>.command.ts`.

**3. Create the tool wrapper** by extending `CommandTool<TSpec, TContext>` in `packages/engine/src/execution-agent/agent/tools/commands/<name>.tool.ts`.

**4. Add the spec** to the union type in `packages/engine/src/commands/command-defs.ts`.

**5. Register the tool** in the `ExecutionAgentFactory` subclass for the relevant platform(s).

**6. Write tests** in `packages/engine/src/commands/commands/<name>/<name>.test.ts`. Use the test utilities in `packages/engine/src/commands/test-utils/` for fake drivers and model registries.

## Adding a New AI Model

See the [AI Package](/architecture/ai-package/#adding-a-new-model) page for full details. The short version:

**1. Add the model entry** to `MODEL_ENTRIES` in `packages/ai/src/registry/model-entries.ts`:

```ts
MY_MODEL: {
  createModel: () => googleProvider.getModel("my-model-id"),
  pricing: simpleCostFunction({
    inputCostPerM: 0.5,
    outputCostPerM: 1.5,
  }),
},
```

**2. Add a provider** in `packages/ai/src/registry/providers.ts` if the model uses a new provider. Add the API key to `packages/ai/src/env.ts` using `createEnv`.

**3. Use it** via `registry.getModel({ model: "MY_MODEL", tag: "my-use-case" })`.

## Running and Writing Tests

Vitest is used everywhere. Every package has it installed.

### Running Tests

```bash
# Run all tests across the monorepo
pnpm test


# Run tests for a specific package
pnpm --filter @autonoma/engine test


# Run a specific test file
pnpm --filter @autonoma/ai test -- src/visual/assert-checker.test.ts


# Run in watch mode
pnpm --filter @autonoma/engine test -- --watch
```

### Writing Tests

**Prefer integration tests over unit tests.** Only test what provides value - don’t test trivial getters.

Test files go in `test/` directories or alongside source files as `*.test.ts`.

**Never mock the database.** For tests that need a database, use Testcontainers with a real PostgreSQL container via the `@autonoma/integration-test` package:

```ts
import { integrationTestSuite } from "@autonoma/integration-test";


integrationTestSuite("MyService", ({ getDb }) => {
  it("creates a record", async () => {
    const db = getDb();
    const result = await myService.create(db, { name: "test" });
    expect(result.name).toBe("test");
  });
});
```

For command tests, use the fake drivers in `packages/engine/src/commands/test-utils/`:

```ts
import { FakeScreenDriver } from "../test-utils/fake-screen.driver";
import { FakeMouseDriver } from "../test-utils/fake-mouse.driver";
```

## Working with the UI Component Library

All frontend components come from `@autonoma/blacklight`, built on Radix UI + Tailwind CSS v4 + CVA.

### Using Components

```tsx
import { Button, Card, Input, cn } from "@autonoma/blacklight";


function MyComponent() {
  return (
    <Card className={cn("p-4")}>
      <Input placeholder="Enter name" />
      <Button variant="default" size="sm">
        Submit
      </Button>
    </Card>
  );
}
```

### Icons

Use Lucide React for all icons:

```tsx
import { Plus, Settings } from "lucide-react";


<Button>
  <Plus className="size-4" />
  Add item
</Button>
```

### Custom Variants

Use CVA (class-variance-authority) for component variants:

```tsx
import { cva } from "class-variance-authority";


const badgeVariants = cva("rounded-full px-2 py-0.5 text-xs font-medium", {
  variants: {
    status: {
      active: "bg-green-100 text-green-800",
      inactive: "bg-gray-100 text-gray-800",
    },
  },
});
```

## Adding Environment Variables

Never read `process.env` directly. Always use `createEnv` from `@t3-oss/env-core`.

**1. Define the variable** in a dedicated `env.ts` file for the package or app:

packages/my-package/src/env.ts

```ts
import { createEnv } from "@t3-oss/env-core";
import z from "zod";


export const env = createEnv({
  server: {
    MY_API_KEY: z.string().min(1),
    MY_TIMEOUT: z.coerce.number().default(5000),
  },
  runtimeEnv: process.env,
});
```

**2. Use the validated env** in your code:

```ts
import { env } from "./env";


const client = new MyClient({ apiKey: env.MY_API_KEY });
```

**3. For library code**, prefer passing values as function parameters rather than reading env directly. This keeps the library testable and reusable:

```ts
// Good - library accepts config
export class MyService {
  constructor(private readonly apiKey: string) {}
}


// App wires it up with env
const service = new MyService(env.MY_API_KEY);
```

**4. Check the catalog** in `pnpm-workspace.yaml` before adding `@t3-oss/env-core` as a dependency. If it is already in the catalog, use `"@t3-oss/env-core": "catalog:"` in your `package.json`.

## Adding Dependencies

Before adding any dependency, check `pnpm-workspace.yaml` for the catalog:

```bash
# Check if the package exists in the catalog
grep "my-package" pnpm-workspace.yaml
```

If the package is in the catalog, use `catalog:` as the version:

```json
{
  "dependencies": {
    "zod": "catalog:"
  }
}
```

If it is not in the catalog but will be shared across multiple packages, consider adding it there first.

Then install:

```bash
pnpm install
```

## Building and Type Checking

```bash
# Build everything (Turborepo handles dependency order)
pnpm build


# Type check all packages
pnpm typecheck


# Lint all packages
pnpm lint


# Run dev servers (web on 3000, API on 4000)
pnpm dev
```

All packages are ESM-only. Never use `.js` extensions in imports - TypeScript resolves modules automatically.

# Environment Variables

> Complete reference for every environment variable used across the Autonoma AI monorepo - API server, frontend, AI services, database, storage, logging, billing, and infrastructure.

## Quick Start - Minimum for Local Development

To get the API and UI running locally, you need a surprisingly small set of variables. Copy `.env.example` to `.env` at the repo root and fill in these essentials:

```bash
# Database
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/autonoma


# Redis
REDIS_URL=redis://localhost:6379


# API server
API_PORT=4000
SCENARIO_ENCRYPTION_KEY=any-string-at-least-1-char


# Google OAuth (create credentials at console.cloud.google.com)
GOOGLE_CLIENT_ID=your-google-client-id
GOOGLE_CLIENT_SECRET=your-google-client-secret


# AI model keys (needed for test execution)
GEMINI_API_KEY=your-gemini-key
GROQ_KEY=your-groq-key
OPENROUTER_API_KEY=your-openrouter-key


# S3-compatible storage (can use MinIO locally)
S3_BUCKET=autonoma-local
S3_REGION=us-east-1
S3_ACCESS_KEY_ID=minioadmin
S3_SECRET_ACCESS_KEY=minioadmin
```

Everything else has sensible defaults or is optional for local development. The sections below cover every variable in detail.

## How Environment Variables Work in This Project

Every app and package defines its environment variables in a dedicated `env.ts` file using [`createEnv` from `@t3-oss/env-core`](https://env.t3.gg/). This gives you:

* **Zod validation at startup** - the process crashes immediately if a required variable is missing or malformed, rather than failing mysteriously at runtime.
* **Type safety** - `env.DATABASE_URL` is typed as `string`, not `string | undefined`. No more `process.env.DATABASE_URL!` casts.
* **Composability** - packages export their `env` object, and apps extend them. For example, the API server’s `env.ts` extends the database, storage, logger, and billing envs, inheriting all their variables.

You should **never read `process.env` directly** in application code. Always import from the nearest `env.ts`:

```ts
// Good
import { env } from "./env";
const port = env.API_PORT;


// Bad - bypasses validation
const port = process.env.API_PORT;
```

The `emptyStringAsUndefined: true` option is enabled everywhere, so setting a variable to an empty string is treated the same as not setting it at all.

For boolean variables, the codebase uses `z.stringbool()` which accepts `"true"`, `"false"`, `"1"`, `"0"`, `"yes"`, and `"no"`.

***

## Core API Server

**Source:** `apps/api/src/env.ts`

The API server extends the database, storage, logger, and billing environments, so all variables from those sections apply here too.

| Variable                  | Required | Default                    | Description                                                                                              |
| ------------------------- | -------- | -------------------------- | -------------------------------------------------------------------------------------------------------- |
| `API_PORT`                | Yes      | -                          | Port the API server listens on. Typically `4000`.                                                        |
| `INTERNAL_DOMAIN`         | No       | `autonoma.app`             | Internal domain used for routing and service discovery.                                                  |
| `ALLOWED_ORIGINS`         | No       | `http://localhost:3000`    | Comma-separated list of CORS origins. Must include the frontend URL.                                     |
| `SCENARIO_ENCRYPTION_KEY` | Yes      | -                          | Key used to encrypt scenario data. Any non-empty string works for local dev.                             |
| `GOOGLE_CLIENT_ID`        | Yes      | -                          | OAuth 2.0 client ID from Google Cloud Console. Required for user authentication.                         |
| `GOOGLE_CLIENT_SECRET`    | Yes      | -                          | OAuth 2.0 client secret from Google Cloud Console.                                                       |
| `AGENT_VERSION`           | No       | `latest`                   | Version tag for the execution agent. Used when dispatching engine jobs.                                  |
| `POSTHOG_KEY`             | No       | -                          | PostHog project API key for server-side analytics. Omit to disable analytics.                            |
| `POSTHOG_HOST`            | No       | `https://us.i.posthog.com` | PostHog ingestion endpoint. Override for self-hosted PostHog instances.                                  |
| `GEMINI_API_KEY`          | Yes      | -                          | Google Gemini API key. Used by the API for AI features like test generation.                             |
| `REDIS_URL`               | Yes      | -                          | Redis connection string (e.g., `redis://localhost:6379`). Used for device locking, caching, and pub/sub. |
| `TESTING`                 | No       | `false`                    | Set to `true` in test environments. Prevents importing certain modules. Not for general use.             |
| `ENGINE_BILLING_SECRET`   | No       | -                          | Shared secret for authenticating billing calls from the engine.                                          |

***

## Frontend (UI)

**Source:** `apps/ui/src/env.ts`

The frontend uses Vite’s `import.meta.env` and requires the `VITE_` prefix for all variables.

| Variable               | Required | Default                 | Description                                                                                                                                                      |
| ---------------------- | -------- | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `VITE_API_URL`         | No       | `http://localhost:4000` | URL of the API server. The frontend makes all tRPC calls to this address.                                                                                        |
| `VITE_INTERNAL_DOMAIN` | No       | `autonoma.app`          | Internal domain, used for UI routing logic.                                                                                                                      |
| `VITE_TEMPORAL_URL`    | No       | -                       | URL of the Temporal UI. When set, enables links to workflow runs in the dashboard.                                                                               |
| `VITE_SENTRY_DSN`      | No       | -                       | Sentry DSN for frontend error tracking. Omit to disable Sentry in the browser.                                                                                   |
| `VITE_SENTRY_URL`      | No       | -                       | Sentry organization URL. Used for linking to Sentry issues from the UI.                                                                                          |
| `VITE_POSTHOG_KEY`     | No       | -                       | PostHog project API key for frontend analytics. Omit to disable analytics. PostHog events are proxied through the API server at `/ingest` to bypass ad blockers. |

***

## Database

**Source:** `packages/db/src/env.ts`

| Variable       | Required | Default | Description                                                                                                                        |
| -------------- | -------- | ------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| `DATABASE_URL` | Yes      | -       | PostgreSQL connection string. Format: `postgresql://user:password@host:port/database`. Used by Prisma for all database operations. |

> **Note:**
>
> For local development, a typical value is `postgresql://postgres:postgres@localhost:5432/autonoma`. Make sure PostgreSQL is running and the database exists before starting the API.

***

## AI Services

**Source:** `packages/ai/src/env.ts`

These keys are required by the execution engines (web and mobile) and any service that runs AI inference. The API server only needs `GEMINI_API_KEY` directly - the other keys are consumed by the engine apps.

| Variable             | Required | Default | Description                                                                                                                               |
| -------------------- | -------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `GEMINI_API_KEY`     | Yes      | -       | Google Gemini API key. Used for the primary model (Gemini 3 Flash/Pro), point detection, object detection, and visual condition checking. |
| `GROQ_KEY`           | Yes      | -       | Groq API key. Used for fast inference with open-source models (e.g., GPT-OSS-120B).                                                       |
| `OPENROUTER_API_KEY` | Yes      | -       | OpenRouter API key. Provides access to Ministral-8B and serves as a fallback provider for open-source models.                             |

> **Note:**
>
> Validation is skipped when running in Vitest (`VITEST` env var is set), so you do not need these keys to run unit tests.

***

## Storage (S3)

**Source:** `packages/storage/src/env.ts`

Used for storing screenshots, video recordings, test artifacts, and other binary assets.

| Variable               | Required | Default | Description                                                        |
| ---------------------- | -------- | ------- | ------------------------------------------------------------------ |
| `S3_BUCKET`            | Yes      | -       | S3 bucket name for storing artifacts.                              |
| `S3_REGION`            | Yes      | -       | AWS region of the S3 bucket (e.g., `us-east-1`).                   |
| `S3_ACCESS_KEY_ID`     | Yes      | -       | AWS access key ID (or MinIO equivalent) for S3 authentication.     |
| `S3_SECRET_ACCESS_KEY` | Yes      | -       | AWS secret access key (or MinIO equivalent) for S3 authentication. |

> **Local development with MinIO:**
>
> You can run [MinIO](https://min.io/) locally as an S3-compatible object store. The default credentials are `minioadmin`/`minioadmin`. Point `S3_REGION` to any valid region string (e.g., `us-east-1`) and create a bucket matching your `S3_BUCKET` value.

***

## Logging and Observability

**Source:** `packages/logger/src/env.ts`

| Variable         | Required | Default       | Description                                                                                                |
| ---------------- | -------- | ------------- | ---------------------------------------------------------------------------------------------------------- |
| `NODE_ENV`       | No       | `development` | Node environment. Accepts `development`, `production`, or `test`. Affects log formatting and behavior.     |
| `SENTRY_DSN`     | No       | -             | Sentry DSN for backend error tracking and performance monitoring. Omit to disable Sentry.                  |
| `SENTRY_ENV`     | No       | `production`  | Sentry environment tag (e.g., `staging`, `production`).                                                    |
| `SENTRY_RELEASE` | No       | `unknown`     | Sentry release identifier. Typically set to the git SHA or version tag in CI.                              |
| `DEBUG`          | No       | -             | Debug filter string. When set, enables verbose debug logging for matching namespaces (e.g., `autonoma:*`). |

***

## Billing (Stripe)

**Source:** `packages/billing/src/env.ts`

Billing is entirely optional. When `STRIPE_ENABLED` is `false` (the default), all billing features are disabled and no other Stripe variables are needed.

| Variable                       | Required | Default                 | Description                                                                                                    |
| ------------------------------ | -------- | ----------------------- | -------------------------------------------------------------------------------------------------------------- |
| `STRIPE_ENABLED`               | No       | `false`                 | Master switch for billing. Set to `true` to enable Stripe integration.                                         |
| `STRIPE_SECRET_KEY`            | No       | -                       | Stripe secret API key. Required when `STRIPE_ENABLED` is `true`.                                               |
| `STRIPE_WEBHOOK_SECRET`        | No       | -                       | Stripe webhook signing secret for verifying incoming webhook events. Required when `STRIPE_ENABLED` is `true`. |
| `STRIPE_SUBSCRIPTION_PRICE_ID` | No       | -                       | Stripe Price ID for the subscription plan. Required when `STRIPE_ENABLED` is `true`.                           |
| `STRIPE_TOPUP_PRICE_ID`        | No       | -                       | Stripe Price ID for credit top-up purchases. Required when `STRIPE_ENABLED` is `true`.                         |
| `BILLING_GRACE_PERIOD_DAYS`    | No       | `3`                     | Number of days after a subscription lapses before access is revoked.                                           |
| `APP_URL`                      | No       | `http://localhost:3000` | Frontend application URL. Used in Stripe checkout redirect URLs and billing emails.                            |

***

## Kubernetes and Workflows

**Source:** `packages/k8s/src/env.ts` and `packages/workflow/src/env.ts`

These variables are only needed in production or when running engine jobs on Kubernetes. Not required for local development.

| Variable    | Required     | Default | Description                                                            |
| ----------- | ------------ | ------- | ---------------------------------------------------------------------- |
| `NAMESPACE` | Yes (in K8s) | -       | Kubernetes namespace where jobs are deployed. Used by `@autonoma/k8s`. |

The workflow package also reads:

| Variable       | Required | Default | Description                                                                                     |
| -------------- | -------- | ------- | ----------------------------------------------------------------------------------------------- |
| `DATABASE_URL` | Yes      | -       | PostgreSQL connection string. The workflow package needs direct DB access for job coordination. |
| `SENTRY_ENV`   | No       | -       | Sentry environment tag for workflow jobs.                                                       |

***

## Engine - Web (Playwright)

**Source:** `apps/engine-web/src/platform/env.ts` and `apps/engine-web/src/execution-agent/env.ts`

The web engine extends the AI, database, logger, and storage environments. All variables from those sections apply.

| Variable             | Required | Default | Description                                                                                                                           |
| -------------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------- |
| `REMOTE_BROWSER_URL` | No       | -       | WebSocket URL of a remote browser instance (e.g., Browserless or Playwright remote). When omitted, launches a local Chromium browser. |
| `HEADLESS`           | No       | -       | Set to any value to run Playwright in headless mode. When omitted, the browser window is visible (useful for local debugging).        |

***

## Engine - Mobile (Appium)

**Source:** `apps/engine-mobile/src/platform/env.ts`

The mobile engine extends the AI, database, logger, and storage environments. All variables from those sections apply.

| Variable                   | Required | Default | Description                                                                                                     |
| -------------------------- | -------- | ------- | --------------------------------------------------------------------------------------------------------------- |
| `APPIUM_HOST`              | No       | -       | Hostname of the Appium server.                                                                                  |
| `APPIUM_PORT`              | No       | -       | Port of the Appium server.                                                                                      |
| `APPIUM_MJPEG_PORT`        | No       | -       | Port for the Appium MJPEG video stream. Used for live frame capture during test execution.                      |
| `APPIUM_SYSTEM_PORT`       | No       | -       | System port used by Appium’s UiAutomator2 (Android) or WebDriverAgent (iOS).                                    |
| `APPIUM_SKIP_INSTALLATION` | No       | `true`  | When `true`, skips reinstalling the app before each test. Speeds up repeated runs on the same device.           |
| `DEVICE_NAME`              | No       | -       | Name of the target device or emulator (e.g., `iPhone 15 Pro`, `Pixel 7`).                                       |
| `IOS_PLATFORM_VERSION`     | No       | -       | iOS version to target (e.g., `17.2`). Required for iOS testing.                                                 |
| `ANDROID_DAEMON_HOSTS`     | No       | -       | Comma-separated list of Android daemon host addresses for distributed device access.                            |
| `IOS_DAEMON_HOSTS`         | No       | -       | Comma-separated list of iOS daemon host addresses for distributed device access.                                |
| `SKIP_DEVICE_DATE_UPDATE`  | No       | `false` | When `true`, skips updating the device date/time before tests. Useful when the device clock is already correct. |

***

## Jobs

### Execution Agent Runner

**Source:** `packages/engine/src/execution-agent/runner/env.ts`

| Variable       | Required | Default | Description                                                                                                              |
| -------------- | -------- | ------- | ------------------------------------------------------------------------------------------------------------------------ |
| `ARTIFACT_DIR` | No       | -       | Local directory for saving test artifacts (screenshots, videos, step logs). Used by the local runner during development. |

### Run Completion Notification

**Source:** `apps/jobs/run-completion-notification/src/env.ts`

| Variable                | Required | Default | Description                                             |
| ----------------------- | -------- | ------- | ------------------------------------------------------- |
| `DATABASE_URL`          | Yes      | -       | PostgreSQL connection string.                           |
| `API_URL`               | No       | -       | API server URL for callbacks.                           |
| `ENGINE_BILLING_SECRET` | No       | -       | Shared secret for authenticating billing-related calls. |
| `STRIPE_ENABLED`        | No       | `false` | Whether to process billing events on run completion.    |

### Diffs

**Source:** `apps/jobs/diffs/src/env.ts`

| Variable                    | Required | Default  | Description                                     |
| --------------------------- | -------- | -------- | ----------------------------------------------- |
| `BRANCH_ID`                 | Yes      | -        | Branch identifier for computing diffs.          |
| `GEMINI_API_KEY`            | Yes      | -        | Gemini API key for AI-powered diff analysis.    |
| `GITHUB_APP_ID`             | Yes      | -        | GitHub App ID for repository access.            |
| `GITHUB_APP_PRIVATE_KEY`    | Yes      | -        | GitHub App private key (PEM format).            |
| `GITHUB_APP_WEBHOOK_SECRET` | Yes      | -        | GitHub App webhook secret for verifying events. |
| `AGENT_VERSION`             | No       | `latest` | Version tag for the diff agent.                 |

### Generation Assigner

**Source:** `apps/jobs/generation-assigner/src/env.ts`

| Variable        | Required | Default | Description                                                                   |
| --------------- | -------- | ------- | ----------------------------------------------------------------------------- |
| `AUTO_ACTIVATE` | No       | -       | When set, automatically activates generated test cases without manual review. |

### Review Jobs (Generation Reviewer, Replay Reviewer)

**Source:** `packages/review/src/env.ts`

Both the generation reviewer and replay reviewer jobs re-export from `@autonoma/review/env`, which extends the AI, logger, and storage environments. No additional variables beyond those from the AI, logger, and storage sections.

***

## GitHub App

These variables appear in `.env.example` and are used by the API server and the diffs job for GitHub integration features (repository connections, PR-triggered test runs).

| Variable                    | Required | Default | Description                                                                  |
| --------------------------- | -------- | ------- | ---------------------------------------------------------------------------- |
| `GITHUB_APP_ID`             | No       | -       | GitHub App ID. Required for GitHub integration features.                     |
| `GITHUB_APP_PRIVATE_KEY`    | No       | -       | GitHub App private key in PEM format.                                        |
| `GITHUB_APP_WEBHOOK_SECRET` | No       | -       | Secret for verifying GitHub webhook payloads.                                |
| `GITHUB_APP_SLUG`           | No       | -       | GitHub App slug (URL-friendly name). Used for generating installation links. |

***

## Authentication

These variables are referenced in `.env.example` for the Better Auth integration used by the API server.

| Variable             | Required | Default | Description                                                                                        |
| -------------------- | -------- | ------- | -------------------------------------------------------------------------------------------------- |
| `BETTER_AUTH_SECRET` | Yes      | -       | Secret key for Better Auth session signing. Generate with `openssl rand -hex 32`.                  |
| `BETTER_AUTH_URL`    | Yes      | -       | Base URL of the API server (e.g., `http://localhost:4000`). Used by Better Auth for callback URLs. |

***

## Tips for Local Development

**What you can skip entirely:**

* **Billing** - Leave `STRIPE_ENABLED=false` (the default). No Stripe keys needed.
* **Analytics** - Omit `POSTHOG_KEY` and `VITE_POSTHOG_KEY`. Analytics calls become no-ops.
* **Sentry** - Omit `SENTRY_DSN` and `VITE_SENTRY_DSN`. Error tracking is disabled gracefully.
* **Kubernetes** - Omit `NAMESPACE`. Only needed when deploying to K8s.
* **GitHub App** - Omit all `GITHUB_APP_*` variables unless you are working on GitHub integration.
* **Temporal** - Omit `VITE_TEMPORAL_URL`. The UI hides workflow links when this is unset.

**What uses defaults that just work:**

* `ALLOWED_ORIGINS` defaults to `http://localhost:3000` - correct for local dev.
* `VITE_API_URL` defaults to `http://localhost:4000` - correct for local dev.
* `APP_URL` defaults to `http://localhost:3000` - correct for local dev.
* `NODE_ENV` defaults to `development`.
* `AGENT_VERSION` defaults to `latest`.

**What you must provide:**

* `DATABASE_URL` - there is no default. You need a running PostgreSQL instance.
* `REDIS_URL` - there is no default. You need a running Redis instance.
* `GOOGLE_CLIENT_ID` and `GOOGLE_CLIENT_SECRET` - required for authentication. Create OAuth credentials in the [Google Cloud Console](https://console.cloud.google.com/apis/credentials).
* `SCENARIO_ENCRYPTION_KEY` - any non-empty string works locally.
* `BETTER_AUTH_SECRET` - generate one with `openssl rand -hex 32`.
* `BETTER_AUTH_URL` - set to `http://localhost:4000`.
* AI keys (`GEMINI_API_KEY`, `GROQ_KEY`, `OPENROUTER_API_KEY`) - required if you are running test execution. Not needed if you are only working on the UI or API without triggering test runs.
* S3 credentials - required for artifact storage. Use MinIO locally.

# Execution Agent

> Deep dive into the core test execution engine - a platform-agnostic AI agent that powers web and mobile test execution through natural language.

The execution agent is the brain of Autonoma’s test execution. It is a **generic, platform-agnostic AI agent** that takes a natural language test instruction, interacts with a live application through screenshots and commands, and produces a structured test result with recorded steps.

Web (`engine-web`) and mobile (`engine-mobile`) engines both extend this shared core. Everything is parameterized with `TSpec` (command spec) and `TContext` (driver context), so the same agent logic works across Playwright and Appium without code duplication.

## The Agent Loop

Every test execution follows the same cycle:

```plaintext
┌─────────────────────────────────────────────────────┐
│  1. Screenshot  - capture current screen state       │
│  2. Inject context - screenshot + instruction +      │
│     steps-so-far + memory into a user message        │
│  3. LLM decides - model picks a tool/command         │
│     (or calls execution-finished)                    │
│  4. Command executes - the chosen command runs       │
│     against platform drivers                         │
│  5. Record step - save before/after metadata,        │
│     execution output, and screenshots                │
│  6. Wait planning - asynchronously generate a wait   │
│     condition for replay                             │
│  7. Loop or stop - continue until execution-finished │
│     is called or maxSteps is reached                 │
└─────────────────────────────────────────────────────┘
```

The agent wraps the Vercel AI SDK’s `ToolLoopAgent`. Before each step, it captures a screenshot and injects it alongside the test instruction, all previous steps, and any stored memory variables. The LLM then decides which command to call next.

**Loop detection:** If the model’s reasoning mentions “loop”, “stuck”, “no progress”, or “repeating” in a `success: false` finish, the result is flagged as a loop.

**Success validation:** Even if the model calls `execution-finished` with `success: true`, the agent verifies that at least one command step was executed and at least one `assert` step exists. If either check fails, the result is overridden to `success: false`.

## Directory Structure

```plaintext
packages/engine/src/
├── commands/                          # Command abstraction system
│   ├── command-spec.ts                # CommandSpec type definition
│   ├── command.ts                     # Abstract Command base class
│   ├── command-defs.ts                # Union of all command specs
│   ├── step.ts                        # StepData type
│   └── commands/                      # Built-in command implementations
│       ├── click/                     # AI-powered element clicking
│       ├── type/                      # Find element + type text
│       ├── scroll/                    # Scroll with condition checking
│       ├── assert/                    # Visual assertion checking
│       ├── hover/                     # Hover over elements (web only)
│       ├── drag/                      # Drag from one element to another
│       ├── read/                      # Extract text from screen into memory
│       ├── refresh/                   # Refresh the current page
│       ├── save-clipboard/            # Save clipboard content to memory
│       └── wait-until/                # Wait for visual condition (not LLM-exposed)
├── execution-agent/                   # Core AI agent loop
│   ├── agent/
│   │   ├── execution-agent.ts         # Main agent class
│   │   ├── execution-agent-factory.ts # Abstract factory for building agents
│   │   ├── execution-result.ts        # Result types
│   │   ├── test-case.ts               # TestCase interface
│   │   ├── system-prompt.ts           # Agent system prompt
│   │   ├── memory/                    # Variable memory store
│   │   ├── components/
│   │   │   └── wait-planner.ts        # Generates wait conditions between steps
│   │   └── tools/                     # LLM tools
│   │       ├── command-tool.ts        # Wraps Command as an AI SDK tool
│   │       ├── execution-finished-tool.ts
│   │       ├── ask-user-tool.ts
│   │       ├── wait-tool.ts
│   │       └── skill-resolver-tool.ts
│   ├── runner/
│   │   ├── execution-agent-runner.ts  # Main runner - ties installer + factory + recording
│   │   ├── artifacts.ts               # Writes screenshots, steps, video to disk
│   │   └── events.ts                  # Event hooks (beforeStep, afterStep, frame)
│   └── local-dev/
│       ├── local-runner.ts            # Local dev runner (loads markdown test files)
│       └── load-test-case.ts          # Parses markdown frontmatter into test cases
└── platform/                          # Platform driver interfaces
    ├── context/
    │   ├── base-context.ts            # BaseCommandContext (screen + application drivers)
    │   ├── installer.ts               # Abstract Installer
    │   ├── image-stream.ts            # Live frame streaming interface
    │   └── video-recorder.ts          # Abstract VideoRecorder with state machine
    └── drivers/
        ├── screen.driver.ts           # screenshot(), getResolution()
        ├── mouse.driver.ts            # click(), hover(), drag(), scroll()
        ├── keyboard.driver.ts         # type(), press(), selectAll(), clear()
        ├── application.driver.ts      # waitUntilStable()
        ├── navigation.driver.ts       # navigate(), getCurrentUrl(), refresh()
        └── clipboard.driver.ts        # read()
```

## CommandSpec - The Command Type System

Every command is defined by a `CommandSpec`:

```ts
interface CommandSpec {
  interaction: string;  // command name (e.g., "click")
  params: object;       // what gets stored for replay
  output: BaseOutput;   // what the command returns (always includes `outcome: string`)
}
```

The `Command<TSpec, TContext>` abstract base class is what all commands extend:

```ts
abstract class Command<TSpec extends CommandSpec, TContext extends BaseCommandContext> {
  abstract readonly interaction: TSpec["interaction"];
  abstract readonly paramsSchema: z.ZodSchema<CommandParams<TSpec>>;
  abstract execute(params: CommandParams<TSpec>, context: TContext): Promise<CommandOutput<TSpec>>;
}
```

The `CommandTool<TSpec, TContext>` class wraps a `Command` to make it compatible with the AI SDK. It adds:

* An `inputSchema()` that defines what the LLM provides (may differ from `paramsSchema`)
* A `description()` shown to the AI model
* An `extractParams()` method that converts LLM input into command parameters

This separation means the LLM can provide a natural language description (“the blue submit button”) while the stored params contain the resolved coordinates and structured data needed for replay.

## Built-in Commands

| Command            | Exposed to LLM | Params                                                      | What it does                                                                                                                                                            |
| ------------------ | -------------- | ----------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **click**          | Yes            | `{ description, options }`                                  | Takes a natural-language element description, uses `PointDetector` AI to locate pixel coordinates, calls `mouse.click(x, y)`                                            |
| **type**           | Yes            | `{ description, text, overwrite }`                          | Uses `PointDetector` to find the input element, clicks it, then types the text. Supports overwrite mode to replace existing content                                     |
| **assert**         | Yes            | `{ instruction }`                                           | Takes an instruction (can contain multiple assertions). Uses `AssertionSplitter` to decompose, takes one screenshot, runs `AssertChecker` on all assertions in parallel |
| **scroll**         | Yes            | `{ elementDescription?, direction, condition, maxScrolls }` | Scrolls up or down on a specific element or the page, checking a visual condition after each scroll                                                                     |
| **hover**          | Yes            | `{ description }`                                           | Hovers over an element identified by natural language description (web only)                                                                                            |
| **drag**           | Yes            | `{ startDescription, endDescription }`                      | Drags from one element to another, both identified by natural language                                                                                                  |
| **read**           | Yes            | `{ description, variableName }`                             | Extracts text from the screen and stores it in the agent’s memory under `variableName` for use in later steps via `{{variableName}}` syntax                             |
| **refresh**        | Yes            | (none)                                                      | Refreshes the current page                                                                                                                                              |
| **save-clipboard** | Yes            | `{ variableName }`                                          | Reads clipboard content and stores it in memory under `variableName`                                                                                                    |
| **wait-until**     | No             | `{ condition, timeout }`                                    | Polls a visual condition every second up to timeout using `VisualConditionChecker`. Auto-generated by `WaitPlanner`, not callable by the LLM                            |

## LLM Tools (Non-Command)

These tools are available to the model but are not recorded as test steps:

| Tool                   | Purpose                                                                                                                  |
| ---------------------- | ------------------------------------------------------------------------------------------------------------------------ |
| **wait**               | Sleeps for N seconds. Useful for loading screens or animations                                                           |
| **ask-user**           | Sends questions to a human via WebSocket. Pauses execution until answered. Only available in frontend-connected sessions |
| **execution-finished** | Called by the model to end the test. Takes `{ success, reasoning }`                                                      |
| **resolve-skill**      | Resolves a reusable sub-flow from a skills directory. Only available when skills config is provided                      |

## Driver Interfaces

Platform-specific apps (`engine-web`, `engine-mobile`) implement these interfaces:

### ScreenDriver

```ts
interface ScreenDriver {
  getResolution(): Promise<ScreenResolution>;
  screenshot(): Promise<Screenshot>;
}
```

### MouseDriver

```ts
interface MouseDriver<TClickOptions extends object = Record<string, never>> {
  click(x: number, y: number, options?: TClickOptions): Promise<void>;
  hover?(x: number, y: number): Promise<void>;
  drag(startX: number, startY: number, endX: number, endY: number): Promise<void>;
  scroll(args: ScrollArgs): Promise<void>;
}
```

### KeyboardDriver

```ts
interface KeyboardDriver {
  selectAll(): Promise<void>;
  clear(): Promise<void>;
  type(text: string, options?: TypeOptions): Promise<void>;
  press(key: string): Promise<void>;
}
```

### ApplicationDriver

```ts
interface ApplicationDriver {
  waitUntilStable(): Promise<void>;
}
```

### NavigationDriver

```ts
interface NavigationDriver {
  navigate(url: string): Promise<void>;
  getCurrentUrl(): Promise<string>;
  refresh(): Promise<void>;
}
```

### ClipboardDriver

```ts
interface ClipboardDriver {
  read(): Promise<string>;
}
```

The `BaseCommandContext` requires only `screen` and `application` drivers. Each platform extends this with additional drivers as needed.

## Memory System

The agent maintains a `MemoryStore` - a key-value store that persists across steps within a single execution. Commands like `read` and `save-clipboard` write values into memory, and any subsequent command can reference stored values using `{{variableName}}` template syntax.

When a command executes, the agent resolves `{{variableName}}` templates in the parameters before passing them to the command. The unresolved params are stored for replay (keeping the template references), while the resolved values are used for actual execution.

## Adding a New Command

1. **Define the spec.** Create a `CommandSpec` type for the command’s interaction, params, and output:

packages/engine/src/commands/commands/my-command/my-command.def.ts

```ts
import z from "zod";


export interface MyCommandSpec {
  interaction: "my-command";
  params: { target: string; value: number };
  output: { outcome: string; success: boolean };
}


export const myCommandParamsSchema = z.object({
  target: z.string().describe("Description for the LLM"),
  value: z.number().describe("A numeric value"),
});
```

2. **Implement the command.** Create a class extending `Command`:

packages/engine/src/commands/commands/my-command/my-command.command.ts

```ts
import { Command } from "../../command";
import { type MyCommandSpec, myCommandParamsSchema } from "./my-command.def";


export class MyCommand extends Command<MyCommandSpec, YourContext> {
  readonly interaction = "my-command" as const;
  readonly paramsSchema = myCommandParamsSchema;


  async execute(params, context) {
    // Use context drivers to perform the action
    return { outcome: "Did the thing", success: true };
  }
}
```

3. **Create the tool wrapper.** Create a `CommandTool` subclass that defines how the LLM interacts with the command:

packages/engine/src/execution-agent/agent/tools/commands/my-command.tool.ts

```ts
import { CommandTool } from "../command-tool";
import type { MyCommandSpec } from "../../../../commands/commands/my-command/my-command.def";


export class MyCommandTool extends CommandTool<MyCommandSpec, YourContext> {
  protected inputSchema() { return myCommandParamsSchema; }
  description() { return "Description shown to the AI model"; }
  protected async extractParams(input, context) { return input; }
}
```

4. **Register it.** Add the tool to the command tools array in your `ExecutionAgentFactory` subclass.

5. **Add the spec to the union type** in `packages/engine/src/commands/command-defs.ts` so TypeScript knows about it.

## Extending for a New Platform

1. **Implement all driver interfaces** using your platform’s SDK. At minimum you need `ScreenDriver` and `ApplicationDriver` (the `BaseCommandContext`). Add `MouseDriver`, `KeyboardDriver`, `NavigationDriver`, and `ClipboardDriver` as needed.

2. **Create an `Installer` subclass** that builds the context. The installer receives application data (URL, device config, etc.) and returns the context with all drivers, plus an `ImageStream` and `VideoRecorder`:

```ts
class MyPlatformInstaller extends Installer<MyAppData, MyContext> {
  async install(appData: MyAppData) {
    // Launch browser/device, create driver instances
    return { context, imageStream, videoRecorder };
  }
}
```

3. **Create an `ExecutionAgentFactory` subclass** that builds the agent with platform-specific command tools:

```ts
class MyPlatformAgentFactory extends ExecutionAgentFactory<MySpec, MyContext> {
  async buildAgent(params) {
    return new ExecutionAgent({
      model: this.model,
      systemPrompt: this.systemPrompt,
      maxSteps: 50,
      commandTools: [new ClickTool(...), new TypeTool(...), ...],
      // ...rest of config
      ...params,
    });
  }
}
```

4. **Create a runner entry point** that wires the installer, factory, and event handlers together using `ExecutionAgentRunner`.

## The Runner and Artifacts

`ExecutionAgentRunner` orchestrates a full test run:

1. Calls `Installer.install()` to build the platform context (browser/device + drivers)
2. Registers a frame handler for live streaming
3. Builds the `ExecutionAgent` via the factory
4. Wraps `agent.generate()` in `VideoRecorder.withRecording()`
5. Returns `{ result, videoPath }`

`LocalRunner` extends this for local development - it loads test cases from markdown files and saves artifacts to disk:

```plaintext
artifacts/{timestamp}-{testName}/
├── screenshots/step-0-before.jpeg, step-0-after.jpeg, ...
├── steps.json          # Array of step execution outputs
├── conversation.json   # Sanitized AI turn log
├── instruction.txt     # The test prompt
└── video.{ext}         # Recording
```

## Result Types

**`GeneratedStep<TSpec>`** - one step of execution:

* `executionOutput` - the command’s step data (interaction + params) and result
* `waitCondition` - an optional wait condition for replay
* `beforeMetadata` / `afterMetadata` - screenshots and other metadata from before/after the step

**`ExecutionResult<TSpec>`** - the full test result:

* `generatedSteps` - all steps
* `memory` - final state of extracted variables
* `success` - whether the test passed
* `finishReason` - `"success"`, `"max_steps"`, or `"error"`
* `reasoning` - the model’s explanation for finishing
* `conversation` - the full AI message history

**`LeanExecutionResult<TSpec>`** - a network-safe version that strips large image buffers from step metadata.

## Test Cases as Markdown

Test files use [gray-matter](https://github.com/jonschlinkert/gray-matter) frontmatter for parameters, with the body containing the natural language prompt:

```markdown
---
url: https://example.com
---
Navigate to the login page, enter "user@test.com" and "password123",
click Sign In, and assert the dashboard is visible.
```

The `loadTestCase` function parses the frontmatter against a Zod schema and extracts the prompt from the body. It also walks up the directory tree looking for an `autonoma/skills/` directory to auto-load skill definitions.

# AI Package

> Deep dive into the AI primitives that power test execution - model registry, visual checkers, point detection, object detection, and structured output generation.

The `@autonoma/ai` package provides every AI primitive used by the execution agent. It handles model management, visual analysis, element location, structured output generation, and evaluation benchmarking. No AI logic should be duplicated in platform apps - everything lives here.

## Directory Structure

```plaintext
packages/ai/src/
├── index.ts                          # Package re-exports
├── env.ts                            # Environment variables (API keys)
├── registry/                         # Model registry and configuration
│   ├── model-registry.ts             # Core ModelRegistry class
│   ├── model-entries.ts              # Model definitions and pricing
│   ├── providers.ts                  # LLM provider singletons
│   ├── options.ts                    # ModelOptions, reasoning effort levels
│   ├── costs.ts                      # Cost calculation functions
│   ├── cost-collector.ts             # Aggregated cost tracking
│   ├── usage.ts                      # Token usage tracking
│   └── monitoring.ts                 # Logging middleware and telemetry
├── visual/                           # Visual AI primitives
│   ├── visual-condition-checker.ts   # Check if a condition is met on a screenshot
│   ├── assert-checker.ts             # Validate test assertions
│   ├── visual-chooser.ts             # Pick which UI element matches an instruction
│   └── text-extractor.ts             # Extract text from screenshots
├── text/
│   └── assertion-splitter.ts         # Split compound assertions into atomic ones
├── object/                           # Structured output generation
│   ├── object-generator.ts           # Core structured JSON generator
│   ├── retry.ts                      # Retry with exponential backoff
│   ├── user-messages.ts              # Build multimodal messages (text + images + video)
│   └── video/
│       ├── video-processor.ts        # Upload videos to Google GenAI Files API
│       └── video-input.ts            # Video input types and model support
└── freestyle/                        # Point and object detection
    ├── resolution-fallback.ts        # Coordinate resolution management
    ├── point/
    │   ├── point-detector.ts         # Abstract PointDetector base
    │   ├── gemini-computer-use-point-detector.ts
    │   └── object-point-detector.ts  # Adapter: ObjectDetector -> PointDetector
    └── object/
        ├── object-detector.ts        # Abstract ObjectDetector base
        └── gemini-object-detector.ts # Gemini-based bounding box detection
```

## Model Registry

`ModelRegistry<TModel>` manages all LLM instances with middleware for cost tracking and monitoring. It wraps the Vercel AI SDK’s language models with usage tracking and provider-specific configuration.

### How It Works

The registry is constructed with a map of model entries. Each entry knows how to create its model instance and how to calculate costs:

```ts
const registry = new ModelRegistry({
  models: MODEL_ENTRIES,
  defaultSettings: { temperature: 0 },
  monitoring: { onGenerate: (result) => { /* log it */ } },
});
```

When you request a model, the registry wraps it with middleware for usage tracking, monitoring, and default settings:

```ts
const model = registry.getModel({
  model: "GEMINI_3_FLASH_PREVIEW",
  tag: "assert-checker",
  reasoning: "low",
});
```

The `tag` field identifies the use case (e.g., “assert-checker”, “click-detector”) for monitoring and cost attribution. The `reasoning` field sets the thinking effort level.

### Current Models

| Key                      | Model ID                      | Provider   |
| ------------------------ | ----------------------------- | ---------- |
| `GEMINI_3_FLASH_PREVIEW` | `gemini-3-flash-preview`      | Google     |
| `MINISTRAL_8B`           | `mistralai/ministral-8b-2512` | OpenRouter |
| `GPT_OSS_120B`           | `openai/gpt-oss-120b`         | Groq       |

An alternative `OPENROUTER_MODEL_ENTRIES` set routes all models through OpenRouter, including a Gemini variant (`google/gemini-3-flash-preview`) and a Llama variant (`meta-llama/llama-4-maverick`) in place of Ministral.

### Providers

Three LLM provider singletons are available, each lazily initialized with their respective API key:

| Provider             | SDK                           | Env Variable         |
| -------------------- | ----------------------------- | -------------------- |
| `googleProvider`     | `@ai-sdk/google`              | `GEMINI_API_KEY`     |
| `groqProvider`       | `@ai-sdk/groq`                | `GROQ_KEY`           |
| `openRouterProvider` | `@openrouter/ai-sdk-provider` | `OPENROUTER_API_KEY` |

The `LLMProvider` class wraps each provider as a singleton - the underlying SDK instance is created on first use.

### Reasoning Effort

The `ModelReasoningEffort` type supports four levels:

| Level      | Groq                        | Google                    |
| ---------- | --------------------------- | ------------------------- |
| `"none"`   | `reasoningEffort: "none"`   | Thinking disabled         |
| `"low"`    | `reasoningEffort: "low"`    | `thinkingLevel: "low"`    |
| `"medium"` | `reasoningEffort: "medium"` | `thinkingLevel: "medium"` |
| `"high"`   | `reasoningEffort: "high"`   | `thinkingLevel: "high"`   |

Reasoning effort is translated to provider-specific options in `buildSettings()`, so callers never need to think about which provider they are targeting.

### Extra Context

The registry supports dynamic context that can be attached during execution:

```ts
registry.addContext({ testRunId: "run-123", stepIndex: 3 });
// Later...
registry.resetContext();
```

This context is passed to monitoring callbacks, making it possible to trace costs and usage back to specific test runs and steps.

## Visual AI Primitives

### VisualConditionChecker

The base class for checking whether a condition is met on a screenshot. It extends `ObjectGenerator` with a predefined schema:

```ts
const checker = new VisualConditionChecker({ model });
const result = await checker.checkCondition(
  "The login form is visible with email and password fields",
  screenshot,
);
// result: { metCondition: true, reason: "The form is visible with both fields" }
```

Returns `{ metCondition: boolean, reason: string }`.

### AssertChecker

Extends `VisualConditionChecker` with a specialized system prompt for test assertions. It handles both positive assertions (“validate there’s a title that says Hello”) and negative assertions (“assert there’s no download button”):

```ts
const checker = new AssertChecker(model);
const result = await checker.checkCondition(
  "The submit button is disabled",
  screenshot,
);
```

Used by the `assert` command to validate each individual assertion against a screenshot.

### VisualChooser

Picks which UI element from a set of options matches a user instruction. It draws numbered bounding boxes on the screenshot and asks the model to choose:

```ts
const chooser = new VisualChooser({ model });
const result = await chooser.chooseOption({
  options: [
    { boundingBox: { x: 10, y: 20, width: 100, height: 30 }, description: "Submit" },
    { boundingBox: { x: 10, y: 60, width: 100, height: 30 }, description: "Cancel" },
  ],
  instruction: "Click the submit button",
  screenshot,
});
// result: { reasoning: "Option 1 is the submit button", option: { ... } }
```

Throws `NoValidOptionFoundError` if no option matches, or `InvalidIndexError` if the model returns an out-of-bounds index.

### AssertionSplitter

Splits a compound assertion instruction into individual atomic assertions that can be checked independently:

```ts
const splitter = new AssertionSplitter(model);
const result = await splitter.splitAssertions(
  "validate that the title is visible, the subtitle as well but the button is not",
);
// result.assertions: [
//   "validate that the title is visible",
//   "validate that the subtitle is visible",
//   "validate that the button is not visible"
// ]
```

Importantly, the splitter ensures each split assertion contains enough context to stand alone. It repairs incomplete fragments (e.g., “the subtitle as well” becomes “validate that the subtitle is visible”).

## Point Detection

Point detectors locate where to interact on screen, given a natural language description. They are used by the `click`, `type`, `hover`, and `drag` commands.

### Abstract Base

All point detectors extend `PointDetector`:

```ts
abstract class PointDetector {
  protected abstract detectPointForResolution(
    screenshot: Screenshot,
    prompt: string,
    resolution: ScreenResolution,
  ): Promise<Point>;


  async detectPoint(
    screenshot: Screenshot,
    prompt: string,
    targetResolution?: ScreenResolution,
  ): Promise<Point>;
}
```

The public `detectPoint` method handles resolution fallback automatically - if no target resolution is provided, it defaults to the device resolution (if configured) or the image resolution.

### GeminiComputerUsePointDetector

Uses Google’s Gemini computer-use API with a `click_at` tool. The model returns coordinates in a normalized 0-1000 space, which are then scaled to actual pixel coordinates based on the target resolution.

### ObjectPointDetector

An adapter that converts an `ObjectDetector` into a `PointDetector`. It detects the bounding box of an element and returns the center point. Useful when you have an object detector but need point-level precision.

## Object Detection

### ObjectDetector (Abstract Base)

Detects objects in an image and returns bounding boxes:

```ts
abstract class ObjectDetector {
  async detectObjects(
    screenshot: Screenshot,
    prompt: string,
    targetResolution?: ScreenResolution,
  ): Promise<DetectedObject[]>;
}
```

Each `DetectedObject` contains a `boundingBox` and an optional `label`.

### GeminiObjectDetector

Uses Gemini’s structured output to return bounding boxes as normalized 0-1000 coordinates. Useful for detecting multiple UI elements at once.

## ObjectGenerator

The core structured output engine used by almost every AI primitive in the package. It wraps the AI SDK’s `generateText` with:

* **Zod schema validation** for structured JSON output
* **Automatic retry** with exponential backoff (default: 5 retries, 100ms initial delay, 2x backoff factor)
* **Multimodal input** via `ObjectGenerationParams` - supports text, images, and video
* **Null byte stripping** from responses for PostgreSQL compatibility
* **Tool support** for agentic generation workflows (stops after 5 tool steps)

```ts
const generator = new ObjectGenerator({
  model,
  systemPrompt: "You are a UI analysis expert.",
  schema: z.object({
    elements: z.array(z.object({
      label: z.string(),
      visible: z.boolean(),
    })),
  }),
});


const result = await generator.generate({
  userPrompt: "List all visible buttons",
  images: [screenshot],
});
```

Video input is supported for models that handle it (checked via `modelSupportsVideo`). Videos are uploaded through the Google GenAI Files API via `VideoProcessor`.

If generation fails after all retries, an `ObjectGenerationFailedError` is thrown wrapping the original error.

## Adding a New Model

1. **Add the model entry** to `packages/ai/src/registry/model-entries.ts`:

```ts
export const MODEL_ENTRIES = {
  // ...existing entries
  MY_NEW_MODEL: {
    createModel: () => googleProvider.getModel("my-new-model-id"),
    pricing: simpleCostFunction({
      inputCostPerM: 0.5,
      outputCostPerM: 1.5,
    }),
  },
} as const;
```

2. **Choose the right cost function.** Use `simpleCostFunction` for models without cache pricing, or `inputCacheCostFunction` for models that support input caching (adds a `cachedInputCostPerM` field).

3. **Add a provider** if needed. If the model uses a provider not yet configured, add a new `LLMProvider` singleton in `providers.ts` and add the corresponding API key to `env.ts`.

4. **Use the model** by referencing its key when calling `registry.getModel()`:

```ts
const model = registry.getModel({
  model: "MY_NEW_MODEL",
  tag: "my-use-case",
  reasoning: "medium",
});
```

## Adding a New Visual AI Primitive

Most visual primitives follow the same pattern: extend `ObjectGenerator` with a specialized schema and system prompt.

1. **Define the output schema** with Zod:

```ts
const myPrimitiveSchema = z.object({
  elements: z.array(z.object({
    name: z.string(),
    confidence: z.number(),
  })),
});
type MyPrimitiveResult = z.infer<typeof myPrimitiveSchema>;
```

2. **Create the class** extending `ObjectGenerator`:

```ts
export class MyPrimitive extends ObjectGenerator<MyPrimitiveResult> {
  constructor(model: LanguageModel) {
    super({
      model,
      systemPrompt: "Your specialized system prompt here.",
      schema: myPrimitiveSchema,
    });
  }


  async analyze(screenshot: Screenshot, instruction: string): Promise<MyPrimitiveResult> {
    return this.generate({ images: [screenshot], userPrompt: instruction });
  }
}
```

3. **Export it** from the package index.

For point or object detection, extend `PointDetector` or `ObjectDetector` instead and implement the `detectPointForResolution` or `detectObjectsForResolution` method.

## Evaluation Framework

The `evals/` directory contains a Vitest-integrated framework for benchmarking AI accuracy:

* **`Evaluation<TTestCase>`** - base class that defines test cases and runs them against models

* **`ModelEvaluation`** - tracks token usage and cost per model across an evaluation run

* **Three eval types:**

  * `assert-condition/` - measures assertion checking accuracy
  * `freestyle-click/` - measures point detection accuracy
  * `wait-for-instruction/` - measures wait condition generation accuracy

Results are saved as JSON with pass rates and per-case breakdowns, making it easy to compare models and track accuracy over time.

# E2E Test Planner

> A Claude Code plugin that analyzes your codebase, integrates the Autonoma SDK, and produces a complete E2E test suite.

The Test Planner is a **Claude Code plugin** that takes a web application codebase and runs a **five-step pipeline**:

1. integrate the Autonoma SDK into your project
2. generate a knowledge base
3. design test data scenarios
4. generate E2E tests
5. validate the scenarios against the live endpoint

The final output is a test suite plus validated scenario recipes that Autonoma can execute against your Environment Factory endpoint.

## Installation

Claude Code   OpenAI Codex Soon   OpenCode Soon

Install the plugin from the marketplace:

```bash
/plugin marketplace add Autonoma-AI/test-planner-plugin
```

Then register it with Claude Code:

```bash
/plugin install autonoma-test-planner@autonoma
```

Then run the plugin:

```bash
/autonoma-test-planner:generate-tests
```

OpenAI Codex support is coming soon.

OpenCode support is coming soon.

## Before you start

The plugin still runs primarily against your **frontend codebase**, but it now starts by wiring the SDK integration itself instead of expecting you to provide a ready-made endpoint up front.

Before you start, make sure you have:

* access to the frontend codebase

* access to the backend codebase if it lives separately from the frontend

* these environment variables in the Claude Code session:

  * `AUTONOMA_API_KEY`
  * `AUTONOMA_PROJECT_ID`
  * `AUTONOMA_API_URL`

Optional:

* `AUTONOMA_AUTO_ADVANCE=true` to keep the plugin moving automatically after Steps 1-4
* `AUTONOMA_REQUIRE_CONFIRMATION=false` as a temporary compatibility alias for older launcher flows

You do **not** need to pre-supply the SDK endpoint URL or the shared request-signing secret anymore. The plugin creates the branch, installs the SDK, starts a local dev server, verifies the endpoint, and writes the discovered endpoint URL into the pipeline artifacts.

If your stack is unsupported, the plugin stops in Step 1 and tells you how to contact Autonoma.

## The five steps

### Step 1 - SDK Integration

The plugin detects your stack, installs the SDK from the appropriate package manager, wires the endpoint into your app, starts a dev server, and verifies `discover`, `up`, and `down`.

**Consumes:** Your repo plus the synced plugin SDK matrix. **Produces:** working SDK integration + `autonoma/.sdk-endpoint`

### Step 2 - Generate a knowledge base

The agent analyzes your codebase and produces `AUTONOMA.md`: a user-perspective guide to your pages, flows, and interactions. It also generates `skills/` files the test runner can use for navigation.

**Consumes:** Your codebase. **Produces:** `autonoma/AUTONOMA.md` + `autonoma/skills/*.md`

### Step 3 - Generate test data scenarios

The agent uses the SDK `discover` artifact captured during Step 1, plus the knowledge base, to design the `standard`, `empty`, and `large` scenarios.

**Consumes:** Knowledge base + verified SDK endpoint/discover output. **Produces:** `autonoma/discover.json` + `autonoma/scenarios.md`

### Step 4 - Generate E2E tests

The agent generates an E2E test suite as natural-language markdown files organized by flow and priority.

**Consumes:** Knowledge base + scenarios. **Produces:** `autonoma/qa-tests/INDEX.md` + `autonoma/qa-tests/**/*`

### Step 5 - Scenario Validation

The plugin validates the planned scenarios against the live endpoint from Step 1, smoke-tests the lifecycle, and uploads executable recipe artifacts.

**Consumes:** `discover.json` + `scenarios.md` + live SDK endpoint. **Produces:** `autonoma/scenario-recipes.json`

## How the steps connect

* **Step 1** makes the rest of the pipeline possible by producing a verified SDK endpoint.
* **Step 2** gives the planner the user-facing map of the product.
* **Step 3** turns the verified schema and product context into concrete scenario plans.
* **Step 4** turns those plans into executable E2E test coverage.
* **Step 5** turns the scenario plan into validated runtime recipes.

## Validation model

The plugin uses **deterministic validators**, not LLM-only checks:

* PostToolUse hooks validate file writes immediately.
* Step-level validators verify required formats and cross-file consistency.
* Recipe preflight checks verify that generated scenarios work against the live endpoint.

If validation fails, the pipeline stops and the agent must fix the issue before continuing.

## Review checkpoints

The canonical launch mode is `AUTONOMA_AUTO_ADVANCE=true`, which keeps the plugin moving after Steps 1-4. If you are still using the older confirmation flag, `AUTONOMA_REQUIRE_CONFIRMATION=false` behaves the same way. When a review checkpoint appears, focus on:

| After step | What to review                                        | Why it matters                                                                         |
| ---------- | ----------------------------------------------------- | -------------------------------------------------------------------------------------- |
| Step 1     | Detected stack and SDK integration result             | Confirms the plugin wired the correct framework/ORM path before planning builds on it. |
| Step 2     | Core flows identified                                 | Determines how the test budget is distributed.                                         |
| Step 3     | Scenario entities, relationships, and variable fields | These become the data contract for tests and validation.                               |
| Step 4     | Journey and critical test samples                     | Confirms assertions and coverage quality before validation.                            |
| Step 5     | Scenario validation results                           | Confirms the endpoint can execute the planned data lifecycle.                          |

# Step 1: SDK Integration

> Detect the project stack, integrate the Autonoma SDK, start a dev server, and verify the endpoint before planning begins.

The first stage of the Test Planner is now **SDK Integration**. Instead of asking you to pre-configure the SDK endpoint URL and shared request-signing secret, the plugin handles the integration itself.

## What this step does

* detects the project stack
* installs the SDK from the appropriate package manager
* wires the SDK endpoint into the application
* ensures the required local secrets exist
* starts or reuses a dev server
* verifies `discover`, `up`, and `down`
* writes `autonoma/.sdk-endpoint` for later stages
* writes `autonoma/.sdk-integration.json` so the orchestrator can prove Step 1 completed cleanly

## Prerequisites

* your repository is open in Claude Code

* the Claude session has:

  * `AUTONOMA_API_KEY`
  * `AUTONOMA_PROJECT_ID`
  * `AUTONOMA_API_URL`

## Supported and unsupported stacks

If the plugin finds a supported stack, it continues automatically into the rest of the pipeline.

If the stack is unsupported, the pipeline stops here and gives you a contact path to Autonoma instead of trying to guess the integration.

## What to review

The canonical launch mode is `AUTONOMA_AUTO_ADVANCE=true`, which continues directly to Step 2 after validation. If you are still using the older confirmation flag, `AUTONOMA_REQUIRE_CONFIRMATION=false` is treated as the same auto-advance behavior.

If the plugin surfaces a review checkpoint here, focus on:

* whether it detected the right framework and ORM
* whether the endpoint path looks correct for your project
* whether the dev server and smoke tests succeeded
* whether the repo changes are isolated to SDK integration work