# E2E Testing Facility Agor's end-to-end testing uses **WebDriverIO + tauri-driver** to drive the real Tauri application through WebKit2GTK's inspector protocol. The facility has three pillars: 1. **Test Fixtures** — isolated fake environments with dummy projects 2. **Test Mode** — app-level env vars that disable watchers and redirect data/config paths 3. **LLM Judge** — Claude-powered semantic assertions for evaluating agent behavior ## Quick Start ```bash # Run all tests (vitest + cargo + E2E) npm run test:all:e2e # Run E2E only (requires pre-built debug binary) SKIP_BUILD=1 npm run test:e2e # Build debug binary separately (faster iteration) cargo tauri build --debug --no-bundle # Run with LLM judge via CLI (default, auto-detected) npm run test:e2e # Force LLM judge to use API instead of CLI LLM_JUDGE_BACKEND=api ANTHROPIC_API_KEY=sk-... npm run test:e2e ``` ## Prerequisites | Dependency | Purpose | Install | |-----------|---------|---------| | Rust + Cargo | Build Tauri backend | [rustup.rs](https://rustup.rs) | | Node.js 20+ | Frontend + test runner | `mise install node` | | tauri-driver | WebDriver bridge to WebKit2GTK | `cargo install tauri-driver` | | X11 display | WebKit2GTK needs a display | Real X, or `xvfb-run` in CI | | Claude CLI | LLM judge (optional) | [claude.ai/download](https://claude.ai/download) | ## Architecture ``` +-----------------------------------------------------+ | WebDriverIO (mocha runner) | | specs/*.test.ts | | +- browser.execute() -> DOM queries + assertions | | +- assertWithJudge() -> LLM semantic evaluation | +-----------------------------------------------------+ | tauri-driver (port 4444) | | WebDriver protocol <-> WebKit2GTK inspector | +-----------------------------------------------------+ | Agor debug binary | | AGOR_TEST=1 (disables watchers, wake scheduler) | | AGOR_TEST_DATA_DIR -> isolated SQLite DBs | | AGOR_TEST_CONFIG_DIR -> test groups.json | +-----------------------------------------------------+ ``` ## Pillar 1: Test Fixtures (`fixtures.ts`) The fixture generator creates isolated temporary environments so tests never touch real user data. Each fixture includes: - **Temp root dir** under `/tmp/agor-e2e-{timestamp}/` - **Data dir** — empty, SQLite databases created at runtime - **Config dir** — contains a generated `groups.json` with test projects - **Project dir** — a real git repo with `README.md` and `hello.py` (for agent testing) ### Single-Project Fixture ```typescript import { createTestFixture, destroyTestFixture } from '../fixtures'; const fixture = createTestFixture('my-test'); // fixture.rootDir -> /tmp/my-test-1710234567890/ // fixture.dataDir -> /tmp/my-test-1710234567890/data/ // fixture.configDir -> /tmp/my-test-1710234567890/config/ // fixture.projectDir -> /tmp/my-test-1710234567890/test-project/ // fixture.env -> { AGOR_TEST: '1', AGOR_TEST_DATA_DIR: '...', ... } destroyTestFixture(fixture); ``` ### Multi-Project Fixture ```typescript import { createMultiProjectFixture } from '../fixtures'; const fixture = createMultiProjectFixture(3); // 3 separate git repos ``` ### Fixture Environment Variables | Variable | Effect | |----------|--------| | `AGOR_TEST=1` | Disables file watchers, wake scheduler, enables `is_test_mode` | | `AGOR_TEST_DATA_DIR` | Redirects `sessions.db` and `btmsg.db` storage | | `AGOR_TEST_CONFIG_DIR` | Redirects `groups.json` config loading | ## Pillar 2: Test Mode When `AGOR_TEST=1` is set: - **Rust backend**: `watcher.rs` and `fs_watcher.rs` skip file watchers - **Frontend**: `is_test_mode` Tauri command returns true, wake scheduler disabled via `disableWakeScheduler()` - **Data isolation**: `AGOR_TEST_DATA_DIR` / `AGOR_TEST_CONFIG_DIR` override default paths The WebDriverIO config (`wdio.conf.js`) passes these env vars via `tauri:options.env` in capabilities. ## Pillar 3: LLM Judge (`llm-judge.ts`) The LLM judge enables semantic assertions — evaluating whether agent output "looks right" rather than exact string matching. ### Dual Backend | Backend | How it works | Requires | |---------|-------------|----------| | `cli` (default) | Spawns `claude` CLI with `--output-format text` | Claude CLI installed | | `api` | Raw `fetch` to `https://api.anthropic.com/v1/messages` | `ANTHROPIC_API_KEY` env var | **Auto-detection order**: CLI first -> API fallback -> skip test. ### API ```typescript import { isJudgeAvailable, judge, assertWithJudge } from '../llm-judge'; if (!isJudgeAvailable()) { this.skip(); return; } const verdict = await judge( 'The output should contain a file listing with at least one filename', actualOutput, 'Agent was asked to list files in a directory containing README.md', ); // verdict: { pass: boolean, reasoning: string, confidence: number } ``` ## Test Spec Files | File | Phase | Tests | Focus | |------|-------|-------|-------| | `agor.test.ts` | Smoke | ~50 | Basic UI rendering, CSS class selectors | | `phase-a-structure.test.ts` | A | 12 | Structural integrity + settings (Scenarios 1-2) | | `phase-a-agent.test.ts` | A | 15 | Agent pane + prompt submission (Scenarios 3+7) | | `phase-a-navigation.test.ts` | A | 15 | Terminal tabs + palette + focus (Scenarios 4-6) | | `phase-b.test.ts` | B | ~15 | Multi-project grid, LLM-judged agent responses | | `phase-c.test.ts` | C | 27 | Hardening features (palette, search, notifications, keyboard, settings, health, metrics, context, files) | ## Test Results Tracking (`results-db.ts`) A lightweight JSON store for tracking test runs and individual step results. Writes to `test-results/results.json`. ## CI Integration (`.github/workflows/e2e.yml`) 1. **Unit tests** — `npm run test` (vitest) 2. **Cargo tests** — `cargo test` (with `env -u AGOR_TEST` to prevent env leakage) 3. **E2E tests** — `xvfb-run npm run test:e2e` (virtual framebuffer for headless WebKit2GTK) LLM-judged tests are gated on the `ANTHROPIC_API_KEY` secret — they skip gracefully in forks. ## Writing New Tests 1. Pick the appropriate spec file (or create a new phase file) 2. Use `data-testid` selectors where possible 3. For DOM queries, use `browser.execute()` to run JS in the app context 4. For semantic assertions, use `assertWithJudge()` with clear criteria ### WebDriverIO Config (`wdio.conf.js`) - **Single session**: `maxInstances: 1` — tauri-driver can't handle parallel sessions - **Lifecycle**: `onPrepare` builds debug binary, `beforeSession` spawns tauri-driver with TCP readiness probe - **Timeouts**: 60s per test, 10s waitfor, 30s connection retry - **Skip build**: Set `SKIP_BUILD=1` to reuse existing binary ## Troubleshooting | Problem | Solution | |---------|----------| | "Callback was not called before unload" | Stale binary — rebuild with `cargo tauri build --debug --no-bundle` | | Tests hang on startup | Kill stale `tauri-driver` processes: `pkill -f tauri-driver` | | All tests skip LLM judge | Install Claude CLI or set `ANTHROPIC_API_KEY` | | SIGUSR2 / exit code 144 | Stale tauri-driver on port 4444 — kill and retry | | `AGOR_TEST` leaking to cargo | Run cargo tests with `env -u AGOR_TEST cargo test` | | No display available | Use `xvfb-run` or ensure X11/Wayland display is set |