refactor(e2e): extract infrastructure into tests/e2e/infra/ module

- Move fixtures.ts, llm-judge.ts, results-db.ts to tests/e2e/infra/ - Deduplicate wdio.conf.js: use createTestFixture() instead of inline copy - Replace __dirname paths with projectRoot-anchored paths - Create test-mode-constants.ts (typed env var names, flag registry) - Create scripts/preflight-check.sh (validates tauri-driver, display, Claude CLI) - Create scripts/check-test-flags.sh (CI lint for AGOR_TEST flag drift) - Rewrite tests/e2e/README.md with full documentation - Update spec imports for moved infra files
2026-03-18 03:06:57 +01:00 · 2026-03-18 03:06:57 +01:00 · e76bc341f2
commit e76bc341f2
parent 538a31f85c
10 changed files with 235 additions and 191 deletions
--- a/tests/e2e/README.md
+++ b/tests/e2e/README.md
@ -1,143 +1,88 @@
-# E2E Tests (WebDriver)
+# E2E Testing Module

-Tauri apps use the WebDriver protocol for E2E testing (not Playwright directly).
-The app runs inside WebKit2GTK on Linux, so tests interact with the real WebView.
+Browser automation tests for Agent Orchestrator using WebDriverIO + tauri-driver.

-## Prerequisites
-
- Rust toolchain (for building the Tauri app)
- Display server (X11 or Wayland) — headless Xvfb works for CI
- `tauri-driver` installed: `cargo install tauri-driver`
- `webkit2gtk-driver` system package: `sudo apt install webkit2gtk-driver`
- npm devDeps already in package.json (`@wdio/cli`, `@wdio/local-runner`, `@wdio/mocha-framework`, `@wdio/spec-reporter`)
-
-## Running
+## Quick Start

 ```bash
-# From v2/ directory — builds debug binary automatically, spawns tauri-driver
-npm run test:e2e
+# Preflight check (validates dependencies)
+./scripts/preflight-check.sh

-# Skip rebuild (use existing binary)
+# Build debug binary + run E2E
+npm run test:all:e2e
+
+# Run E2E only (skip build)
 SKIP_BUILD=1 npm run test:e2e

-# With test isolation (custom data/config dirs)
-AGOR_TEST_DATA_DIR=/tmp/bt-test/data AGOR_TEST_CONFIG_DIR=/tmp/bt-test/config npm run test:e2e
+# Headless (CI)
+xvfb-run --auto-servernum npm run test:e2e
 ```

-The `wdio.conf.js` handles:
-1. Building the debug binary (`cargo tauri build --debug --no-bundle`) in `onPrepare`
-2. Spawning `tauri-driver` before each session (TCP readiness probe, 10s deadline)
-3. Killing `tauri-driver` after each session
-4. Passing `AGOR_TEST=1` env var to the app for test mode isolation
+## System Dependencies

-## Test Mode (`AGOR_TEST=1`)
+| Tool | Required | Install |
+|------|----------|---------|
+| tauri-driver | Yes | `cargo install tauri-driver` |
+| Debug binary | Yes | `cargo tauri build --debug --no-bundle` |
+| X11/Wayland | Yes (Linux) | Use `xvfb-run` in CI |
+| Claude CLI | Optional | LLM-judged tests skip if absent |
+| ANTHROPIC_API_KEY | Optional | Alternative to Claude CLI for LLM judge |

-When `AGOR_TEST=1` is set:
- File watchers (watcher.rs, fs_watcher.rs) are disabled to avoid inotify noise
- Wake scheduler is disabled (no auto-wake timers)
- Data/config directories can be overridden via `AGOR_TEST_DATA_DIR` / `AGOR_TEST_CONFIG_DIR`
-
-## CI setup (headless)
-
-```bash
-# Install virtual framebuffer + WebKit driver
-sudo apt install xvfb webkit2gtk-driver
-
-# Run with Xvfb wrapper
-xvfb-run npm run test:e2e
-```
-
-## Writing tests
-
-Tests use WebdriverIO with Mocha. Specs go in `specs/`:
-
-```typescript
-import { browser, expect } from '@wdio/globals';
-
-describe('BTerminal', () => {
-  it('should show the status bar', async () => {
-    const statusBar = await browser.$('[data-testid="status-bar"]');
-    await expect(statusBar).toBeDisplayed();
-  });
-});
-```
-
-### Stable selectors
-
-Prefer `data-testid` attributes over CSS class selectors:
-
-| Element | Selector |
-|---------|----------|
-| Status bar | `[data-testid="status-bar"]` |
-| Sidebar rail | `[data-testid="sidebar-rail"]` |
-| Settings button | `[data-testid="settings-btn"]` |
-| Project box | `[data-testid="project-box"]` |
-| Project ID | `[data-project-id="..."]` |
-| Project tabs | `[data-testid="project-tabs"]` |
-| Agent session | `[data-testid="agent-session"]` |
-| Agent pane | `[data-testid="agent-pane"]` |
-| Agent status | `[data-agent-status="idle\|running\|..."]` |
-| Agent messages | `[data-testid="agent-messages"]` |
-| Agent prompt | `[data-testid="agent-prompt"]` |
-| Agent submit | `[data-testid="agent-submit"]` |
-| Agent stop | `[data-testid="agent-stop"]` |
-| Terminal tabs | `[data-testid="terminal-tabs"]` |
-| Add tab button | `[data-testid="tab-add"]` |
-| Terminal toggle | `[data-testid="terminal-toggle"]` |
-| Command palette | `[data-testid="command-palette"]` |
-| Palette input | `[data-testid="palette-input"]` |
-
-### Key constraints
-
- `maxInstances: 1` — Tauri doesn't support parallel WebDriver sessions
- Mocha timeout is 60s — the app needs time to initialize
- Tests interact with the real WebKit2GTK WebView, not a browser
- Use `browser.execute()` for JS clicks when WebDriver clicks don't trigger Svelte handlers
- Agent tests (Scenario 7) require a real Claude CLI install + API key — they skip gracefully if unavailable
-
-## Test infrastructure
-
-### Fixtures (`fixtures.ts`)
-
-Creates isolated test environments with temp data/config dirs and git repos:
-
-```typescript
-import { createTestFixture, destroyTestFixture } from '../fixtures';
-
-const fixture = createTestFixture('my-test');
-// fixture.dataDir, fixture.configDir, fixture.projectDir, fixture.env
-destroyTestFixture(fixture);
-```
-
-### Results DB (`results-db.ts`)
-
-JSON-based test results store for tracking runs and steps:
-
-```typescript
-import { ResultsDb } from '../results-db';
-
-const db = new ResultsDb();
-db.startRun('run-001', 'v2-mission-control', 'abc123');
-db.recordStep({ run_id: 'run-001', scenario_name: 'Smoke', step_name: 'renders', status: 'passed', ... });
-db.finishRun('run-001', 'passed', 5000);
-```
-
-## File structure
+## Directory Structure

 ```
 tests/e2e/
-├── README.md                         # This file
-├── wdio.conf.js                      # WebdriverIO config with tauri-driver lifecycle
-├── tsconfig.json                     # TypeScript config for test specs
-├── fixtures.ts                       # Test fixture generator (isolated environments)
-├── results-db.ts                     # JSON test results store
-└── specs/
-    ├── agor.test.ts             # Smoke tests (CSS class selectors, 50+ tests)
-    └── agent-scenarios.test.ts       # Phase A scenarios (data-testid selectors, 22 tests)
+├── wdio.conf.js          # WebDriverIO config + tauri-driver lifecycle
+├── tsconfig.json          # TypeScript config for specs
+├── README.md              # This file
+├── infra/                 # Test infrastructure (not specs)
+│   ├── fixtures.ts        # Test fixture generator (isolated temp dirs)
+│   ├── llm-judge.ts       # LLM-based assertion engine (Claude CLI / API)
+│   ├── results-db.ts      # JSON test results store
+│   └── test-mode-constants.ts  # Typed env var names for test mode
+└── specs/                 # Test specifications
+    ├── agor.test.ts       # Smoke + UI tests (50+ tests)
+    ├── agent-scenarios.test.ts  # Phase A: agent interaction (22 tests)
+    ├── phase-b.test.ts    # Phase B: multi-project + LLM judge
+    └── phase-c.test.ts    # Phase C: hardening features (11 scenarios)
 ```

-## References
+## Test Mode Environment Variables

- Tauri WebDriver docs: https://v2.tauri.app/develop/tests/webdriver/
- WebdriverIO docs: https://webdriver.io/
- tauri-driver: https://crates.io/crates/tauri-driver
+| Variable | Purpose | Read By |
+|----------|---------|---------|
+| `AGOR_TEST=1` | Enable test isolation | config.rs, misc.rs, lib.rs, watcher.rs, fs_watcher.rs, telemetry.rs, App.svelte |
+| `AGOR_TEST_DATA_DIR` | Override data dir | config.rs |
+| `AGOR_TEST_CONFIG_DIR` | Override config dir | config.rs |
+
+**Effects when AGOR_TEST=1:**
+- File watchers disabled (watcher.rs, fs_watcher.rs)
+- OTLP telemetry export disabled (telemetry.rs)
+- CLI tool installation skipped (lib.rs)
+- Wake scheduler disabled (App.svelte)
+- Test env vars forwarded to sidecar processes (lib.rs)
+
+## Test Phases
+
+| Phase | File | Tests | Type |
+|-------|------|-------|------|
+| Smoke | agor.test.ts | 50+ | Deterministic (CSS/DOM assertions) |
+| A | agent-scenarios.test.ts | 22 | Deterministic (data-testid selectors) |
+| B | phase-b.test.ts | 6+ | LLM-judged (multi-project, agent quality) |
+| C | phase-c.test.ts | 11 scenarios | Mixed (deterministic + LLM-judged) |
+
+## Adding a New Spec
+
+1. Create `tests/e2e/specs/my-feature.test.ts`
+2. Import from `@wdio/globals` for `browser` and `expect`
+3. Use `data-testid` selectors (preferred) or CSS classes
+4. Add to `wdio.conf.js` specs array
+5. For LLM assertions: `import { assertWithJudge } from '../infra/llm-judge'`
+6. Run `./scripts/check-test-flags.sh` if you added new AGOR_TEST references
+
+## CI Workflow
+
+See `.github/workflows/e2e.yml` — 3 jobs:
+1. **unit-tests**: vitest frontend
+2. **cargo-tests**: Rust backend
+3. **e2e-tests**: WebDriverIO (xvfb-run, Phase A+B+C, LLM tests gated on secret)