Phase D — Settings & Error Handling: - D1: Settings panel 6-category tabs, search, active highlighting - D2: Appearance settings (themes, fonts, cursor, scrollback) - D3: Theme Editor (color pickers, groups, save/cancel) - D4: Toast notifications, notification center bell/dropdown - D5: Error states (no loadError warnings, status bar) Phase E — Agents & Health: - E1: ProjectBox tab bar (7+ tabs, PERSISTED-LAZY switching) - E2: Agent session UI (prompt input, context meter, cost) - E3: Provider configuration (panels, capabilities, toggles) - E4: Status bar fleet state (counts, cost, attention queue) - E5: Project health indicators (status dot, CWD, pressure, burn rate) - E6: Metrics tab (fleet aggregates, health cards, Live/History) - E7: Conflict detection (no false badges on fresh launch) - E8: Audit log (manager-only tab, toolbar, entries) Phase F — Search & LLM Quality: - F1: Search overlay (Ctrl+Shift+F, input, empty state, close) - F2: Context tab & anchors (visualization, budget scale) - F3: SSH tab (connection list, add button) - F4-F7: LLM-judged quality (settings completeness, theme editor, error messages, overall UI consistency) |
||
|---|---|---|
| .. | ||
| infra | ||
| specs | ||
| README.md | ||
| tsconfig.json | ||
| wdio.conf.js | ||
E2E Testing Module
Browser automation tests for Agent Orchestrator using WebDriverIO + tauri-driver.
Quick Start
# Preflight check (validates dependencies)
./scripts/preflight-check.sh
# Build debug binary + run E2E
npm run test:all:e2e
# Run E2E only (skip build)
SKIP_BUILD=1 npm run test:e2e
# Headless (CI)
xvfb-run --auto-servernum npm run test:e2e
System Dependencies
| Tool | Required | Install |
|---|---|---|
| tauri-driver | Yes | cargo install tauri-driver |
| Debug binary | Yes | cargo tauri build --debug --no-bundle |
| X11/Wayland | Yes (Linux) | Use xvfb-run in CI |
| Claude CLI | Optional | LLM-judged tests skip if absent |
| ANTHROPIC_API_KEY | Optional | Alternative to Claude CLI for LLM judge |
Directory Structure
tests/e2e/
├── wdio.conf.js # WebDriverIO config + tauri-driver lifecycle
├── tsconfig.json # TypeScript config for specs
├── README.md # This file
├── infra/ # Test infrastructure (not specs)
│ ├── fixtures.ts # Test fixture generator (isolated temp dirs)
│ ├── llm-judge.ts # LLM-based assertion engine (Claude CLI / API)
│ ├── results-db.ts # JSON test results store
│ └── test-mode-constants.ts # Typed env var names for test mode
└── specs/ # Test specifications
├── agor.test.ts # Smoke + UI tests (50+ tests)
├── agent-scenarios.test.ts # Phase A: agent interaction (22 tests)
├── phase-b.test.ts # Phase B: multi-project + LLM judge
└── phase-c.test.ts # Phase C: hardening features (11 scenarios)
Test Mode Environment Variables
| Variable | Purpose | Read By |
|---|---|---|
AGOR_TEST=1 |
Enable test isolation | config.rs, misc.rs, lib.rs, watcher.rs, fs_watcher.rs, telemetry.rs, App.svelte |
AGOR_TEST_DATA_DIR |
Override data dir | config.rs |
AGOR_TEST_CONFIG_DIR |
Override config dir | config.rs |
Effects when AGOR_TEST=1:
- File watchers disabled (watcher.rs, fs_watcher.rs)
- OTLP telemetry export disabled (telemetry.rs)
- CLI tool installation skipped (lib.rs)
- Wake scheduler disabled (App.svelte)
- Test env vars forwarded to sidecar processes (lib.rs)
Test Phases
| Phase | File | Tests | Type |
|---|---|---|---|
| Smoke | agor.test.ts | 50+ | Deterministic (CSS/DOM assertions) |
| A | agent-scenarios.test.ts | 22 | Deterministic (data-testid selectors) |
| B | phase-b.test.ts | 6+ | LLM-judged (multi-project, agent quality) |
| C | phase-c.test.ts | 11 scenarios | Mixed (deterministic + LLM-judged) |
Adding a New Spec
- Create
tests/e2e/specs/my-feature.test.ts - Import from
@wdio/globalsforbrowserandexpect - Use
data-testidselectors (preferred) or CSS classes - Add to
wdio.conf.jsspecs array - For LLM assertions:
import { assertWithJudge } from '../infra/llm-judge' - Run
./scripts/check-test-flags.shif you added new AGOR_TEST references
CI Workflow
See .github/workflows/e2e.yml — 3 jobs:
- unit-tests: vitest frontend
- cargo-tests: Rust backend
- e2e-tests: WebDriverIO (xvfb-run, Phase A+B+C, LLM tests gated on secret)