refactor(e2e): extract infrastructure into tests/e2e/infra/ module

- Move fixtures.ts, llm-judge.ts, results-db.ts to tests/e2e/infra/
- Deduplicate wdio.conf.js: use createTestFixture() instead of inline copy
- Replace __dirname paths with projectRoot-anchored paths
- Create test-mode-constants.ts (typed env var names, flag registry)
- Create scripts/preflight-check.sh (validates tauri-driver, display, Claude CLI)
- Create scripts/check-test-flags.sh (CI lint for AGOR_TEST flag drift)
- Rewrite tests/e2e/README.md with full documentation
- Update spec imports for moved infra files
This commit is contained in:
Hibryda 2026-03-18 03:06:57 +01:00
parent 538a31f85c
commit e76bc341f2
10 changed files with 235 additions and 191 deletions

View file

@ -1,143 +1,88 @@
# E2E Tests (WebDriver)
# E2E Testing Module
Tauri apps use the WebDriver protocol for E2E testing (not Playwright directly).
The app runs inside WebKit2GTK on Linux, so tests interact with the real WebView.
Browser automation tests for Agent Orchestrator using WebDriverIO + tauri-driver.
## Prerequisites
- Rust toolchain (for building the Tauri app)
- Display server (X11 or Wayland) — headless Xvfb works for CI
- `tauri-driver` installed: `cargo install tauri-driver`
- `webkit2gtk-driver` system package: `sudo apt install webkit2gtk-driver`
- npm devDeps already in package.json (`@wdio/cli`, `@wdio/local-runner`, `@wdio/mocha-framework`, `@wdio/spec-reporter`)
## Running
## Quick Start
```bash
# From v2/ directory — builds debug binary automatically, spawns tauri-driver
npm run test:e2e
# Preflight check (validates dependencies)
./scripts/preflight-check.sh
# Skip rebuild (use existing binary)
# Build debug binary + run E2E
npm run test:all:e2e
# Run E2E only (skip build)
SKIP_BUILD=1 npm run test:e2e
# With test isolation (custom data/config dirs)
AGOR_TEST_DATA_DIR=/tmp/bt-test/data AGOR_TEST_CONFIG_DIR=/tmp/bt-test/config npm run test:e2e
# Headless (CI)
xvfb-run --auto-servernum npm run test:e2e
```
The `wdio.conf.js` handles:
1. Building the debug binary (`cargo tauri build --debug --no-bundle`) in `onPrepare`
2. Spawning `tauri-driver` before each session (TCP readiness probe, 10s deadline)
3. Killing `tauri-driver` after each session
4. Passing `AGOR_TEST=1` env var to the app for test mode isolation
## System Dependencies
## Test Mode (`AGOR_TEST=1`)
| Tool | Required | Install |
|------|----------|---------|
| tauri-driver | Yes | `cargo install tauri-driver` |
| Debug binary | Yes | `cargo tauri build --debug --no-bundle` |
| X11/Wayland | Yes (Linux) | Use `xvfb-run` in CI |
| Claude CLI | Optional | LLM-judged tests skip if absent |
| ANTHROPIC_API_KEY | Optional | Alternative to Claude CLI for LLM judge |
When `AGOR_TEST=1` is set:
- File watchers (watcher.rs, fs_watcher.rs) are disabled to avoid inotify noise
- Wake scheduler is disabled (no auto-wake timers)
- Data/config directories can be overridden via `AGOR_TEST_DATA_DIR` / `AGOR_TEST_CONFIG_DIR`
## CI setup (headless)
```bash
# Install virtual framebuffer + WebKit driver
sudo apt install xvfb webkit2gtk-driver
# Run with Xvfb wrapper
xvfb-run npm run test:e2e
```
## Writing tests
Tests use WebdriverIO with Mocha. Specs go in `specs/`:
```typescript
import { browser, expect } from '@wdio/globals';
describe('BTerminal', () => {
it('should show the status bar', async () => {
const statusBar = await browser.$('[data-testid="status-bar"]');
await expect(statusBar).toBeDisplayed();
});
});
```
### Stable selectors
Prefer `data-testid` attributes over CSS class selectors:
| Element | Selector |
|---------|----------|
| Status bar | `[data-testid="status-bar"]` |
| Sidebar rail | `[data-testid="sidebar-rail"]` |
| Settings button | `[data-testid="settings-btn"]` |
| Project box | `[data-testid="project-box"]` |
| Project ID | `[data-project-id="..."]` |
| Project tabs | `[data-testid="project-tabs"]` |
| Agent session | `[data-testid="agent-session"]` |
| Agent pane | `[data-testid="agent-pane"]` |
| Agent status | `[data-agent-status="idle\|running\|..."]` |
| Agent messages | `[data-testid="agent-messages"]` |
| Agent prompt | `[data-testid="agent-prompt"]` |
| Agent submit | `[data-testid="agent-submit"]` |
| Agent stop | `[data-testid="agent-stop"]` |
| Terminal tabs | `[data-testid="terminal-tabs"]` |
| Add tab button | `[data-testid="tab-add"]` |
| Terminal toggle | `[data-testid="terminal-toggle"]` |
| Command palette | `[data-testid="command-palette"]` |
| Palette input | `[data-testid="palette-input"]` |
### Key constraints
- `maxInstances: 1` — Tauri doesn't support parallel WebDriver sessions
- Mocha timeout is 60s — the app needs time to initialize
- Tests interact with the real WebKit2GTK WebView, not a browser
- Use `browser.execute()` for JS clicks when WebDriver clicks don't trigger Svelte handlers
- Agent tests (Scenario 7) require a real Claude CLI install + API key — they skip gracefully if unavailable
## Test infrastructure
### Fixtures (`fixtures.ts`)
Creates isolated test environments with temp data/config dirs and git repos:
```typescript
import { createTestFixture, destroyTestFixture } from '../fixtures';
const fixture = createTestFixture('my-test');
// fixture.dataDir, fixture.configDir, fixture.projectDir, fixture.env
destroyTestFixture(fixture);
```
### Results DB (`results-db.ts`)
JSON-based test results store for tracking runs and steps:
```typescript
import { ResultsDb } from '../results-db';
const db = new ResultsDb();
db.startRun('run-001', 'v2-mission-control', 'abc123');
db.recordStep({ run_id: 'run-001', scenario_name: 'Smoke', step_name: 'renders', status: 'passed', ... });
db.finishRun('run-001', 'passed', 5000);
```
## File structure
## Directory Structure
```
tests/e2e/
├── README.md # This file
├── wdio.conf.js # WebdriverIO config with tauri-driver lifecycle
├── tsconfig.json # TypeScript config for test specs
├── fixtures.ts # Test fixture generator (isolated environments)
├── results-db.ts # JSON test results store
└── specs/
├── agor.test.ts # Smoke tests (CSS class selectors, 50+ tests)
└── agent-scenarios.test.ts # Phase A scenarios (data-testid selectors, 22 tests)
├── wdio.conf.js # WebDriverIO config + tauri-driver lifecycle
├── tsconfig.json # TypeScript config for specs
├── README.md # This file
├── infra/ # Test infrastructure (not specs)
│ ├── fixtures.ts # Test fixture generator (isolated temp dirs)
│ ├── llm-judge.ts # LLM-based assertion engine (Claude CLI / API)
│ ├── results-db.ts # JSON test results store
│ └── test-mode-constants.ts # Typed env var names for test mode
└── specs/ # Test specifications
├── agor.test.ts # Smoke + UI tests (50+ tests)
├── agent-scenarios.test.ts # Phase A: agent interaction (22 tests)
├── phase-b.test.ts # Phase B: multi-project + LLM judge
└── phase-c.test.ts # Phase C: hardening features (11 scenarios)
```
## References
## Test Mode Environment Variables
- Tauri WebDriver docs: https://v2.tauri.app/develop/tests/webdriver/
- WebdriverIO docs: https://webdriver.io/
- tauri-driver: https://crates.io/crates/tauri-driver
| Variable | Purpose | Read By |
|----------|---------|---------|
| `AGOR_TEST=1` | Enable test isolation | config.rs, misc.rs, lib.rs, watcher.rs, fs_watcher.rs, telemetry.rs, App.svelte |
| `AGOR_TEST_DATA_DIR` | Override data dir | config.rs |
| `AGOR_TEST_CONFIG_DIR` | Override config dir | config.rs |
**Effects when AGOR_TEST=1:**
- File watchers disabled (watcher.rs, fs_watcher.rs)
- OTLP telemetry export disabled (telemetry.rs)
- CLI tool installation skipped (lib.rs)
- Wake scheduler disabled (App.svelte)
- Test env vars forwarded to sidecar processes (lib.rs)
## Test Phases
| Phase | File | Tests | Type |
|-------|------|-------|------|
| Smoke | agor.test.ts | 50+ | Deterministic (CSS/DOM assertions) |
| A | agent-scenarios.test.ts | 22 | Deterministic (data-testid selectors) |
| B | phase-b.test.ts | 6+ | LLM-judged (multi-project, agent quality) |
| C | phase-c.test.ts | 11 scenarios | Mixed (deterministic + LLM-judged) |
## Adding a New Spec
1. Create `tests/e2e/specs/my-feature.test.ts`
2. Import from `@wdio/globals` for `browser` and `expect`
3. Use `data-testid` selectors (preferred) or CSS classes
4. Add to `wdio.conf.js` specs array
5. For LLM assertions: `import { assertWithJudge } from '../infra/llm-judge'`
6. Run `./scripts/check-test-flags.sh` if you added new AGOR_TEST references
## CI Workflow
See `.github/workflows/e2e.yml` — 3 jobs:
1. **unit-tests**: vitest frontend
2. **cargo-tests**: Rust backend
3. **e2e-tests**: WebDriverIO (xvfb-run, Phase A+B+C, LLM tests gated on secret)