refactor(e2e): extract infrastructure into tests/e2e/infra/ module
- Move fixtures.ts, llm-judge.ts, results-db.ts to tests/e2e/infra/ - Deduplicate wdio.conf.js: use createTestFixture() instead of inline copy - Replace __dirname paths with projectRoot-anchored paths - Create test-mode-constants.ts (typed env var names, flag registry) - Create scripts/preflight-check.sh (validates tauri-driver, display, Claude CLI) - Create scripts/check-test-flags.sh (CI lint for AGOR_TEST flag drift) - Rewrite tests/e2e/README.md with full documentation - Update spec imports for moved infra files
This commit is contained in:
parent
538a31f85c
commit
e76bc341f2
10 changed files with 235 additions and 191 deletions
|
|
@ -1,143 +1,88 @@
|
|||
# E2E Tests (WebDriver)
|
||||
# E2E Testing Module
|
||||
|
||||
Tauri apps use the WebDriver protocol for E2E testing (not Playwright directly).
|
||||
The app runs inside WebKit2GTK on Linux, so tests interact with the real WebView.
|
||||
Browser automation tests for Agent Orchestrator using WebDriverIO + tauri-driver.
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Rust toolchain (for building the Tauri app)
|
||||
- Display server (X11 or Wayland) — headless Xvfb works for CI
|
||||
- `tauri-driver` installed: `cargo install tauri-driver`
|
||||
- `webkit2gtk-driver` system package: `sudo apt install webkit2gtk-driver`
|
||||
- npm devDeps already in package.json (`@wdio/cli`, `@wdio/local-runner`, `@wdio/mocha-framework`, `@wdio/spec-reporter`)
|
||||
|
||||
## Running
|
||||
## Quick Start
|
||||
|
||||
```bash
|
||||
# From v2/ directory — builds debug binary automatically, spawns tauri-driver
|
||||
npm run test:e2e
|
||||
# Preflight check (validates dependencies)
|
||||
./scripts/preflight-check.sh
|
||||
|
||||
# Skip rebuild (use existing binary)
|
||||
# Build debug binary + run E2E
|
||||
npm run test:all:e2e
|
||||
|
||||
# Run E2E only (skip build)
|
||||
SKIP_BUILD=1 npm run test:e2e
|
||||
|
||||
# With test isolation (custom data/config dirs)
|
||||
AGOR_TEST_DATA_DIR=/tmp/bt-test/data AGOR_TEST_CONFIG_DIR=/tmp/bt-test/config npm run test:e2e
|
||||
# Headless (CI)
|
||||
xvfb-run --auto-servernum npm run test:e2e
|
||||
```
|
||||
|
||||
The `wdio.conf.js` handles:
|
||||
1. Building the debug binary (`cargo tauri build --debug --no-bundle`) in `onPrepare`
|
||||
2. Spawning `tauri-driver` before each session (TCP readiness probe, 10s deadline)
|
||||
3. Killing `tauri-driver` after each session
|
||||
4. Passing `AGOR_TEST=1` env var to the app for test mode isolation
|
||||
## System Dependencies
|
||||
|
||||
## Test Mode (`AGOR_TEST=1`)
|
||||
| Tool | Required | Install |
|
||||
|------|----------|---------|
|
||||
| tauri-driver | Yes | `cargo install tauri-driver` |
|
||||
| Debug binary | Yes | `cargo tauri build --debug --no-bundle` |
|
||||
| X11/Wayland | Yes (Linux) | Use `xvfb-run` in CI |
|
||||
| Claude CLI | Optional | LLM-judged tests skip if absent |
|
||||
| ANTHROPIC_API_KEY | Optional | Alternative to Claude CLI for LLM judge |
|
||||
|
||||
When `AGOR_TEST=1` is set:
|
||||
- File watchers (watcher.rs, fs_watcher.rs) are disabled to avoid inotify noise
|
||||
- Wake scheduler is disabled (no auto-wake timers)
|
||||
- Data/config directories can be overridden via `AGOR_TEST_DATA_DIR` / `AGOR_TEST_CONFIG_DIR`
|
||||
|
||||
## CI setup (headless)
|
||||
|
||||
```bash
|
||||
# Install virtual framebuffer + WebKit driver
|
||||
sudo apt install xvfb webkit2gtk-driver
|
||||
|
||||
# Run with Xvfb wrapper
|
||||
xvfb-run npm run test:e2e
|
||||
```
|
||||
|
||||
## Writing tests
|
||||
|
||||
Tests use WebdriverIO with Mocha. Specs go in `specs/`:
|
||||
|
||||
```typescript
|
||||
import { browser, expect } from '@wdio/globals';
|
||||
|
||||
describe('BTerminal', () => {
|
||||
it('should show the status bar', async () => {
|
||||
const statusBar = await browser.$('[data-testid="status-bar"]');
|
||||
await expect(statusBar).toBeDisplayed();
|
||||
});
|
||||
});
|
||||
```
|
||||
|
||||
### Stable selectors
|
||||
|
||||
Prefer `data-testid` attributes over CSS class selectors:
|
||||
|
||||
| Element | Selector |
|
||||
|---------|----------|
|
||||
| Status bar | `[data-testid="status-bar"]` |
|
||||
| Sidebar rail | `[data-testid="sidebar-rail"]` |
|
||||
| Settings button | `[data-testid="settings-btn"]` |
|
||||
| Project box | `[data-testid="project-box"]` |
|
||||
| Project ID | `[data-project-id="..."]` |
|
||||
| Project tabs | `[data-testid="project-tabs"]` |
|
||||
| Agent session | `[data-testid="agent-session"]` |
|
||||
| Agent pane | `[data-testid="agent-pane"]` |
|
||||
| Agent status | `[data-agent-status="idle\|running\|..."]` |
|
||||
| Agent messages | `[data-testid="agent-messages"]` |
|
||||
| Agent prompt | `[data-testid="agent-prompt"]` |
|
||||
| Agent submit | `[data-testid="agent-submit"]` |
|
||||
| Agent stop | `[data-testid="agent-stop"]` |
|
||||
| Terminal tabs | `[data-testid="terminal-tabs"]` |
|
||||
| Add tab button | `[data-testid="tab-add"]` |
|
||||
| Terminal toggle | `[data-testid="terminal-toggle"]` |
|
||||
| Command palette | `[data-testid="command-palette"]` |
|
||||
| Palette input | `[data-testid="palette-input"]` |
|
||||
|
||||
### Key constraints
|
||||
|
||||
- `maxInstances: 1` — Tauri doesn't support parallel WebDriver sessions
|
||||
- Mocha timeout is 60s — the app needs time to initialize
|
||||
- Tests interact with the real WebKit2GTK WebView, not a browser
|
||||
- Use `browser.execute()` for JS clicks when WebDriver clicks don't trigger Svelte handlers
|
||||
- Agent tests (Scenario 7) require a real Claude CLI install + API key — they skip gracefully if unavailable
|
||||
|
||||
## Test infrastructure
|
||||
|
||||
### Fixtures (`fixtures.ts`)
|
||||
|
||||
Creates isolated test environments with temp data/config dirs and git repos:
|
||||
|
||||
```typescript
|
||||
import { createTestFixture, destroyTestFixture } from '../fixtures';
|
||||
|
||||
const fixture = createTestFixture('my-test');
|
||||
// fixture.dataDir, fixture.configDir, fixture.projectDir, fixture.env
|
||||
destroyTestFixture(fixture);
|
||||
```
|
||||
|
||||
### Results DB (`results-db.ts`)
|
||||
|
||||
JSON-based test results store for tracking runs and steps:
|
||||
|
||||
```typescript
|
||||
import { ResultsDb } from '../results-db';
|
||||
|
||||
const db = new ResultsDb();
|
||||
db.startRun('run-001', 'v2-mission-control', 'abc123');
|
||||
db.recordStep({ run_id: 'run-001', scenario_name: 'Smoke', step_name: 'renders', status: 'passed', ... });
|
||||
db.finishRun('run-001', 'passed', 5000);
|
||||
```
|
||||
|
||||
## File structure
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
tests/e2e/
|
||||
├── README.md # This file
|
||||
├── wdio.conf.js # WebdriverIO config with tauri-driver lifecycle
|
||||
├── tsconfig.json # TypeScript config for test specs
|
||||
├── fixtures.ts # Test fixture generator (isolated environments)
|
||||
├── results-db.ts # JSON test results store
|
||||
└── specs/
|
||||
├── agor.test.ts # Smoke tests (CSS class selectors, 50+ tests)
|
||||
└── agent-scenarios.test.ts # Phase A scenarios (data-testid selectors, 22 tests)
|
||||
├── wdio.conf.js # WebDriverIO config + tauri-driver lifecycle
|
||||
├── tsconfig.json # TypeScript config for specs
|
||||
├── README.md # This file
|
||||
├── infra/ # Test infrastructure (not specs)
|
||||
│ ├── fixtures.ts # Test fixture generator (isolated temp dirs)
|
||||
│ ├── llm-judge.ts # LLM-based assertion engine (Claude CLI / API)
|
||||
│ ├── results-db.ts # JSON test results store
|
||||
│ └── test-mode-constants.ts # Typed env var names for test mode
|
||||
└── specs/ # Test specifications
|
||||
├── agor.test.ts # Smoke + UI tests (50+ tests)
|
||||
├── agent-scenarios.test.ts # Phase A: agent interaction (22 tests)
|
||||
├── phase-b.test.ts # Phase B: multi-project + LLM judge
|
||||
└── phase-c.test.ts # Phase C: hardening features (11 scenarios)
|
||||
```
|
||||
|
||||
## References
|
||||
## Test Mode Environment Variables
|
||||
|
||||
- Tauri WebDriver docs: https://v2.tauri.app/develop/tests/webdriver/
|
||||
- WebdriverIO docs: https://webdriver.io/
|
||||
- tauri-driver: https://crates.io/crates/tauri-driver
|
||||
| Variable | Purpose | Read By |
|
||||
|----------|---------|---------|
|
||||
| `AGOR_TEST=1` | Enable test isolation | config.rs, misc.rs, lib.rs, watcher.rs, fs_watcher.rs, telemetry.rs, App.svelte |
|
||||
| `AGOR_TEST_DATA_DIR` | Override data dir | config.rs |
|
||||
| `AGOR_TEST_CONFIG_DIR` | Override config dir | config.rs |
|
||||
|
||||
**Effects when AGOR_TEST=1:**
|
||||
- File watchers disabled (watcher.rs, fs_watcher.rs)
|
||||
- OTLP telemetry export disabled (telemetry.rs)
|
||||
- CLI tool installation skipped (lib.rs)
|
||||
- Wake scheduler disabled (App.svelte)
|
||||
- Test env vars forwarded to sidecar processes (lib.rs)
|
||||
|
||||
## Test Phases
|
||||
|
||||
| Phase | File | Tests | Type |
|
||||
|-------|------|-------|------|
|
||||
| Smoke | agor.test.ts | 50+ | Deterministic (CSS/DOM assertions) |
|
||||
| A | agent-scenarios.test.ts | 22 | Deterministic (data-testid selectors) |
|
||||
| B | phase-b.test.ts | 6+ | LLM-judged (multi-project, agent quality) |
|
||||
| C | phase-c.test.ts | 11 scenarios | Mixed (deterministic + LLM-judged) |
|
||||
|
||||
## Adding a New Spec
|
||||
|
||||
1. Create `tests/e2e/specs/my-feature.test.ts`
|
||||
2. Import from `@wdio/globals` for `browser` and `expect`
|
||||
3. Use `data-testid` selectors (preferred) or CSS classes
|
||||
4. Add to `wdio.conf.js` specs array
|
||||
5. For LLM assertions: `import { assertWithJudge } from '../infra/llm-judge'`
|
||||
6. Run `./scripts/check-test-flags.sh` if you added new AGOR_TEST references
|
||||
|
||||
## CI Workflow
|
||||
|
||||
See `.github/workflows/e2e.yml` — 3 jobs:
|
||||
1. **unit-tests**: vitest frontend
|
||||
2. **cargo-tests**: Rust backend
|
||||
3. **e2e-tests**: WebDriverIO (xvfb-run, Phase A+B+C, LLM tests gated on secret)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue