From c4c673a4b0bc091046817cde403a1c67a9728f1d Mon Sep 17 00:00:00 2001 From: Hibryda Date: Thu, 12 Mar 2026 02:52:14 +0100 Subject: [PATCH] docs: update meta files for E2E testing engine Phase A --- .claude/CLAUDE.md | 3 +- CHANGELOG.md | 10 +++++ CLAUDE.md | 5 +++ TODO.md | 2 +- docs/v3-progress.md | 48 +++++++++++++++++++++++ v2/tests/e2e/README.md | 86 ++++++++++++++++++++++++++++++++++++++---- 6 files changed, 145 insertions(+), 9 deletions(-) diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 75812d3..5aa261b 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -5,7 +5,7 @@ - v1 is a single-file Python app (`bterminal.py`). Changes are localized. - v2 docs are in `docs/`. Architecture decisions are in `docs/task_plan.md`. - v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery. -- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 388 vitest + 76 cargo tests. +- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 345 vitest + 68 cargo tests + 22 E2E scenarios (Phase A). - v3 docs: `docs/v3-task_plan.md`, `docs/v3-findings.md`, `docs/v3-progress.md`. - Consult Memora (tag: `bterminal`) before making architectural changes. @@ -82,6 +82,7 @@ - v3 workspace store (`workspace.svelte.ts`) replaces layout store for v3. Groups loaded from `~/.config/bterminal/groups.json` via `groups-bridge.ts`. State: groups, activeGroupId, activeTab, focusedProjectId. Derived: activeGroup, activeProjects. - v3 groups backend (`groups.rs`): load_groups(), save_groups(), default_groups(). Tauri commands: groups_load, groups_save. - Telemetry (`telemetry.rs`): tracing + optional OTLP export to Tempo. `BTERMINAL_OTLP_ENDPOINT` env var controls (absent = console-only). TelemetryGuard in AppState with Drop-based shutdown. Frontend events route through `frontend_log` Tauri command → Rust tracing (no browser OTEL SDK — WebKit2GTK incompatible). `telemetry-bridge.ts` provides `tel.info/warn/error()` convenience API. Docker stack at `docker/tempo/` (Grafana port 9715). +- E2E test mode (`BTERMINAL_TEST=1`): watcher.rs and fs_watcher.rs skip file watchers, wake-scheduler disabled via `disableWakeScheduler()`, `is_test_mode` Tauri command bridges to frontend. Data/config dirs overridable via `BTERMINAL_TEST_DATA_DIR`/`BTERMINAL_TEST_CONFIG_DIR`. E2E uses WebDriverIO + tauri-driver, single session, TCP readiness probe. 7 data-testid-based scenarios in `agent-scenarios.test.ts`. Test fixtures in `fixtures.ts` create isolated temp environments. Results tracked via JSON store in `results-db.ts`. - v3 SQLite additions: agent_messages table (per-project message persistence), project_agent_state table (sdkSessionId, cost, status per project), sessions.project_id column. - v3 App.svelte: VSCode-style sidebar layout. Horizontal: left icon rail (GlobalTabBar, 2.75rem, single Settings gear icon) + expandable drawer panel (Settings only, content-driven width, max 50%) + main workspace (ProjectGrid always visible) + StatusBar. Sidebar has Settings only — Sessions/Docs/Context are project-specific (in ProjectBox tabs). Keyboard: Ctrl+B (toggle sidebar), Ctrl+, (settings), Escape (close). - v3 component tree: App -> GlobalTabBar (settings icon) + sidebar-panel? (SettingsTab) + workspace (ProjectGrid) + StatusBar. See `docs/v3-task_plan.md` for full tree. diff --git a/CHANGELOG.md b/CHANGELOG.md index 19f133f..af9d4e2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- **E2E test mode infrastructure** — `BTERMINAL_TEST=1` env var disables file watchers (watcher.rs, fs_watcher.rs), wake scheduler, and allows data/config dir overrides via `BTERMINAL_TEST_DATA_DIR`/`BTERMINAL_TEST_CONFIG_DIR`. New `is_test_mode` Tauri command bridges test state to frontend +- **E2E data-testid attributes** — Stable test selectors on 7 key Svelte components: AgentPane (agent-pane, data-agent-status, agent-messages, agent-stop, agent-prompt, agent-submit), ProjectBox (project-box, data-project-id, project-tabs, terminal-toggle), StatusBar, AgentSession, GlobalTabBar, CommandPalette, TerminalTabs +- **E2E Phase A scenarios** — 7 human-authored test scenarios (22 tests) in `agent-scenarios.test.ts`: app structural integrity, settings panel, agent pane initial state, terminal tab management, command palette, project focus/tab switching, agent prompt submission (graceful Claude CLI skip) +- **E2E test fixtures** — `tests/e2e/fixtures.ts`: creates isolated temp environments with data/config dirs, git repos, and groups.json. `createTestFixture()`, `createMultiProjectFixture()`, `destroyTestFixture()` +- **E2E results store** — `tests/e2e/results-db.ts`: JSON-based test run/step tracking (pivoted from better-sqlite3 due to Node 25 native compile failure) + +### Changed +- **WebDriverIO config** — TCP readiness probe replaces blind 2s sleep for tauri-driver startup (200ms interval, 10s deadline). Added BTERMINAL_TEST=1 passthrough in capabilities + ### Security - `claude_read_skill` path traversal: added `canonicalize()` + `starts_with()` validation to prevent reading arbitrary files via crafted skill paths (commands/claude.rs) - **Sidecar env allowlist hardening** — added `ANTHROPIC_*` to Rust-level `strip_provider_env_var()` as defense-in-depth (Claude CLI uses credentials file, not env for auth). Dual-layer stripping documented: Rust layer (first checkpoint) + JS runner layer (per-provider) diff --git a/CLAUDE.md b/CLAUDE.md index 709a8ec..6a9d8b9 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -87,6 +87,11 @@ Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Pyth | `v2/src/lib/adapters/telemetry-bridge.ts` | Frontend telemetry bridge (routes events to Rust tracing via IPC) | | `v2/src/lib/utils/agent-prompts.ts` | Agent prompt generator (generateAgentPrompt: identity, env, team, btmsg/bttask docs, workflow) | | `docker/tempo/` | Docker compose: Tempo + Grafana for trace visualization (port 9715) | +| `v2/tests/e2e/wdio.conf.js` | WebDriverIO config (tauri-driver lifecycle, TCP probe, test env vars) | +| `v2/tests/e2e/fixtures.ts` | E2E test fixture generator (isolated temp dirs, git repos, groups.json) | +| `v2/tests/e2e/results-db.ts` | JSON test results store (run/step tracking, no native deps) | +| `v2/tests/e2e/specs/bterminal.test.ts` | E2E smoke tests (CSS class selectors, 50+ tests) | +| `v2/tests/e2e/specs/agent-scenarios.test.ts` | Phase A E2E scenarios (data-testid selectors, 7 scenarios, 22 tests) | | `v2/src/lib/stores/machines.svelte.ts` | Remote machine state store (Svelte 5 runes) | | `v2/src/lib/utils/attention-scorer.ts` | Pure attention scoring function (extracted from health store, 14 tests) | | `v2/src/lib/utils/wake-scorer.ts` | Pure wake signal evaluation (6 signals, 24 tests) | diff --git a/TODO.md b/TODO.md index ceab514..f3e303c 100644 --- a/TODO.md +++ b/TODO.md @@ -3,7 +3,7 @@ ## Active ### v2/v3 Remaining -- [ ] **E2E testing — expand coverage** -- 48 tests passing across 8 describe blocks (WebdriverIO v9.24 + tauri-driver, single spec file, ~23s). Add tests for agent sessions, terminal interaction. +- [ ] **E2E testing — Phase B+** -- Phase A complete: 72 tests across 2 spec files (smoke + 7 agent scenarios). Next: LLM-judged assertions, multi-project scenarios, CI integration (xvfb-run). - [ ] **Multi-machine real-world testing** -- Test bterminal-relay with 2 machines. - [ ] **Multi-machine TLS/certificate pinning** -- TLS support for bterminal-relay + certificate pinning in RemoteManager. - [ ] **Agent Teams real-world testing** -- Env var whitelist fix done. 3 test sessions ran ($1.10, $0.69, $1.70) but model didn't spawn subagents — needs complex multi-part prompts to trigger delegation. Test with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1. diff --git a/docs/v3-progress.md b/docs/v3-progress.md index 3b138a6..dff5491 100644 --- a/docs/v3-progress.md +++ b/docs/v3-progress.md @@ -923,3 +923,51 @@ Reviewed and integrated Dexter's multi-agent orchestration branch (dexter_change - Vitest: 327 passed (was 286, +41) - Cargo src-tauri: 64 passed (was 49, +15) - Cargo bterminal-core: 8 passed (was 0, +8) + +### E2E Testing Engine — Phase A (2026-03-12) + +#### Test Mode Infrastructure +- [x] Rust: watcher.rs — test mode bypass (skip file watching, return content directly) +- [x] Rust: fs_watcher.rs — test mode bypass (skip inotify watchers) +- [x] Rust: commands/misc.rs — added `is_test_mode` Tauri command +- [x] Rust: lib.rs — registered is_test_mode in invoke handler +- [x] Frontend: wake-scheduler.svelte.ts — `disableWakeScheduler()` export + early return +- [x] Frontend: App.svelte — test mode detection via IPC, disables wake scheduler + +#### E2E Test Anchors (data-testid attributes) +- [x] AgentPane: data-testid="agent-pane", data-agent-status, agent-messages, agent-stop, agent-prompt, agent-submit +- [x] ProjectBox: data-testid="project-box", data-project-id, project-tabs, terminal-toggle +- [x] StatusBar: data-testid="status-bar" +- [x] AgentSession: data-testid="agent-session" +- [x] GlobalTabBar: data-testid="sidebar-rail", settings-btn +- [x] CommandPalette: data-testid="command-palette", palette-input +- [x] TerminalTabs: data-testid="terminal-tabs", tab-add + +#### WebDriverIO Config Improvements +- [x] TCP readiness probe replaces blind 2s sleep for tauri-driver startup +- [x] BTERMINAL_TEST=1 env var passed to Tauri app via capabilities +- [x] Optional BTERMINAL_TEST_DATA_DIR / BTERMINAL_TEST_CONFIG_DIR passthrough + +#### Test Infrastructure Files +- [x] `v2/tests/e2e/fixtures.ts` — isolated test fixture generator (temp dirs, git repos, groups.json) +- [x] `v2/tests/e2e/results-db.ts` — JSON-based test results store (no native deps) +- [x] `v2/tests/e2e/specs/agent-scenarios.test.ts` — 7 Phase A scenarios (22 test cases) + +#### Phase A Scenarios (7 scenarios, 22 tests) +1. **App Structural Integrity** — verifies all data-testid anchors render correctly +2. **Settings Panel (data-testid)** — open/close settings via stable selector +3. **Agent Pane Initial State** — idle status, prompt textarea, empty messages +4. **Terminal Tab Management** — add/close tabs via data-testid, empty state +5. **Command Palette (data-testid)** — open, focus, filter, close +6. **Project Focus & Tab Switching** — focus, tab persistence, agent status preservation +7. **Agent Prompt Submission** — textarea input, submit button state, graceful Claude CLI skip + +#### Verification +- [x] cargo test: 68 passed, 0 failed +- [x] vitest: 345 passed across 18 files, 0 failed +- [x] svelte-check: 0 project errors (2 pre-existing esrap node_modules) + +#### Test Counts +- Vitest: 345 passed (was 327, +18 — new wake-scorer + metrics tests from prior session) +- Cargo src-tauri: 68 passed (was 64, +4) +- E2E scenarios: 22 new test cases across 7 scenarios diff --git a/v2/tests/e2e/README.md b/v2/tests/e2e/README.md index 4955997..5e33708 100644 --- a/v2/tests/e2e/README.md +++ b/v2/tests/e2e/README.md @@ -16,12 +16,26 @@ The app runs inside WebKit2GTK on Linux, so tests interact with the real WebView ```bash # From v2/ directory — builds debug binary automatically, spawns tauri-driver npm run test:e2e + +# Skip rebuild (use existing binary) +SKIP_BUILD=1 npm run test:e2e + +# With test isolation (custom data/config dirs) +BTERMINAL_TEST_DATA_DIR=/tmp/bt-test/data BTERMINAL_TEST_CONFIG_DIR=/tmp/bt-test/config npm run test:e2e ``` The `wdio.conf.js` handles: 1. Building the debug binary (`cargo tauri build --debug --no-bundle`) in `onPrepare` -2. Spawning `tauri-driver` before each session +2. Spawning `tauri-driver` before each session (TCP readiness probe, 10s deadline) 3. Killing `tauri-driver` after each session +4. Passing `BTERMINAL_TEST=1` env var to the app for test mode isolation + +## Test Mode (`BTERMINAL_TEST=1`) + +When `BTERMINAL_TEST=1` is set: +- File watchers (watcher.rs, fs_watcher.rs) are disabled to avoid inotify noise +- Wake scheduler is disabled (no auto-wake timers) +- Data/config directories can be overridden via `BTERMINAL_TEST_DATA_DIR` / `BTERMINAL_TEST_CONFIG_DIR` ## CI setup (headless) @@ -42,26 +56,84 @@ import { browser, expect } from '@wdio/globals'; describe('BTerminal', () => { it('should show the status bar', async () => { - const statusBar = await browser.$('.status-bar'); + const statusBar = await browser.$('[data-testid="status-bar"]'); await expect(statusBar).toBeDisplayed(); }); }); ``` -Key constraints: +### Stable selectors + +Prefer `data-testid` attributes over CSS class selectors: + +| Element | Selector | +|---------|----------| +| Status bar | `[data-testid="status-bar"]` | +| Sidebar rail | `[data-testid="sidebar-rail"]` | +| Settings button | `[data-testid="settings-btn"]` | +| Project box | `[data-testid="project-box"]` | +| Project ID | `[data-project-id="..."]` | +| Project tabs | `[data-testid="project-tabs"]` | +| Agent session | `[data-testid="agent-session"]` | +| Agent pane | `[data-testid="agent-pane"]` | +| Agent status | `[data-agent-status="idle\|running\|..."]` | +| Agent messages | `[data-testid="agent-messages"]` | +| Agent prompt | `[data-testid="agent-prompt"]` | +| Agent submit | `[data-testid="agent-submit"]` | +| Agent stop | `[data-testid="agent-stop"]` | +| Terminal tabs | `[data-testid="terminal-tabs"]` | +| Add tab button | `[data-testid="tab-add"]` | +| Terminal toggle | `[data-testid="terminal-toggle"]` | +| Command palette | `[data-testid="command-palette"]` | +| Palette input | `[data-testid="palette-input"]` | + +### Key constraints + - `maxInstances: 1` — Tauri doesn't support parallel WebDriver sessions - Mocha timeout is 60s — the app needs time to initialize - Tests interact with the real WebKit2GTK WebView, not a browser +- Use `browser.execute()` for JS clicks when WebDriver clicks don't trigger Svelte handlers +- Agent tests (Scenario 7) require a real Claude CLI install + API key — they skip gracefully if unavailable + +## Test infrastructure + +### Fixtures (`fixtures.ts`) + +Creates isolated test environments with temp data/config dirs and git repos: + +```typescript +import { createTestFixture, destroyTestFixture } from '../fixtures'; + +const fixture = createTestFixture('my-test'); +// fixture.dataDir, fixture.configDir, fixture.projectDir, fixture.env +destroyTestFixture(fixture); +``` + +### Results DB (`results-db.ts`) + +JSON-based test results store for tracking runs and steps: + +```typescript +import { ResultsDb } from '../results-db'; + +const db = new ResultsDb(); +db.startRun('run-001', 'v2-mission-control', 'abc123'); +db.recordStep({ run_id: 'run-001', scenario_name: 'Smoke', step_name: 'renders', status: 'passed', ... }); +db.finishRun('run-001', 'passed', 5000); +``` ## File structure ``` tests/e2e/ -├── README.md # This file -├── wdio.conf.js # WebdriverIO config with tauri-driver lifecycle -├── tsconfig.json # TypeScript config for test specs +├── README.md # This file +├── wdio.conf.js # WebdriverIO config with tauri-driver lifecycle +├── tsconfig.json # TypeScript config for test specs +├── fixtures.ts # Test fixture generator (isolated environments) +├── results-db.ts # JSON test results store └── specs/ - └── smoke.test.ts # Basic smoke tests (app renders, sidebar toggle) + ├── bterminal.test.ts # Smoke tests (CSS class selectors, 50+ tests) + └── agent-scenarios.test.ts # Phase A scenarios (data-testid selectors, 22 tests) ``` ## References