From c4c673a4b0bc091046817cde403a1c67a9728f1d Mon Sep 17 00:00:00 2001
From: Hibryda <hibryda@protonmail.com>
Date: Thu, 12 Mar 2026 02:52:14 +0100
Subject: [PATCH] docs: update meta files for E2E testing engine Phase A

---
 .claude/CLAUDE.md      |  3 +-
 CHANGELOG.md           | 10 +++++
 CLAUDE.md              |  5 +++
 TODO.md                |  2 +-
 docs/v3-progress.md    | 48 +++++++++++++++++++++++
 v2/tests/e2e/README.md | 86 ++++++++++++++++++++++++++++++++++++++----
 6 files changed, 145 insertions(+), 9 deletions(-)

diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index 75812d3..5aa261b 100644
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -5,7 +5,7 @@
 - v1 is a single-file Python app (`bterminal.py`). Changes are localized.
 - v2 docs are in `docs/`. Architecture decisions are in `docs/task_plan.md`.
 - v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery.
-- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 388 vitest + 76 cargo tests.
+- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 345 vitest + 68 cargo tests + 22 E2E scenarios (Phase A).
 - v3 docs: `docs/v3-task_plan.md`, `docs/v3-findings.md`, `docs/v3-progress.md`.
 - Consult Memora (tag: `bterminal`) before making architectural changes.
 
@@ -82,6 +82,7 @@
 - v3 workspace store (`workspace.svelte.ts`) replaces layout store for v3. Groups loaded from `~/.config/bterminal/groups.json` via `groups-bridge.ts`. State: groups, activeGroupId, activeTab, focusedProjectId. Derived: activeGroup, activeProjects.
 - v3 groups backend (`groups.rs`): load_groups(), save_groups(), default_groups(). Tauri commands: groups_load, groups_save.
 - Telemetry (`telemetry.rs`): tracing + optional OTLP export to Tempo. `BTERMINAL_OTLP_ENDPOINT` env var controls (absent = console-only). TelemetryGuard in AppState with Drop-based shutdown. Frontend events route through `frontend_log` Tauri command → Rust tracing (no browser OTEL SDK — WebKit2GTK incompatible). `telemetry-bridge.ts` provides `tel.info/warn/error()` convenience API. Docker stack at `docker/tempo/` (Grafana port 9715).
+- E2E test mode (`BTERMINAL_TEST=1`): watcher.rs and fs_watcher.rs skip file watchers, wake-scheduler disabled via `disableWakeScheduler()`, `is_test_mode` Tauri command bridges to frontend. Data/config dirs overridable via `BTERMINAL_TEST_DATA_DIR`/`BTERMINAL_TEST_CONFIG_DIR`. E2E uses WebDriverIO + tauri-driver, single session, TCP readiness probe. 7 data-testid-based scenarios in `agent-scenarios.test.ts`. Test fixtures in `fixtures.ts` create isolated temp environments. Results tracked via JSON store in `results-db.ts`.
 - v3 SQLite additions: agent_messages table (per-project message persistence), project_agent_state table (sdkSessionId, cost, status per project), sessions.project_id column.
 - v3 App.svelte: VSCode-style sidebar layout. Horizontal: left icon rail (GlobalTabBar, 2.75rem, single Settings gear icon) + expandable drawer panel (Settings only, content-driven width, max 50%) + main workspace (ProjectGrid always visible) + StatusBar. Sidebar has Settings only — Sessions/Docs/Context are project-specific (in ProjectBox tabs). Keyboard: Ctrl+B (toggle sidebar), Ctrl+, (settings), Escape (close).
 - v3 component tree: App -> GlobalTabBar (settings icon) + sidebar-panel? (SettingsTab) + workspace (ProjectGrid) + StatusBar. See `docs/v3-task_plan.md` for full tree.
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 19f133f..af9d4e2 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,6 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ## [Unreleased]
 
+### Added
+- **E2E test mode infrastructure** — `BTERMINAL_TEST=1` env var disables file watchers (watcher.rs, fs_watcher.rs), wake scheduler, and allows data/config dir overrides via `BTERMINAL_TEST_DATA_DIR`/`BTERMINAL_TEST_CONFIG_DIR`. New `is_test_mode` Tauri command bridges test state to frontend
+- **E2E data-testid attributes** — Stable test selectors on 7 key Svelte components: AgentPane (agent-pane, data-agent-status, agent-messages, agent-stop, agent-prompt, agent-submit), ProjectBox (project-box, data-project-id, project-tabs, terminal-toggle), StatusBar, AgentSession, GlobalTabBar, CommandPalette, TerminalTabs
+- **E2E Phase A scenarios** — 7 human-authored test scenarios (22 tests) in `agent-scenarios.test.ts`: app structural integrity, settings panel, agent pane initial state, terminal tab management, command palette, project focus/tab switching, agent prompt submission (graceful Claude CLI skip)
+- **E2E test fixtures** — `tests/e2e/fixtures.ts`: creates isolated temp environments with data/config dirs, git repos, and groups.json. `createTestFixture()`, `createMultiProjectFixture()`, `destroyTestFixture()`
+- **E2E results store** — `tests/e2e/results-db.ts`: JSON-based test run/step tracking (pivoted from better-sqlite3 due to Node 25 native compile failure)
+
+### Changed
+- **WebDriverIO config** — TCP readiness probe replaces blind 2s sleep for tauri-driver startup (200ms interval, 10s deadline). Added BTERMINAL_TEST=1 passthrough in capabilities
+
 ### Security
 - `claude_read_skill` path traversal: added `canonicalize()` + `starts_with()` validation to prevent reading arbitrary files via crafted skill paths (commands/claude.rs)
 - **Sidecar env allowlist hardening** — added `ANTHROPIC_*` to Rust-level `strip_provider_env_var()` as defense-in-depth (Claude CLI uses credentials file, not env for auth). Dual-layer stripping documented: Rust layer (first checkpoint) + JS runner layer (per-provider)
diff --git a/CLAUDE.md b/CLAUDE.md
index 709a8ec..6a9d8b9 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -87,6 +87,11 @@ Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Pyth
 | `v2/src/lib/adapters/telemetry-bridge.ts` | Frontend telemetry bridge (routes events to Rust tracing via IPC) |
 | `v2/src/lib/utils/agent-prompts.ts` | Agent prompt generator (generateAgentPrompt: identity, env, team, btmsg/bttask docs, workflow) |
 | `docker/tempo/` | Docker compose: Tempo + Grafana for trace visualization (port 9715) |
+| `v2/tests/e2e/wdio.conf.js` | WebDriverIO config (tauri-driver lifecycle, TCP probe, test env vars) |
+| `v2/tests/e2e/fixtures.ts` | E2E test fixture generator (isolated temp dirs, git repos, groups.json) |
+| `v2/tests/e2e/results-db.ts` | JSON test results store (run/step tracking, no native deps) |
+| `v2/tests/e2e/specs/bterminal.test.ts` | E2E smoke tests (CSS class selectors, 50+ tests) |
+| `v2/tests/e2e/specs/agent-scenarios.test.ts` | Phase A E2E scenarios (data-testid selectors, 7 scenarios, 22 tests) |
 | `v2/src/lib/stores/machines.svelte.ts` | Remote machine state store (Svelte 5 runes) |
 | `v2/src/lib/utils/attention-scorer.ts` | Pure attention scoring function (extracted from health store, 14 tests) |
 | `v2/src/lib/utils/wake-scorer.ts` | Pure wake signal evaluation (6 signals, 24 tests) |
diff --git a/TODO.md b/TODO.md
index ceab514..f3e303c 100644
--- a/TODO.md
+++ b/TODO.md
@@ -3,7 +3,7 @@
 ## Active
 
 ### v2/v3 Remaining
-- [ ] **E2E testing — expand coverage** -- 48 tests passing across 8 describe blocks (WebdriverIO v9.24 + tauri-driver, single spec file, ~23s). Add tests for agent sessions, terminal interaction.
+- [ ] **E2E testing — Phase B+** -- Phase A complete: 72 tests across 2 spec files (smoke + 7 agent scenarios). Next: LLM-judged assertions, multi-project scenarios, CI integration (xvfb-run).
 - [ ] **Multi-machine real-world testing** -- Test bterminal-relay with 2 machines.
 - [ ] **Multi-machine TLS/certificate pinning** -- TLS support for bterminal-relay + certificate pinning in RemoteManager.
 - [ ] **Agent Teams real-world testing** -- Env var whitelist fix done. 3 test sessions ran ($1.10, $0.69, $1.70) but model didn't spawn subagents — needs complex multi-part prompts to trigger delegation. Test with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.
diff --git a/docs/v3-progress.md b/docs/v3-progress.md
index 3b138a6..dff5491 100644
--- a/docs/v3-progress.md
+++ b/docs/v3-progress.md
@@ -923,3 +923,51 @@ Reviewed and integrated Dexter's multi-agent orchestration branch (dexter_change
 - Vitest: 327 passed (was 286, +41)
 - Cargo src-tauri: 64 passed (was 49, +15)
 - Cargo bterminal-core: 8 passed (was 0, +8)
+
+### E2E Testing Engine — Phase A (2026-03-12)
+
+#### Test Mode Infrastructure
+- [x] Rust: watcher.rs — test mode bypass (skip file watching, return content directly)
+- [x] Rust: fs_watcher.rs — test mode bypass (skip inotify watchers)
+- [x] Rust: commands/misc.rs — added `is_test_mode` Tauri command
+- [x] Rust: lib.rs — registered is_test_mode in invoke handler
+- [x] Frontend: wake-scheduler.svelte.ts — `disableWakeScheduler()` export + early return
+- [x] Frontend: App.svelte — test mode detection via IPC, disables wake scheduler
+
+#### E2E Test Anchors (data-testid attributes)
+- [x] AgentPane: data-testid="agent-pane", data-agent-status, agent-messages, agent-stop, agent-prompt, agent-submit
+- [x] ProjectBox: data-testid="project-box", data-project-id, project-tabs, terminal-toggle
+- [x] StatusBar: data-testid="status-bar"
+- [x] AgentSession: data-testid="agent-session"
+- [x] GlobalTabBar: data-testid="sidebar-rail", settings-btn
+- [x] CommandPalette: data-testid="command-palette", palette-input
+- [x] TerminalTabs: data-testid="terminal-tabs", tab-add
+
+#### WebDriverIO Config Improvements
+- [x] TCP readiness probe replaces blind 2s sleep for tauri-driver startup
+- [x] BTERMINAL_TEST=1 env var passed to Tauri app via capabilities
+- [x] Optional BTERMINAL_TEST_DATA_DIR / BTERMINAL_TEST_CONFIG_DIR passthrough
+
+#### Test Infrastructure Files
+- [x] `v2/tests/e2e/fixtures.ts` — isolated test fixture generator (temp dirs, git repos, groups.json)
+- [x] `v2/tests/e2e/results-db.ts` — JSON-based test results store (no native deps)
+- [x] `v2/tests/e2e/specs/agent-scenarios.test.ts` — 7 Phase A scenarios (22 test cases)
+
+#### Phase A Scenarios (7 scenarios, 22 tests)
+1. **App Structural Integrity** — verifies all data-testid anchors render correctly
+2. **Settings Panel (data-testid)** — open/close settings via stable selector
+3. **Agent Pane Initial State** — idle status, prompt textarea, empty messages
+4. **Terminal Tab Management** — add/close tabs via data-testid, empty state
+5. **Command Palette (data-testid)** — open, focus, filter, close
+6. **Project Focus & Tab Switching** — focus, tab persistence, agent status preservation
+7. **Agent Prompt Submission** — textarea input, submit button state, graceful Claude CLI skip
+
+#### Verification
+- [x] cargo test: 68 passed, 0 failed
+- [x] vitest: 345 passed across 18 files, 0 failed
+- [x] svelte-check: 0 project errors (2 pre-existing esrap node_modules)
+
+#### Test Counts
+- Vitest: 345 passed (was 327, +18 — new wake-scorer + metrics tests from prior session)
+- Cargo src-tauri: 68 passed (was 64, +4)
+- E2E scenarios: 22 new test cases across 7 scenarios
diff --git a/v2/tests/e2e/README.md b/v2/tests/e2e/README.md
index 4955997..5e33708 100644
--- a/v2/tests/e2e/README.md
+++ b/v2/tests/e2e/README.md
@@ -16,12 +16,26 @@ The app runs inside WebKit2GTK on Linux, so tests interact with the real WebView
 ```bash
 # From v2/ directory — builds debug binary automatically, spawns tauri-driver
 npm run test:e2e
+
+# Skip rebuild (use existing binary)
+SKIP_BUILD=1 npm run test:e2e
+
+# With test isolation (custom data/config dirs)
+BTERMINAL_TEST_DATA_DIR=/tmp/bt-test/data BTERMINAL_TEST_CONFIG_DIR=/tmp/bt-test/config npm run test:e2e
 ```
 
 The `wdio.conf.js` handles:
 1. Building the debug binary (`cargo tauri build --debug --no-bundle`) in `onPrepare`
-2. Spawning `tauri-driver` before each session
+2. Spawning `tauri-driver` before each session (TCP readiness probe, 10s deadline)
 3. Killing `tauri-driver` after each session
+4. Passing `BTERMINAL_TEST=1` env var to the app for test mode isolation
+
+## Test Mode (`BTERMINAL_TEST=1`)
+
+When `BTERMINAL_TEST=1` is set:
+- File watchers (watcher.rs, fs_watcher.rs) are disabled to avoid inotify noise
+- Wake scheduler is disabled (no auto-wake timers)
+- Data/config directories can be overridden via `BTERMINAL_TEST_DATA_DIR` / `BTERMINAL_TEST_CONFIG_DIR`
 
 ## CI setup (headless)
 
@@ -42,26 +56,84 @@ import { browser, expect } from '@wdio/globals';
 
 describe('BTerminal', () => {
   it('should show the status bar', async () => {
-    const statusBar = await browser.$('.status-bar');
+    const statusBar = await browser.$('[data-testid="status-bar"]');
     await expect(statusBar).toBeDisplayed();
   });
 });
 ```
 
-Key constraints:
+### Stable selectors
+
+Prefer `data-testid` attributes over CSS class selectors:
+
+| Element | Selector |
+|---------|----------|
+| Status bar | `[data-testid="status-bar"]` |
+| Sidebar rail | `[data-testid="sidebar-rail"]` |
+| Settings button | `[data-testid="settings-btn"]` |
+| Project box | `[data-testid="project-box"]` |
+| Project ID | `[data-project-id="..."]` |
+| Project tabs | `[data-testid="project-tabs"]` |
+| Agent session | `[data-testid="agent-session"]` |
+| Agent pane | `[data-testid="agent-pane"]` |
+| Agent status | `[data-agent-status="idle\|running\|..."]` |
+| Agent messages | `[data-testid="agent-messages"]` |
+| Agent prompt | `[data-testid="agent-prompt"]` |
+| Agent submit | `[data-testid="agent-submit"]` |
+| Agent stop | `[data-testid="agent-stop"]` |
+| Terminal tabs | `[data-testid="terminal-tabs"]` |
+| Add tab button | `[data-testid="tab-add"]` |
+| Terminal toggle | `[data-testid="terminal-toggle"]` |
+| Command palette | `[data-testid="command-palette"]` |
+| Palette input | `[data-testid="palette-input"]` |
+
+### Key constraints
+
 - `maxInstances: 1` — Tauri doesn't support parallel WebDriver sessions
 - Mocha timeout is 60s — the app needs time to initialize
 - Tests interact with the real WebKit2GTK WebView, not a browser
+- Use `browser.execute()` for JS clicks when WebDriver clicks don't trigger Svelte handlers
+- Agent tests (Scenario 7) require a real Claude CLI install + API key — they skip gracefully if unavailable
+
+## Test infrastructure
+
+### Fixtures (`fixtures.ts`)
+
+Creates isolated test environments with temp data/config dirs and git repos:
+
+```typescript
+import { createTestFixture, destroyTestFixture } from '../fixtures';
+
+const fixture = createTestFixture('my-test');
+// fixture.dataDir, fixture.configDir, fixture.projectDir, fixture.env
+destroyTestFixture(fixture);
+```
+
+### Results DB (`results-db.ts`)
+
+JSON-based test results store for tracking runs and steps:
+
+```typescript
+import { ResultsDb } from '../results-db';
+
+const db = new ResultsDb();
+db.startRun('run-001', 'v2-mission-control', 'abc123');
+db.recordStep({ run_id: 'run-001', scenario_name: 'Smoke', step_name: 'renders', status: 'passed', ... });
+db.finishRun('run-001', 'passed', 5000);
+```
 
 ## File structure
 
 ```
 tests/e2e/
-├── README.md          # This file
-├── wdio.conf.js       # WebdriverIO config with tauri-driver lifecycle
-├── tsconfig.json      # TypeScript config for test specs
+├── README.md                         # This file
+├── wdio.conf.js                      # WebdriverIO config with tauri-driver lifecycle
+├── tsconfig.json                     # TypeScript config for test specs
+├── fixtures.ts                       # Test fixture generator (isolated environments)
+├── results-db.ts                     # JSON test results store
 └── specs/
-    └── smoke.test.ts  # Basic smoke tests (app renders, sidebar toggle)
+    ├── bterminal.test.ts             # Smoke tests (CSS class selectors, 50+ tests)
+    └── agent-scenarios.test.ts       # Phase A scenarios (data-testid selectors, 22 tests)
 ```
 
 ## References