docs: update meta files for E2E testing engine Phase A

This commit is contained in:
Hibryda 2026-03-12 02:52:14 +01:00
parent c6c38b91c6
commit c4c673a4b0
6 changed files with 145 additions and 9 deletions

View file

@ -5,7 +5,7 @@
- v1 is a single-file Python app (`bterminal.py`). Changes are localized.
- v2 docs are in `docs/`. Architecture decisions are in `docs/task_plan.md`.
- v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery.
- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 388 vitest + 76 cargo tests.
- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 345 vitest + 68 cargo tests + 22 E2E scenarios (Phase A).
- v3 docs: `docs/v3-task_plan.md`, `docs/v3-findings.md`, `docs/v3-progress.md`.
- Consult Memora (tag: `bterminal`) before making architectural changes.
@ -82,6 +82,7 @@
- v3 workspace store (`workspace.svelte.ts`) replaces layout store for v3. Groups loaded from `~/.config/bterminal/groups.json` via `groups-bridge.ts`. State: groups, activeGroupId, activeTab, focusedProjectId. Derived: activeGroup, activeProjects.
- v3 groups backend (`groups.rs`): load_groups(), save_groups(), default_groups(). Tauri commands: groups_load, groups_save.
- Telemetry (`telemetry.rs`): tracing + optional OTLP export to Tempo. `BTERMINAL_OTLP_ENDPOINT` env var controls (absent = console-only). TelemetryGuard in AppState with Drop-based shutdown. Frontend events route through `frontend_log` Tauri command → Rust tracing (no browser OTEL SDK — WebKit2GTK incompatible). `telemetry-bridge.ts` provides `tel.info/warn/error()` convenience API. Docker stack at `docker/tempo/` (Grafana port 9715).
- E2E test mode (`BTERMINAL_TEST=1`): watcher.rs and fs_watcher.rs skip file watchers, wake-scheduler disabled via `disableWakeScheduler()`, `is_test_mode` Tauri command bridges to frontend. Data/config dirs overridable via `BTERMINAL_TEST_DATA_DIR`/`BTERMINAL_TEST_CONFIG_DIR`. E2E uses WebDriverIO + tauri-driver, single session, TCP readiness probe. 7 data-testid-based scenarios in `agent-scenarios.test.ts`. Test fixtures in `fixtures.ts` create isolated temp environments. Results tracked via JSON store in `results-db.ts`.
- v3 SQLite additions: agent_messages table (per-project message persistence), project_agent_state table (sdkSessionId, cost, status per project), sessions.project_id column.
- v3 App.svelte: VSCode-style sidebar layout. Horizontal: left icon rail (GlobalTabBar, 2.75rem, single Settings gear icon) + expandable drawer panel (Settings only, content-driven width, max 50%) + main workspace (ProjectGrid always visible) + StatusBar. Sidebar has Settings only — Sessions/Docs/Context are project-specific (in ProjectBox tabs). Keyboard: Ctrl+B (toggle sidebar), Ctrl+, (settings), Escape (close).
- v3 component tree: App -> GlobalTabBar (settings icon) + sidebar-panel? (SettingsTab) + workspace (ProjectGrid) + StatusBar. See `docs/v3-task_plan.md` for full tree.

View file

@ -7,6 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Added
- **E2E test mode infrastructure**`BTERMINAL_TEST=1` env var disables file watchers (watcher.rs, fs_watcher.rs), wake scheduler, and allows data/config dir overrides via `BTERMINAL_TEST_DATA_DIR`/`BTERMINAL_TEST_CONFIG_DIR`. New `is_test_mode` Tauri command bridges test state to frontend
- **E2E data-testid attributes** — Stable test selectors on 7 key Svelte components: AgentPane (agent-pane, data-agent-status, agent-messages, agent-stop, agent-prompt, agent-submit), ProjectBox (project-box, data-project-id, project-tabs, terminal-toggle), StatusBar, AgentSession, GlobalTabBar, CommandPalette, TerminalTabs
- **E2E Phase A scenarios** — 7 human-authored test scenarios (22 tests) in `agent-scenarios.test.ts`: app structural integrity, settings panel, agent pane initial state, terminal tab management, command palette, project focus/tab switching, agent prompt submission (graceful Claude CLI skip)
- **E2E test fixtures**`tests/e2e/fixtures.ts`: creates isolated temp environments with data/config dirs, git repos, and groups.json. `createTestFixture()`, `createMultiProjectFixture()`, `destroyTestFixture()`
- **E2E results store**`tests/e2e/results-db.ts`: JSON-based test run/step tracking (pivoted from better-sqlite3 due to Node 25 native compile failure)
### Changed
- **WebDriverIO config** — TCP readiness probe replaces blind 2s sleep for tauri-driver startup (200ms interval, 10s deadline). Added BTERMINAL_TEST=1 passthrough in capabilities
### Security
- `claude_read_skill` path traversal: added `canonicalize()` + `starts_with()` validation to prevent reading arbitrary files via crafted skill paths (commands/claude.rs)
- **Sidecar env allowlist hardening** — added `ANTHROPIC_*` to Rust-level `strip_provider_env_var()` as defense-in-depth (Claude CLI uses credentials file, not env for auth). Dual-layer stripping documented: Rust layer (first checkpoint) + JS runner layer (per-provider)

View file

@ -87,6 +87,11 @@ Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Pyth
| `v2/src/lib/adapters/telemetry-bridge.ts` | Frontend telemetry bridge (routes events to Rust tracing via IPC) |
| `v2/src/lib/utils/agent-prompts.ts` | Agent prompt generator (generateAgentPrompt: identity, env, team, btmsg/bttask docs, workflow) |
| `docker/tempo/` | Docker compose: Tempo + Grafana for trace visualization (port 9715) |
| `v2/tests/e2e/wdio.conf.js` | WebDriverIO config (tauri-driver lifecycle, TCP probe, test env vars) |
| `v2/tests/e2e/fixtures.ts` | E2E test fixture generator (isolated temp dirs, git repos, groups.json) |
| `v2/tests/e2e/results-db.ts` | JSON test results store (run/step tracking, no native deps) |
| `v2/tests/e2e/specs/bterminal.test.ts` | E2E smoke tests (CSS class selectors, 50+ tests) |
| `v2/tests/e2e/specs/agent-scenarios.test.ts` | Phase A E2E scenarios (data-testid selectors, 7 scenarios, 22 tests) |
| `v2/src/lib/stores/machines.svelte.ts` | Remote machine state store (Svelte 5 runes) |
| `v2/src/lib/utils/attention-scorer.ts` | Pure attention scoring function (extracted from health store, 14 tests) |
| `v2/src/lib/utils/wake-scorer.ts` | Pure wake signal evaluation (6 signals, 24 tests) |

View file

@ -3,7 +3,7 @@
## Active
### v2/v3 Remaining
- [ ] **E2E testing — expand coverage** -- 48 tests passing across 8 describe blocks (WebdriverIO v9.24 + tauri-driver, single spec file, ~23s). Add tests for agent sessions, terminal interaction.
- [ ] **E2E testing — Phase B+** -- Phase A complete: 72 tests across 2 spec files (smoke + 7 agent scenarios). Next: LLM-judged assertions, multi-project scenarios, CI integration (xvfb-run).
- [ ] **Multi-machine real-world testing** -- Test bterminal-relay with 2 machines.
- [ ] **Multi-machine TLS/certificate pinning** -- TLS support for bterminal-relay + certificate pinning in RemoteManager.
- [ ] **Agent Teams real-world testing** -- Env var whitelist fix done. 3 test sessions ran ($1.10, $0.69, $1.70) but model didn't spawn subagents — needs complex multi-part prompts to trigger delegation. Test with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.

View file

@ -923,3 +923,51 @@ Reviewed and integrated Dexter's multi-agent orchestration branch (dexter_change
- Vitest: 327 passed (was 286, +41)
- Cargo src-tauri: 64 passed (was 49, +15)
- Cargo bterminal-core: 8 passed (was 0, +8)
### E2E Testing Engine — Phase A (2026-03-12)
#### Test Mode Infrastructure
- [x] Rust: watcher.rs — test mode bypass (skip file watching, return content directly)
- [x] Rust: fs_watcher.rs — test mode bypass (skip inotify watchers)
- [x] Rust: commands/misc.rs — added `is_test_mode` Tauri command
- [x] Rust: lib.rs — registered is_test_mode in invoke handler
- [x] Frontend: wake-scheduler.svelte.ts — `disableWakeScheduler()` export + early return
- [x] Frontend: App.svelte — test mode detection via IPC, disables wake scheduler
#### E2E Test Anchors (data-testid attributes)
- [x] AgentPane: data-testid="agent-pane", data-agent-status, agent-messages, agent-stop, agent-prompt, agent-submit
- [x] ProjectBox: data-testid="project-box", data-project-id, project-tabs, terminal-toggle
- [x] StatusBar: data-testid="status-bar"
- [x] AgentSession: data-testid="agent-session"
- [x] GlobalTabBar: data-testid="sidebar-rail", settings-btn
- [x] CommandPalette: data-testid="command-palette", palette-input
- [x] TerminalTabs: data-testid="terminal-tabs", tab-add
#### WebDriverIO Config Improvements
- [x] TCP readiness probe replaces blind 2s sleep for tauri-driver startup
- [x] BTERMINAL_TEST=1 env var passed to Tauri app via capabilities
- [x] Optional BTERMINAL_TEST_DATA_DIR / BTERMINAL_TEST_CONFIG_DIR passthrough
#### Test Infrastructure Files
- [x] `v2/tests/e2e/fixtures.ts` — isolated test fixture generator (temp dirs, git repos, groups.json)
- [x] `v2/tests/e2e/results-db.ts` — JSON-based test results store (no native deps)
- [x] `v2/tests/e2e/specs/agent-scenarios.test.ts` — 7 Phase A scenarios (22 test cases)
#### Phase A Scenarios (7 scenarios, 22 tests)
1. **App Structural Integrity** — verifies all data-testid anchors render correctly
2. **Settings Panel (data-testid)** — open/close settings via stable selector
3. **Agent Pane Initial State** — idle status, prompt textarea, empty messages
4. **Terminal Tab Management** — add/close tabs via data-testid, empty state
5. **Command Palette (data-testid)** — open, focus, filter, close
6. **Project Focus & Tab Switching** — focus, tab persistence, agent status preservation
7. **Agent Prompt Submission** — textarea input, submit button state, graceful Claude CLI skip
#### Verification
- [x] cargo test: 68 passed, 0 failed
- [x] vitest: 345 passed across 18 files, 0 failed
- [x] svelte-check: 0 project errors (2 pre-existing esrap node_modules)
#### Test Counts
- Vitest: 345 passed (was 327, +18 — new wake-scorer + metrics tests from prior session)
- Cargo src-tauri: 68 passed (was 64, +4)
- E2E scenarios: 22 new test cases across 7 scenarios

View file

@ -16,12 +16,26 @@ The app runs inside WebKit2GTK on Linux, so tests interact with the real WebView
```bash
# From v2/ directory — builds debug binary automatically, spawns tauri-driver
npm run test:e2e
# Skip rebuild (use existing binary)
SKIP_BUILD=1 npm run test:e2e
# With test isolation (custom data/config dirs)
BTERMINAL_TEST_DATA_DIR=/tmp/bt-test/data BTERMINAL_TEST_CONFIG_DIR=/tmp/bt-test/config npm run test:e2e
```
The `wdio.conf.js` handles:
1. Building the debug binary (`cargo tauri build --debug --no-bundle`) in `onPrepare`
2. Spawning `tauri-driver` before each session
2. Spawning `tauri-driver` before each session (TCP readiness probe, 10s deadline)
3. Killing `tauri-driver` after each session
4. Passing `BTERMINAL_TEST=1` env var to the app for test mode isolation
## Test Mode (`BTERMINAL_TEST=1`)
When `BTERMINAL_TEST=1` is set:
- File watchers (watcher.rs, fs_watcher.rs) are disabled to avoid inotify noise
- Wake scheduler is disabled (no auto-wake timers)
- Data/config directories can be overridden via `BTERMINAL_TEST_DATA_DIR` / `BTERMINAL_TEST_CONFIG_DIR`
## CI setup (headless)
@ -42,26 +56,84 @@ import { browser, expect } from '@wdio/globals';
describe('BTerminal', () => {
it('should show the status bar', async () => {
const statusBar = await browser.$('.status-bar');
const statusBar = await browser.$('[data-testid="status-bar"]');
await expect(statusBar).toBeDisplayed();
});
});
```
Key constraints:
### Stable selectors
Prefer `data-testid` attributes over CSS class selectors:
| Element | Selector |
|---------|----------|
| Status bar | `[data-testid="status-bar"]` |
| Sidebar rail | `[data-testid="sidebar-rail"]` |
| Settings button | `[data-testid="settings-btn"]` |
| Project box | `[data-testid="project-box"]` |
| Project ID | `[data-project-id="..."]` |
| Project tabs | `[data-testid="project-tabs"]` |
| Agent session | `[data-testid="agent-session"]` |
| Agent pane | `[data-testid="agent-pane"]` |
| Agent status | `[data-agent-status="idle\|running\|..."]` |
| Agent messages | `[data-testid="agent-messages"]` |
| Agent prompt | `[data-testid="agent-prompt"]` |
| Agent submit | `[data-testid="agent-submit"]` |
| Agent stop | `[data-testid="agent-stop"]` |
| Terminal tabs | `[data-testid="terminal-tabs"]` |
| Add tab button | `[data-testid="tab-add"]` |
| Terminal toggle | `[data-testid="terminal-toggle"]` |
| Command palette | `[data-testid="command-palette"]` |
| Palette input | `[data-testid="palette-input"]` |
### Key constraints
- `maxInstances: 1` — Tauri doesn't support parallel WebDriver sessions
- Mocha timeout is 60s — the app needs time to initialize
- Tests interact with the real WebKit2GTK WebView, not a browser
- Use `browser.execute()` for JS clicks when WebDriver clicks don't trigger Svelte handlers
- Agent tests (Scenario 7) require a real Claude CLI install + API key — they skip gracefully if unavailable
## Test infrastructure
### Fixtures (`fixtures.ts`)
Creates isolated test environments with temp data/config dirs and git repos:
```typescript
import { createTestFixture, destroyTestFixture } from '../fixtures';
const fixture = createTestFixture('my-test');
// fixture.dataDir, fixture.configDir, fixture.projectDir, fixture.env
destroyTestFixture(fixture);
```
### Results DB (`results-db.ts`)
JSON-based test results store for tracking runs and steps:
```typescript
import { ResultsDb } from '../results-db';
const db = new ResultsDb();
db.startRun('run-001', 'v2-mission-control', 'abc123');
db.recordStep({ run_id: 'run-001', scenario_name: 'Smoke', step_name: 'renders', status: 'passed', ... });
db.finishRun('run-001', 'passed', 5000);
```
## File structure
```
tests/e2e/
├── README.md # This file
├── wdio.conf.js # WebdriverIO config with tauri-driver lifecycle
├── tsconfig.json # TypeScript config for test specs
├── README.md # This file
├── wdio.conf.js # WebdriverIO config with tauri-driver lifecycle
├── tsconfig.json # TypeScript config for test specs
├── fixtures.ts # Test fixture generator (isolated environments)
├── results-db.ts # JSON test results store
└── specs/
└── smoke.test.ts # Basic smoke tests (app renders, sidebar toggle)
├── bterminal.test.ts # Smoke tests (CSS class selectors, 50+ tests)
└── agent-scenarios.test.ts # Phase A scenarios (data-testid selectors, 22 tests)
```
## References