docs: update meta files for E2E test fixes

Update test counts (82 E2E passing), add CHANGELOG entries for
27 fixed failures and AgentPane template fix, update TODO.md.
This commit is contained in:
Hibryda 2026-03-12 03:50:13 +01:00
parent 9ce7c35325
commit 2e29ba5d9a
3 changed files with 6 additions and 2 deletions

View file

@ -5,7 +5,7 @@
- v1 is a single-file Python app (`bterminal.py`). Changes are localized.
- v2 docs are in `docs/`. Architecture decisions are in `docs/task_plan.md`.
- v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery.
- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 388 vitest + 68 cargo tests + 22 Phase A E2E + 6 Phase B E2E (multi-project + LLM-judged).
- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 388 vitest + 68 cargo + 82 E2E (smoke + Phase A + Phase B LLM-judged).
- v3 docs: `docs/v3-task_plan.md`, `docs/v3-findings.md`, `docs/v3-progress.md`.
- Consult Memora (tag: `bterminal`) before making architectural changes.

View file

@ -17,6 +17,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **LLM judge helper**`tests/e2e/llm-judge.ts`: raw fetch to Anthropic API (claude-haiku-4-5), structured verdicts (pass/fail + reasoning + confidence), `assertWithJudge()` with configurable min confidence threshold, graceful skip when `ANTHROPIC_API_KEY` absent
- **E2E CI workflow**`.github/workflows/e2e.yml`: 3 jobs (vitest, cargo, e2e), xvfb-run for headless WebKit2GTK, path-filtered triggers on v2 source changes, LLM-judged tests gated on `ANTHROPIC_API_KEY` secret availability
### Fixed
- **E2E test suite — 27 failures fixed** across 3 spec files: bterminal.test.ts (22 — stale v2 CSS selectors, v3 tab order/count, JS-dispatched KeyboardEvent for Ctrl+K, idempotent palette open/close, backdrop click close, scrollIntoView for below-fold settings, scoped theme dropdown selectors), agent-scenarios.test.ts (3 — JS click for settings button, programmatic focus check, graceful 40s agent timeout with skip), phase-b.test.ts (2 — waitUntil for project box render, conditional null handling for burn-rate/cost elements). 82 E2E passing, 0 failing, 4 skipped
- **AgentPane.svelte missing closing `>`** — div tag with data-testid attributes was missing closing angle bracket, causing template parse issues
### Changed
- **WebDriverIO config** — TCP readiness probe replaces blind 2s sleep for tauri-driver startup (200ms interval, 10s deadline). Added BTERMINAL_TEST=1 passthrough in capabilities

View file

@ -3,7 +3,7 @@
## Active
### v2/v3 Remaining
- [x] **E2E testing — Phase B+** -- Phase B complete: LLM judge helper (llm-judge.ts, raw Anthropic API fetch, claude-haiku-4-5), 6 multi-project scenarios (phase-b.test.ts: grid rendering, independent tabs, status bar, LLM-judged agent responses + code generation, context tab), CI workflow (e2e.yml: 3 jobs, xvfb-run, path-filtered, LLM tests gated on secret). 388 vitest + 68 cargo + 22 Phase A + 6 Phase B E2E. | Done: 2026-03-12
- [x] **E2E testing — Phase B+ & test fixes** -- Phase B: LLM judge (llm-judge.ts, claude-haiku-4-5), 6 multi-project scenarios, CI workflow (3 jobs). Test fixes: 27 failures across 3 spec files (stale v2 CSS selectors, WebKit2GTK keyboard/focus quirks, conditional render timing). Shared idempotent helpers, JS-dispatched KeyboardEvent, backdrop click close, programmatic focus checks. 388 vitest + 68 cargo + 82 E2E (0 fail, 4 skip). | Done: 2026-03-12
- [ ] **Multi-machine real-world testing** -- Test bterminal-relay with 2 machines.
- [ ] **Multi-machine TLS/certificate pinning** -- TLS support for bterminal-relay + certificate pinning in RemoteManager.
- [ ] **Agent Teams real-world testing** -- Env var whitelist fix done. 3 test sessions ran ($1.10, $0.69, $1.70) but model didn't spawn subagents — needs complex multi-part prompts to trigger delegation. Test with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.