From 2e29ba5d9a256dfe6249d04bc71c8952bc07739a Mon Sep 17 00:00:00 2001
From: Hibryda <hibryda@protonmail.com>
Date: Thu, 12 Mar 2026 03:50:13 +0100
Subject: [PATCH] docs: update meta files for E2E test fixes

Update test counts (82 E2E passing), add CHANGELOG entries for
27 fixed failures and AgentPane template fix, update TODO.md.
---
 .claude/CLAUDE.md | 2 +-
 CHANGELOG.md      | 4 ++++
 TODO.md           | 2 +-
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md
index 5f45b9e..1bac83f 100644
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@@ -5,7 +5,7 @@
 - v1 is a single-file Python app (`bterminal.py`). Changes are localized.
 - v2 docs are in `docs/`. Architecture decisions are in `docs/task_plan.md`.
 - v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery.
-- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 388 vitest + 68 cargo tests + 22 Phase A E2E + 6 Phase B E2E (multi-project + LLM-judged).
+- v3 Mission Control (All Phases 1-10 Complete + S-1 Phase 1/1.5/2/3 + S-2 Session Anchors + Provider Adapter Pattern + Provider Runners + Memora Adapter + SOLID Phase 3 + Multi-Agent Orchestration): project groups, workspace store, 15 Workspace components, session continuity, workspace teardown, file overlap conflict detection, inotify-based external write detection, multi-provider adapter pattern (3 phases + Codex/Ollama runners), worktree isolation, session anchors, Memora adapter (read-only SQLite), SOLID refactoring (agent-dispatcher split → 4 utils, session.rs split → 7 sub-modules, branded types), multi-agent orchestration (btmsg inter-agent messaging, bttask kanban task board, agent prompt generator, BTMSG_AGENT_ID env passthrough, periodic re-injection, role-specific tabs: Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks), dead v2 component cleanup, dashboard metrics panel (MetricsPanel.svelte — live health + task counts + SVG sparkline history), auto-wake Manager scheduler (3 strategies: persistent/on-demand/smart, 6 signal types, configurable threshold), reviewer agent role (workflow prompt, #review-queue/#review-log auto-channels, reviewQueueDepth attention scoring 10pts/task cap 50, Tasks tab). 388 vitest + 68 cargo + 82 E2E (smoke + Phase A + Phase B LLM-judged).
 - v3 docs: `docs/v3-task_plan.md`, `docs/v3-findings.md`, `docs/v3-progress.md`.
 - Consult Memora (tag: `bterminal`) before making architectural changes.
 
diff --git a/CHANGELOG.md b/CHANGELOG.md
index 4605a60..3c6bc5b 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -17,6 +17,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **LLM judge helper** — `tests/e2e/llm-judge.ts`: raw fetch to Anthropic API (claude-haiku-4-5), structured verdicts (pass/fail + reasoning + confidence), `assertWithJudge()` with configurable min confidence threshold, graceful skip when `ANTHROPIC_API_KEY` absent
 - **E2E CI workflow** — `.github/workflows/e2e.yml`: 3 jobs (vitest, cargo, e2e), xvfb-run for headless WebKit2GTK, path-filtered triggers on v2 source changes, LLM-judged tests gated on `ANTHROPIC_API_KEY` secret availability
 
+### Fixed
+- **E2E test suite — 27 failures fixed** across 3 spec files: bterminal.test.ts (22 — stale v2 CSS selectors, v3 tab order/count, JS-dispatched KeyboardEvent for Ctrl+K, idempotent palette open/close, backdrop click close, scrollIntoView for below-fold settings, scoped theme dropdown selectors), agent-scenarios.test.ts (3 — JS click for settings button, programmatic focus check, graceful 40s agent timeout with skip), phase-b.test.ts (2 — waitUntil for project box render, conditional null handling for burn-rate/cost elements). 82 E2E passing, 0 failing, 4 skipped
+- **AgentPane.svelte missing closing `>`** — div tag with data-testid attributes was missing closing angle bracket, causing template parse issues
+
 ### Changed
 - **WebDriverIO config** — TCP readiness probe replaces blind 2s sleep for tauri-driver startup (200ms interval, 10s deadline). Added BTERMINAL_TEST=1 passthrough in capabilities
 
diff --git a/TODO.md b/TODO.md
index 4a324b1..3455d6e 100644
--- a/TODO.md
+++ b/TODO.md
@@ -3,7 +3,7 @@
 ## Active
 
 ### v2/v3 Remaining
-- [x] **E2E testing — Phase B+** -- Phase B complete: LLM judge helper (llm-judge.ts, raw Anthropic API fetch, claude-haiku-4-5), 6 multi-project scenarios (phase-b.test.ts: grid rendering, independent tabs, status bar, LLM-judged agent responses + code generation, context tab), CI workflow (e2e.yml: 3 jobs, xvfb-run, path-filtered, LLM tests gated on secret). 388 vitest + 68 cargo + 22 Phase A + 6 Phase B E2E. | Done: 2026-03-12
+- [x] **E2E testing — Phase B+ & test fixes** -- Phase B: LLM judge (llm-judge.ts, claude-haiku-4-5), 6 multi-project scenarios, CI workflow (3 jobs). Test fixes: 27 failures across 3 spec files (stale v2 CSS selectors, WebKit2GTK keyboard/focus quirks, conditional render timing). Shared idempotent helpers, JS-dispatched KeyboardEvent, backdrop click close, programmatic focus checks. 388 vitest + 68 cargo + 82 E2E (0 fail, 4 skip). | Done: 2026-03-12
 - [ ] **Multi-machine real-world testing** -- Test bterminal-relay with 2 machines.
 - [ ] **Multi-machine TLS/certificate pinning** -- TLS support for bterminal-relay + certificate pinning in RemoteManager.
 - [ ] **Agent Teams real-world testing** -- Env var whitelist fix done. 3 test sessions ran ($1.10, $0.69, $1.70) but model didn't spawn subagents — needs complex multi-part prompts to trigger delegation. Test with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1.