docs: update all references for restructured docs layout

Update CLAUDE.md, .claude/CLAUDE.md, README.md, CHANGELOG.md to reference new paths: decisions.md, progress/, release-notes.md, unified findings.md. Fix branch name reference (dexter_changes -> hib_changes). Rewrite TODO.md with grouped categories (Multi-Machine, Multi-Agent, Security, Reliability).
2026-03-14 02:51:22 +01:00 · 2026-03-14 02:51:22 +01:00 · 949d90887d
commit 949d90887d
parent a89e2b9f69
5 changed files with 44 additions and 41 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@ -6,28 +6,26 @@
 - v2 docs are in `docs/`. Architecture in `docs/architecture.md`.
 - v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery.
 - v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (sandboxed new Function(), permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (35). 444 vitest + 151 cargo + 109 E2E.
- v3 docs: `docs/v3-task_plan.md`, `docs/v3-findings.md`, `docs/v3-progress.md`.
 - Consult Memora (tag: `bterminal`) before making architectural changes.

 ## Documentation References

 - System architecture: [docs/architecture.md](../docs/architecture.md)
+- Architecture decisions: [docs/decisions.md](../docs/decisions.md)
 - Sidecar architecture: [docs/sidecar.md](../docs/sidecar.md)
 - Multi-agent orchestration: [docs/orchestration.md](../docs/orchestration.md)
 - Production hardening: [docs/production.md](../docs/production.md)
- v3 design decisions: [docs/v3-task_plan.md](../docs/v3-task_plan.md)
- v3 findings: [docs/v3-findings.md](../docs/v3-findings.md)
 - Implementation phases: [docs/phases.md](../docs/phases.md)
 - Research findings: [docs/findings.md](../docs/findings.md)
- Progress log: [docs/progress.md](../docs/progress.md)
+- Progress logs: [docs/progress/](../docs/progress/)

 ## Rules

 - Do not modify v1 code (`bterminal.py`) unless explicitly asked — it is production-stable.
- v2/v3 work goes on the `dexter_changes` branch (repo: agent-orchestrator), not master.
- Architecture decisions must reference `docs/v3-task_plan.md` Decisions Log.
- When adding new decisions, append to the Decisions Log table with date.
- Update `docs/progress.md` after each significant work session.
+- v2/v3 work goes on the `hib_changes` branch (repo: agent-orchestrator), not master.
+- Architecture decisions must reference `docs/decisions.md`.
+- When adding new decisions, append to the appropriate category table with date.
+- Update `docs/progress/` after each significant work session.

 ## Key Technical Constraints

@ -86,7 +84,7 @@
 - E2E test mode (`BTERMINAL_TEST=1`): watcher.rs and fs_watcher.rs skip file watchers, wake-scheduler disabled via `disableWakeScheduler()`, `is_test_mode` Tauri command bridges to frontend. Data/config dirs overridable via `BTERMINAL_TEST_DATA_DIR`/`BTERMINAL_TEST_CONFIG_DIR`. E2E uses WebDriverIO + tauri-driver, single session, TCP readiness probe. Phase A: 7 data-testid-based scenarios in `agent-scenarios.test.ts` (deterministic assertions). Phase B: 6 scenarios in `phase-b.test.ts` (multi-project grid, independent tab switching, status bar fleet state, LLM-judged agent responses/code generation, context tab verification). LLM judge (`llm-judge.ts`): raw fetch to Anthropic API using claude-haiku-4-5, structured verdict (pass/fail + reasoning + confidence), `assertWithJudge()` with configurable threshold, skips when `ANTHROPIC_API_KEY` absent. CI workflow (`.github/workflows/e2e.yml`): unit + cargo + e2e jobs, xvfb-run, path-filtered triggers, LLM tests gated on secret. Test fixtures in `fixtures.ts` create isolated temp environments. Results tracked via JSON store in `results-db.ts`.
 - v3 SQLite additions: agent_messages table (per-project message persistence), project_agent_state table (sdkSessionId, cost, status per project), sessions.project_id column.
 - v3 App.svelte: VSCode-style sidebar layout. Horizontal: left icon rail (GlobalTabBar, 2.75rem, single Settings gear icon) + expandable drawer panel (Settings only, content-driven width, max 50%) + main workspace (ProjectGrid always visible) + StatusBar. Sidebar has Settings only — Sessions/Docs/Context are project-specific (in ProjectBox tabs). Keyboard: Ctrl+B (toggle sidebar), Ctrl+, (settings), Escape (close).
- v3 component tree: App -> GlobalTabBar (settings icon) + sidebar-panel? (SettingsTab) + workspace (ProjectGrid) + StatusBar. See `docs/v3-task_plan.md` for full tree.
+- v3 component tree: App -> GlobalTabBar (settings icon) + sidebar-panel? (SettingsTab) + workspace (ProjectGrid) + StatusBar. See `docs/architecture.md` for full tree.
 - MarkdownPane reactively watches filePath changes via $effect (not onMount-only). Uses sans-serif font (Inter, system-ui), all --ctp-* theme vars. Styled blockquotes with translucent backgrounds, table row hover, link hover underlines. Inner `.markdown-pane-scroll` wrapper with `container-type: inline-size` for responsive padding via `--bterminal-pane-padding-inline`.
 - AgentPane UI (redesigned 2026-03-09): sans-serif root font (`system-ui, -apple-system, sans-serif`), monospace only on code/tool names. Tool calls paired with results in collapsible `<details>` groups via `$derived.by` toolResultMap (cache-guarded by tool_result count). Hook messages collapsed into compact `<details>` with gear icon. Context window meter inline in status strip. Cost bar minimal (no background, subtle border-top). Session summary with translucent surface background. Two-phase scroll anchoring (`$effect.pre` + `$effect`). Tool-aware output truncation (Bash 500 lines, Read/Write 50, Glob/Grep 20, default 30). Colors softened via `color-mix()`. Inner `.agent-pane-scroll` wrapper with `container-type: inline-size` for responsive padding via shared `--bterminal-pane-padding-inline` variable.
 - ProjectBox uses CSS `style:display` (flex/none) instead of `{#if}` for tab content panes — keeps AgentSession mounted across tab switches (prevents session ID reset and message loss). Terminal section also uses `style:display`. Grid rows: auto auto 1fr auto.
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -306,7 +306,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - v3 App.svelte full rewrite: GlobalTabBar + tab content area + StatusBar (no sidebar, no TilingGrid)
 - 24 new vitest tests for workspace store, 7 new cargo tests for groups (total: 138 vitest + 36 cargo)
 - v3 adversarial architecture review: 3 agents (Architect, Devil's Advocate, UX+Performance Specialist), 12 issues identified and resolved
- v3 Mission Control redesign planning: architecture docs (`docs/v3-task_plan.md`, `docs/v3-findings.md`, `docs/v3-progress.md`), codebase reuse analysis
+- v3 Mission Control redesign planning: architecture docs (`docs/architecture.md`, `docs/decisions.md`, `docs/findings.md`), codebase reuse analysis
 - Claude profile/account switching: `claude_list_profiles()` reads `~/.config/switcher/profiles/` directories with `profile.toml` metadata (email, subscription_type, display_name); profile selector dropdown in AgentPane toolbar when multiple profiles available; selected profile's `config_dir` passed as `CLAUDE_CONFIG_DIR` env override to SDK
 - Skill discovery and autocomplete: `claude_list_skills()` reads `~/.claude/skills/` (directories with `SKILL.md` or standalone `.md` files); type `/` in agent prompt textarea to trigger autocomplete menu with arrow key navigation, Tab/Enter selection, Escape dismiss; `expandSkillPrompt()` reads skill content and injects as prompt
 - New frontend adapter `claude-bridge.ts`: `ClaudeProfile` and `ClaudeSkill` interfaces, `listProfiles()`, `listSkills()`, `readSkill()` IPC wrappers
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -21,15 +21,13 @@ Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Pyth
 | `install.sh` | v1 system installer |
 | `install-v2.sh` | v2 build-from-source installer (Node.js 20+, Rust 1.77+, system libs) |
 | `.github/workflows/release.yml` | CI: builds .deb + AppImage on v* tags, uploads to GitHub Releases |
-| `docs/task_plan.md` | v2 architecture decisions and strategies |
+| `docs/architecture.md` | End-to-end system architecture, data model, layout system |
+| `docs/decisions.md` | Architecture decisions log with rationale and dates |
 | `docs/phases.md` | v2 implementation phases (1-7 + multi-machine A-D) |
-| `docs/findings.md` | v2 research findings |
-| `docs/progress.md` | Session progress log (recent) |
-| `docs/progress-archive.md` | Archived progress log (2026-03-05 to 2026-03-06 early) |
+| `docs/findings.md` | All research findings (v2 + v3 combined) |
+| `docs/progress/` | Session progress logs (v2, v3, archive) |
 | `docs/multi-machine.md` | Multi-machine architecture (implemented, Phases A-D) |
-| `docs/v3-task_plan.md` | v3 Mission Control redesign: architecture decisions and strategies |
-| `docs/v3-findings.md` | v3 research findings and codebase reuse analysis |
-| `docs/v3-progress.md` | v3 session progress log |
+| `docs/release-notes.md` | v3.0 release notes |
 | `docs/e2e-testing.md` | E2E testing facility: fixtures, test mode, LLM judge, spec phases, CI |
 | `v2/Cargo.toml` | Cargo workspace root (members: src-tauri, bterminal-core, bterminal-relay) |
 | `v2/bterminal-core/` | Shared crate: EventSink trait, PtyManager, SidecarManager |
--- a/README.md
+++ b/README.md
@ -130,9 +130,9 @@ cd v2 && cargo build --release -p bterminal-relay

 | Document | Description |
 |----------|-------------|
-| [docs/v3-task_plan.md](docs/v3-task_plan.md) | Architecture decisions and strategies |
-| [docs/v3-progress.md](docs/v3-progress.md) | Session progress log |
-| [docs/v3-release-notes.md](docs/v3-release-notes.md) | v3.0 release notes |
+| [docs/decisions.md](docs/decisions.md) | Architecture decisions log |
+| [docs/progress/](docs/progress/) | Session progress logs (v2, v3, archive) |
+| [docs/release-notes.md](docs/release-notes.md) | v3.0 release notes |
 | [docs/e2e-testing.md](docs/e2e-testing.md) | E2E testing facility documentation |
 | [docs/multi-machine.md](docs/multi-machine.md) | Multi-machine relay architecture |

--- a/TODO.md
+++ b/TODO.md
@ -1,26 +1,33 @@
-# BTerminal -- TODO
+# Agent Orchestrator — TODO

-## Active
+## Multi-Machine (v3.1)

-### v3.1 Remaining
- [ ] **Multi-machine real-world testing** -- TLS added to relay. Needs real 2-machine test. Multi-machine UI not surfaced in v3, code exists in bridges/stores only.
- [ ] **Certificate pinning** -- TLS encryption done (v3.0). Pin cert hash in RemoteManager for v3.1.
- [ ] **Agent Teams real-world testing** -- Subagent delegation prompt fix done + env var injection. Needs real multi-agent session to verify Manager spawns child agents.
- [ ] **Plugin sandbox migration** -- `new Function()` has inherent escape vectors (prototype walking, arguments.callee.constructor). Consider Web Worker isolation for v3.2.
- [ ] **Soak test** -- Run 4-hour soak with 6+ agents across 3+ projects. Monitor memory, SQLite WAL size, xterm.js instances.
+- [ ] **Real-world relay testing** — TLS added, code complete in bridges/stores. Needs 2-machine test to verify relay + RemoteManager end-to-end. Multi-machine UI not yet surfaced in v3 ProjectBox.
+- [ ] **Certificate pinning** — TLS encryption works. Pin relay cert hash in RemoteManager to prevent MITM. Planned for v3.1.
+
+## Multi-Agent (v3.1)
+
+- [ ] **Agent Teams real-world testing** — Subagent delegation prompt + `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env injection done. Needs real multi-agent session to verify Manager spawns child agents via SDK teams.
+
+## Security (v3.2)
+
+- [ ] **Plugin sandbox migration** — Current `new Function()` sandbox has escape vectors (prototype walking, `arguments.callee.constructor`). Migrate to Web Worker isolation for true process-level sandboxing.
+
+## Reliability
+
+- [ ] **Soak test** — Run 4-hour soak with 6+ agents across 3+ projects. Monitor: memory growth, SQLite WAL size, xterm.js instance count, sidecar supervisor restarts.

 ## Completed

- [x] **E2E fixture + judge hardening** -- Fixed fixture env propagation (process.env injection, tauri:options.env unreliable), LLM judge CLI context isolation (--setting-sources user, cwd /tmp, --system-prompt), mocha timeout 180s. Confirmed fixture fakes project list. Agent tests CI-only (nested Claude limitation). | Done: 2026-03-12
- [x] **LLM judge refactor + E2E docs** -- Refactored llm-judge.ts to dual-mode (CLI first, API fallback), env-configurable via LLM_JUDGE_BACKEND. Wrote comprehensive docs/e2e-testing.md covering fixtures, test mode, LLM judge, all spec phases, CI, troubleshooting. 444 vitest + 151 cargo + 109 E2E. | Done: 2026-03-12
- [x] **v3 Hardening Sprint** -- Fixed subagent delegation (prompt + env var), added TLS to relay, WAL checkpoint (5min), Landlock logging, plugin sandbox tests (35), gitignore fix. Phase C E2E tests (27 new, 3 pre-existing fixes). 444 vitest + 151 cargo + 109 E2E. | Done: 2026-03-12
- [x] **v3 Production Readiness — ALL tribunal items** -- Implemented all 13 features from tribunal assessment: sidecar supervisor, notifications, secrets, keyboard UX, agent health, search, plugins, sandbox, error classifier, audit log, team agent orchestration, optimistic locking, usage meter. 409 vitest + 109 cargo. | Done: 2026-03-12
- [x] **Unified test runner + testing gate rule** -- Created v2/scripts/test-all.sh (vitest + cargo + optional E2E), added npm scripts (test:all, test:all:e2e, test:cargo), added .claude/rules/20-testing-gate.md requiring full suite after major changes. | Done: 2026-03-12
- [x] **E2E testing — Phase B+ & test fixes** -- Phase B: LLM judge (llm-judge.ts, claude-haiku-4-5), 6 multi-project scenarios, CI workflow (3 jobs). Test fixes: 27 failures across 3 spec files. 388 vitest + 68 cargo + 82 E2E (0 fail, 4 skip). | Done: 2026-03-12
- [x] **Reviewer agent role** -- Tier 1 specialist with role='reviewer'. Reviewer workflow in agent-prompts.ts (8-step process). #review-queue/#review-log auto-channels. reviewQueueDepth in attention scoring (10pts/task, cap 50). 388 vitest + 76 cargo. | Done: 2026-03-12
- [x] **Auto-wake Manager** -- wake-scheduler.svelte.ts + wake-scorer.ts (24 tests). 3 strategies: persistent/on-demand/smart. 6 signals. Settings UI. 381 vitest + 72 cargo. | Done: 2026-03-12
- [x] **Dashboard metrics panel** -- MetricsPanel.svelte: live health + task board summary + SVG sparkline history. 25 tests. 357 vitest + 72 cargo. | Done: 2026-03-12
- [x] **Brand Dexter's new types (SOLID Phase 3b)** -- GroupId + AgentId branded types. Applied to ~40 sites. 332 vitest + 72 cargo. | Done: 2026-03-11
- [x] **Regression tests + sidecar env security** -- 49 new tests. Added ANTHROPIC_* to Rust env strip. 327 vitest + 72 cargo. | Done: 2026-03-11
- [x] **Integrate dexter_changes + fix 5 critical bugs** -- Fixed: btmsg.rs column index, btmsg-bridge camelCase, GroupAgentsPanel stopPropagation, ArchitectureTab PlantUML, TestingTab Tauri 2.x. | Done: 2026-03-11
- [x] **SOLID Phase 3 — Primitive obsession** -- Branded types SessionId/ProjectId. Applied to ~130 sites. 293 vitest + 49 cargo. | Done: 2026-03-11
+- [x] E2E fixture + judge hardening | Done: 2026-03-12
+- [x] LLM judge refactor + E2E docs | Done: 2026-03-12
+- [x] v3 Hardening Sprint (TLS, WAL, Landlock, plugin tests, Phase C E2E) | Done: 2026-03-12
+- [x] v3 Production Readiness — all 13 tribunal items | Done: 2026-03-12
+- [x] Unified test runner + testing gate rule | Done: 2026-03-12
+- [x] E2E Phase B + 27 test fixes | Done: 2026-03-12
+- [x] Reviewer agent role | Done: 2026-03-12
+- [x] Auto-wake Manager scheduler | Done: 2026-03-12
+- [x] Dashboard metrics panel | Done: 2026-03-12
+- [x] Branded types (GroupId, AgentId, SessionId, ProjectId) | Done: 2026-03-11
+- [x] Regression tests + sidecar env security | Done: 2026-03-11
+- [x] Integration fix (btmsg column, camelCase, PlantUML, Tauri 2.x assets) | Done: 2026-03-11