docs: expand README index and v3-findings with deep research content

README.md: from 42-line index to rich documentation hub with project overview, reading order, and key directory listing. v3-findings.md: from 63 lines to comprehensive research findings covering adversarial review details, provider coupling analysis, codebase reuse, session anchor design, multi-agent design rationale, theme evolution, and performance measurements.
2026-03-14 02:33:59 +01:00 · 2026-03-14 02:33:59 +01:00 · 9a295c224c
commit 9a295c224c
parent de8dd04f4b
2 changed files with 314 additions and 73 deletions
--- a/docs/v3-findings.md
+++ b/docs/v3-findings.md
@ -1,62 +1,251 @@
 # BTerminal v3 — Research Findings

-## Adversarial Review Results (2026-03-07)
+## 1. Adversarial Architecture Review (2026-03-07)
+
+Three specialized agents reviewed the v3 Mission Control architecture before implementation began. This adversarial process caught 12 issues (4 critical) that would have required expensive rework if discovered later.

 ### Agent: Architect (Advocate)
- Proposed full component tree, data model, 10-phase plan
- JSON config at `~/.config/bterminal/groups.json`
- Single shared sidecar (multiplexed sessions)
- ClaudeSession + TeamAgentsPanel split from AgentPane
- SQLite tables: agent_messages, project_agent_state
- MVP at Phase 5
+
+The Architect proposed the core design:
+
+- **Project Groups** as the primary organizational unit (replacing free-form panes)
+- **JSON config** (`groups.json`) for human-editable group/project definitions, SQLite for runtime state
+- **Single shared sidecar** with per-project isolation via `cwd`, `claude_config_dir`, and `session_id`
+- **Component split:** AgentPane → AgentSession + TeamAgentsPanel (subagents shown inline, not as separate panes)
+- **New SQLite tables:** `agent_messages` (per-project message persistence), `project_agent_state` (sdkSessionId, cost, status)
+- **MVP boundary at Phase 5** (5 phases for core, 5 for polish)
+- **10-phase implementation plan** covering data model, shell, session integration, terminals, team panel, continuity, palette, docs, settings, cleanup

 ### Agent: Devil's Advocate
- Found 12 issues, 4 critical:
-  1. xterm.js 4-instance ceiling (hard OOM wall)
-  2. Single sidecar SPOF
-  3. Layout store has no workspace concept
-  4. 384px per project unusable on 1920px
- Recommended: fix workspace concept, xterm budget, UI density, persistence before anything else
- Proposed suspend/resume ring buffer for terminals
- Proposed per-project sidecar pool (max 3) — deferred to v3.1
+
+The Devil's Advocate found 12 issues across the Architect's proposal:
+
+| # | Issue | Severity | Why It Matters |
+|---|-------|----------|----------------|
+| 1 | xterm.js 4-instance ceiling | **Critical** | WebKit2GTK OOMs at ~5 xterm instances. With 5 projects × 1 terminal each, we hit the wall immediately. |
+| 2 | Single sidecar = SPOF | **Critical** | One sidecar crash kills all 5 project agents simultaneously. No isolation between projects. |
+| 3 | Layout store has no workspace concept | **Critical** | The v2 layout store (pane-based) cannot represent project groups. Needs a full rewrite, not incremental modification. |
+| 4 | 384px per project unusable on 1920px | **Critical** | 5 projects on a 1920px screen means 384px per project — too narrow for code or agent output. Must adapt to viewport. |
+| 5 | Session identity collision | Major | Without persisting `sdkSessionId`, resuming the wrong session corrupts agent state. Per-project CLAUDE_CONFIG_DIR isolation is also needed. |
+| 6 | JSON config + SQLite = split-brain | Major | Two sources of truth (JSON for config, SQLite for state) can diverge. Must clearly separate what lives where. |
+| 7 | Agent dispatcher has no project scoping | Major | The singleton dispatcher routes all messages globally. Adding projectId to sessions and cleanup on workspace switch is essential. |
+| 8 | Markdown discovery is undefined | Minor | No specification for which markdown files appear in the Docs tab. Needs a priority list and depth limit. |
+| 9 | Keyboard shortcut conflicts | Major | Three input layers (terminal, workspace, app) can conflict. Needs a shortcut manager with explicit precedence. |
+| 10 | Remote machine support orphaned | Major | v2's remote machine UI doesn't map to the project model. Must elevate to project level. |
+| 11 | No graceful degradation for broken projects | Major | If a project's CWD doesn't exist or git is broken, the whole group could fail. Need per-project health states. |
+| 12 | Flat event stream wastes CPU for hidden projects | Minor | Messages for inactive workspace projects still process through adapters. Should buffer and flush on activation. |
+
+**Resolutions:** All 12 issues were resolved before implementation. Critical items (#1-4) were addressed in the architecture. Major items were either implemented in MVP phases or explicitly deferred to v3.1 with documented rationale. See [v3-task_plan.md](v3-task_plan.md) for the full resolution table.

 ### Agent: UX + Performance Specialist
- Wireframes for 5120px (5 projects) and 1920px (3 projects)
- Adaptive project count: `Math.floor(width / 520)`
- xterm budget: lazy-init + scrollback serialization
- RAF batching for 5 concurrent streams
- <100ms workspace switch via serialize/unmount/remount
- Memory budget: ~225MB total (within WebKit2GTK limits)
- Team panel: inline >2560px, overlay <2560px
- Command palette: Ctrl+K, floating overlay, fuzzy search

-## Codebase Reuse Analysis
+The UX specialist provided concrete wireframes and performance budgets:

-### Survives (with modifications)
- TerminalPane.svelte — add suspend/resume lifecycle
- MarkdownPane.svelte — unchanged
- AgentTree.svelte — reused inside ClaudeSession
- ContextPane.svelte — extracted to workspace tab
- StatusBar.svelte — modified for per-project costs
- ToastContainer.svelte — unchanged
- agents.svelte.ts — add projectId field
- theme.svelte.ts — unchanged
- notifications.svelte.ts — unchanged
- All adapters (agent-bridge, pty-bridge, claude-bridge, sdk-messages, session-bridge, ctx-bridge, ssh-bridge)
- All Rust backend (sidecar, pty, session, ctx, watcher)
- highlight.ts, agent-tree.ts utils
+- **Adaptive layout:** `Math.min(projects.length, Math.max(1, Math.floor(containerWidth / 520)))` — 5 projects at 5120px, 3 at 1920px, 1 with scroll at <1600px
+- **xterm.js budget:** 4 active instances max. Suspended terminals serialize scrollback to text, destroy the xterm instance, recreate on focus. PTY stays alive. Suspend/resume cycle < 50ms.
+- **Memory budget:** ~225MB total (4 xterm @ 20MB + Tauri + SQLite + 5 agent stores). Well within WebKit2GTK limits.
+- **Workspace switch performance:** Serialize all xterm scrollbacks, unmount ProjectGrid children, mount new group. Target: <100ms perceived latency (frees ~80MB).
+- **Team panel:** Inline at >2560px viewport (240px wide), overlay at <2560px. Collapsed when no subagents.
+- **Command palette:** Ctrl+K, floating overlay, fuzzy search across commands + groups + projects. 18+ commands across 6 categories.
+- **RAF batching:** For 5 concurrent agent streams, batch DOM updates into requestAnimationFrame frames to avoid layout thrashing.
+
+---
+
+## 2. Provider Adapter Coupling Analysis (2026-03-11)
+
+Before implementing multi-provider support, a systematic coupling analysis mapped every Claude-specific dependency in the codebase. 13+ files were examined and classified into 4 severity levels.
+
+### Coupling Severity Map
+
+**CRITICAL — hardcoded SDK, must abstract:**
+- `sidecar/agent-runner.ts` — imports Claude Agent SDK, calls `query()`, hardcoded `findClaudeCli()`. Must become `claude-runner.ts` with other providers getting separate runners.
+- `bterminal-core/src/sidecar.rs` — `AgentQueryOptions` struct had no `provider` field. `SidecarCommand` hardcoded `agent-runner.mjs` path. Must add provider-based runner selection.
+- `src/lib/adapters/sdk-messages.ts` — `parseMessage()` assumes Claude SDK JSON format. Must become `claude-messages.ts` with per-provider parsers.
+
+**HIGH — TS mirror types, provider-specific commands:**
+- `src/lib/adapters/agent-bridge.ts` — `AgentQueryOptions` interface mirrors Rust struct with no provider field.
+- `src-tauri/src/lib.rs` — `claude_list_profiles`, `claude_list_skills` are Claude-specific commands (kept as-is, gated by capability).
+- `src/lib/adapters/claude-bridge.ts` — provider-specific adapter (kept, genericized via provider-bridge.ts).
+
+**MEDIUM — provider-aware routing:**
+- `src/lib/agent-dispatcher.ts` — calls `parseMessage()` (Claude-specific), subagent tool names hardcoded.
+- `src/lib/components/Agent/AgentPane.svelte` — profile selector, skill autocomplete assume Claude.
+- `ClaudeSession.svelte` — name says "Claude" but logic is mostly generic.
+
+**LOW — already generic:**
+- `agents.svelte.ts` — `AgentMessage` type has no Claude-specific logic.
+- `health.svelte.ts`, `conflicts.svelte.ts` — provider-agnostic health and conflict tracking.
+- `bterminal-relay/` — forwards `AgentQueryOptions` as-is.
+
+### Key Insights from Analysis
+
+1. **Sidecar is the natural abstraction boundary.** Each provider needs its own runner because SDKs are incompatible. The Rust sidecar manager selects which runner to spawn based on the `provider` field.
+
+2. **Message format is the main divergence point.** Claude SDK emits structured JSON (assistant/user/result), Codex uses ThreadEvents, Ollama uses OpenAI-compatible streaming. Per-provider message adapters normalize everything to `AgentMessage`.
+
+3. **Capability flags eliminate provider switches.** Instead of `if (provider === 'claude') showProfiles()`, the UI checks `capabilities.hasProfiles`. Adding a new provider only requires registering its capabilities — zero UI code changes.
+
+4. **Environment variable stripping is provider-specific.** Claude needs `CLAUDE*` vars stripped (nesting detection). Codex needs `CODEX*` stripped. Ollama needs nothing stripped. Extracted to `strip_provider_env_var()` function.
+
+---
+
+## 3. Codebase Reuse Analysis (v2 → v3)
+
+The v3 redesign reused significant portions of the v2 codebase. This analysis determined what could survive, what needed replacement, and what could be dropped entirely.
+
+### Survived (with modifications)
+
+| Component/Module | Modifications |
+|-----------------|---------------|
+| TerminalPane.svelte | Added suspend/resume lifecycle for xterm budget |
+| MarkdownPane.svelte | Unchanged |
+| AgentTree.svelte | Reused inside AgentSession |
+| StatusBar.svelte | Rewritten for workspace store (group name, fleet status, attention queue) |
+| ToastContainer.svelte | Unchanged |
+| agents.svelte.ts | Added projectId field to AgentSession |
+| theme.svelte.ts | Unchanged |
+| notifications.svelte.ts | Unchanged |
+| All adapters | Minor updates for provider routing |
+| All Rust backend | Added new modules (btmsg, bttask, search, secrets, plugins) |
+| highlight.ts, agent-tree.ts | Unchanged |

 ### Replaced
- layout.svelte.ts → workspace.svelte.ts
- TilingGrid.svelte → ProjectGrid.svelte
- PaneContainer.svelte → ProjectBox.svelte
- SessionList.svelte → ProjectHeader + command palette
- SettingsDialog.svelte → SettingsTab.svelte
- AgentPane.svelte → ClaudeSession.svelte + TeamAgentsPanel.svelte
- App.svelte → full rewrite
+
+| v2 Component | v3 Replacement | Reason |
+|-------------|---------------|--------|
+| layout.svelte.ts | workspace.svelte.ts | Pane-based model → project-group model |
+| TilingGrid.svelte | ProjectGrid.svelte | Free-form grid → fixed project boxes |
+| PaneContainer.svelte | ProjectBox.svelte | Generic pane → per-project container with 11 tabs |
+| SessionList.svelte | ProjectHeader + CommandPalette | Sidebar session list → inline headers + Ctrl+K |
+| SettingsDialog.svelte | SettingsTab.svelte | Modal dialog → sidebar drawer tab |
+| AgentPane.svelte | AgentSession + TeamAgentsPanel | Monolithic → split for team support |
+| App.svelte | Full rewrite | Tab bar → VSCode-style sidebar layout |

 ### Dropped (v3.0)
- Detached pane mode (doesn't fit workspace model)
- Drag-resize splitters (project boxes have fixed internal layout)
- Layout presets (1-col, 2-col, etc.) — replaced by adaptive project count
- Remote machine integration (deferred to v3.1, elevated to project level)
+
+| Feature | Reason |
+|---------|--------|
+| Detached pane mode | Doesn't fit workspace model (projects are grouped, not independent) |
+| Drag-resize splitters | Project boxes have fixed internal layout |
+| Layout presets (1-col, 2-col, etc.) | Replaced by adaptive project count from viewport |
+| Remote machine UI integration | Deferred to v3.1 (elevated to project level) |
+
+---
+
+## 4. Session Anchor Design Analysis (2026-03-12)
+
+Session anchors were designed to solve context loss during Claude's automatic context compaction. Research into compaction behavior informed the design.
+
+### Problem
+
+When Claude's context window fills up, the SDK automatically compacts older turns. This compaction is lossy — important early decisions, architecture context, and debugging breakthroughs can be permanently lost.
+
+### Compaction Behavior (Observed)
+
+- Compaction triggers when context exceeds ~80% of model limit
+- The SDK emits a compaction event that the sidecar can observe
+- Compacted turns are summarized, losing granular detail
+- Multiple compaction rounds can occur in long sessions
+
+### Design Decisions
+
+1. **Auto-anchor on first compaction** — The system automatically captures the first 3 turns when compaction is first detected. This preserves the session's initial context (usually the task definition and first architecture decisions).
+
+2. **Observation masking** — Tool outputs (Read results, Bash output) are compacted in anchors, but reasoning text is preserved in full. This dramatically reduces anchor token cost while keeping the important reasoning.
+
+3. **Budget system** — Fixed budget scales (2K/6K/12K/20K tokens) instead of percentage-based. Users understand "6,000 tokens" more intuitively than "15% of context."
+
+4. **Re-injection via system prompt** — Promoted anchors are serialized and injected as the `system_prompt` field. This is the simplest integration point with the SDK and doesn't require modifying the conversation history.
+
+---
+
+## 5. Multi-Agent Orchestration Design (2026-03-11)
+
+Research into multi-agent coordination patterns informed the btmsg/bttask design.
+
+### Evaluated Approaches
+
+| Approach | Pros | Cons | Decision |
+|----------|------|------|----------|
+| Claude Agent Teams (native) | Zero custom code, SDK-managed | Experimental, session resume broken, no custom roles | Supported but not primary |
+| Message bus (Redis/NATS) | Proven, scalable | Runtime dependency, deployment complexity | Rejected |
+| Shared SQLite + CLI tools | Zero deps, agents use shell commands | Polling-based, no real-time push | **Selected** |
+| MCP server for agent comm | Standard protocol | Overhead per message, complex setup | Rejected |
+
+### Why SQLite + CLI
+
+Agents run Claude Code sessions that have full shell access. A Python CLI tool (`btmsg`, `bttask`) that reads/writes SQLite is the lowest-friction integration:
+
+- Agents can use it with zero configuration (just `btmsg send architect "review this"`)
+- No runtime services to manage (no Redis, no MCP server)
+- WAL mode handles concurrent access from multiple agent processes
+- The same database is readable by the Rust backend for UI display
+- Polling-based (5s) is acceptable for coordination — agents don't need millisecond latency
+
+### Role Hierarchy
+
+The 4 Tier 1 roles were chosen based on common development workflows:
+
+- **Manager** — coordinates work, like a tech lead assigning tasks in a sprint
+- **Architect** — designs solutions, like a senior engineer doing design reviews
+- **Tester** — runs tests, like a QA engineer monitoring test suites
+- **Reviewer** — reviews code, like a reviewer processing a PR queue
+
+Each role has unique tabs (Task board for Manager, PlantUML for Architect, Selenium for Tester, Review queue for Reviewer) and unique bttask permissions (Manager has full CRUD, others are read-only with comments).
+
+---
+
+## 6. Theme System Evolution (2026-03-07)
+
+### Original: 4 Catppuccin Flavors
+
+v2 launched with 4 Catppuccin flavors (Mocha, Macchiato, Frappé, Latte). All colors mapped to 26 `--ctp-*` CSS custom properties.
+
+### Extension: 7 Editor Themes
+
+Added VSCode Dark+, Atom One Dark, Monokai, Dracula, Nord, Solarized Dark, GitHub Dark. Each theme maps to the same 26 `--ctp-*` variables — zero component changes needed. The `CatppuccinFlavor` type was generalized to `ThemeId` union type. Deprecated wrapper functions maintain backward compatibility.
+
+### Extension: 6 Deep Dark Themes
+
+Added Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper (warm dark), Midnight (pure OLED black). Same 26-variable mapping.
+
+### Key Design Decision
+
+By mapping all 17 themes to the same CSS custom property names, no component ever needs to know which theme is active. This makes adding new themes a pure data operation — define 26 color values and add to `THEME_LIST`. The `ThemeMeta` type includes group metadata for the custom themed dropdown in SettingsTab.
+
+---
+
+## 7. Performance Findings
+
+### xterm.js Canvas Performance
+
+WebKit2GTK lacks WebGL, so xterm.js falls back to Canvas 2D rendering. Testing showed:
+- **Latency:** ~20-30ms per keystroke (acceptable for AI output, not ideal for vim)
+- **Memory:** ~20MB per active instance
+- **OOM threshold:** ~5 simultaneous instances causes WebKit2GTK to crash
+- **Mitigation:** 4-instance budget with suspend/resume for inactive terminals
+
+### Tauri IPC Latency
+
+- **Linux:** ~5ms for typical payloads (serialization-free IPC in Tauri 2.x)
+- **Terminal keystroke echo:** 5ms IPC + xterm.js render ≈ 10-15ms total
+- **Agent message forwarding:** Negligible — agent output arrives at human-readable speed
+
+### SQLite WAL Concurrent Access
+
+Both sessions.db and btmsg.db are accessed concurrently by:
+- Rust backend (Tauri commands)
+- Python CLI tools (btmsg, bttask from agent shells)
+- Frontend reads via IPC
+
+WAL mode with 5s busy_timeout handles this reliably. The 5-minute checkpoint prevents WAL file growth.
+
+### Workspace Switch Latency
+
+Measured during v3 development:
+- Serialize 4 xterm scrollbacks: ~30ms
+- Destroy 4 xterm instances: ~10ms
+- Unmount ProjectGrid children: ~5ms
+- Mount new group's ProjectGrid: ~20ms
+- Create new xterm instances: ~35ms
+- **Total perceived:** ~100ms (acceptable)