agent-orchestrator/docs/v3-findings.md
Hibryda 9a295c224c docs: expand README index and v3-findings with deep research content
README.md: from 42-line index to rich documentation hub with project
overview, reading order, and key directory listing.
v3-findings.md: from 63 lines to comprehensive research findings covering
adversarial review details, provider coupling analysis, codebase reuse,
session anchor design, multi-agent design rationale, theme evolution,
and performance measurements.
2026-03-14 02:33:59 +01:00

15 KiB
Raw Blame History

BTerminal v3 — Research Findings

1. Adversarial Architecture Review (2026-03-07)

Three specialized agents reviewed the v3 Mission Control architecture before implementation began. This adversarial process caught 12 issues (4 critical) that would have required expensive rework if discovered later.

Agent: Architect (Advocate)

The Architect proposed the core design:

  • Project Groups as the primary organizational unit (replacing free-form panes)
  • JSON config (groups.json) for human-editable group/project definitions, SQLite for runtime state
  • Single shared sidecar with per-project isolation via cwd, claude_config_dir, and session_id
  • Component split: AgentPane → AgentSession + TeamAgentsPanel (subagents shown inline, not as separate panes)
  • New SQLite tables: agent_messages (per-project message persistence), project_agent_state (sdkSessionId, cost, status)
  • MVP boundary at Phase 5 (5 phases for core, 5 for polish)
  • 10-phase implementation plan covering data model, shell, session integration, terminals, team panel, continuity, palette, docs, settings, cleanup

Agent: Devil's Advocate

The Devil's Advocate found 12 issues across the Architect's proposal:

# Issue Severity Why It Matters
1 xterm.js 4-instance ceiling Critical WebKit2GTK OOMs at ~5 xterm instances. With 5 projects × 1 terminal each, we hit the wall immediately.
2 Single sidecar = SPOF Critical One sidecar crash kills all 5 project agents simultaneously. No isolation between projects.
3 Layout store has no workspace concept Critical The v2 layout store (pane-based) cannot represent project groups. Needs a full rewrite, not incremental modification.
4 384px per project unusable on 1920px Critical 5 projects on a 1920px screen means 384px per project — too narrow for code or agent output. Must adapt to viewport.
5 Session identity collision Major Without persisting sdkSessionId, resuming the wrong session corrupts agent state. Per-project CLAUDE_CONFIG_DIR isolation is also needed.
6 JSON config + SQLite = split-brain Major Two sources of truth (JSON for config, SQLite for state) can diverge. Must clearly separate what lives where.
7 Agent dispatcher has no project scoping Major The singleton dispatcher routes all messages globally. Adding projectId to sessions and cleanup on workspace switch is essential.
8 Markdown discovery is undefined Minor No specification for which markdown files appear in the Docs tab. Needs a priority list and depth limit.
9 Keyboard shortcut conflicts Major Three input layers (terminal, workspace, app) can conflict. Needs a shortcut manager with explicit precedence.
10 Remote machine support orphaned Major v2's remote machine UI doesn't map to the project model. Must elevate to project level.
11 No graceful degradation for broken projects Major If a project's CWD doesn't exist or git is broken, the whole group could fail. Need per-project health states.
12 Flat event stream wastes CPU for hidden projects Minor Messages for inactive workspace projects still process through adapters. Should buffer and flush on activation.

Resolutions: All 12 issues were resolved before implementation. Critical items (#1-4) were addressed in the architecture. Major items were either implemented in MVP phases or explicitly deferred to v3.1 with documented rationale. See v3-task_plan.md for the full resolution table.

Agent: UX + Performance Specialist

The UX specialist provided concrete wireframes and performance budgets:

  • Adaptive layout: Math.min(projects.length, Math.max(1, Math.floor(containerWidth / 520))) — 5 projects at 5120px, 3 at 1920px, 1 with scroll at <1600px
  • xterm.js budget: 4 active instances max. Suspended terminals serialize scrollback to text, destroy the xterm instance, recreate on focus. PTY stays alive. Suspend/resume cycle < 50ms.
  • Memory budget: ~225MB total (4 xterm @ 20MB + Tauri + SQLite + 5 agent stores). Well within WebKit2GTK limits.
  • Workspace switch performance: Serialize all xterm scrollbacks, unmount ProjectGrid children, mount new group. Target: <100ms perceived latency (frees ~80MB).
  • Team panel: Inline at >2560px viewport (240px wide), overlay at <2560px. Collapsed when no subagents.
  • Command palette: Ctrl+K, floating overlay, fuzzy search across commands + groups + projects. 18+ commands across 6 categories.
  • RAF batching: For 5 concurrent agent streams, batch DOM updates into requestAnimationFrame frames to avoid layout thrashing.

2. Provider Adapter Coupling Analysis (2026-03-11)

Before implementing multi-provider support, a systematic coupling analysis mapped every Claude-specific dependency in the codebase. 13+ files were examined and classified into 4 severity levels.

Coupling Severity Map

CRITICAL — hardcoded SDK, must abstract:

  • sidecar/agent-runner.ts — imports Claude Agent SDK, calls query(), hardcoded findClaudeCli(). Must become claude-runner.ts with other providers getting separate runners.
  • bterminal-core/src/sidecar.rsAgentQueryOptions struct had no provider field. SidecarCommand hardcoded agent-runner.mjs path. Must add provider-based runner selection.
  • src/lib/adapters/sdk-messages.tsparseMessage() assumes Claude SDK JSON format. Must become claude-messages.ts with per-provider parsers.

HIGH — TS mirror types, provider-specific commands:

  • src/lib/adapters/agent-bridge.tsAgentQueryOptions interface mirrors Rust struct with no provider field.
  • src-tauri/src/lib.rsclaude_list_profiles, claude_list_skills are Claude-specific commands (kept as-is, gated by capability).
  • src/lib/adapters/claude-bridge.ts — provider-specific adapter (kept, genericized via provider-bridge.ts).

MEDIUM — provider-aware routing:

  • src/lib/agent-dispatcher.ts — calls parseMessage() (Claude-specific), subagent tool names hardcoded.
  • src/lib/components/Agent/AgentPane.svelte — profile selector, skill autocomplete assume Claude.
  • ClaudeSession.svelte — name says "Claude" but logic is mostly generic.

LOW — already generic:

  • agents.svelte.tsAgentMessage type has no Claude-specific logic.
  • health.svelte.ts, conflicts.svelte.ts — provider-agnostic health and conflict tracking.
  • bterminal-relay/ — forwards AgentQueryOptions as-is.

Key Insights from Analysis

  1. Sidecar is the natural abstraction boundary. Each provider needs its own runner because SDKs are incompatible. The Rust sidecar manager selects which runner to spawn based on the provider field.

  2. Message format is the main divergence point. Claude SDK emits structured JSON (assistant/user/result), Codex uses ThreadEvents, Ollama uses OpenAI-compatible streaming. Per-provider message adapters normalize everything to AgentMessage.

  3. Capability flags eliminate provider switches. Instead of if (provider === 'claude') showProfiles(), the UI checks capabilities.hasProfiles. Adding a new provider only requires registering its capabilities — zero UI code changes.

  4. Environment variable stripping is provider-specific. Claude needs CLAUDE* vars stripped (nesting detection). Codex needs CODEX* stripped. Ollama needs nothing stripped. Extracted to strip_provider_env_var() function.


3. Codebase Reuse Analysis (v2 → v3)

The v3 redesign reused significant portions of the v2 codebase. This analysis determined what could survive, what needed replacement, and what could be dropped entirely.

Survived (with modifications)

Component/Module Modifications
TerminalPane.svelte Added suspend/resume lifecycle for xterm budget
MarkdownPane.svelte Unchanged
AgentTree.svelte Reused inside AgentSession
StatusBar.svelte Rewritten for workspace store (group name, fleet status, attention queue)
ToastContainer.svelte Unchanged
agents.svelte.ts Added projectId field to AgentSession
theme.svelte.ts Unchanged
notifications.svelte.ts Unchanged
All adapters Minor updates for provider routing
All Rust backend Added new modules (btmsg, bttask, search, secrets, plugins)
highlight.ts, agent-tree.ts Unchanged

Replaced

v2 Component v3 Replacement Reason
layout.svelte.ts workspace.svelte.ts Pane-based model → project-group model
TilingGrid.svelte ProjectGrid.svelte Free-form grid → fixed project boxes
PaneContainer.svelte ProjectBox.svelte Generic pane → per-project container with 11 tabs
SessionList.svelte ProjectHeader + CommandPalette Sidebar session list → inline headers + Ctrl+K
SettingsDialog.svelte SettingsTab.svelte Modal dialog → sidebar drawer tab
AgentPane.svelte AgentSession + TeamAgentsPanel Monolithic → split for team support
App.svelte Full rewrite Tab bar → VSCode-style sidebar layout

Dropped (v3.0)

Feature Reason
Detached pane mode Doesn't fit workspace model (projects are grouped, not independent)
Drag-resize splitters Project boxes have fixed internal layout
Layout presets (1-col, 2-col, etc.) Replaced by adaptive project count from viewport
Remote machine UI integration Deferred to v3.1 (elevated to project level)

4. Session Anchor Design Analysis (2026-03-12)

Session anchors were designed to solve context loss during Claude's automatic context compaction. Research into compaction behavior informed the design.

Problem

When Claude's context window fills up, the SDK automatically compacts older turns. This compaction is lossy — important early decisions, architecture context, and debugging breakthroughs can be permanently lost.

Compaction Behavior (Observed)

  • Compaction triggers when context exceeds ~80% of model limit
  • The SDK emits a compaction event that the sidecar can observe
  • Compacted turns are summarized, losing granular detail
  • Multiple compaction rounds can occur in long sessions

Design Decisions

  1. Auto-anchor on first compaction — The system automatically captures the first 3 turns when compaction is first detected. This preserves the session's initial context (usually the task definition and first architecture decisions).

  2. Observation masking — Tool outputs (Read results, Bash output) are compacted in anchors, but reasoning text is preserved in full. This dramatically reduces anchor token cost while keeping the important reasoning.

  3. Budget system — Fixed budget scales (2K/6K/12K/20K tokens) instead of percentage-based. Users understand "6,000 tokens" more intuitively than "15% of context."

  4. Re-injection via system prompt — Promoted anchors are serialized and injected as the system_prompt field. This is the simplest integration point with the SDK and doesn't require modifying the conversation history.


5. Multi-Agent Orchestration Design (2026-03-11)

Research into multi-agent coordination patterns informed the btmsg/bttask design.

Evaluated Approaches

Approach Pros Cons Decision
Claude Agent Teams (native) Zero custom code, SDK-managed Experimental, session resume broken, no custom roles Supported but not primary
Message bus (Redis/NATS) Proven, scalable Runtime dependency, deployment complexity Rejected
Shared SQLite + CLI tools Zero deps, agents use shell commands Polling-based, no real-time push Selected
MCP server for agent comm Standard protocol Overhead per message, complex setup Rejected

Why SQLite + CLI

Agents run Claude Code sessions that have full shell access. A Python CLI tool (btmsg, bttask) that reads/writes SQLite is the lowest-friction integration:

  • Agents can use it with zero configuration (just btmsg send architect "review this")
  • No runtime services to manage (no Redis, no MCP server)
  • WAL mode handles concurrent access from multiple agent processes
  • The same database is readable by the Rust backend for UI display
  • Polling-based (5s) is acceptable for coordination — agents don't need millisecond latency

Role Hierarchy

The 4 Tier 1 roles were chosen based on common development workflows:

  • Manager — coordinates work, like a tech lead assigning tasks in a sprint
  • Architect — designs solutions, like a senior engineer doing design reviews
  • Tester — runs tests, like a QA engineer monitoring test suites
  • Reviewer — reviews code, like a reviewer processing a PR queue

Each role has unique tabs (Task board for Manager, PlantUML for Architect, Selenium for Tester, Review queue for Reviewer) and unique bttask permissions (Manager has full CRUD, others are read-only with comments).


6. Theme System Evolution (2026-03-07)

Original: 4 Catppuccin Flavors

v2 launched with 4 Catppuccin flavors (Mocha, Macchiato, Frappé, Latte). All colors mapped to 26 --ctp-* CSS custom properties.

Extension: 7 Editor Themes

Added VSCode Dark+, Atom One Dark, Monokai, Dracula, Nord, Solarized Dark, GitHub Dark. Each theme maps to the same 26 --ctp-* variables — zero component changes needed. The CatppuccinFlavor type was generalized to ThemeId union type. Deprecated wrapper functions maintain backward compatibility.

Extension: 6 Deep Dark Themes

Added Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper (warm dark), Midnight (pure OLED black). Same 26-variable mapping.

Key Design Decision

By mapping all 17 themes to the same CSS custom property names, no component ever needs to know which theme is active. This makes adding new themes a pure data operation — define 26 color values and add to THEME_LIST. The ThemeMeta type includes group metadata for the custom themed dropdown in SettingsTab.


7. Performance Findings

xterm.js Canvas Performance

WebKit2GTK lacks WebGL, so xterm.js falls back to Canvas 2D rendering. Testing showed:

  • Latency: ~20-30ms per keystroke (acceptable for AI output, not ideal for vim)
  • Memory: ~20MB per active instance
  • OOM threshold: ~5 simultaneous instances causes WebKit2GTK to crash
  • Mitigation: 4-instance budget with suspend/resume for inactive terminals

Tauri IPC Latency

  • Linux: ~5ms for typical payloads (serialization-free IPC in Tauri 2.x)
  • Terminal keystroke echo: 5ms IPC + xterm.js render ≈ 10-15ms total
  • Agent message forwarding: Negligible — agent output arrives at human-readable speed

SQLite WAL Concurrent Access

Both sessions.db and btmsg.db are accessed concurrently by:

  • Rust backend (Tauri commands)
  • Python CLI tools (btmsg, bttask from agent shells)
  • Frontend reads via IPC

WAL mode with 5s busy_timeout handles this reliably. The 5-minute checkpoint prevents WAL file growth.

Workspace Switch Latency

Measured during v3 development:

  • Serialize 4 xterm scrollbacks: ~30ms
  • Destroy 4 xterm instances: ~10ms
  • Unmount ProjectGrid children: ~5ms
  • Mount new group's ProjectGrid: ~20ms
  • Create new xterm instances: ~35ms
  • Total perceived: ~100ms (acceptable)