Hibryda 9a295c224c docs: expand README index and v3-findings with deep research content

README.md: from 42-line index to rich documentation hub with project
overview, reading order, and key directory listing.
v3-findings.md: from 63 lines to comprehensive research findings covering
adversarial review details, provider coupling analysis, codebase reuse,
session anchor design, multi-agent design rationale, theme evolution,
and performance measurements.

2026-03-14 02:33:59 +01:00

15 KiB

Raw Blame History

BTerminal v3 — Research Findings

1. Adversarial Architecture Review (2026-03-07)

Three specialized agents reviewed the v3 Mission Control architecture before implementation began. This adversarial process caught 12 issues (4 critical) that would have required expensive rework if discovered later.

Agent: Architect (Advocate)

The Architect proposed the core design:

Project Groups as the primary organizational unit (replacing free-form panes)
JSON config (groups.json) for human-editable group/project definitions, SQLite for runtime state
Single shared sidecar with per-project isolation via cwd, claude_config_dir, and session_id
Component split: AgentPane → AgentSession + TeamAgentsPanel (subagents shown inline, not as separate panes)
New SQLite tables: agent_messages (per-project message persistence), project_agent_state (sdkSessionId, cost, status)
MVP boundary at Phase 5 (5 phases for core, 5 for polish)
10-phase implementation plan covering data model, shell, session integration, terminals, team panel, continuity, palette, docs, settings, cleanup

Agent: Devil's Advocate

The Devil's Advocate found 12 issues across the Architect's proposal:

#	Issue	Severity	Why It Matters
1	xterm.js 4-instance ceiling	Critical	WebKit2GTK OOMs at ~5 xterm instances. With 5 projects × 1 terminal each, we hit the wall immediately.
2	Single sidecar = SPOF	Critical	One sidecar crash kills all 5 project agents simultaneously. No isolation between projects.
3	Layout store has no workspace concept	Critical	The v2 layout store (pane-based) cannot represent project groups. Needs a full rewrite, not incremental modification.
4	384px per project unusable on 1920px	Critical	5 projects on a 1920px screen means 384px per project — too narrow for code or agent output. Must adapt to viewport.
5	Session identity collision	Major	Without persisting `sdkSessionId`, resuming the wrong session corrupts agent state. Per-project CLAUDE_CONFIG_DIR isolation is also needed.
6	JSON config + SQLite = split-brain	Major	Two sources of truth (JSON for config, SQLite for state) can diverge. Must clearly separate what lives where.
7	Agent dispatcher has no project scoping	Major	The singleton dispatcher routes all messages globally. Adding projectId to sessions and cleanup on workspace switch is essential.
8	Markdown discovery is undefined	Minor	No specification for which markdown files appear in the Docs tab. Needs a priority list and depth limit.
9	Keyboard shortcut conflicts	Major	Three input layers (terminal, workspace, app) can conflict. Needs a shortcut manager with explicit precedence.
10	Remote machine support orphaned	Major	v2's remote machine UI doesn't map to the project model. Must elevate to project level.
11	No graceful degradation for broken projects	Major	If a project's CWD doesn't exist or git is broken, the whole group could fail. Need per-project health states.
12	Flat event stream wastes CPU for hidden projects	Minor	Messages for inactive workspace projects still process through adapters. Should buffer and flush on activation.

Resolutions: All 12 issues were resolved before implementation. Critical items (#1-4) were addressed in the architecture. Major items were either implemented in MVP phases or explicitly deferred to v3.1 with documented rationale. See v3-task_plan.md for the full resolution table.

Agent: UX + Performance Specialist

The UX specialist provided concrete wireframes and performance budgets:

Adaptive layout: Math.min(projects.length, Math.max(1, Math.floor(containerWidth / 520))) — 5 projects at 5120px, 3 at 1920px, 1 with scroll at <1600px
xterm.js budget: 4 active instances max. Suspended terminals serialize scrollback to text, destroy the xterm instance, recreate on focus. PTY stays alive. Suspend/resume cycle < 50ms.
Memory budget: ~225MB total (4 xterm @ 20MB + Tauri + SQLite + 5 agent stores). Well within WebKit2GTK limits.
Workspace switch performance: Serialize all xterm scrollbacks, unmount ProjectGrid children, mount new group. Target: <100ms perceived latency (frees ~80MB).
Team panel: Inline at >2560px viewport (240px wide), overlay at <2560px. Collapsed when no subagents.
Command palette: Ctrl+K, floating overlay, fuzzy search across commands + groups + projects. 18+ commands across 6 categories.
RAF batching: For 5 concurrent agent streams, batch DOM updates into requestAnimationFrame frames to avoid layout thrashing.

2. Provider Adapter Coupling Analysis (2026-03-11)

Before implementing multi-provider support, a systematic coupling analysis mapped every Claude-specific dependency in the codebase. 13+ files were examined and classified into 4 severity levels.

Coupling Severity Map

CRITICAL — hardcoded SDK, must abstract:

sidecar/agent-runner.ts — imports Claude Agent SDK, calls query(), hardcoded findClaudeCli(). Must become claude-runner.ts with other providers getting separate runners.
bterminal-core/src/sidecar.rs — AgentQueryOptions struct had no provider field. SidecarCommand hardcoded agent-runner.mjs path. Must add provider-based runner selection.
src/lib/adapters/sdk-messages.ts — parseMessage() assumes Claude SDK JSON format. Must become claude-messages.ts with per-provider parsers.

HIGH — TS mirror types, provider-specific commands:

src/lib/adapters/agent-bridge.ts — AgentQueryOptions interface mirrors Rust struct with no provider field.
src-tauri/src/lib.rs — claude_list_profiles, claude_list_skills are Claude-specific commands (kept as-is, gated by capability).
src/lib/adapters/claude-bridge.ts — provider-specific adapter (kept, genericized via provider-bridge.ts).

MEDIUM — provider-aware routing:

src/lib/agent-dispatcher.ts — calls parseMessage() (Claude-specific), subagent tool names hardcoded.
src/lib/components/Agent/AgentPane.svelte — profile selector, skill autocomplete assume Claude.
ClaudeSession.svelte — name says "Claude" but logic is mostly generic.

LOW — already generic:

agents.svelte.ts — AgentMessage type has no Claude-specific logic.
health.svelte.ts, conflicts.svelte.ts — provider-agnostic health and conflict tracking.
bterminal-relay/ — forwards AgentQueryOptions as-is.

Key Insights from Analysis

Sidecar is the natural abstraction boundary. Each provider needs its own runner because SDKs are incompatible. The Rust sidecar manager selects which runner to spawn based on the provider field.
Message format is the main divergence point. Claude SDK emits structured JSON (assistant/user/result), Codex uses ThreadEvents, Ollama uses OpenAI-compatible streaming. Per-provider message adapters normalize everything to AgentMessage.
Capability flags eliminate provider switches. Instead of if (provider === 'claude') showProfiles(), the UI checks capabilities.hasProfiles. Adding a new provider only requires registering its capabilities — zero UI code changes.
Environment variable stripping is provider-specific. Claude needs CLAUDE* vars stripped (nesting detection). Codex needs CODEX* stripped. Ollama needs nothing stripped. Extracted to strip_provider_env_var() function.

3. Codebase Reuse Analysis (v2 → v3)

The v3 redesign reused significant portions of the v2 codebase. This analysis determined what could survive, what needed replacement, and what could be dropped entirely.

Survived (with modifications)

Component/Module	Modifications
TerminalPane.svelte	Added suspend/resume lifecycle for xterm budget
MarkdownPane.svelte	Unchanged
AgentTree.svelte	Reused inside AgentSession
StatusBar.svelte	Rewritten for workspace store (group name, fleet status, attention queue)
ToastContainer.svelte	Unchanged
agents.svelte.ts	Added projectId field to AgentSession
theme.svelte.ts	Unchanged
notifications.svelte.ts	Unchanged
All adapters	Minor updates for provider routing
All Rust backend	Added new modules (btmsg, bttask, search, secrets, plugins)
highlight.ts, agent-tree.ts	Unchanged

Replaced

v2 Component	v3 Replacement	Reason
layout.svelte.ts	workspace.svelte.ts	Pane-based model → project-group model
TilingGrid.svelte	ProjectGrid.svelte	Free-form grid → fixed project boxes
PaneContainer.svelte	ProjectBox.svelte	Generic pane → per-project container with 11 tabs
SessionList.svelte	ProjectHeader + CommandPalette	Sidebar session list → inline headers + Ctrl+K
SettingsDialog.svelte	SettingsTab.svelte	Modal dialog → sidebar drawer tab
AgentPane.svelte	AgentSession + TeamAgentsPanel	Monolithic → split for team support
App.svelte	Full rewrite	Tab bar → VSCode-style sidebar layout

Dropped (v3.0)

Feature	Reason
Detached pane mode	Doesn't fit workspace model (projects are grouped, not independent)
Drag-resize splitters	Project boxes have fixed internal layout
Layout presets (1-col, 2-col, etc.)	Replaced by adaptive project count from viewport
Remote machine UI integration	Deferred to v3.1 (elevated to project level)

4. Session Anchor Design Analysis (2026-03-12)

Session anchors were designed to solve context loss during Claude's automatic context compaction. Research into compaction behavior informed the design.

Problem

When Claude's context window fills up, the SDK automatically compacts older turns. This compaction is lossy — important early decisions, architecture context, and debugging breakthroughs can be permanently lost.

Compaction Behavior (Observed)

Compaction triggers when context exceeds ~80% of model limit
The SDK emits a compaction event that the sidecar can observe
Compacted turns are summarized, losing granular detail
Multiple compaction rounds can occur in long sessions

Design Decisions

Auto-anchor on first compaction — The system automatically captures the first 3 turns when compaction is first detected. This preserves the session's initial context (usually the task definition and first architecture decisions).
Observation masking — Tool outputs (Read results, Bash output) are compacted in anchors, but reasoning text is preserved in full. This dramatically reduces anchor token cost while keeping the important reasoning.
Budget system — Fixed budget scales (2K/6K/12K/20K tokens) instead of percentage-based. Users understand "6,000 tokens" more intuitively than "15% of context."
Re-injection via system prompt — Promoted anchors are serialized and injected as the system_prompt field. This is the simplest integration point with the SDK and doesn't require modifying the conversation history.

5. Multi-Agent Orchestration Design (2026-03-11)

Research into multi-agent coordination patterns informed the btmsg/bttask design.

Evaluated Approaches

Approach	Pros	Cons	Decision
Claude Agent Teams (native)	Zero custom code, SDK-managed	Experimental, session resume broken, no custom roles	Supported but not primary
Message bus (Redis/NATS)	Proven, scalable	Runtime dependency, deployment complexity	Rejected
Shared SQLite + CLI tools	Zero deps, agents use shell commands	Polling-based, no real-time push	Selected
MCP server for agent comm	Standard protocol	Overhead per message, complex setup	Rejected

Why SQLite + CLI

Agents run Claude Code sessions that have full shell access. A Python CLI tool (btmsg, bttask) that reads/writes SQLite is the lowest-friction integration:

Agents can use it with zero configuration (just btmsg send architect "review this")
No runtime services to manage (no Redis, no MCP server)
WAL mode handles concurrent access from multiple agent processes
The same database is readable by the Rust backend for UI display
Polling-based (5s) is acceptable for coordination — agents don't need millisecond latency

Role Hierarchy

The 4 Tier 1 roles were chosen based on common development workflows:

Manager — coordinates work, like a tech lead assigning tasks in a sprint
Architect — designs solutions, like a senior engineer doing design reviews
Tester — runs tests, like a QA engineer monitoring test suites
Reviewer — reviews code, like a reviewer processing a PR queue

Each role has unique tabs (Task board for Manager, PlantUML for Architect, Selenium for Tester, Review queue for Reviewer) and unique bttask permissions (Manager has full CRUD, others are read-only with comments).

6. Theme System Evolution (2026-03-07)

Original: 4 Catppuccin Flavors

v2 launched with 4 Catppuccin flavors (Mocha, Macchiato, Frappé, Latte). All colors mapped to 26 --ctp-* CSS custom properties.

Extension: 7 Editor Themes

Added VSCode Dark+, Atom One Dark, Monokai, Dracula, Nord, Solarized Dark, GitHub Dark. Each theme maps to the same 26 --ctp-* variables — zero component changes needed. The CatppuccinFlavor type was generalized to ThemeId union type. Deprecated wrapper functions maintain backward compatibility.

Extension: 6 Deep Dark Themes

Added Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper (warm dark), Midnight (pure OLED black). Same 26-variable mapping.

Key Design Decision

By mapping all 17 themes to the same CSS custom property names, no component ever needs to know which theme is active. This makes adding new themes a pure data operation — define 26 color values and add to THEME_LIST. The ThemeMeta type includes group metadata for the custom themed dropdown in SettingsTab.

7. Performance Findings

xterm.js Canvas Performance

WebKit2GTK lacks WebGL, so xterm.js falls back to Canvas 2D rendering. Testing showed:

Latency: ~20-30ms per keystroke (acceptable for AI output, not ideal for vim)
Memory: ~20MB per active instance
OOM threshold: ~5 simultaneous instances causes WebKit2GTK to crash
Mitigation: 4-instance budget with suspend/resume for inactive terminals

Tauri IPC Latency

Linux: ~5ms for typical payloads (serialization-free IPC in Tauri 2.x)
Terminal keystroke echo: 5ms IPC + xterm.js render ≈ 10-15ms total
Agent message forwarding: Negligible — agent output arrives at human-readable speed

SQLite WAL Concurrent Access

Both sessions.db and btmsg.db are accessed concurrently by:

Rust backend (Tauri commands)
Python CLI tools (btmsg, bttask from agent shells)
Frontend reads via IPC

WAL mode with 5s busy_timeout handles this reliably. The 5-minute checkpoint prevents WAL file growth.

Workspace Switch Latency

Measured during v3 development:

Serialize 4 xterm scrollbacks: ~30ms
Destroy 4 xterm instances: ~10ms
Unmount ProjectGrid children: ~5ms
Mount new group's ProjectGrid: ~20ms
Create new xterm instances: ~35ms
Total perceived: ~100ms (acceptable)

15 KiB Raw Blame History Unescape Escape