Hibryda 356660f17d docs: reconcile hib_changes onto flat structure

Bring over comprehensive documentation, CLI tools, and project
scaffolding from the archived v2/ branch onto the rebuilt flat
main. All v2/ path references updated to match flat layout.

- docs/: architecture, decisions, phases, progress, findings, etc.
- docker/tempo: telemetry stack (Grafana + Tempo)
- CLAUDE.md, .claude/CLAUDE.md: comprehensive project guides
- CHANGELOG.md, TODO.md, README.md: project meta
- consult, ctx: CLI tools
- .gitignore: merged entries from both branches

2026-03-16 03:34:04 +01:00

17 KiB

Raw Blame History

Research Findings

This document captures research conducted during v2 and v3 development — technology evaluations, architecture reviews, performance measurements, and design analysis. Each finding informed implementation decisions recorded in decisions.md.

1. Claude Agent SDK (v2 Research, 2026-03-05)

Source: https://platform.claude.com/docs/en/agent-sdk/overview

The Claude Agent SDK (formerly Claude Code SDK, renamed Sept 2025) provides structured streaming, subagent detection, hooks, and telemetry — everything needed for a rich agent UI without terminal emulation.

Streaming API

import { query } from "@anthropic-ai/claude-agent-sdk";

for await (const message of query({
  prompt: "Fix the bug",
  options: { allowedTools: ["Read", "Edit", "Bash"] }
})) {
  console.log(message);  // structured, typed, parseable
}

Subagent Detection

Messages from subagents include parent_tool_use_id:

for (const block of msg.message?.content ?? []) {
  if (block.type === "tool_use" && block.name === "Task") {
    console.log(`Subagent invoked: ${block.input.subagent_type}`);
  }
}
if (msg.parent_tool_use_id) {
  console.log("Running inside subagent");
}

Session Management

session_id captured from init message
Resume with options: { resume: sessionId }
Subagent transcripts persist independently

Hooks

PreToolUse, PostToolUse, Stop, SessionStart, SessionEnd, UserPromptSubmit

Telemetry

Every SDKResultMessage contains: total_cost_usd, duration_ms, per-model modelUsage breakdowns.

Key Insight

The SDK gives structured data — we render it as rich UI (markdown, diff views, file cards, agent trees) instead of raw terminal text. Terminal emulation (xterm.js) is only needed for SSH, local shell, and legacy CLI sessions.

2. Tauri + xterm.js Integration (v2 Research, 2026-03-05)

Existing Projects

tauri-terminal — basic Tauri + xterm.js + portable-pty
Terminon — Tauri v2 + React + xterm.js, SSH profiles, split panes
tauri-plugin-pty — PTY plugin for Tauri 2, xterm.js bridge

Integration Pattern

Frontend (xterm.js) <-> Tauri IPC <-> Rust PTY (portable-pty) <-> Shell/SSH/Claude

pty.onData() -> term.write() (output)
term.onData() -> pty.write() (input)

3. Terminal Performance Benchmarks (v2 Research, 2026-03-05)

Native Terminal Latency

Terminal	Latency	Notes
xterm (native)	~10ms	Gold standard
Alacritty	~12ms	GPU-rendered Rust
Kitty	~13ms	GPU-rendered
VTE (GNOME Terminal)	~50ms	GTK3/4, spikes above
Hyper (Electron+xterm.js)	~40ms	Web-based worst case

Memory

Alacritty: ~30MB, WezTerm: ~45MB, xterm native: ~5MB

Verdict

xterm.js in Tauri: ~20-30ms latency, ~20MB per instance. For AI output (not vim), perfectly fine. The VTE we used in v1 GTK3 is actually slower at ~50ms.

4. Zellij Architecture (v2 Inspiration, 2026-03-05)

Zellij uses WASM plugins for extensibility: message passing at WASM boundary, permission model, event types for rendering/input/lifecycle, KDL layout files.

Relevance: We don't need WASM plugins — our "plugins" are different pane types. But the layout concept (JSON layout definitions) is worth borrowing for saved layouts.

5. Ultrawide Design Patterns (v2 Research, 2026-03-05)

Key Insight: 5120px width / ~600px per pane = ~8 panes max, ~4-5 comfortable.

Layout Philosophy:

Center = primary attention (1-2 main agent panes)
Left edge = navigation (sidebar, 250-300px)
Right edge = context (agent tree, file viewer, 350-450px)
Never use tabs for primary content — everything visible
Tabs only for switching saved layouts

6. Frontend Framework Choice (v2 Research, 2026-03-05)

Why Svelte 5

Fine-grained reactivity — $state/$derived runes match Solid's signals model
No VDOM — critical when 4-8 panes stream data simultaneously
Small bundle — ~5KB runtime vs React's ~40KB
Larger ecosystem than Solid.js — more component libraries, better tooling

Why NOT Solid.js (initially considered)

Ecosystem too small for production use
Svelte 5 runes eliminated the ceremony gap

Why NOT React

VDOM reconciliation across 4-8 simultaneously updating panes = CPU waste
Larger bundle, state management complexity (Redux/Zustand needed)

7. Claude Code CLI Observation (v2 Research, 2026-03-05)

Three observation tiers for Claude sessions:

SDK sessions (best): Full structured streaming, subagent detection, hooks, cost tracking
CLI with stream-json (good): claude -p "prompt" --output-format stream-json — structured output but non-interactive
Interactive CLI (fallback): Tail JSONL session files at ~/.claude/projects/<encoded-dir>/<session-uuid>.jsonl + show terminal via xterm.js

JSONL Session Files

Path encoding: /home/user/project -> -home-user-project. Append-only, written immediately. Can be tail -f'd for external observation.

Hooks (SDK only)

SubagentStart, SubagentStop (gives agent_transcript_path), PreToolUse, PostToolUse, Stop, Notification, TeammateIdle

8. Agent Teams (v2 Research, 2026-03-05)

CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 enables full independent Claude Code instances sharing a task list and mailbox.

3-5 teammates is the practical sweet spot (linear token cost)
Display modes: in-process (Shift+Down cycles), tmux (own pane each), auto
Session resumption is broken for in-process teammates
Agent Orchestrator is the ideal frontend for Agent Teams — each teammate gets its own ProjectBox

9. Competing Approaches (v2 Research, 2026-03-05)

claude-squad (Go+tmux): Most adopted multi-agent manager
agent-deck: MCP socket pooling (~85-90% memory savings)
Git worktrees: Dominant isolation strategy for parallel Claude sessions

10. Adversarial Architecture Review (v3, 2026-03-07)

Three specialized agents reviewed the v3 Mission Control architecture before implementation. This adversarial process caught 12 issues (4 critical) that would have required expensive rework if discovered later.

Agent: Architect (Advocate)

Proposed the core design:

Project Groups as primary organizational unit (replacing free-form panes)
JSON config for human-editable definitions, SQLite for runtime state
Single shared sidecar with per-project isolation via cwd, claude_config_dir, session_id
Component split: AgentPane -> AgentSession + TeamAgentsPanel
MVP boundary at Phase 5 (5 phases core, 5 polish)

Agent: Devil's Advocate

Found 12 issues across the Architect's proposal:

#	Issue	Severity	Why It Matters
1	xterm.js 4-instance ceiling	Critical	WebKit2GTK OOMs at ~5 instances. 5 projects x 1 terminal = immediate wall.
2	Single sidecar = SPOF	Critical	One crash kills all 5 project agents. No isolation.
3	Layout store has no workspace concept	Critical	v2 pane-based store cannot represent project groups. Full rewrite needed.
4	384px per project on 1920px	Critical	5 projects on 1920px = 384px each — too narrow for code. Must adapt to viewport.
5	Session identity collision	Major	Without persisted `sdkSessionId`, resuming wrong session corrupts state.
6	JSON + SQLite = split-brain risk	Major	Two sources of truth can diverge. Must clearly separate config vs state.
7	Dispatcher has no project scoping	Major	Singleton routes all messages globally. Needs projectId and per-project cleanup.
8	Markdown discovery undefined	Minor	No spec for which .md files appear in Docs tab.
9	Keyboard shortcut conflicts	Major	Three input layers can conflict without explicit precedence.
10	Remote machine support orphaned	Major	v2 remote UI doesn't map to project model.
11	No graceful degradation	Major	Broken CWD or git could fail the whole group.
12	Flat event stream wastes CPU	Minor	Messages for hidden projects still process through adapters.

All 12 resolved before implementation. Critical items addressed in architecture. Major items implemented in MVP or deferred to v3.1 with rationale.

Agent: UX + Performance Specialist

Provided concrete wireframes and performance budgets:

Adaptive layout formula: 5 at 5120px, 3 at 1920px, 1 with scroll at <1600px
xterm budget: 4 active max, suspend/resume < 50ms
Memory budget: ~225MB total (4 xterm @ 20MB + Tauri + SQLite + agent stores)
Workspace switch: <100ms perceived (serialize scrollbacks + unmount/mount)
RAF batching: For 5 concurrent agent streams, batch DOM updates to avoid layout thrashing

11. Provider Adapter Coupling Analysis (v3, 2026-03-11)

Before implementing multi-provider support, a systematic coupling analysis mapped every Claude-specific dependency. 13+ files examined and classified into 4 severity levels.

Coupling Severity Map

CRITICAL — hardcoded SDK, must abstract:

sidecar/agent-runner.ts — imports Claude Agent SDK, calls query(), hardcoded findClaudeCli(). Became claude-runner.ts with other providers getting separate runners.
bterminal-core/src/sidecar.rs — AgentQueryOptions had no provider field. SidecarCommand hardcoded runner path. Added provider-based runner selection.
src/lib/adapters/sdk-messages.ts — parseMessage() assumed Claude SDK JSON format. Became claude-messages.ts with per-provider parsers.

HIGH — TS mirror types, provider-specific commands:

agent-bridge.ts — AgentQueryOptions interface mirrored Rust with no provider field.
lib.rs — claude_list_profiles, claude_list_skills are Claude-specific (kept, gated by capability).
claude-bridge.ts — provider-specific adapter (kept, genericized via provider-bridge.ts).

MEDIUM — provider-aware routing:

agent-dispatcher.ts — called parseMessage() (Claude-specific), subagent tool names hardcoded.
AgentPane.svelte — profile selector, skill autocomplete assumed Claude.

LOW — already generic:

agents.svelte.ts, health.svelte.ts, conflicts.svelte.ts — provider-agnostic.
bterminal-relay/ — forwards AgentQueryOptions as-is.

Key Insights

Sidecar is the natural abstraction boundary. Each provider needs its own runner because SDKs are incompatible.
Message format is the main divergence point. Per-provider adapters normalize to AgentMessage.
Capability flags eliminate provider switches. UI checks capabilities.hasProfiles instead of provider === 'claude'.
Env var stripping is provider-specific. Claude strips CLAUDE*, Codex strips CODEX*, Ollama strips nothing.

12. Codebase Reuse Analysis: v2 to v3 (2026-03-07)

Survived (with modifications)

Component/Module	Modifications
TerminalPane.svelte	Added suspend/resume lifecycle for xterm budget
MarkdownPane.svelte	Unchanged
AgentTree.svelte	Reused inside AgentSession
StatusBar.svelte	Rewritten for workspace store (group name, fleet status, attention queue)
ToastContainer.svelte	Unchanged
agents.svelte.ts	Added projectId field to AgentSession
theme.svelte.ts	Unchanged
notifications.svelte.ts	Unchanged
All adapters	Minor updates for provider routing
All Rust backend	Added new modules (btmsg, bttask, search, secrets, plugins)

Replaced

v2 Component	v3 Replacement	Reason
layout.svelte.ts	workspace.svelte.ts	Pane-based model -> project-group model
TilingGrid.svelte	ProjectGrid.svelte	Free-form grid -> fixed project boxes
PaneContainer.svelte	ProjectBox.svelte	Generic pane -> per-project container with 11 tabs
SessionList.svelte	ProjectHeader + CommandPalette	Sidebar list -> inline headers + Ctrl+K
SettingsDialog.svelte	SettingsTab.svelte	Modal dialog -> sidebar drawer tab
AgentPane.svelte	AgentSession + TeamAgentsPanel	Monolithic -> split for team support
App.svelte	Full rewrite	Tab bar -> VSCode-style sidebar layout

Dropped (v3.0)

Feature	Reason
Detached pane mode	Doesn't fit workspace model (projects are grouped)
Drag-resize splitters	Project boxes have fixed internal layout
Layout presets	Replaced by adaptive project count from viewport
Remote machine UI	Deferred to v3.1 (elevated to project level)

13. Session Anchor Design (v3, 2026-03-12)

Session anchors solve context loss during Claude's automatic context compaction.

Problem

When Claude's context window fills up (~80% of model limit), the SDK automatically compacts older turns. This is lossy — important early decisions, architecture context, and debugging breakthroughs can be permanently lost.

Design Decisions

Auto-anchor on first compaction — Automatically captures the first 3 turns when compaction is first detected. Preserves the session's initial context (task definition, first architecture decisions).
Observation masking — Tool outputs (Read results, Bash output) are compacted in anchors, but reasoning text is preserved in full. Dramatically reduces anchor token cost while keeping important reasoning.
Budget system — Fixed scales (2K/6K/12K/20K tokens) instead of percentage-based. "6,000 tokens" is more intuitive than "15% of context."
Re-injection via system prompt — Promoted anchors are serialized and injected as the system_prompt field. Simplest integration with the SDK — no conversation history modification needed.

14. Multi-Agent Orchestration Design (v3, 2026-03-11)

Evaluated Approaches

Approach	Pros	Cons	Decision
Claude Agent Teams (native)	Zero custom code, SDK-managed	Experimental, session resume broken	Supported but not primary
Message bus (Redis/NATS)	Proven, scalable	Runtime dependency, deployment complexity	Rejected
Shared SQLite + CLI tools	Zero deps, agents use shell	Polling-based, no real-time push	Selected
MCP server for agent comm	Standard protocol	Overhead per message, complex setup	Rejected

Why SQLite + CLI

Agents run Claude Code sessions with full shell access. Python CLI tools (btmsg, bttask) reading/writing SQLite is the lowest-friction integration:

Zero configuration (btmsg send architect "review this")
No runtime services (no Redis, no MCP server)
WAL mode handles concurrent access from multiple agent processes
Same database readable by Rust backend for UI display
5s polling is acceptable — agents don't need millisecond latency

Role Hierarchy

4 Tier 1 roles based on common development workflows:

Manager — coordinates work (tech lead assigning sprint tasks). Unique: Task board tab, full bttask CRUD.
Architect — designs solutions (senior engineer doing design reviews). Unique: PlantUML tab.
Tester — runs tests (QA monitoring test suites). Unique: Selenium + Tests tabs.
Reviewer — reviews code (processing PR queue). Unique: review queue depth in attention scoring.

15. Theme System Evolution (v3, 2026-03-07)

Phase 1: 4 Catppuccin Flavors (v2)

Mocha, Macchiato, Frappe, Latte. All colors mapped to 26 --ctp-* CSS custom properties.

Phase 2: +7 Editor Themes

VSCode Dark+, Atom One Dark, Monokai, Dracula, Nord, Solarized Dark, GitHub Dark. Same 26 variables — zero component changes. CatppuccinFlavor type generalized to ThemeId.

Phase 3: +6 Deep Dark Themes

Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper (warm dark), Midnight (pure OLED black). Same mapping.

Key Decision

All 17 themes map to the same CSS custom property names. No component ever needs to know which theme is active. Adding new themes is a pure data operation: define 26 color values and add to THEME_LIST.

16. Performance Measurements (v3, 2026-03-11)

xterm.js Canvas Performance

WebKit2GTK lacks WebGL — xterm.js falls back to Canvas 2D:

Latency: ~20-30ms per keystroke (acceptable for AI output)
Memory: ~20MB per active instance
OOM threshold: ~5 simultaneous instances causes WebKit2GTK crash
Mitigation: 4-instance budget with suspend/resume

Tauri IPC Latency

Linux: ~5ms for typical payloads
Terminal keystroke echo: 5ms IPC + xterm render = 10-15ms total
Agent message forwarding: Negligible (human-readable speed)

SQLite WAL Concurrent Access

Both databases accessed concurrently by Rust backend + Python CLIs + frontend reads via IPC. WAL mode with 5s busy_timeout handles this reliably. 5-minute checkpoint prevents WAL growth.

Workspace Switch Latency

Serialize 4 xterm scrollbacks: ~30ms
Destroy 4 xterm instances: ~10ms
Unmount ProjectGrid children: ~5ms
Mount new group: ~20ms
Create new xterm instances: ~35ms
Total perceived: ~100ms (acceptable)

17 KiB Raw Blame History