agent-orchestrator/docs/architecture/findings.md

6 KiB

Research Findings

Research conducted during development — technology evaluations, architecture reviews, performance measurements, and design analysis. Each finding informed implementation decisions recorded in decisions.md.


1. Claude Agent SDK

Source: https://platform.claude.com/docs/en/agent-sdk/overview

The Claude Agent SDK provides structured streaming, subagent detection, hooks, and telemetry — everything needed for a rich agent UI without terminal emulation.

Key Insight: The SDK gives structured data — we render it as rich UI (markdown, diff views, file cards, agent trees) instead of raw terminal text. Terminal emulation (xterm.js) is only needed for SSH, local shell, and legacy CLI sessions.


2. Tauri + xterm.js Integration

Integration pattern: Frontend (xterm.js) <-> Tauri IPC <-> Rust PTY (portable-pty) <-> Shell/SSH/Claude

Existing projects (tauri-terminal, Terminon, tauri-plugin-pty) validated the approach.


3. Terminal Performance Benchmarks

Terminal Latency Notes
xterm (native) ~10ms Gold standard
Alacritty ~12ms GPU-rendered Rust
VTE (GNOME Terminal) ~50ms GTK3/4
Hyper (Electron+xterm.js) ~40ms Web-based worst case

xterm.js in Tauri: ~20-30ms latency, ~20MB per instance. For AI output, perfectly fine. VTE in v1 GTK3 was actually slower at ~50ms.


4. Frontend Framework Choice

Why Svelte 5: Fine-grained reactivity ($state/$derived runes), no VDOM (critical for 4-8 panes streaming simultaneously), ~5KB runtime vs React's ~40KB. Larger ecosystem than Solid.js.


5. Adversarial Architecture Review (v3)

Three specialized agents reviewed the v3 Mission Control architecture before implementation. Caught 12 issues (4 critical) that would have required expensive rework if discovered later.

Critical Issues Found

# Issue Resolution
1 xterm.js 4-instance ceiling (WebKit2GTK OOM) Budget system with suspend/resume
2 Single sidecar = SPOF Supervisor with crash recovery, per-project pool deferred
3 Layout store has no workspace concept Full rewrite to workspace.svelte.ts
4 384px per project on 1920px (too narrow) Adaptive count from viewport width

8 more issues (Major/Minor) resolved before implementation.


6. Provider Adapter Coupling Analysis (v3)

Before implementing multi-provider support, mapped every Claude-specific dependency. 13+ files classified into 4 severity levels.

Key Insights

  1. Sidecar is the natural abstraction boundary. Each provider needs its own runner.
  2. Message format is the main divergence point. Per-provider adapters normalize to AgentMessage.
  3. Capability flags eliminate provider switches. UI checks capabilities.hasProfiles instead of provider === 'claude'.
  4. Env var stripping is provider-specific.

7. Codebase Reuse Analysis: v2 to v3

Survived (with modifications)

Component Modifications
TerminalPane.svelte Added suspend/resume lifecycle
MarkdownPane.svelte Unchanged
AgentTree.svelte Reused inside AgentSession
agents.svelte.ts Added projectId field
theme.svelte.ts Unchanged
notifications.svelte.ts Unchanged
All adapters Minor updates for provider routing
All Rust backend Added new modules (btmsg, bttask, search, secrets, plugins)

Replaced

v2 Component v3 Replacement Reason
layout.svelte.ts workspace.svelte.ts Pane-based -> project-group model
TilingGrid.svelte ProjectGrid.svelte Free-form grid -> fixed project boxes
PaneContainer.svelte ProjectBox.svelte Generic pane -> 11-tab container
SettingsDialog.svelte SettingsTab.svelte Modal -> sidebar drawer
AgentPane.svelte AgentSession + TeamAgentsPanel Monolithic -> split for teams
App.svelte Full rewrite VSCode-style sidebar layout

8. Session Anchor Design (v3)

Problem

When Claude's context window fills (~80% of model limit), the SDK automatically compacts older turns. Important early decisions and debugging breakthroughs can be permanently lost.

Design Decisions

  1. Auto-anchor on first compaction — Captures first 3 turns automatically.
  2. Observation masking — Tool outputs compacted, reasoning preserved in full.
  3. Budget system — Fixed scales (2K/6K/12K/20K tokens) instead of percentage-based.
  4. Re-injection via system prompt — Simplest SDK integration.

9. Multi-Agent Orchestration Design (v3)

Approach Decision
Claude Agent Teams (native) Supported but not primary (experimental, resume broken)
Message bus (Redis/NATS) Rejected (runtime dependency)
Shared SQLite + CLI tools Selected (zero deps, agents use shell)
MCP server for agent comm Rejected (overhead, complexity)

Why SQLite + CLI: Agents have full shell access. Python CLI tools reading/writing SQLite is lowest friction. Zero configuration, no runtime services, WAL handles concurrency.


10. Theme System Evolution

All 17 themes (4 Catppuccin + 7 Editor + 6 Deep Dark) map to the same 26 --ctp-* CSS custom properties. No component ever needs to know which theme is active. Adding new themes is a pure data operation.


11. Performance Measurements (v3)

xterm.js Canvas Performance (WebKit2GTK, no WebGL)

  • Latency: ~20-30ms per keystroke
  • Memory: ~20MB per active instance
  • OOM threshold: ~5 simultaneous instances
  • Mitigation: 4-instance budget with suspend/resume

Tauri IPC Latency

  • Linux: ~5ms for typical payloads
  • Terminal keystroke echo: 10-15ms total
  • Agent message forwarding: negligible

SQLite WAL Concurrent Access

WAL mode with 5s busy_timeout handles concurrent access reliably. 5-minute checkpoint prevents WAL growth.

Workspace Switch Latency

  • Serialize 4 xterm scrollbacks: ~30ms
  • Destroy + unmount: ~15ms
  • Mount new group + create xterm: ~55ms
  • Total perceived: ~100ms