From de8dd04f4b54e66f6a6c18047c0cab687ef0eff7 Mon Sep 17 00:00:00 2001 From: Hibryda Date: Sat, 14 Mar 2026 02:33:59 +0100 Subject: [PATCH] docs: add architecture, sidecar, orchestration, and production guides New documentation covering end-to-end system architecture, multi-provider sidecar lifecycle, btmsg/bttask multi-agent orchestration, and production hardening features (supervisor, sandbox, search, plugins, secrets, audit). --- docs/architecture.md | 333 ++++++++++++++++++++++++++++++++++++++ docs/orchestration.md | 362 +++++++++++++++++++++++++++++++++++++++++ docs/production.md | 364 ++++++++++++++++++++++++++++++++++++++++++ docs/sidecar.md | 235 +++++++++++++++++++++++++++ 4 files changed, 1294 insertions(+) create mode 100644 docs/architecture.md create mode 100644 docs/orchestration.md create mode 100644 docs/production.md create mode 100644 docs/sidecar.md diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..0c61760 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,333 @@ +# System Architecture + +This document describes the end-to-end architecture of Agent Orchestrator — how the Rust backend, Svelte 5 frontend, and Node.js/Deno sidecar processes work together to provide a multi-project AI agent orchestration dashboard. + +--- + +## High-Level Overview + +Agent Orchestrator is a Tauri 2.x desktop application. Tauri provides a Rust backend process and a WebKit2GTK-based webview for the frontend. The application manages AI agent sessions by spawning sidecar child processes that communicate with AI provider APIs (Claude, Codex, Ollama). + +``` +┌────────────────────────────────────────────────────────────────┐ +│ Agent Orchestrator (Tauri 2.x) │ +│ │ +│ ┌─────────────────┐ Tauri IPC ┌────────────────────┐ │ +│ │ WebView │ ◄─────────────► │ Rust Backend │ │ +│ │ (Svelte 5) │ invoke/listen │ │ │ +│ │ │ │ ├── PtyManager │ │ +│ │ ├── ProjectGrid │ │ ├── SidecarManager │ │ +│ │ ├── AgentPane │ │ ├── SessionDb │ │ +│ │ ├── TerminalPane │ │ ├── BtmsgDb │ │ +│ │ ├── StatusBar │ │ ├── SearchDb │ │ +│ │ └── Stores │ │ ├── SecretsManager │ │ +│ └─────────────────┘ │ ├── RemoteManager │ │ +│ │ └── FileWatchers │ │ +│ └────────────────────┘ │ +│ │ │ +└───────────────────────────────────────────┼────────────────────┘ + │ stdio NDJSON + ▼ + ┌───────────────────┐ + │ Sidecar Processes │ + │ (Deno or Node.js) │ + │ │ + │ claude-runner.mjs │ + │ codex-runner.mjs │ + │ ollama-runner.mjs │ + └───────────────────┘ +``` + +### Why Three Layers? + +1. **Rust backend** — Manages OS-level resources (PTY processes, file watchers, SQLite databases) with memory safety and low overhead. Exposes everything to the frontend via Tauri IPC commands and events. + +2. **Svelte 5 frontend** — Renders the UI with fine-grained reactivity (no VDOM). Svelte 5 runes (`$state`, `$derived`, `$effect`) provide signal-based reactivity comparable to Solid.js but with a larger ecosystem. + +3. **Sidecar processes** — The Claude Agent SDK, OpenAI Codex SDK, and Ollama API are all JavaScript/TypeScript libraries. They cannot run in Rust or in the WebKit2GTK webview (no Node.js APIs). The sidecar layer bridges this gap: Rust spawns a JS process, communicates via stdio NDJSON, and forwards structured messages to the frontend. + +--- + +## Rust Backend (`v2/src-tauri/`) + +The Rust backend is the central coordinator. It owns all OS resources and database connections. + +### Cargo Workspace + +The Rust code is organized as a Cargo workspace with three members: + +``` +v2/ +├── Cargo.toml # Workspace root +├── bterminal-core/ # Shared crate +│ └── src/ +│ ├── lib.rs +│ ├── pty.rs # PtyManager (portable-pty) +│ ├── sidecar.rs # SidecarManager (multi-provider) +│ ├── supervisor.rs # SidecarSupervisor (crash recovery) +│ ├── sandbox.rs # Landlock sandbox +│ └── event.rs # EventSink trait +├── bterminal-relay/ # Remote machine relay +│ └── src/main.rs # WebSocket server + token auth +└── src-tauri/ # Tauri application + └── src/ + ├── lib.rs # AppState + setup + handler registration + ├── commands/ # 16 domain command modules + ├── btmsg.rs # Inter-agent messaging (SQLite) + ├── bttask.rs # Task board (SQLite, shared btmsg.db) + ├── search.rs # FTS5 full-text search + ├── secrets.rs # System keyring (libsecret) + ├── plugins.rs # Plugin discovery + ├── notifications.rs # Desktop notifications + ├── session/ # SessionDb (sessions, layout, settings, agents, metrics, anchors) + ├── remote.rs # RemoteManager (WebSocket client) + ├── ctx.rs # Read-only ctx database access + ├── memora.rs # Read-only Memora database access + ├── telemetry.rs # OpenTelemetry tracing + ├── groups.rs # Project groups config + ├── watcher.rs # File watcher (notify crate) + ├── fs_watcher.rs # Per-project filesystem watcher (inotify) + ├── event_sink.rs # TauriEventSink implementation + ├── pty.rs # Thin re-export from bterminal-core + └── sidecar.rs # Thin re-export from bterminal-core +``` + +### Why a Workspace? + +The `bterminal-core` crate exists so that both the Tauri application and the standalone `bterminal-relay` binary can share PtyManager and SidecarManager code. The `EventSink` trait abstracts event emission — TauriEventSink wraps Tauri's AppHandle, while the relay uses a WebSocket-based EventSink. + +### AppState + +All backend state lives in `AppState`, initialized during Tauri setup: + +```rust +pub struct AppState { + pub pty_manager: Mutex, + pub sidecar_manager: Mutex, + pub session_db: Mutex, + pub remote_manager: Mutex, + pub telemetry: Option, +} +``` + +### SQLite Databases + +The backend manages two SQLite databases, both in WAL mode with 5-second busy timeout for concurrent access: + +| Database | Location | Purpose | +|----------|----------|---------| +| `sessions.db` | `~/.local/share/bterminal/` | Sessions, layout, settings, agent state, metrics, anchors | +| `btmsg.db` | `~/.local/share/bterminal/` | Inter-agent messages, tasks, agents registry, audit log | + +WAL checkpoints run every 5 minutes via a background tokio task to prevent unbounded WAL growth. + +All queries use **named column access** (`row.get("column_name")`) — never positional indices. Rust structs use `#[serde(rename_all = "camelCase")]` so TypeScript interfaces receive camelCase field names on the wire. + +### Command Modules + +Tauri commands are organized into 16 domain modules under `commands/`: + +| Module | Commands | Purpose | +|--------|----------|---------| +| `pty` | spawn, write, resize, kill | Terminal management | +| `agent` | query, stop, ready, restart | Agent session lifecycle | +| `session` | session CRUD, layout, settings | Session persistence | +| `persistence` | agent state, messages | Agent session continuity | +| `knowledge` | ctx, memora queries | External knowledge bases | +| `claude` | profiles, skills | Claude-specific features | +| `groups` | load, save | Project group config | +| `files` | list_directory, read/write file | File browser | +| `watcher` | start, stop | File change monitoring | +| `remote` | 12 commands | Remote machine management | +| `bttask` | list, create, update, delete, comments | Task board | +| `search` | init, search, rebuild, index | FTS5 search | +| `secrets` | store, get, delete, list, has_keyring | Secrets management | +| `plugins` | discover, read_file | Plugin discovery | +| `notifications` | send_desktop | OS notifications | +| `misc` | test_mode, frontend_log | Utilities | + +--- + +## Svelte 5 Frontend (`v2/src/`) + +The frontend uses Svelte 5 with runes for reactive state management. The UI follows a VSCode-inspired layout with a left icon rail, expandable drawer, project grid, and status bar. + +### Component Hierarchy + +``` +App.svelte [Root — VSCode-style layout] +├── CommandPalette.svelte [Ctrl+K overlay, 18+ commands] +├── SearchOverlay.svelte [Ctrl+Shift+F, FTS5 Spotlight-style] +├── NotificationCenter.svelte [Bell icon + dropdown] +├── GlobalTabBar.svelte [Left icon rail, 2.75rem wide] +├── [Sidebar Panel] [Expandable drawer, max 50%] +│ └── SettingsTab.svelte [Global settings + group/project CRUD] +├── ProjectGrid.svelte [Flex + scroll-snap, adaptive count] +│ └── ProjectBox.svelte [Per-project container, 11 tab types] +│ ├── ProjectHeader.svelte [Icon + name + status + badges] +│ ├── AgentSession.svelte [Main Claude session wrapper] +│ │ ├── AgentPane.svelte [Structured message rendering] +│ │ └── TeamAgentsPanel.svelte [Tier 1 subagent cards] +│ ├── TerminalTabs.svelte [Shell/SSH/agent-preview tabs] +│ │ ├── TerminalPane.svelte [xterm.js + Canvas addon] +│ │ └── AgentPreviewPane.svelte [Read-only agent activity] +│ ├── DocsTab.svelte [Markdown file browser] +│ ├── ContextTab.svelte [LLM context visualization] +│ ├── FilesTab.svelte [Directory tree + CodeMirror editor] +│ ├── SshTab.svelte [SSH connection manager] +│ ├── MemoriesTab.svelte [Memora database viewer] +│ ├── MetricsPanel.svelte [Health + history sparklines] +│ ├── TaskBoardTab.svelte [Kanban board, Manager only] +│ ├── ArchitectureTab.svelte [PlantUML viewer, Architect only] +│ └── TestingTab.svelte [Selenium/test files, Tester only] +└── StatusBar.svelte [Agent counts, burn rate, attention queue] +``` + +### Stores (Svelte 5 Runes) + +All store files use the `.svelte.ts` extension — this is required for Svelte 5 runes (`$state`, `$derived`, `$effect`). Files with plain `.ts` extension will compile but fail at runtime with "rune_outside_svelte". + +| Store | Purpose | +|-------|---------| +| `workspace.svelte.ts` | Project groups, active group, tabs, focus | +| `agents.svelte.ts` | Agent sessions, messages, cost, parent/child hierarchy | +| `health.svelte.ts` | Per-project health tracking, attention scoring, burn rate | +| `conflicts.svelte.ts` | File overlap + external write detection | +| `anchors.svelte.ts` | Session anchor management (auto/pinned/promoted) | +| `notifications.svelte.ts` | Toast + history (6 types, unread badge) | +| `plugins.svelte.ts` | Plugin command registry, event bus | +| `theme.svelte.ts` | 17 themes, font restoration | +| `machines.svelte.ts` | Remote machine state | +| `wake-scheduler.svelte.ts` | Manager auto-wake (3 strategies, per-manager timers) | + +### Adapters (IPC Bridge Layer) + +Adapters wrap Tauri `invoke()` calls and `listen()` event subscriptions. They isolate the frontend from IPC details and provide typed TypeScript interfaces. + +| Adapter | Backend Module | Purpose | +|---------|---------------|---------| +| `agent-bridge.ts` | sidecar + commands/agent | Agent query/stop/restart | +| `pty-bridge.ts` | pty + commands/pty | Terminal spawn/write/resize | +| `claude-messages.ts` | — (frontend-only) | Parse Claude SDK NDJSON → AgentMessage | +| `codex-messages.ts` | — (frontend-only) | Parse Codex ThreadEvents → AgentMessage | +| `ollama-messages.ts` | — (frontend-only) | Parse Ollama chunks → AgentMessage | +| `message-adapters.ts` | — (frontend-only) | Provider registry for message parsers | +| `provider-bridge.ts` | commands/claude | Generic provider bridge (profiles, skills) | +| `btmsg-bridge.ts` | btmsg | Inter-agent messaging | +| `bttask-bridge.ts` | bttask | Task board operations | +| `groups-bridge.ts` | groups | Group config load/save | +| `session-bridge.ts` | session | Session/layout persistence | +| `settings-bridge.ts` | session/settings | Key-value settings | +| `files-bridge.ts` | commands/files | File browser operations | +| `search-bridge.ts` | search | FTS5 search | +| `secrets-bridge.ts` | secrets | System keyring | +| `anchors-bridge.ts` | session/anchors | Session anchor CRUD | +| `remote-bridge.ts` | remote | Remote machine management | +| `ssh-bridge.ts` | session/ssh | SSH session CRUD | +| `ctx-bridge.ts` | ctx | Context database queries | +| `memora-bridge.ts` | memora | Memora database queries | +| `fs-watcher-bridge.ts` | fs_watcher | Filesystem change events | +| `audit-bridge.ts` | btmsg (audit_log) | Audit log queries | +| `telemetry-bridge.ts` | telemetry | Frontend → Rust tracing | +| `notifications-bridge.ts` | notifications | Desktop notification trigger | +| `plugins-bridge.ts` | plugins | Plugin discovery | + +### Agent Dispatcher + +The agent dispatcher (`agent-dispatcher.ts`, ~260 lines) is the central router between sidecar events and the agent store. When the Rust backend emits a `sidecar-message` Tauri event, the dispatcher: + +1. Looks up the provider for the session (via `sessionProviderMap`) +2. Routes the raw message through the appropriate adapter (claude-messages.ts, codex-messages.ts, or ollama-messages.ts) via `message-adapters.ts` +3. Feeds the resulting `AgentMessage[]` into the agent store +4. Handles side effects: subagent pane spawning, session persistence, auto-anchoring, worktree detection, health tracking, conflict recording + +The dispatcher delegates to four extracted utility modules: +- `utils/session-persistence.ts` — session-project maps, persistSessionForProject +- `utils/subagent-router.ts` — spawn + route subagent panes +- `utils/auto-anchoring.ts` — triggerAutoAnchor on first compaction event +- `utils/worktree-detection.ts` — detectWorktreeFromCwd pure function + +--- + +## Sidecar Layer (`v2/sidecar/`) + +See [sidecar.md](sidecar.md) for the full sidecar architecture. In brief: + +- Each AI provider has its own runner file (e.g., `claude-runner.ts`) compiled to an ESM bundle (`claude-runner.mjs`) by esbuild +- Rust's SidecarManager spawns the appropriate runner based on the `provider` field in AgentQueryOptions +- Communication uses stdio NDJSON — one JSON object per line, newline-delimited +- Deno is preferred (faster startup), Node.js is the fallback +- The Claude runner uses `@anthropic-ai/claude-agent-sdk` query() internally + +--- + +## Data Flow: Agent Query Lifecycle + +Here is the complete path of a user prompt through the system: + +``` +1. User types prompt in AgentPane +2. AgentPane calls agentBridge.queryAgent(options) +3. agent-bridge.ts invokes Tauri command 'agent_query' +4. Rust agent_query handler calls SidecarManager.query() +5. SidecarManager resolves provider runner (e.g., claude-runner.mjs) +6. SidecarManager writes QueryMessage as NDJSON to sidecar stdin +7. Sidecar runner calls provider SDK (e.g., Claude Agent SDK query()) +8. Provider SDK streams responses +9. Runner forwards each response as NDJSON to stdout +10. SidecarManager reads stdout line-by-line +11. SidecarManager emits Tauri event 'sidecar-message' with sessionId + data +12. Frontend agent-dispatcher.ts receives event +13. Dispatcher routes through message-adapters.ts → provider-specific parser +14. Parser converts to AgentMessage[] +15. Dispatcher feeds messages into agents.svelte.ts store +16. AgentPane reactively re-renders via $derived bindings +``` + +### Session Stop Flow + +``` +1. User clicks Stop button in AgentPane +2. AgentPane calls agentBridge.stopAgent(sessionId) +3. agent-bridge.ts invokes Tauri command 'agent_stop' +4. Rust handler calls SidecarManager.stop(sessionId) +5. SidecarManager writes StopMessage to sidecar stdin +6. Runner calls AbortController.abort() on the SDK query +7. SDK terminates the Claude subprocess +8. Runner emits final status message, then closes +``` + +--- + +## Configuration + +### Project Groups (`~/.config/bterminal/groups.json`) + +Human-editable JSON file defining project groups and their projects. Loaded at startup by `groups.rs`. Not hot-reloaded — changes require app restart or group switch. + +### SQLite Settings (`sessions.db` → `settings` table) + +Key-value store for user preferences: theme, fonts, shell, CWD, provider settings. Accessed via `settings-bridge.ts` → `settings_get`/`settings_set` Tauri commands. + +### Environment Variables + +| Variable | Purpose | +|----------|---------| +| `BTERMINAL_TEST` | Enables test mode (disables watchers, wake scheduler) | +| `BTERMINAL_TEST_DATA_DIR` | Redirects SQLite database storage | +| `BTERMINAL_TEST_CONFIG_DIR` | Redirects groups.json config | +| `BTERMINAL_OTLP_ENDPOINT` | Enables OpenTelemetry OTLP export | + +--- + +## Key Constraints + +1. **WebKit2GTK has no WebGL** — xterm.js must use the Canvas addon explicitly. Maximum 4 active xterm.js instances to avoid OOM. + +2. **Svelte 5 runes require `.svelte.ts`** — Store files using `$state`/`$derived` must have the `.svelte.ts` extension. The compiler silently accepts `.ts` but runes fail at runtime. + +3. **Single shared sidecar** — All agent sessions share one SidecarManager. Per-project isolation is via `cwd`, `claude_config_dir`, and `session_id` routing. Per-project sidecar pools deferred to v3.1. + +4. **SQLite WAL mode** — Both databases use WAL with 5s busy_timeout for concurrent access from Rust backend + Python CLIs (btmsg/bttask). + +5. **camelCase wire format** — Rust uses `#[serde(rename_all = "camelCase")]`. TypeScript interfaces must match. This was a source of bugs during development (see [findings.md](findings.md) for context). diff --git a/docs/orchestration.md b/docs/orchestration.md new file mode 100644 index 0000000..b2512c6 --- /dev/null +++ b/docs/orchestration.md @@ -0,0 +1,362 @@ +# Multi-Agent Orchestration + +Agent Orchestrator supports running multiple AI agents that communicate with each other, coordinate work through a shared task board, and are managed by a hierarchy of specialized roles. This document covers the inter-agent messaging system (btmsg), the task board (bttask), agent roles and system prompts, and the auto-wake scheduler. + +--- + +## Agent Roles (Tier 1 and Tier 2) + +Agents are organized into two tiers: + +### Tier 1 — Management Agents + +Defined in `groups.json` under a group's `agents[]` array. Each management agent gets a full ProjectBox in the UI (converted via `agentToProject()` in the workspace store). They have role-specific capabilities, tabs, and system prompts. + +| Role | Tabs | btmsg Permissions | bttask Permissions | Purpose | +|------|------|-------------------|-------------------|---------| +| **Manager** | Model, Tasks | Full (send, receive, create channels) | Full CRUD | Coordinates work, creates/assigns tasks, delegates to subagents | +| **Architect** | Model, Architecture | Send, receive | Read-only + comments | Designs solutions, creates PlantUML diagrams, reviews architecture | +| **Tester** | Model, Selenium, Tests | Send, receive | Read-only + comments | Runs tests, monitors screenshots, discovers test files | +| **Reviewer** | Model, Tasks | Send, receive | Read + status + comments | Reviews code, manages review queue, approves/rejects tasks | + +### Tier 2 — Project Agents + +Regular `ProjectConfig` entries in `groups.json`. Each project gets its own Claude session with optional custom context via `project.systemPrompt`. They have standard tabs (Model, Docs, Context, Files, SSH, Memory) but no role-specific tabs. + +### System Prompt Generation + +Tier 1 agents receive auto-generated system prompts built by `generateAgentPrompt()` in `utils/agent-prompts.ts`. The prompt has 7 sections: + +1. **Identity** — Role name, project context, team membership +2. **Environment** — Working directory, available tools, shell info +3. **Team** — List of other agents in the group with their roles +4. **btmsg documentation** — CLI usage, channel commands, message format +5. **bttask documentation** — CLI usage, task lifecycle, role-specific permissions +6. **Custom context** — Optional `project.systemPrompt` (Tier 2) or role-specific instructions +7. **Workflow** — Role-specific workflow guidelines (e.g., Manager delegates, Reviewer checks review queue) + +Tier 2 agents receive only the custom context section (if `project.systemPrompt` is set), injected as the `system_prompt` field in AgentQueryOptions. + +### BTMSG_AGENT_ID + +Tier 1 agents receive the `BTMSG_AGENT_ID` environment variable, injected via `extra_env` in AgentQueryOptions. This flows through 5 layers: TypeScript → Rust AgentQueryOptions → NDJSON → JS runner → SDK env. The CLI tools (`btmsg`, `bttask`) read this variable to identify which agent is sending messages or creating tasks. + +### Periodic Re-injection + +LLM context degrades over long sessions as important instructions scroll out of the context window. To counter this, AgentSession runs a 1-hour timer that re-sends the system prompt when the agent is idle. The mechanism: + +1. AgentSession timer fires after 60 minutes of agent inactivity +2. Sets `autoPrompt` flag, which AgentPane reads via `onautopromptconsumed` callback +3. AgentPane calls `startQuery()` with `resume=true` and the refresh prompt +4. The agent receives the role/tools reminder as a follow-up message + +--- + +## btmsg — Inter-Agent Messaging + +btmsg is a messaging system that lets agents communicate with each other. It consists of a Rust backend (SQLite), a Python CLI tool (for agents to use in their shell), and a Svelte frontend (CommsTab). + +### Architecture + +``` +Agent (via btmsg CLI) + │ + ├── btmsg send "message" → writes to btmsg.db + ├── btmsg read → reads from btmsg.db + ├── btmsg channel create #review-queue → creates channel + ├── btmsg channel post #review-queue "msg" → posts to channel + └── btmsg heartbeat → updates agent heartbeat + │ + ▼ +btmsg.db (SQLite, WAL mode, ~/.local/share/bterminal/btmsg.db) + │ + ├── agents table — registered agents with roles + ├── messages table — DMs and channel messages + ├── channels table — named channels (#review-queue, #review-log) + ├── contacts table — ACL (who can message whom) + ├── heartbeats table — agent liveness tracking + ├── dead_letter_queue — undeliverable messages + └── audit_log — all operations for debugging + │ + ▼ +Rust Backend (btmsg.rs, ~600 lines) + │ + ├── btmsg_list_messages, btmsg_send_message, ... + ├── 15+ Tauri commands for full CRUD + └── Shared database connection (WAL + 5s busy_timeout) + │ + ▼ +Frontend (btmsg-bridge.ts → CommsTab.svelte) + ├── Activity feed — all messages across all agents + ├── DM view — direct messages between specific agents + └── Channel view — channel messages (#review-queue, etc.) +``` + +### Database Schema + +The btmsg database (`btmsg.db`) stores all messaging data: + +| Table | Purpose | Key Columns | +|-------|---------|-------------| +| `agents` | Agent registry | id, name, role, project_id, status, created_at | +| `messages` | All messages | id, sender_id, recipient_id, channel_id, content, read, created_at | +| `channels` | Named channels | id, name, created_by, created_at | +| `contacts` | ACL | agent_id, contact_id (bidirectional) | +| `heartbeats` | Liveness | agent_id, last_heartbeat, status | +| `dead_letter_queue` | Failed delivery | message_id, reason, created_at | +| `audit_log` | All operations | id, event_type, agent_id, details, created_at | + +### CLI Usage (for agents) + +Agents use the `btmsg` Python CLI tool in their shell. The tool reads `BTMSG_AGENT_ID` to identify the sender: + +```bash +# Send a direct message +btmsg send architect "Please review the auth module design" + +# Read unread messages +btmsg read + +# Create a channel +btmsg channel create #architecture-decisions + +# Post to a channel +btmsg channel post #review-queue "PR #42 ready for review" + +# Send heartbeat (agents do this periodically) +btmsg heartbeat + +# List all agents +btmsg agents +``` + +### Frontend (CommsTab) + +The CommsTab component (rendered in ProjectBox for all agents) shows: + +- **Activity Feed** — chronological view of all messages across all agents +- **DMs** — direct message threads between agents +- **Channels** — named channel message streams +- Polling-based updates (5s interval) + +### Dead Letter Queue + +Messages sent to non-existent or offline agents are moved to the dead letter queue instead of being silently dropped. The Rust backend checks agent status before delivery and queues failures. The Manager agent's health dashboard shows dead letter count. + +### Audit Logging + +Every btmsg operation is logged to the `audit_log` table with event type, agent ID, and JSON details. Event types include: message_sent, message_read, channel_created, agent_registered, heartbeat, and prompt_injection_detected. + +--- + +## bttask — Task Board + +bttask is a kanban-style task board that agents use to coordinate work. It shares the same SQLite database as btmsg (`btmsg.db`) for deployment simplicity. + +### Architecture + +``` +Agent (via bttask CLI) + │ + ├── bttask list → list all tasks + ├── bttask create "Fix auth bug" → create task (Manager only) + ├── bttask status in_progress → update status + ├── bttask comment "Done" → add comment + └── bttask review-count → count review queue tasks + │ + ▼ +btmsg.db → tasks table + task_comments table + │ + ▼ +Rust Backend (bttask.rs, ~300 lines) + │ + ├── 7 Tauri commands: list, create, update_status, delete, add_comment, comments, review_queue_count + └── Optimistic locking via version column + │ + ▼ +Frontend (bttask-bridge.ts → TaskBoardTab.svelte) + └── Kanban board: 5 columns, 5s poll, drag-and-drop +``` + +### Task Lifecycle + +``` +┌──────────┐ assign ┌─────────────┐ complete ┌──────────┐ +│ Backlog │──────────►│ In Progress │────────────►│ Review │ +└──────────┘ └─────────────┘ └──────────┘ + │ + ┌───────────┼───────────┐ + ▼ ▼ + ┌────────┐ ┌──────────┐ + │ Done │ │ Rejected │ + └────────┘ └──────────┘ +``` + +When a task moves to the "Review" column, the system automatically posts a notification to the `#review-queue` btmsg channel. The `ensure_review_channels()` function creates `#review-queue` and `#review-log` channels idempotently on first use. + +### Optimistic Locking + +To prevent concurrent updates from corrupting task state, bttask uses optimistic locking via a `version` column: + +1. Client reads task with current version (e.g., version=3) +2. Client sends update with expected version=3 +3. Server's UPDATE query includes `WHERE version = 3` +4. If another client updated first (version=4), the WHERE clause matches 0 rows +5. Server returns a conflict error, client must re-read and retry + +This is critical because multiple agents may try to update the same task simultaneously. + +### Role-Based Permissions + +| Role | List | Create | Update Status | Delete | Comments | +|------|------|--------|---------------|--------|----------| +| Manager | Yes | Yes | Yes | Yes | Yes | +| Reviewer | Yes | No | Yes (review decisions) | No | Yes | +| Architect | Yes | No | No | No | Yes | +| Tester | Yes | No | No | No | Yes | +| Project (Tier 2) | Yes | No | No | No | Yes | + +Permissions are enforced in the CLI tool based on the agent's role (read from `BTMSG_AGENT_ID` → agents table lookup). + +### Review Queue Integration + +The Reviewer agent gets special treatment in the attention scoring system: + +- `reviewQueueDepth` is an input to attention scoring: 10 points per review task, capped at 50 +- Priority: between file_conflict (70) and context_high (40) +- ProjectBox polls `review_queue_count` every 10 seconds for reviewer agents +- Results feed into `setReviewQueueDepth()` in the health store + +### Frontend (TaskBoardTab.svelte) + +The kanban board renders 5 columns (Backlog, In Progress, Review, Done, Rejected) with task cards. Features: + +- 5-second polling for updates +- Click to expand task details + comments +- Manager-only create/delete buttons +- Color-coded status badges + +--- + +## Wake Scheduler + +The wake scheduler automatically re-activates idle Manager agents when attention-worthy events occur. It runs in `wake-scheduler.svelte.ts` and supports three user-selectable strategies. + +### Strategies + +| Strategy | Behavior | Use Case | +|----------|----------|----------| +| **Persistent** | Sends a resume prompt to the existing session | Long-running managers that should maintain context | +| **On-demand** | Starts a fresh session | Managers that work in bursts | +| **Smart** | On-demand, but only when wake score exceeds threshold | Avoids waking for minor events | + +Strategy and threshold are configurable per group agent via `GroupAgentConfig.wakeStrategy` and `GroupAgentConfig.wakeThreshold` fields, persisted in `groups.json`. + +### Wake Signals + +The wake scorer evaluates 6 signals (defined in `types/wake.ts`, scored by `utils/wake-scorer.ts`): + +| Signal | Weight | Trigger | +|--------|--------|---------| +| AttentionSpike | 1.0 | Any project's attention score exceeds threshold | +| ContextPressureCluster | 0.9 | Multiple projects have >75% context usage | +| BurnRateAnomaly | 0.8 | Cost rate deviates significantly from baseline | +| TaskQueuePressure | 0.7 | Task backlog grows beyond threshold | +| ReviewBacklog | 0.6 | Review queue has pending items | +| PeriodicFloor | 0.1 | Minimum periodic check (floor signal) | + +The pure scoring function in `wake-scorer.ts` is tested with 24 unit tests. The types are in `types/wake.ts` (WakeStrategy, WakeSignal, WakeEvaluation, WakeContext). + +### Lifecycle + +1. ProjectBox registers manager agents via `$effect` on mount +2. Wake scheduler creates per-manager timers +3. Every 5 seconds, AgentSession polls wake events +4. If score exceeds threshold (for smart strategy), triggers wake +5. On group switch, `clearWakeScheduler()` cancels all timers +6. In test mode (`BTERMINAL_TEST=1`), wake scheduler is disabled via `disableWakeScheduler()` + +--- + +## Health Monitoring & Attention Scoring + +The health store (`health.svelte.ts`) tracks per-project health with a 5-second tick timer. It provides the data that feeds the StatusBar, wake scheduler, and attention queue. + +### Activity States + +| State | Meaning | Visual | +|-------|---------|--------| +| Inactive | No agent running, no recent activity | Dim dot | +| Running | Agent actively processing | Green pulse | +| Idle | Agent finished, waiting for input | Gray dot | +| Stalled | Agent hasn't produced output for >N minutes | Orange pulse | + +The stall threshold is configurable per-project via `stallThresholdMin` in ProjectConfig (default 15 min, range 5-60, step 5). + +### Attention Scoring + +Each project gets an attention score (0-100) based on its current state. The attention queue in the StatusBar shows the top 5 projects sorted by urgency: + +| Condition | Score | Priority | +|-----------|-------|----------| +| Stalled agent | 100 | Highest — agent may be stuck | +| Error state | 90 | Agent crashed or API error | +| Context >90% | 80 | Context window nearly full | +| File conflict | 70 | Two agents wrote same file | +| Review queue depth | 10/task, cap 50 | Reviewer has pending reviews | +| Context >75% | 40 | Context pressure building | + +The pure scoring function is in `utils/attention-scorer.ts` (14 tests). It takes `AttentionInput` and returns a numeric score. + +### Burn Rate + +Cost tracking uses a 5-minute exponential moving average (EMA) of cost snapshots. The StatusBar displays aggregate $/hr across all running agents. + +### File Conflict Detection + +The conflicts store (`conflicts.svelte.ts`) detects two types of conflicts: + +1. **Agent overlap** — Two agents in the same worktree write the same file (tracked via tool_call analysis in the dispatcher) +2. **External writes** — A file watched by an agent is modified externally (detected via inotify in `fs_watcher.rs`, uses 2s timing heuristic `AGENT_WRITE_GRACE_MS` to distinguish agent writes from external) + +Both types show badges in ProjectHeader (orange ⚡ for external, red ⚠ for agent overlap). + +--- + +## Session Anchors + +Session anchors preserve important conversation turns through Claude's context compaction process. Without anchors, valuable early context (architecture decisions, debugging breakthroughs) can be lost when the context window fills up. + +### Anchor Types + +| Type | Created By | Behavior | +|------|-----------|----------| +| **Auto** | System (on first compaction) | Captures first 3 turns, observation-masked (reasoning preserved, tool outputs compacted) | +| **Pinned** | User (pin button in AgentPane) | Marks specific turns as important | +| **Promoted** | User (from pinned) | Re-injectable into future sessions via system prompt | + +### Anchor Budget + +The budget controls how many tokens are spent on anchor re-injection: + +| Scale | Token Budget | Use Case | +|-------|-------------|----------| +| Small | 2,000 | Quick sessions, minimal context needed | +| Medium | 6,000 | Default, covers most scenarios | +| Large | 12,000 | Complex debugging sessions | +| Full | 20,000 | Maximum context preservation | + +Configurable per-project via slider in SettingsTab, stored as `ProjectConfig.anchorBudgetScale` in `groups.json`. + +### Re-injection Flow + +When a session resumes with promoted anchors: +1. `anchors.svelte.ts` loads promoted anchors for the project +2. `anchor-serializer.ts` serializes them (turn grouping, observation masking, token estimation) +3. `AgentPane.startQuery()` includes serialized anchors in the `system_prompt` field +4. The sidecar passes the system prompt to the SDK +5. Claude receives the anchors as context alongside the new prompt + +### Storage + +Anchors are persisted in the `session_anchors` table in `sessions.db`. The ContextTab shows an anchor section with a budget meter (derived from the configured scale) and promote/demote buttons. diff --git a/docs/production.md b/docs/production.md new file mode 100644 index 0000000..f5b2a59 --- /dev/null +++ b/docs/production.md @@ -0,0 +1,364 @@ +# Production Hardening + +Agent Orchestrator includes several production-readiness features that ensure reliability, security, and observability. This document covers each subsystem in detail. + +--- + +## Sidecar Supervisor (Crash Recovery) + +The `SidecarSupervisor` in `bterminal-core/src/supervisor.rs` automatically restarts crashed sidecar processes. + +### Behavior + +When the sidecar child process exits unexpectedly: + +1. The supervisor detects the exit via process monitoring +2. Waits with exponential backoff before restarting: + - Attempt 1: wait 1 second + - Attempt 2: wait 2 seconds + - Attempt 3: wait 4 seconds + - Attempt 4: wait 8 seconds + - Attempt 5: wait 16 seconds (capped at 30s) +3. After 5 failed attempts, the supervisor gives up and reports `SidecarHealth::Failed` + +### Health States + +```rust +pub enum SidecarHealth { + Healthy, + Restarting { attempt: u32, next_retry: Duration }, + Failed { attempts: u32, last_error: String }, +} +``` + +The frontend can query health state and offer a manual restart button when auto-recovery fails. 17 unit tests cover all recovery scenarios including edge cases like rapid successive crashes. + +--- + +## Landlock Sandbox + +Landlock is a Linux kernel (6.2+) security module that restricts filesystem access for processes. Agent Orchestrator uses it to sandbox sidecar processes, limiting what files they can read and write. + +### Configuration + +```rust +pub struct SandboxConfig { + pub read_write_paths: Vec, // Full access (project dir, temp) + pub read_only_paths: Vec, // Read-only (system libs, SDK) +} +``` + +The sandbox is applied via `pre_exec()` on the child process command, before the sidecar starts executing. + +### Path Rules + +| Path | Access | Reason | +|------|--------|--------| +| Project CWD | Read/Write | Agent needs to read and modify project files | +| `/tmp` | Read/Write | Temporary files during operation | +| `~/.local/share/bterminal/` | Read/Write | SQLite databases (btmsg, sessions) | +| System library paths | Read-only | Node.js/Deno runtime dependencies | +| `~/.claude/` or config dir | Read-only | Claude configuration and credentials | + +### Graceful Fallback + +If the kernel doesn't support Landlock (< 6.2) or the kernel module isn't loaded, the sandbox silently degrades — the sidecar runs without filesystem restrictions. This is logged as a warning but doesn't prevent operation. + +--- + +## FTS5 Full-Text Search + +The search system uses SQLite's FTS5 extension for full-text search across three data types. Accessed via a Spotlight-style overlay (Ctrl+Shift+F). + +### Architecture + +``` +SearchOverlay.svelte (Ctrl+Shift+F) + │ + └── search-bridge.ts → Tauri commands + │ + └── search.rs → SearchDb (separate FTS5 tables) + │ + ├── search_messages — agent session messages + ├── search_tasks — bttask task content + └── search_btmsg — btmsg inter-agent messages +``` + +### Virtual Tables + +The `SearchDb` struct in `search.rs` manages three FTS5 virtual tables: + +| Table | Source | Indexed Columns | +|-------|--------|----------------| +| `search_messages` | Agent session messages | content, session_id, project_id | +| `search_tasks` | bttask tasks | title, description, assignee, status | +| `search_btmsg` | btmsg messages | content, sender, recipient, channel | + +### Operations + +| Tauri Command | Purpose | +|---------------|---------| +| `search_init` | Creates FTS5 virtual tables if not exist | +| `search_all` | Queries all 3 tables, returns ranked results | +| `search_rebuild` | Drops and rebuilds all indices (maintenance) | +| `search_index_message` | Indexes a single new message (real-time) | + +### Frontend (SearchOverlay.svelte) + +- Triggered by Ctrl+Shift+F +- Spotlight-style floating overlay centered on screen +- 300ms debounce on input to avoid excessive queries +- Results grouped by type (Messages, Tasks, Communications) +- Click result to navigate to source (focus project, switch tab) + +--- + +## Plugin System + +The plugin system allows extending Agent Orchestrator with custom commands and event handlers. Plugins are sandboxed JavaScript executing in a restricted environment. + +### Plugin Discovery + +Plugins live in `~/.config/bterminal/plugins/`. Each plugin is a directory containing a `plugin.json` manifest: + +```json +{ + "name": "my-plugin", + "version": "1.0.0", + "description": "A custom plugin", + "main": "index.js", + "permissions": ["notifications", "settings"] +} +``` + +The Rust `plugins.rs` module scans for `plugin.json` files with path-traversal protection (rejects `..` in paths). + +### Sandboxed Runtime (plugin-host.ts) + +Plugins execute via `new Function()` in a restricted scope: + +**Shadowed globals (13):** +`fetch`, `XMLHttpRequest`, `WebSocket`, `Worker`, `eval`, `Function`, `importScripts`, `require`, `process`, `globalThis`, `window`, `document`, `localStorage` + +**Provided API (permission-gated):** + +| API | Permission | Purpose | +|-----|-----------|---------| +| `bt.notify(msg)` | `notifications` | Show toast notification | +| `bt.getSetting(key)` | `settings` | Read app setting | +| `bt.setSetting(key, val)` | `settings` | Write app setting | +| `bt.registerCommand(name, fn)` | — (always allowed) | Add command to palette | +| `bt.on(event, fn)` | — (always allowed) | Subscribe to app events | + +The API object is frozen (`Object.freeze`) to prevent tampering. Strict mode is enforced. + +### Plugin Store (`plugins.svelte.ts`) + +The store manages plugin lifecycle: +- `loadAllPlugins()` — discover, validate permissions, execute in sandbox +- `unloadAllPlugins()` — cleanup event listeners, remove commands +- Command registry integrates with CommandPalette +- Event bus distributes app events to subscribed plugins + +### Security Notes + +The `new Function()` sandbox is best-effort — it is not a security boundary. A determined attacker could escape it. Landlock provides the actual filesystem restriction. The plugin sandbox primarily prevents accidental damage from buggy plugins. + +35 tests cover the plugin system including permission validation, sandbox escape attempts, and lifecycle management. + +--- + +## Secrets Management + +Secrets (API keys, tokens) are stored in the system keyring rather than in plaintext files or SQLite. + +### Backend (`secrets.rs`) + +Uses the `keyring` crate with the `linux-native` feature (libsecret/DBUS): + +```rust +pub struct SecretsManager; + +impl SecretsManager { + pub fn store(key: &str, value: &str) -> Result<()>; + pub fn get(key: &str) -> Result>; + pub fn delete(key: &str) -> Result<()>; + pub fn list() -> Result>; + pub fn has_keyring() -> bool; +} +``` + +Metadata (key names, last modified timestamps) is stored in SQLite settings. The actual secret values never touch disk — they live only in the system keyring (gnome-keyring, KWallet, or equivalent). + +### Frontend (`secrets-bridge.ts`) + +| Function | Purpose | +|----------|---------| +| `storeSecret(key, value)` | Store a secret in keyring | +| `getSecret(key)` | Retrieve a secret | +| `deleteSecret(key)` | Remove a secret | +| `listSecrets()` | List all secret metadata | +| `hasKeyring()` | Check if system keyring is available | + +### No Fallback + +If no keyring daemon is available (no DBUS session, no gnome-keyring), secret operations fail with a clear error message. There is no plaintext fallback — this is intentional to prevent accidental credential leakage. + +--- + +## Notifications + +Agent Orchestrator has two notification systems: in-app toasts and OS-level desktop notifications. + +### In-App Toasts (`notifications.svelte.ts`) + +- 6 notification types: `success`, `error`, `warning`, `info`, `agent_complete`, `agent_error` +- Maximum 5 visible toasts, 4-second auto-dismiss +- Toast history (up to 100 entries) with unread badge in NotificationCenter +- Agent dispatcher emits toasts on: agent completion, agent error, sidecar crash + +### Desktop Notifications (`notifications.rs`) + +Uses `notify-rust` crate for native Linux notifications. Graceful fallback if notification daemon is unavailable (e.g., no D-Bus session). + +Frontend triggers via `sendDesktopNotification()` in `notifications-bridge.ts`. Used for events that should be visible even when the app is not focused. + +### Notification Center (`NotificationCenter.svelte`) + +Bell icon in the top-right with unread badge. Dropdown panel shows notification history with timestamps, type icons, and clear/mark-read actions. + +--- + +## Agent Health Monitoring + +### Heartbeats + +Tier 1 agents send periodic heartbeats via `btmsg heartbeat` CLI command. The heartbeats table tracks last heartbeat timestamp and status per agent. + +### Stale Detection + +The health store detects stalled agents via the `stallThresholdMin` setting (default 15 minutes). If an agent hasn't produced output within the threshold, its activity state transitions to `Stalled` and the attention score jumps to 100 (highest priority). + +### Dead Letter Queue + +Messages sent to agents that are offline or have crashed are moved to the dead letter queue in `btmsg.db`. This prevents silent message loss and allows debugging delivery failures. + +### Audit Logging + +All significant events are logged to the `audit_log` table: + +| Event Type | Logged When | +|-----------|-------------| +| `message_sent` | Agent sends a btmsg message | +| `message_read` | Agent reads messages | +| `channel_created` | New btmsg channel created | +| `agent_registered` | Agent registers with btmsg | +| `heartbeat` | Agent sends heartbeat | +| `task_created` | New bttask task | +| `task_status_changed` | Task status update | +| `wake_event` | Wake scheduler triggers | +| `prompt_injection_detected` | Suspicious content in agent messages | + +The AuditLogTab component in the workspace UI displays audit entries with filtering by event type and agent, with 5-second auto-refresh and max 200 entries. + +--- + +## Error Classification + +The error classifier (`utils/error-classifier.ts`) categorizes API errors into 6 types with appropriate retry behavior: + +| Type | Examples | Retry? | User Message | +|------|----------|--------|--------------| +| `rate_limit` | HTTP 429, "rate limit exceeded" | Yes (with backoff) | "Rate limited — retrying in Xs" | +| `auth` | HTTP 401/403, "invalid API key" | No | "Authentication failed — check API key" | +| `quota` | "quota exceeded", "billing" | No | "Usage quota exceeded" | +| `overloaded` | HTTP 529, "overloaded" | Yes (longer backoff) | "Service overloaded — retrying" | +| `network` | ECONNREFUSED, timeout, DNS failure | Yes | "Network error — check connection" | +| `unknown` | Anything else | No | "Unexpected error" | + +20 unit tests cover classification accuracy across various error message formats. + +--- + +## WAL Checkpoint + +Both SQLite databases (`sessions.db` and `btmsg.db`) use WAL (Write-Ahead Logging) mode for concurrent read/write access. Without periodic checkpoints, the WAL file grows unboundedly. + +A background tokio task runs `PRAGMA wal_checkpoint(TRUNCATE)` every 5 minutes on both databases. This moves WAL data into the main database file and resets the WAL. + +--- + +## TLS Relay Support + +The `bterminal-relay` binary supports TLS for encrypted WebSocket connections: + +```bash +bterminal-relay \ + --port 9750 \ + --token \ + --tls-cert /path/to/cert.pem \ + --tls-key /path/to/key.pem +``` + +Without `--tls-cert`/`--tls-key`, the relay only accepts connections with the `--insecure` flag (plain WebSocket). In production, TLS is mandatory — the relay rejects `ws://` connections unless `--insecure` is explicitly set. + +Certificate pinning (comparing relay certificate fingerprints) is planned for v3.1. + +--- + +## OpenTelemetry Observability + +The Rust backend supports optional OTLP trace export via the `BTERMINAL_OTLP_ENDPOINT` environment variable. + +### Backend (`telemetry.rs`) + +- `TelemetryGuard` initializes tracing + OTLP export pipeline +- Uses `tracing` + `tracing-subscriber` + `opentelemetry` 0.28 + `tracing-opentelemetry` 0.29 +- OTLP/HTTP export to configured endpoint +- `Drop`-based shutdown ensures spans are flushed + +### Frontend (`telemetry-bridge.ts`) + +The frontend cannot use the browser OTEL SDK (WebKit2GTK incompatible). Instead, it routes events through a `frontend_log` Tauri command that pipes into Rust's tracing system: + +```typescript +tel.info('agent-started', { sessionId, provider }); +tel.warn('context-pressure', { projectId, usage: 0.85 }); +tel.error('sidecar-crash', { error: msg }); +``` + +### Docker Stack + +A pre-configured Tempo + Grafana stack lives in `docker/tempo/`: + +```bash +cd docker/tempo && docker compose up -d +# Grafana at http://localhost:9715 +# Set BTERMINAL_OTLP_ENDPOINT=http://localhost:4318 to enable export +``` + +--- + +## Session Metrics + +Per-project historical session data is stored in the `session_metrics` table: + +| Column | Type | Purpose | +|--------|------|---------| +| `project_id` | TEXT | Which project | +| `session_id` | TEXT | Agent session ID | +| `start_time` | INTEGER | Session start timestamp | +| `end_time` | INTEGER | Session end timestamp | +| `peak_tokens` | INTEGER | Maximum context tokens used | +| `turn_count` | INTEGER | Total conversation turns | +| `tool_call_count` | INTEGER | Total tool calls made | +| `cost_usd` | REAL | Total cost in USD | +| `model` | TEXT | Model used | +| `status` | TEXT | Final status (success/error/stopped) | +| `error_message` | TEXT | Error details if failed | + +100-row retention per project (oldest pruned on insert). Metrics are persisted on agent completion via the agent dispatcher. + +The MetricsPanel component displays this data as: +- **Live view** — fleet aggregates, project health grid, task board summary, attention queue +- **History view** — SVG sparklines for cost/tokens/turns/tools/duration, stats row, session table diff --git a/docs/sidecar.md b/docs/sidecar.md new file mode 100644 index 0000000..162ad37 --- /dev/null +++ b/docs/sidecar.md @@ -0,0 +1,235 @@ +# Sidecar Architecture + +The sidecar is the bridge between Agent Orchestrator's Rust backend and AI provider APIs. Because the Claude Agent SDK, OpenAI Codex SDK, and Ollama API are JavaScript/TypeScript libraries, they cannot run inside Rust or WebKit2GTK's webview. Instead, the Rust backend spawns child processes (sidecars) that handle AI interactions and communicate back via stdio NDJSON. + +--- + +## Overview + +``` +Rust Backend (SidecarManager) + │ + ├── Spawns child process (Deno preferred, Node.js fallback) + ├── Writes QueryMessage to stdin (NDJSON) + ├── Reads response lines from stdout (NDJSON) + ├── Emits Tauri events for each message + └── Manages lifecycle (start, stop, crash recovery) + │ + ▼ +Sidecar Process (one of): + ├── claude-runner.mjs → @anthropic-ai/claude-agent-sdk + ├── codex-runner.mjs → @openai/codex-sdk + └── ollama-runner.mjs → native fetch to localhost:11434 +``` + +--- + +## Provider Runners + +Each provider has its own runner file in `v2/sidecar/`, compiled to a standalone ESM bundle in `v2/sidecar/dist/` by esbuild. The runners are self-contained — all dependencies (including SDKs) are bundled into the `.mjs` file. + +### Claude Runner (`claude-runner.ts` → `claude-runner.mjs`) + +The primary runner. Uses `@anthropic-ai/claude-agent-sdk` query() function. + +**Startup sequence:** +1. Reads NDJSON messages from stdin in a loop +2. On `query` message: resolves Claude CLI path via `findClaudeCli()` +3. Calls SDK `query()` with options: prompt, cwd, permissionMode, model, settingSources, systemPrompt, additionalDirectories, worktreeName, pathToClaudeCodeExecutable +4. Streams SDK messages as NDJSON to stdout +5. On `stop` message: calls AbortController.abort() + +**Claude CLI detection (`findClaudeCli()`):** +Checks paths in order: `~/.local/bin/claude` → `~/.claude/local/claude` → `/usr/local/bin/claude` → `/usr/bin/claude` → `which claude`. If none found, emits `agent_error` immediately. The path is resolved once at sidecar startup and reused for all sessions. + +**Session resume:** Passes `resume: sessionId` to the SDK when a resume session ID is provided. The SDK handles transcript loading internally. + +**Multi-account support:** When `claudeConfigDir` is provided (from profile selection), it is set as `CLAUDE_CONFIG_DIR` in the SDK's env option. This points the Claude CLI at a different configuration directory. + +**Worktree isolation:** When `worktreeName` is provided, it is passed as `extraArgs: { worktree: name }` to the SDK, which translates to `--worktree ` on the CLI. + +### Codex Runner (`codex-runner.ts` → `codex-runner.mjs`) + +Uses `@openai/codex-sdk` via dynamic import (graceful failure if not installed). + +**Key differences from Claude:** +- Authentication via `CODEX_API_KEY` environment variable +- Sandbox mode mapping: `bypassPermissions` → `full-auto`, `default` → `suggest` +- Session resume via thread ID (Codex's equivalent of session continuity) +- No profile/skill support +- ThreadEvent format differs from Claude's stream-json (parsed by `codex-messages.ts`) + +### Ollama Runner (`ollama-runner.ts` → `ollama-runner.mjs`) + +Direct HTTP to Ollama's REST API — zero external dependencies. + +**Key differences:** +- No SDK — uses native `fetch()` to `http://localhost:11434/api/chat` +- Health check on startup (`GET /api/tags`) +- NDJSON streaming response from Ollama's `/api/chat` endpoint +- Supports Qwen3's `` tags for reasoning display +- Configurable: host, model, num_ctx, temperature +- Cost is always $0 (local inference) +- No subagent support, no profiles, no skills + +--- + +## Communication Protocol + +### Messages from Rust to Sidecar (stdin) + +```typescript +// Query — start a new agent session +{ + "type": "query", + "session_id": "uuid", + "prompt": "Fix the bug in auth.ts", + "cwd": "/home/user/project", + "provider": "claude", + "model": "claude-sonnet-4-6", + "permission_mode": "bypassPermissions", + "resume_session_id": "previous-uuid", // optional + "system_prompt": "You are an architect...", // optional + "claude_config_dir": "~/.config/switcher-claude/work/", // optional + "setting_sources": ["user", "project"], // optional + "additional_directories": ["/shared/lib"], // optional + "worktree_name": "session-123", // optional + "provider_config": { ... }, // provider-specific blob + "extra_env": { "BTMSG_AGENT_ID": "manager-1" } // optional +} + +// Stop — abort a running session +{ + "type": "stop", + "session_id": "uuid" +} +``` + +### Messages from Sidecar to Rust (stdout) + +The sidecar writes one JSON object per line (NDJSON). The format depends on the provider, but all messages include a `sessionId` field added by the Rust SidecarManager before forwarding as Tauri events. + +**Claude messages** follow the same format as the Claude CLI's `--output-format stream-json`: +```typescript +// System init (carries session ID, model info) +{ "type": "system", "subtype": "init", "session_id": "...", "model": "..." } + +// Assistant text +{ "type": "assistant", "message": { "content": [{ "type": "text", "text": "..." }] } } + +// Tool use +{ "type": "assistant", "message": { "content": [{ "type": "tool_use", "name": "Read", "input": {...} }] } } + +// Tool result +{ "type": "user", "message": { "content": [{ "type": "tool_result", "content": "..." }] } } + +// Final result +{ "type": "result", "subtype": "success", "cost_usd": 0.05, "duration_ms": 12000, ... } + +// Error +{ "type": "agent_error", "error": "Claude CLI not found" } +``` + +--- + +## Environment Variable Stripping + +When Agent Orchestrator is launched from within a Claude Code terminal session, the parent process sets `CLAUDE*` environment variables for nesting detection and sandbox configuration. If these leak to the sidecar, Claude's SDK detects nesting and either errors or behaves unexpectedly. + +The solution is **dual-layer stripping**: + +1. **Rust layer (primary):** `SidecarManager` calls `env_clear()` on the child process command, then explicitly sets only the variables needed (`PATH`, `HOME`, `USER`, etc.). This prevents any parent environment from leaking. + +2. **JavaScript layer (defense-in-depth):** Each runner also strips provider-specific variables via `strip_provider_env_var()`: + - Claude: strips all `CLAUDE*` keys (whitelists `CLAUDE_CODE_EXPERIMENTAL_*`) + - Codex: strips all `CODEX*` keys + - Ollama: strips all `OLLAMA*` keys (except `OLLAMA_HOST`) + +The `extra_env` field in AgentQueryOptions allows injecting specific variables (like `BTMSG_AGENT_ID` for Tier 1 agents) after stripping. + +--- + +## Sidecar Lifecycle + +### Startup + +The SidecarManager is initialized during Tauri app setup. It does not spawn any sidecar processes at startup — processes are spawned on-demand when the first agent query arrives. + +### Runtime Resolution + +When a query arrives, `resolve_sidecar_for_provider(provider)` finds the appropriate runner: + +1. Looks for `{provider}-runner.mjs` in the sidecar dist directory +2. Checks for Deno first (`deno` or `~/.deno/bin/deno`), then Node.js +3. Returns a `SidecarCommand` struct with the runtime binary and script path +4. If neither runtime is found, returns an error + +Deno is preferred because it has faster cold-start time (~50ms vs ~150ms for Node.js) and can compile to a single binary for distribution. + +### Crash Recovery (SidecarSupervisor) + +The `SidecarSupervisor` in `bterminal-core/src/supervisor.rs` provides automatic crash recovery: + +- Monitors the sidecar child process for unexpected exits +- On crash: waits with exponential backoff (1s → 2s → 4s → 8s → 16s → 30s cap) +- Maximum 5 restart attempts before giving up +- Reports health via `SidecarHealth` enum: `Healthy`, `Restarting { attempt, next_retry }`, `Failed { attempts, last_error }` +- 17 unit tests covering all recovery scenarios + +### Shutdown + +On app exit, `SidecarManager` sends stop messages to all active sessions and kills remaining child processes. The `Drop` implementation ensures cleanup even on panic. + +--- + +## Build Pipeline + +```bash +# Build all 3 runner bundles +cd v2 && npm run build:sidecar + +# Internally runs esbuild 3 times: +# sidecar/claude-runner.ts → sidecar/dist/claude-runner.mjs +# sidecar/codex-runner.ts → sidecar/dist/codex-runner.mjs +# sidecar/ollama-runner.ts → sidecar/dist/ollama-runner.mjs +``` + +Each bundle is a standalone ESM file with all dependencies included. The Claude runner bundles `@anthropic-ai/claude-agent-sdk` directly — no `node_modules` needed at runtime. The Codex runner uses dynamic import for `@openai/codex-sdk` (graceful failure if not installed). The Ollama runner has zero external dependencies. + +The built `.mjs` files are included as Tauri resources in `tauri.conf.json` and copied to the app bundle during `tauri build`. + +--- + +## Message Adapter Layer + +On the frontend, raw sidecar messages pass through a provider-specific adapter before reaching the agent store: + +``` +Sidecar stdout → Rust SidecarManager → Tauri event + → agent-dispatcher.ts + → message-adapters.ts (registry) + → claude-messages.ts / codex-messages.ts / ollama-messages.ts + → AgentMessage[] (common type) + → agents.svelte.ts store +``` + +The `AgentMessage` type is provider-agnostic: + +```typescript +interface AgentMessage { + id: string; + type: 'text' | 'tool_call' | 'tool_result' | 'thinking' | 'init' + | 'status' | 'cost' | 'error' | 'hook'; + parentId?: string; // for subagent tracking + content: unknown; // type-specific payload + timestamp: number; +} +``` + +This means the agent store and AgentPane rendering code never need to know which provider generated a message. The adapter layer is the only code that understands provider-specific formats. + +### Test Coverage + +- `claude-messages.test.ts` — 25 tests covering all Claude message types +- `codex-messages.test.ts` — 19 tests covering all Codex ThreadEvent types +- `ollama-messages.test.ts` — 11 tests covering all Ollama chunk types