chore: remove old root-level docs (content moved to subdirectories)

This commit is contained in:
Hibryda 2026-03-17 04:22:47 +01:00
parent 493b436eef
commit 2cdc8dddb2
9 changed files with 0 additions and 2875 deletions

View file

@ -1,530 +0,0 @@
# System Architecture
This document describes the end-to-end architecture of Agent Orchestrator — how the Rust backend, Svelte 5 frontend, and Node.js/Deno sidecar processes work together to provide a multi-project AI agent orchestration dashboard.
---
## High-Level Overview
Agent Orchestrator is a Tauri 2.x desktop application. Tauri provides a Rust backend process and a WebKit2GTK-based webview for the frontend. The application manages AI agent sessions by spawning sidecar child processes that communicate with AI provider APIs (Claude, Codex, Ollama).
```
┌────────────────────────────────────────────────────────────────┐
│ Agent Orchestrator (Tauri 2.x) │
│ │
│ ┌─────────────────┐ Tauri IPC ┌────────────────────┐ │
│ │ WebView │ ◄─────────────► │ Rust Backend │ │
│ │ (Svelte 5) │ invoke/listen │ │ │
│ │ │ │ ├── PtyManager │ │
│ │ ├── ProjectGrid │ │ ├── SidecarManager │ │
│ │ ├── AgentPane │ │ ├── SessionDb │ │
│ │ ├── TerminalPane │ │ ├── BtmsgDb │ │
│ │ ├── StatusBar │ │ ├── SearchDb │ │
│ │ └── Stores │ │ ├── SecretsManager │ │
│ └─────────────────┘ │ ├── RemoteManager │ │
│ │ └── FileWatchers │ │
│ └────────────────────┘ │
│ │ │
└───────────────────────────────────────────┼────────────────────┘
│ stdio NDJSON
┌───────────────────┐
│ Sidecar Processes │
│ (Deno or Node.js) │
│ │
│ claude-runner.mjs │
│ codex-runner.mjs │
│ ollama-runner.mjs │
└───────────────────┘
```
### Why Three Layers?
1. **Rust backend** — Manages OS-level resources (PTY processes, file watchers, SQLite databases) with memory safety and low overhead. Exposes everything to the frontend via Tauri IPC commands and events.
2. **Svelte 5 frontend** — Renders the UI with fine-grained reactivity (no VDOM). Svelte 5 runes (`$state`, `$derived`, `$effect`) provide signal-based reactivity comparable to Solid.js but with a larger ecosystem.
3. **Sidecar processes** — The Claude Agent SDK, OpenAI Codex SDK, and Ollama API are all JavaScript/TypeScript libraries. They cannot run in Rust or in the WebKit2GTK webview (no Node.js APIs). The sidecar layer bridges this gap: Rust spawns a JS process, communicates via stdio NDJSON, and forwards structured messages to the frontend.
---
## Rust Backend (`src-tauri/`)
The Rust backend is the central coordinator. It owns all OS resources and database connections.
### Cargo Workspace
The Rust code is organized as a Cargo workspace with three members:
```
v2/
├── Cargo.toml # Workspace root
├── agor-core/ # Shared crate
│ └── src/
│ ├── lib.rs
│ ├── pty.rs # PtyManager (portable-pty)
│ ├── sidecar.rs # SidecarManager (multi-provider)
│ ├── supervisor.rs # SidecarSupervisor (crash recovery)
│ ├── sandbox.rs # Landlock sandbox
│ └── event.rs # EventSink trait
├── agor-relay/ # Remote machine relay
│ └── src/main.rs # WebSocket server + token auth
└── src-tauri/ # Tauri application
└── src/
├── lib.rs # AppState + setup + handler registration
├── commands/ # 16 domain command modules
├── btmsg.rs # Inter-agent messaging (SQLite)
├── bttask.rs # Task board (SQLite, shared btmsg.db)
├── search.rs # FTS5 full-text search
├── secrets.rs # System keyring (libsecret)
├── plugins.rs # Plugin discovery
├── notifications.rs # Desktop notifications
├── session/ # SessionDb (sessions, layout, settings, agents, metrics, anchors)
├── remote.rs # RemoteManager (WebSocket client)
├── ctx.rs # Read-only ctx database access
├── memora.rs # Read-only Memora database access
├── telemetry.rs # OpenTelemetry tracing
├── groups.rs # Project groups config
├── watcher.rs # File watcher (notify crate)
├── fs_watcher.rs # Per-project filesystem watcher (inotify)
├── event_sink.rs # TauriEventSink implementation
├── pty.rs # Thin re-export from agor-core
└── sidecar.rs # Thin re-export from agor-core
```
### Why a Workspace?
The `agor-core` crate exists so that both the Tauri application and the standalone `agor-relay` binary can share PtyManager and SidecarManager code. The `EventSink` trait abstracts event emission — TauriEventSink wraps Tauri's AppHandle, while the relay uses a WebSocket-based EventSink.
### AppState
All backend state lives in `AppState`, initialized during Tauri setup:
```rust
pub struct AppState {
pub pty_manager: Mutex<PtyManager>,
pub sidecar_manager: Mutex<SidecarManager>,
pub session_db: Mutex<SessionDb>,
pub remote_manager: Mutex<RemoteManager>,
pub telemetry: Option<TelemetryGuard>,
}
```
### SQLite Databases
The backend manages two SQLite databases, both in WAL mode with 5-second busy timeout for concurrent access:
| Database | Location | Purpose |
|----------|----------|---------|
| `sessions.db` | `~/.local/share/agor/` | Sessions, layout, settings, agent state, metrics, anchors |
| `btmsg.db` | `~/.local/share/agor/` | Inter-agent messages, tasks, agents registry, audit log |
WAL checkpoints run every 5 minutes via a background tokio task to prevent unbounded WAL growth.
All queries use **named column access** (`row.get("column_name")`) — never positional indices. Rust structs use `#[serde(rename_all = "camelCase")]` so TypeScript interfaces receive camelCase field names on the wire.
### Command Modules
Tauri commands are organized into 16 domain modules under `commands/`:
| Module | Commands | Purpose |
|--------|----------|---------|
| `pty` | spawn, write, resize, kill | Terminal management |
| `agent` | query, stop, ready, restart | Agent session lifecycle |
| `session` | session CRUD, layout, settings | Session persistence |
| `persistence` | agent state, messages | Agent session continuity |
| `knowledge` | ctx, memora queries | External knowledge bases |
| `claude` | profiles, skills | Claude-specific features |
| `groups` | load, save | Project group config |
| `files` | list_directory, read/write file | File browser |
| `watcher` | start, stop | File change monitoring |
| `remote` | 12 commands | Remote machine management |
| `bttask` | list, create, update, delete, comments | Task board |
| `search` | init, search, rebuild, index | FTS5 search |
| `secrets` | store, get, delete, list, has_keyring | Secrets management |
| `plugins` | discover, read_file | Plugin discovery |
| `notifications` | send_desktop | OS notifications |
| `misc` | test_mode, frontend_log | Utilities |
---
## Svelte 5 Frontend (`src/`)
The frontend uses Svelte 5 with runes for reactive state management. The UI follows a VSCode-inspired layout with a left icon rail, expandable drawer, project grid, and status bar.
### Component Hierarchy
```
App.svelte [Root — VSCode-style layout]
├── CommandPalette.svelte [Ctrl+K overlay, 18+ commands]
├── SearchOverlay.svelte [Ctrl+Shift+F, FTS5 Spotlight-style]
├── NotificationCenter.svelte [Bell icon + dropdown]
├── GlobalTabBar.svelte [Left icon rail, 2.75rem wide]
├── [Sidebar Panel] [Expandable drawer, max 50%]
│ └── SettingsTab.svelte [Global settings + group/project CRUD]
├── ProjectGrid.svelte [Flex + scroll-snap, adaptive count]
│ └── ProjectBox.svelte [Per-project container, 11 tab types]
│ ├── ProjectHeader.svelte [Icon + name + status + badges]
│ ├── AgentSession.svelte [Main Claude session wrapper]
│ │ ├── AgentPane.svelte [Structured message rendering]
│ │ └── TeamAgentsPanel.svelte [Tier 1 subagent cards]
│ ├── TerminalTabs.svelte [Shell/SSH/agent-preview tabs]
│ │ ├── TerminalPane.svelte [xterm.js + Canvas addon]
│ │ └── AgentPreviewPane.svelte [Read-only agent activity]
│ ├── DocsTab.svelte [Markdown file browser]
│ ├── ContextTab.svelte [LLM context visualization]
│ ├── FilesTab.svelte [Directory tree + CodeMirror editor]
│ ├── SshTab.svelte [SSH connection manager]
│ ├── MemoriesTab.svelte [Memora database viewer]
│ ├── MetricsPanel.svelte [Health + history sparklines]
│ ├── TaskBoardTab.svelte [Kanban board, Manager only]
│ ├── ArchitectureTab.svelte [PlantUML viewer, Architect only]
│ └── TestingTab.svelte [Selenium/test files, Tester only]
└── StatusBar.svelte [Agent counts, burn rate, attention queue]
```
### Stores (Svelte 5 Runes)
All store files use the `.svelte.ts` extension — this is required for Svelte 5 runes (`$state`, `$derived`, `$effect`). Files with plain `.ts` extension will compile but fail at runtime with "rune_outside_svelte".
| Store | Purpose |
|-------|---------|
| `workspace.svelte.ts` | Project groups, active group, tabs, focus |
| `agents.svelte.ts` | Agent sessions, messages, cost, parent/child hierarchy |
| `health.svelte.ts` | Per-project health tracking, attention scoring, burn rate |
| `conflicts.svelte.ts` | File overlap + external write detection |
| `anchors.svelte.ts` | Session anchor management (auto/pinned/promoted) |
| `notifications.svelte.ts` | Toast + history (6 types, unread badge) |
| `plugins.svelte.ts` | Plugin command registry, event bus |
| `theme.svelte.ts` | 17 themes, font restoration |
| `machines.svelte.ts` | Remote machine state |
| `wake-scheduler.svelte.ts` | Manager auto-wake (3 strategies, per-manager timers) |
### Adapters (IPC Bridge Layer)
Adapters wrap Tauri `invoke()` calls and `listen()` event subscriptions. They isolate the frontend from IPC details and provide typed TypeScript interfaces.
| Adapter | Backend Module | Purpose |
|---------|---------------|---------|
| `agent-bridge.ts` | sidecar + commands/agent | Agent query/stop/restart |
| `pty-bridge.ts` | pty + commands/pty | Terminal spawn/write/resize |
| `claude-messages.ts` | — (frontend-only) | Parse Claude SDK NDJSON → AgentMessage |
| `codex-messages.ts` | — (frontend-only) | Parse Codex ThreadEvents → AgentMessage |
| `ollama-messages.ts` | — (frontend-only) | Parse Ollama chunks → AgentMessage |
| `message-adapters.ts` | — (frontend-only) | Provider registry for message parsers |
| `provider-bridge.ts` | commands/claude | Generic provider bridge (profiles, skills) |
| `btmsg-bridge.ts` | btmsg | Inter-agent messaging |
| `bttask-bridge.ts` | bttask | Task board operations |
| `groups-bridge.ts` | groups | Group config load/save |
| `session-bridge.ts` | session | Session/layout persistence |
| `settings-bridge.ts` | session/settings | Key-value settings |
| `files-bridge.ts` | commands/files | File browser operations |
| `search-bridge.ts` | search | FTS5 search |
| `secrets-bridge.ts` | secrets | System keyring |
| `anchors-bridge.ts` | session/anchors | Session anchor CRUD |
| `remote-bridge.ts` | remote | Remote machine management |
| `ssh-bridge.ts` | session/ssh | SSH session CRUD |
| `ctx-bridge.ts` | ctx | Context database queries |
| `memora-bridge.ts` | memora | Memora database queries |
| `fs-watcher-bridge.ts` | fs_watcher | Filesystem change events |
| `audit-bridge.ts` | btmsg (audit_log) | Audit log queries |
| `telemetry-bridge.ts` | telemetry | Frontend → Rust tracing |
| `notifications-bridge.ts` | notifications | Desktop notification trigger |
| `plugins-bridge.ts` | plugins | Plugin discovery |
### Agent Dispatcher
The agent dispatcher (`agent-dispatcher.ts`, ~260 lines) is the central router between sidecar events and the agent store. When the Rust backend emits a `sidecar-message` Tauri event, the dispatcher:
1. Looks up the provider for the session (via `sessionProviderMap`)
2. Routes the raw message through the appropriate adapter (claude-messages.ts, codex-messages.ts, or ollama-messages.ts) via `message-adapters.ts`
3. Feeds the resulting `AgentMessage[]` into the agent store
4. Handles side effects: subagent pane spawning, session persistence, auto-anchoring, worktree detection, health tracking, conflict recording
The dispatcher delegates to four extracted utility modules:
- `utils/session-persistence.ts` — session-project maps, persistSessionForProject
- `utils/subagent-router.ts` — spawn + route subagent panes
- `utils/auto-anchoring.ts` — triggerAutoAnchor on first compaction event
- `utils/worktree-detection.ts` — detectWorktreeFromCwd pure function
---
## Sidecar Layer (`sidecar/`)
See [sidecar.md](sidecar.md) for the full sidecar architecture. In brief:
- Each AI provider has its own runner file (e.g., `claude-runner.ts`) compiled to an ESM bundle (`claude-runner.mjs`) by esbuild
- Rust's SidecarManager spawns the appropriate runner based on the `provider` field in AgentQueryOptions
- Communication uses stdio NDJSON — one JSON object per line, newline-delimited
- Deno is preferred (faster startup), Node.js is the fallback
- The Claude runner uses `@anthropic-ai/claude-agent-sdk` query() internally
---
## Data Flow: Agent Query Lifecycle
Here is the complete path of a user prompt through the system:
```
1. User types prompt in AgentPane
2. AgentPane calls agentBridge.queryAgent(options)
3. agent-bridge.ts invokes Tauri command 'agent_query'
4. Rust agent_query handler calls SidecarManager.query()
5. SidecarManager resolves provider runner (e.g., claude-runner.mjs)
6. SidecarManager writes QueryMessage as NDJSON to sidecar stdin
7. Sidecar runner calls provider SDK (e.g., Claude Agent SDK query())
8. Provider SDK streams responses
9. Runner forwards each response as NDJSON to stdout
10. SidecarManager reads stdout line-by-line
11. SidecarManager emits Tauri event 'sidecar-message' with sessionId + data
12. Frontend agent-dispatcher.ts receives event
13. Dispatcher routes through message-adapters.ts → provider-specific parser
14. Parser converts to AgentMessage[]
15. Dispatcher feeds messages into agents.svelte.ts store
16. AgentPane reactively re-renders via $derived bindings
```
### Session Stop Flow
```
1. User clicks Stop button in AgentPane
2. AgentPane calls agentBridge.stopAgent(sessionId)
3. agent-bridge.ts invokes Tauri command 'agent_stop'
4. Rust handler calls SidecarManager.stop(sessionId)
5. SidecarManager writes StopMessage to sidecar stdin
6. Runner calls AbortController.abort() on the SDK query
7. SDK terminates the Claude subprocess
8. Runner emits final status message, then closes
```
---
## Configuration
### Project Groups (`~/.config/agor/groups.json`)
Human-editable JSON file defining project groups and their projects. Loaded at startup by `groups.rs`. Not hot-reloaded — changes require app restart or group switch.
### SQLite Settings (`sessions.db``settings` table)
Key-value store for user preferences: theme, fonts, shell, CWD, provider settings. Accessed via `settings-bridge.ts``settings_get`/`settings_set` Tauri commands.
### Environment Variables
| Variable | Purpose |
|----------|---------|
| `AGOR_TEST` | Enables test mode (disables watchers, wake scheduler) |
| `AGOR_TEST_DATA_DIR` | Redirects SQLite database storage |
| `AGOR_TEST_CONFIG_DIR` | Redirects groups.json config |
| `AGOR_OTLP_ENDPOINT` | Enables OpenTelemetry OTLP export |
---
## Data Model
### Project Group Config (`~/.config/agor/groups.json`)
Human-editable JSON file defining workspaces. Each group contains up to 5 projects. Loaded at startup by `groups.rs`, not hot-reloaded.
```jsonc
{
"version": 1,
"groups": [
{
"id": "work-ai",
"name": "AI Projects",
"projects": [
{
"id": "agor",
"name": "Agents Orchestrator",
"identifier": "agor",
"description": "Terminal emulator with Claude integration",
"icon": "\uf120",
"cwd": "/home/user/code/Agents Orchestrator",
"profile": "default",
"enabled": true
}
]
}
],
"activeGroupId": "work-ai"
}
```
### TypeScript Types (`src/lib/types/groups.ts`)
```typescript
export interface ProjectConfig {
id: string;
name: string;
identifier: string;
description: string;
icon: string;
cwd: string;
profile: string;
enabled: boolean;
}
export interface GroupConfig {
id: string;
name: string;
projects: ProjectConfig[]; // max 5
}
export interface GroupsFile {
version: number;
groups: GroupConfig[];
activeGroupId: string;
}
```
### SQLite Schema (v3 Additions)
Beyond the core `sessions` and `settings` tables, v3 added project-scoped agent persistence:
```sql
ALTER TABLE sessions ADD COLUMN project_id TEXT DEFAULT '';
CREATE TABLE IF NOT EXISTS agent_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
project_id TEXT NOT NULL,
sdk_session_id TEXT,
message_type TEXT NOT NULL,
content TEXT NOT NULL,
parent_id TEXT,
created_at INTEGER NOT NULL,
FOREIGN KEY (session_id) REFERENCES sessions(id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS project_agent_state (
project_id TEXT PRIMARY KEY,
last_session_id TEXT NOT NULL,
sdk_session_id TEXT,
status TEXT NOT NULL,
cost_usd REAL DEFAULT 0,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
last_prompt TEXT,
updated_at INTEGER NOT NULL
);
```
---
## Layout System
### Project Grid (Flexbox + scroll-snap)
Projects are arranged horizontally in a flex container with CSS scroll-snap for clean project-to-project scrolling:
```css
.project-grid {
display: flex;
gap: 4px;
height: 100%;
overflow-x: auto;
scroll-snap-type: x mandatory;
}
.project-box {
flex: 0 0 calc((100% - (N-1) * 4px) / N);
scroll-snap-align: start;
min-width: 480px;
}
```
N is computed from viewport width: `Math.min(projects.length, Math.max(1, Math.floor(containerWidth / 520)))`
### Project Box Internal Layout
Each project box uses a CSS grid with 4 rows:
```
┌─ ProjectHeader (auto) ─────────────────┐
├─────────────────────┬──────────────────┤
│ AgentSession │ TeamAgentsPanel │
│ (flex: 1) │ (240px/overlay) │
├─────────────────────┴──────────────────┤
│ [Tab1] [Tab2] [+] TabBar auto │
├────────────────────────────────────────┤
│ Terminal content (xterm or scrollback) │
└────────────────────────────────────────┘
```
Team panel: inline at >2560px viewport (240px wide), overlay at <2560px. Collapsed when no subagents running.
### Responsive Breakpoints
| Viewport Width | Visible Projects | Team Panel Mode |
|---------------|-----------------|-----------------|
| 5120px+ | 5 | inline 240px |
| 3840px | 4 | inline 200px |
| 2560px | 3 | overlay |
| 1920px | 3 | overlay |
| <1600px | 1 + project tabs | overlay |
### xterm.js Budget: 4 Active Instances
WebKit2GTK OOMs at ~5 simultaneous xterm.js instances. The budget system manages this:
| State | xterm.js Instance? | Memory |
|-------|--------------------|--------|
| Active-Focused | Yes | ~20MB |
| Active-Background | Yes (if budget allows) | ~20MB |
| Suspended | No (HTML pre scrollback) | ~200KB |
| Uninitialized | No (placeholder) | 0 |
On focus: serialize least-recent xterm scrollback, destroy it, create new for focused tab, reconnect PTY. Suspend/resume cycle < 50ms.
### Project Accent Colors
Each project slot gets a distinct Catppuccin accent color for visual distinction:
| Slot | Color | CSS Variable |
|------|-------|-------------|
| 1 | Blue | `var(--ctp-blue)` |
| 2 | Green | `var(--ctp-green)` |
| 3 | Mauve | `var(--ctp-mauve)` |
| 4 | Peach | `var(--ctp-peach)` |
| 5 | Pink | `var(--ctp-pink)` |
Applied to border tint and header accent via `var(--accent)` CSS custom property set per ProjectBox.
---
## Keyboard Shortcuts
Three-layer shortcut system prevents conflicts between terminal input, workspace navigation, and app-level commands:
| Shortcut | Action | Layer |
|----------|--------|-------|
| Ctrl+K | Command palette | App |
| Ctrl+G | Switch group (palette filtered) | App |
| Ctrl+1..5 | Focus project by index | App |
| Alt+1..4 | Switch sidebar tab + open drawer | App |
| Ctrl+B | Toggle sidebar open/closed | App |
| Ctrl+, | Toggle settings panel | App |
| Escape | Close sidebar drawer | App |
| Ctrl+Shift+F | FTS5 search overlay | App |
| Ctrl+N | New terminal in focused project | Workspace |
| Ctrl+Shift+N | New agent query | Workspace |
| Ctrl+Tab | Next terminal tab | Project |
| Ctrl+W | Close terminal tab | Project |
| Ctrl+Shift+C/V | Copy/paste in terminal | Terminal |
Terminal layer captures raw keys only when focused. App layer has highest priority.
---
## Key Constraints
1. **WebKit2GTK has no WebGL** — xterm.js must use the Canvas addon explicitly. Maximum 4 active xterm.js instances to avoid OOM.
2. **Svelte 5 runes require `.svelte.ts`** — Store files using `$state`/`$derived` must have the `.svelte.ts` extension. The compiler silently accepts `.ts` but runes fail at runtime.
3. **Single shared sidecar** — All agent sessions share one SidecarManager. Per-project isolation is via `cwd`, `claude_config_dir`, and `session_id` routing. Per-project sidecar pools deferred to v3.1.
4. **SQLite WAL mode** — Both databases use WAL with 5s busy_timeout for concurrent access from Rust backend + Python CLIs (btmsg/bttask).
5. **camelCase wire format** — Rust uses `#[serde(rename_all = "camelCase")]`. TypeScript interfaces must match. This was a source of bugs during development (see [findings.md](findings.md) for context).

View file

@ -1,51 +0,0 @@
# Architecture Decisions Log
This document records significant architecture decisions made during the development of Agent Orchestrator. Each entry captures the decision, its rationale, and the date it was made. Decisions are listed chronologically within each category.
---
## Data & Configuration
| Decision | Rationale | Date |
|----------|-----------|------|
| JSON for groups config, SQLite for session state | JSON is human-editable, shareable, version-controllable. SQLite for ephemeral runtime state. Load at startup only — no hot-reload, no split-brain risk. | 2026-03-07 |
| btmsg/bttask shared SQLite DB | Both CLI tools share `~/.local/share/agor/btmsg.db`. Single DB simplifies deployment — agents already have the path. Read-only for non-Manager roles via CLI permissions. | 2026-03-11 |
## Layout & UI
| Decision | Rationale | Date |
|----------|-----------|------|
| Adaptive project count from viewport width | `Math.min(projects.length, Math.max(1, Math.floor(containerWidth / 520)))` — 5 at 5120px, 3 at 1920px, scroll-snap for overflow. min-width 480px. Better than forcing 5 at all sizes. | 2026-03-07 |
| Flexbox + scroll-snap over CSS Grid | Allows horizontal scroll on narrow screens. Scroll-snap gives clean project-to-project scrolling. | 2026-03-07 |
| Team panel: inline >2560px, overlay <2560px | Adapts to available space. Collapsed when no subagents running. Saves ~240px on smaller screens. | 2026-03-07 |
| VSCode-style left sidebar (replaces top tab bar) | Vertical icon rail (2.75rem) + expandable drawer (max 50%) + always-visible workspace. Settings is a regular tab, not a special drawer. ProjectGrid always visible. Ctrl+B toggles. | 2026-03-08 |
| CSS relative units (rule 18) | rem/em for all layout CSS. Pixels only for icon sizes, borders, box shadows. Exception: `--ui-font-size`/`--term-font-size` store px for xterm.js API. | 2026-03-08 |
| Project accent colors from Catppuccin palette | Visual distinction: blue/green/mauve/peach/pink per slot 1-5. Applied to border + header tint via `var(--accent)`. | 2026-03-07 |
## Agent Architecture
| Decision | Rationale | Date |
|----------|-----------|------|
| Single shared sidecar (v3.0) | Existing multiplexed protocol handles concurrent sessions. Per-project pool deferred to v3.1 if crash isolation needed. Saves ~200MB RAM. | 2026-03-07 |
| xterm budget: 4 active, unlimited suspended | WebKit2GTK OOM at ~5 instances. Serialize scrollback to text buffer, destroy xterm, recreate on focus. PTY stays alive. Suspend/resume < 50ms. | 2026-03-07 |
| AgentPane splits into AgentSession + TeamAgentsPanel | Team agents shown inline in right panel, not as separate panes. Saves xterm/pane slots. | 2026-03-07 |
| Tier 1 agents as ProjectBoxes via `agentToProject()` | Agents render as full ProjectBoxes (not separate UI). `getAllWorkItems()` merges agents + projects. Unified rendering = less code, same capabilities. | 2026-03-11 |
| `extra_env` 5-layer passthrough for BTMSG_AGENT_ID | TS → Rust AgentQueryOptions → NDJSON → JS runner → SDK env. Minimal surface — only agent projects get env injection. | 2026-03-11 |
| Periodic system prompt re-injection (1 hour) | LLM context degrades over long sessions. 1-hour timer re-sends role/tools reminder when agent is idle. `autoPrompt`/`onautopromptconsumed` callback pattern. | 2026-03-11 |
| Role-specific tabs via conditional rendering | Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks. PERSISTED-LAZY pattern (mount on first activation). Conditional on `isAgent && agentRole`. | 2026-03-11 |
| PlantUML via plantuml.com server (~h hex encoding) | Avoids Java dependency. Hex encoding simpler than deflate+base64. Works with free tier. Trade-off: requires internet. | 2026-03-11 |
## Themes & Typography
| Decision | Rationale | Date |
|----------|-----------|------|
| All 17 themes map to `--ctp-*` CSS vars | 4 Catppuccin + 7 Editor + 6 Deep Dark themes. All map to same 26 CSS custom properties — zero component changes when adding themes. Pure data operation. | 2026-03-07 |
| Typography via CSS custom properties | `--ui-font-family`/`--ui-font-size` + `--term-font-family`/`--term-font-size` in `:root`. Restored by `initTheme()` on startup. Persisted as SQLite settings. | 2026-03-07 |
## System Design
| Decision | Rationale | Date |
|----------|-----------|------|
| Keyboard shortcut layers: App > Workspace > Terminal | Prevents conflicts. Terminal captures raw keys only when focused. App layer uses Ctrl+K/G/B. | 2026-03-07 |
| Unmount/remount on group switch | Serialize xterm scrollbacks, destroy, remount new group. <100ms perceived. Frees ~80MB per switch. | 2026-03-07 |
| Remote machines deferred to v3.1 | Elevate to project level (`project.remote_machine_id`) but don't implement in MVP. Focus on local orchestration first. | 2026-03-07 |

View file

@ -1,282 +0,0 @@
# E2E Testing Facility
Agents Orchestrator's end-to-end testing uses **WebDriverIO + tauri-driver** to drive the real Tauri application through WebKit2GTK's inspector protocol. The facility has three pillars:
1. **Test Fixtures** — isolated fake environments with dummy projects
2. **Test Mode** — app-level env vars that disable watchers and redirect data/config paths
3. **LLM Judge** — Claude-powered semantic assertions for evaluating agent behavior
## Quick Start
```bash
# Run all tests (vitest + cargo + E2E)
npm run test:all:e2e
# Run E2E only (requires pre-built debug binary)
SKIP_BUILD=1 npm run test:e2e
# Build debug binary separately (faster iteration)
cargo tauri build --debug --no-bundle
# Run with LLM judge via CLI (default, auto-detected)
npm run test:e2e
# Force LLM judge to use API instead of CLI
LLM_JUDGE_BACKEND=api ANTHROPIC_API_KEY=sk-... npm run test:e2e
```
## Prerequisites
| Dependency | Purpose | Install |
|-----------|---------|---------|
| Rust + Cargo | Build Tauri backend | [rustup.rs](https://rustup.rs) |
| Node.js 20+ | Frontend + test runner | `mise install node` |
| tauri-driver | WebDriver bridge to WebKit2GTK | `cargo install tauri-driver` |
| X11 display | WebKit2GTK needs a display | Real X, or `xvfb-run` in CI |
| Claude CLI | LLM judge (optional) | [claude.ai/download](https://claude.ai/download) |
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ WebDriverIO (mocha runner) │
│ specs/*.test.ts │
│ └─ browser.execute() → DOM queries + assertions │
│ └─ assertWithJudge() → LLM semantic evaluation │
├─────────────────────────────────────────────────────────┤
│ tauri-driver (port 4444) │
│ WebDriver protocol ↔ WebKit2GTK inspector │
├─────────────────────────────────────────────────────────┤
│ Agents Orchestrator debug binary │
│ AGOR_TEST=1 (disables watchers, wake scheduler) │
│ AGOR_TEST_DATA_DIR → isolated SQLite DBs │
│ AGOR_TEST_CONFIG_DIR → test groups.json │
└─────────────────────────────────────────────────────────┘
```
## Pillar 1: Test Fixtures (`fixtures.ts`)
The fixture generator creates isolated temporary environments so tests never touch real user data. Each fixture includes:
- **Temp root dir** under `/tmp/agor-e2e-{timestamp}/`
- **Data dir** — empty, SQLite databases created at runtime
- **Config dir** — contains a generated `groups.json` with test projects
- **Project dir** — a real git repo with `README.md` and `hello.py` (for agent testing)
### Single-Project Fixture
```typescript
import { createTestFixture, destroyTestFixture } from '../fixtures';
const fixture = createTestFixture('my-test');
// fixture.rootDir → /tmp/my-test-1710234567890/
// fixture.dataDir → /tmp/my-test-1710234567890/data/
// fixture.configDir → /tmp/my-test-1710234567890/config/
// fixture.projectDir → /tmp/my-test-1710234567890/test-project/
// fixture.env → { AGOR_TEST: '1', AGOR_TEST_DATA_DIR: '...', AGOR_TEST_CONFIG_DIR: '...' }
// The test project is a git repo with:
// README.md — "# Test Project\n\nA simple test project for Agents Orchestrator E2E tests."
// hello.py — "def greet(name: str) -> str:\n return f\"Hello, {name}!\""
// Both committed as "initial commit"
// groups.json contains one group "Test Group" with one project pointing at projectDir
// Cleanup when done:
destroyTestFixture(fixture);
```
### Multi-Project Fixture
```typescript
import { createMultiProjectFixture } from '../fixtures';
const fixture = createMultiProjectFixture(3); // 3 separate git repos
// Creates project-0, project-1, project-2 under fixture.rootDir
// Each is a git repo with README.md
// groups.json has one group "Multi Project Group" with all 3 projects
```
### Fixture Environment Variables
Pass `fixture.env` to the app to redirect all data/config paths:
| Variable | Effect |
|----------|--------|
| `AGOR_TEST=1` | Disables file watchers, wake scheduler, enables `is_test_mode` |
| `AGOR_TEST_DATA_DIR` | Redirects `sessions.db` and `btmsg.db` storage |
| `AGOR_TEST_CONFIG_DIR` | Redirects `groups.json` config loading |
## Pillar 2: Test Mode
When `AGOR_TEST=1` is set:
- **Rust backend**: `watcher.rs` and `fs_watcher.rs` skip file watchers
- **Frontend**: `is_test_mode` Tauri command returns true, wake scheduler disabled via `disableWakeScheduler()`
- **Data isolation**: `AGOR_TEST_DATA_DIR` / `AGOR_TEST_CONFIG_DIR` override default paths
The WebDriverIO config (`wdio.conf.js`) passes these env vars via `tauri:options.env` in capabilities.
## Pillar 3: LLM Judge (`llm-judge.ts`)
The LLM judge enables semantic assertions — evaluating whether agent output "looks right" rather than exact string matching. Useful for testing AI agent responses where exact output is non-deterministic.
### Dual Backend
The judge supports two backends, auto-detected or explicitly set:
| Backend | How it works | Requires |
|---------|-------------|----------|
| `cli` (default) | Spawns `claude` CLI with `--output-format text` | Claude CLI installed |
| `api` | Raw `fetch` to `https://api.anthropic.com/v1/messages` | `ANTHROPIC_API_KEY` env var |
**Auto-detection order**: CLI first → API fallback → skip test.
**Override**: Set `LLM_JUDGE_BACKEND=cli` or `LLM_JUDGE_BACKEND=api`.
### API
```typescript
import { isJudgeAvailable, judge, assertWithJudge } from '../llm-judge';
// Check availability (CLI or API key present)
if (!isJudgeAvailable()) {
this.skip(); // graceful skip in mocha
return;
}
// Basic judge call
const verdict = await judge(
'The output should contain a file listing with at least one filename', // criteria
actualOutput, // actual
'Agent was asked to list files in a directory containing README.md', // context (optional)
);
// verdict: { pass: boolean, reasoning: string, confidence: number }
// With confidence threshold (default 0.7)
const verdict = await assertWithJudge(
'Response should describe the greet function',
agentMessages,
{ context: 'hello.py contains def greet(name)', minConfidence: 0.8 },
);
```
### How It Works
1. Builds a structured prompt with criteria, actual output, and optional context
2. Asks Claude (Haiku) to evaluate as a test assertion judge
3. Expects JSON response: `{"pass": true/false, "reasoning": "...", "confidence": 0.0-1.0}`
4. Validates and returns structured `JudgeVerdict`
The CLI backend unsets `CLAUDECODE` env var to avoid nested session errors when running inside Claude Code.
## Test Spec Files
| File | Phase | Tests | Focus |
|------|-------|-------|-------|
| `agor.test.ts` | Smoke | ~50 | Basic UI rendering, CSS class selectors |
| `agent-scenarios.test.ts` | A | 22 | `data-testid` selectors, 7 deterministic scenarios |
| `phase-b.test.ts` | B | ~15 | Multi-project grid, LLM-judged agent responses |
| `phase-c.test.ts` | C | 27 | Hardening features (palette, search, notifications, keyboard, settings, health, metrics, context, files) |
### Phase A: Deterministic Agent Scenarios
Uses `data-testid` attributes for reliable selectors. Tests app structure, project rendering, and agent pane states without live agent interaction.
### Phase B: Multi-Project + LLM Judge
Tests multi-project grid rendering, independent tab switching, status bar fleet state. LLM-judged tests (B4, B5) send real prompts to agents and evaluate response quality — these require Claude CLI or API key and are skipped otherwise.
### Phase C: Production Hardening
Tests v3 hardening features: command palette commands (C1), search overlay (C2), notification center (C3), keyboard navigation (C4), settings panel (C5), project health indicators (C6), metrics tab (C7), context tab (C8), files tab with editor (C9), LLM-judged settings (C10), LLM-judged status bar (C11).
## Test Results Tracking (`results-db.ts`)
A lightweight JSON store for tracking test runs and individual step results:
```typescript
import { ResultsDb } from '../results-db';
const db = new ResultsDb(); // writes to test-results/results.json
db.startRun('run-001', 'v2-mission-control', 'abc123');
db.recordStep({
run_id: 'run-001',
scenario_name: 'B4',
step_name: 'should send prompt and get meaningful response',
status: 'passed',
duration_ms: 15000,
error_message: null,
screenshot_path: null,
agent_cost_usd: 0.003,
});
db.finishRun('run-001', 'passed', 45000);
```
## CI Integration (`.github/workflows/e2e.yml`)
The CI pipeline runs on push/PR with path-filtered triggers:
1. **Unit tests**`npm run test` (vitest)
2. **Cargo tests**`cargo test` (with `env -u AGOR_TEST` to prevent env leakage)
3. **E2E tests**`xvfb-run npm run test:e2e` (virtual framebuffer for headless WebKit2GTK)
LLM-judged tests are gated on the `ANTHROPIC_API_KEY` secret — they skip gracefully in forks or when the secret is absent.
## Writing New Tests
### Adding a New Scenario
1. Pick the appropriate spec file (or create a new phase file)
2. Use `data-testid` selectors where possible (more stable than CSS classes)
3. For DOM queries, use `browser.execute()` to run JS in the app context
4. For semantic assertions, use `assertWithJudge()` with clear criteria
### Common Helpers
All spec files share similar helper patterns:
```typescript
// Get project IDs
const ids: string[] = await browser.execute(() => {
const boxes = document.querySelectorAll('[data-testid="project-box"]');
return Array.from(boxes).map(b => b.getAttribute('data-project-id') ?? '').filter(Boolean);
});
// Focus a project
await browser.execute((id) => {
const box = document.querySelector(`[data-project-id="${id}"]`);
const header = box?.querySelector('.project-header');
if (header) (header as HTMLElement).click();
}, projectId);
// Switch tab in a project
await browser.execute((id, idx) => {
const box = document.querySelector(`[data-project-id="${id}"]`);
const tabs = box?.querySelectorAll('[data-testid="project-tabs"] .ptab');
if (tabs && tabs[idx]) (tabs[idx] as HTMLElement).click();
}, projectId, tabIndex);
```
### WebDriverIO Config (`wdio.conf.js`)
Key settings:
- **Single session**: `maxInstances: 1` — tauri-driver can't handle parallel sessions
- **Lifecycle**: `onPrepare` builds debug binary, `beforeSession` spawns tauri-driver with TCP readiness probe, `afterSession` kills tauri-driver
- **Timeouts**: 60s per test (mocha), 10s waitfor, 30s connection retry
- **Skip build**: Set `SKIP_BUILD=1` to reuse existing binary
## Troubleshooting
| Problem | Solution |
|---------|----------|
| "Callback was not called before unload" | Stale binary — rebuild with `cargo tauri build --debug --no-bundle` |
| Tests hang on startup | Kill stale `tauri-driver` processes: `pkill -f tauri-driver` |
| All tests skip LLM judge | Install Claude CLI or set `ANTHROPIC_API_KEY` |
| SIGUSR2 / exit code 144 | Stale tauri-driver on port 4444 — kill and retry |
| `AGOR_TEST` leaking to cargo | Run cargo tests with `env -u AGOR_TEST cargo test` |
| No display available | Use `xvfb-run` or ensure X11/Wayland display is set |

View file

@ -1,398 +0,0 @@
# Research Findings
This document captures research conducted during v2 and v3 development — technology evaluations, architecture reviews, performance measurements, and design analysis. Each finding informed implementation decisions recorded in [decisions.md](decisions.md).
---
## 1. Claude Agent SDK (v2 Research, 2026-03-05)
**Source:** https://platform.claude.com/docs/en/agent-sdk/overview
The Claude Agent SDK (formerly Claude Code SDK, renamed Sept 2025) provides structured streaming, subagent detection, hooks, and telemetry — everything needed for a rich agent UI without terminal emulation.
### Streaming API
```typescript
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Fix the bug",
options: { allowedTools: ["Read", "Edit", "Bash"] }
})) {
console.log(message); // structured, typed, parseable
}
```
### Subagent Detection
Messages from subagents include `parent_tool_use_id`:
```typescript
for (const block of msg.message?.content ?? []) {
if (block.type === "tool_use" && block.name === "Task") {
console.log(`Subagent invoked: ${block.input.subagent_type}`);
}
}
if (msg.parent_tool_use_id) {
console.log("Running inside subagent");
}
```
### Session Management
- `session_id` captured from init message
- Resume with `options: { resume: sessionId }`
- Subagent transcripts persist independently
### Hooks
`PreToolUse`, `PostToolUse`, `Stop`, `SessionStart`, `SessionEnd`, `UserPromptSubmit`
### Telemetry
Every `SDKResultMessage` contains: `total_cost_usd`, `duration_ms`, per-model `modelUsage` breakdowns.
### Key Insight
The SDK gives structured data — we render it as rich UI (markdown, diff views, file cards, agent trees) instead of raw terminal text. Terminal emulation (xterm.js) is only needed for SSH, local shell, and legacy CLI sessions.
---
## 2. Tauri + xterm.js Integration (v2 Research, 2026-03-05)
### Existing Projects
- **tauri-terminal** — basic Tauri + xterm.js + portable-pty
- **Terminon** — Tauri v2 + React + xterm.js, SSH profiles, split panes
- **tauri-plugin-pty** — PTY plugin for Tauri 2, xterm.js bridge
### Integration Pattern
```
Frontend (xterm.js) <-> Tauri IPC <-> Rust PTY (portable-pty) <-> Shell/SSH/Claude
```
- `pty.onData()` -> `term.write()` (output)
- `term.onData()` -> `pty.write()` (input)
---
## 3. Terminal Performance Benchmarks (v2 Research, 2026-03-05)
### Native Terminal Latency
| Terminal | Latency | Notes |
|----------|---------|-------|
| xterm (native) | ~10ms | Gold standard |
| Alacritty | ~12ms | GPU-rendered Rust |
| Kitty | ~13ms | GPU-rendered |
| VTE (GNOME Terminal) | ~50ms | GTK3/4, spikes above |
| Hyper (Electron+xterm.js) | ~40ms | Web-based worst case |
### Memory
- Alacritty: ~30MB, WezTerm: ~45MB, xterm native: ~5MB
### Verdict
xterm.js in Tauri: ~20-30ms latency, ~20MB per instance. For AI output (not vim), perfectly fine. The VTE we used in v1 GTK3 is actually slower at ~50ms.
---
## 4. Zellij Architecture (v2 Inspiration, 2026-03-05)
Zellij uses WASM plugins for extensibility: message passing at WASM boundary, permission model, event types for rendering/input/lifecycle, KDL layout files.
**Relevance:** We don't need WASM plugins — our "plugins" are different pane types. But the layout concept (JSON layout definitions) is worth borrowing for saved layouts.
---
## 5. Ultrawide Design Patterns (v2 Research, 2026-03-05)
**Key Insight:** 5120px width / ~600px per pane = ~8 panes max, ~4-5 comfortable.
**Layout Philosophy:**
- Center = primary attention (1-2 main agent panes)
- Left edge = navigation (sidebar, 250-300px)
- Right edge = context (agent tree, file viewer, 350-450px)
- Never use tabs for primary content — everything visible
- Tabs only for switching saved layouts
---
## 6. Frontend Framework Choice (v2 Research, 2026-03-05)
### Why Svelte 5
- **Fine-grained reactivity**`$state`/`$derived` runes match Solid's signals model
- **No VDOM** — critical when 4-8 panes stream data simultaneously
- **Small bundle** — ~5KB runtime vs React's ~40KB
- **Larger ecosystem** than Solid.js — more component libraries, better tooling
### Why NOT Solid.js (initially considered)
- Ecosystem too small for production use
- Svelte 5 runes eliminated the ceremony gap
### Why NOT React
- VDOM reconciliation across 4-8 simultaneously updating panes = CPU waste
- Larger bundle, state management complexity (Redux/Zustand needed)
---
## 7. Claude Code CLI Observation (v2 Research, 2026-03-05)
Three observation tiers for Claude sessions:
1. **SDK sessions** (best): Full structured streaming, subagent detection, hooks, cost tracking
2. **CLI with stream-json** (good): `claude -p "prompt" --output-format stream-json` — structured output but non-interactive
3. **Interactive CLI** (fallback): Tail JSONL session files at `~/.claude/projects/<encoded-dir>/<session-uuid>.jsonl` + show terminal via xterm.js
### JSONL Session Files
Path encoding: `/home/user/project` -> `-home-user-project`. Append-only, written immediately. Can be `tail -f`'d for external observation.
### Hooks (SDK only)
`SubagentStart`, `SubagentStop` (gives `agent_transcript_path`), `PreToolUse`, `PostToolUse`, `Stop`, `Notification`, `TeammateIdle`
---
## 8. Agent Teams (v2 Research, 2026-03-05)
`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` enables full independent Claude Code instances sharing a task list and mailbox.
- 3-5 teammates is the practical sweet spot (linear token cost)
- Display modes: in-process (Shift+Down cycles), tmux (own pane each), auto
- Session resumption is broken for in-process teammates
- Agent Orchestrator is the ideal frontend for Agent Teams — each teammate gets its own ProjectBox
---
## 9. Competing Approaches (v2 Research, 2026-03-05)
- **claude-squad** (Go+tmux): Most adopted multi-agent manager
- **agent-deck**: MCP socket pooling (~85-90% memory savings)
- **Git worktrees**: Dominant isolation strategy for parallel Claude sessions
---
## 10. Adversarial Architecture Review (v3, 2026-03-07)
Three specialized agents reviewed the v3 Mission Control architecture before implementation. This adversarial process caught 12 issues (4 critical) that would have required expensive rework if discovered later.
### Agent: Architect (Advocate)
Proposed the core design:
- **Project Groups** as primary organizational unit (replacing free-form panes)
- **JSON config** for human-editable definitions, SQLite for runtime state
- **Single shared sidecar** with per-project isolation via `cwd`, `claude_config_dir`, `session_id`
- **Component split:** AgentPane -> AgentSession + TeamAgentsPanel
- **MVP boundary at Phase 5** (5 phases core, 5 polish)
### Agent: Devil's Advocate
Found 12 issues across the Architect's proposal:
| # | Issue | Severity | Why It Matters |
|---|-------|----------|----------------|
| 1 | xterm.js 4-instance ceiling | **Critical** | WebKit2GTK OOMs at ~5 instances. 5 projects x 1 terminal = immediate wall. |
| 2 | Single sidecar = SPOF | **Critical** | One crash kills all 5 project agents. No isolation. |
| 3 | Layout store has no workspace concept | **Critical** | v2 pane-based store cannot represent project groups. Full rewrite needed. |
| 4 | 384px per project on 1920px | **Critical** | 5 projects on 1920px = 384px each — too narrow for code. Must adapt to viewport. |
| 5 | Session identity collision | Major | Without persisted `sdkSessionId`, resuming wrong session corrupts state. |
| 6 | JSON + SQLite = split-brain risk | Major | Two sources of truth can diverge. Must clearly separate config vs state. |
| 7 | Dispatcher has no project scoping | Major | Singleton routes all messages globally. Needs projectId and per-project cleanup. |
| 8 | Markdown discovery undefined | Minor | No spec for which .md files appear in Docs tab. |
| 9 | Keyboard shortcut conflicts | Major | Three input layers can conflict without explicit precedence. |
| 10 | Remote machine support orphaned | Major | v2 remote UI doesn't map to project model. |
| 11 | No graceful degradation | Major | Broken CWD or git could fail the whole group. |
| 12 | Flat event stream wastes CPU | Minor | Messages for hidden projects still process through adapters. |
All 12 resolved before implementation. Critical items addressed in architecture. Major items implemented in MVP or deferred to v3.1 with rationale.
### Agent: UX + Performance Specialist
Provided concrete wireframes and performance budgets:
- **Adaptive layout** formula: 5 at 5120px, 3 at 1920px, 1 with scroll at <1600px
- **xterm budget:** 4 active max, suspend/resume < 50ms
- **Memory budget:** ~225MB total (4 xterm @ 20MB + Tauri + SQLite + agent stores)
- **Workspace switch:** <100ms perceived (serialize scrollbacks + unmount/mount)
- **RAF batching:** For 5 concurrent agent streams, batch DOM updates to avoid layout thrashing
---
## 11. Provider Adapter Coupling Analysis (v3, 2026-03-11)
Before implementing multi-provider support, a systematic coupling analysis mapped every Claude-specific dependency. 13+ files examined and classified into 4 severity levels.
### Coupling Severity Map
**CRITICAL — hardcoded SDK, must abstract:**
- `sidecar/agent-runner.ts` — imports Claude Agent SDK, calls `query()`, hardcoded `findClaudeCli()`. Became `claude-runner.ts` with other providers getting separate runners.
- `agor-core/src/sidecar.rs``AgentQueryOptions` had no `provider` field. `SidecarCommand` hardcoded runner path. Added provider-based runner selection.
- `src/lib/adapters/sdk-messages.ts``parseMessage()` assumed Claude SDK JSON format. Became `claude-messages.ts` with per-provider parsers.
**HIGH — TS mirror types, provider-specific commands:**
- `agent-bridge.ts``AgentQueryOptions` interface mirrored Rust with no provider field.
- `lib.rs``claude_list_profiles`, `claude_list_skills` are Claude-specific (kept, gated by capability).
- `claude-bridge.ts` — provider-specific adapter (kept, genericized via `provider-bridge.ts`).
**MEDIUM — provider-aware routing:**
- `agent-dispatcher.ts` — called `parseMessage()` (Claude-specific), subagent tool names hardcoded.
- `AgentPane.svelte` — profile selector, skill autocomplete assumed Claude.
**LOW — already generic:**
- `agents.svelte.ts`, `health.svelte.ts`, `conflicts.svelte.ts` — provider-agnostic.
- `agor-relay/` — forwards `AgentQueryOptions` as-is.
### Key Insights
1. **Sidecar is the natural abstraction boundary.** Each provider needs its own runner because SDKs are incompatible.
2. **Message format is the main divergence point.** Per-provider adapters normalize to `AgentMessage`.
3. **Capability flags eliminate provider switches.** UI checks `capabilities.hasProfiles` instead of `provider === 'claude'`.
4. **Env var stripping is provider-specific.** Claude strips `CLAUDE*`, Codex strips `CODEX*`, Ollama strips nothing.
---
## 12. Codebase Reuse Analysis: v2 to v3 (2026-03-07)
### Survived (with modifications)
| Component/Module | Modifications |
|-----------------|---------------|
| TerminalPane.svelte | Added suspend/resume lifecycle for xterm budget |
| MarkdownPane.svelte | Unchanged |
| AgentTree.svelte | Reused inside AgentSession |
| StatusBar.svelte | Rewritten for workspace store (group name, fleet status, attention queue) |
| ToastContainer.svelte | Unchanged |
| agents.svelte.ts | Added projectId field to AgentSession |
| theme.svelte.ts | Unchanged |
| notifications.svelte.ts | Unchanged |
| All adapters | Minor updates for provider routing |
| All Rust backend | Added new modules (btmsg, bttask, search, secrets, plugins) |
### Replaced
| v2 Component | v3 Replacement | Reason |
|-------------|---------------|--------|
| layout.svelte.ts | workspace.svelte.ts | Pane-based model -> project-group model |
| TilingGrid.svelte | ProjectGrid.svelte | Free-form grid -> fixed project boxes |
| PaneContainer.svelte | ProjectBox.svelte | Generic pane -> per-project container with 11 tabs |
| SessionList.svelte | ProjectHeader + CommandPalette | Sidebar list -> inline headers + Ctrl+K |
| SettingsDialog.svelte | SettingsTab.svelte | Modal dialog -> sidebar drawer tab |
| AgentPane.svelte | AgentSession + TeamAgentsPanel | Monolithic -> split for team support |
| App.svelte | Full rewrite | Tab bar -> VSCode-style sidebar layout |
### Dropped (v3.0)
| Feature | Reason |
|---------|--------|
| Detached pane mode | Doesn't fit workspace model (projects are grouped) |
| Drag-resize splitters | Project boxes have fixed internal layout |
| Layout presets | Replaced by adaptive project count from viewport |
| Remote machine UI | Deferred to v3.1 (elevated to project level) |
---
## 13. Session Anchor Design (v3, 2026-03-12)
Session anchors solve context loss during Claude's automatic context compaction.
### Problem
When Claude's context window fills up (~80% of model limit), the SDK automatically compacts older turns. This is lossy — important early decisions, architecture context, and debugging breakthroughs can be permanently lost.
### Design Decisions
1. **Auto-anchor on first compaction** — Automatically captures the first 3 turns when compaction is first detected. Preserves the session's initial context (task definition, first architecture decisions).
2. **Observation masking** — Tool outputs (Read results, Bash output) are compacted in anchors, but reasoning text is preserved in full. Dramatically reduces anchor token cost while keeping important reasoning.
3. **Budget system** — Fixed scales (2K/6K/12K/20K tokens) instead of percentage-based. "6,000 tokens" is more intuitive than "15% of context."
4. **Re-injection via system prompt** — Promoted anchors are serialized and injected as the `system_prompt` field. Simplest integration with the SDK — no conversation history modification needed.
---
## 14. Multi-Agent Orchestration Design (v3, 2026-03-11)
### Evaluated Approaches
| Approach | Pros | Cons | Decision |
|----------|------|------|----------|
| Claude Agent Teams (native) | Zero custom code, SDK-managed | Experimental, session resume broken | Supported but not primary |
| Message bus (Redis/NATS) | Proven, scalable | Runtime dependency, deployment complexity | Rejected |
| Shared SQLite + CLI tools | Zero deps, agents use shell | Polling-based, no real-time push | **Selected** |
| MCP server for agent comm | Standard protocol | Overhead per message, complex setup | Rejected |
### Why SQLite + CLI
Agents run Claude Code sessions with full shell access. Python CLI tools (`btmsg`, `bttask`) reading/writing SQLite is the lowest-friction integration:
- Zero configuration (`btmsg send architect "review this"`)
- No runtime services (no Redis, no MCP server)
- WAL mode handles concurrent access from multiple agent processes
- Same database readable by Rust backend for UI display
- 5s polling is acceptable — agents don't need millisecond latency
### Role Hierarchy
4 Tier 1 roles based on common development workflows:
- **Manager** — coordinates work (tech lead assigning sprint tasks). Unique: Task board tab, full bttask CRUD.
- **Architect** — designs solutions (senior engineer doing design reviews). Unique: PlantUML tab.
- **Tester** — runs tests (QA monitoring test suites). Unique: Selenium + Tests tabs.
- **Reviewer** — reviews code (processing PR queue). Unique: review queue depth in attention scoring.
---
## 15. Theme System Evolution (v3, 2026-03-07)
### Phase 1: 4 Catppuccin Flavors (v2)
Mocha, Macchiato, Frappe, Latte. All colors mapped to 26 `--ctp-*` CSS custom properties.
### Phase 2: +7 Editor Themes
VSCode Dark+, Atom One Dark, Monokai, Dracula, Nord, Solarized Dark, GitHub Dark. Same 26 variables — zero component changes. `CatppuccinFlavor` type generalized to `ThemeId`.
### Phase 3: +6 Deep Dark Themes
Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper (warm dark), Midnight (pure OLED black). Same mapping.
### Key Decision
All 17 themes map to the same CSS custom property names. No component ever needs to know which theme is active. Adding new themes is a pure data operation: define 26 color values and add to `THEME_LIST`.
---
## 16. Performance Measurements (v3, 2026-03-11)
### xterm.js Canvas Performance
WebKit2GTK lacks WebGL — xterm.js falls back to Canvas 2D:
- **Latency:** ~20-30ms per keystroke (acceptable for AI output)
- **Memory:** ~20MB per active instance
- **OOM threshold:** ~5 simultaneous instances causes WebKit2GTK crash
- **Mitigation:** 4-instance budget with suspend/resume
### Tauri IPC Latency
- **Linux:** ~5ms for typical payloads
- **Terminal keystroke echo:** 5ms IPC + xterm render = 10-15ms total
- **Agent message forwarding:** Negligible (human-readable speed)
### SQLite WAL Concurrent Access
Both databases accessed concurrently by Rust backend + Python CLIs + frontend reads via IPC. WAL mode with 5s busy_timeout handles this reliably. 5-minute checkpoint prevents WAL growth.
### Workspace Switch Latency
- Serialize 4 xterm scrollbacks: ~30ms
- Destroy 4 xterm instances: ~10ms
- Unmount ProjectGrid children: ~5ms
- Mount new group: ~20ms
- Create new xterm instances: ~35ms
- **Total perceived: ~100ms** (acceptable)

View file

@ -1,323 +0,0 @@
# Multi-Machine Support — Architecture & Implementation
**Status: Implemented (Phases A-D complete, 2026-03-06)**
## Overview
Extend Agents Orchestrator to manage Claude agent sessions and terminal panes running on **remote machines** over WebSocket, while keeping the local sidecar path unchanged.
## Problem
Current architecture is local-only:
```
WebView ←→ Rust (Tauri IPC) ←→ Local Sidecar (stdio NDJSON)
←→ Local PTY (portable-pty)
```
Target state: Agents Orchestrator acts as a **mission control** that observes agents and terminals running on multiple machines (dev servers, cloud VMs, CI runners).
## Design Constraints
1. **Zero changes to local path** — local sidecar/PTY must work identically
2. **Same NDJSON protocol** — remote and local agents speak the same message format
3. **No new runtime dependencies** — use Rust's `tokio-tungstenite` (already available via Tauri)
4. **Graceful degradation** — remote machine goes offline → pane shows disconnected state, reconnects automatically
5. **Security** — all remote connections authenticated and encrypted (TLS + token)
## Architecture
### Three-Layer Model
```
┌──────────────────────────────────────────────────────────────────┐
│ Agents Orchestrator (Controller) │
│ │
│ ┌──────────┐ Tauri IPC ┌──────────────────────────────┐ │
│ │ WebView │ ←────────────→ │ Rust Backend │ │
│ │ (Svelte) │ │ │ │
│ └──────────┘ │ ├── PtyManager (local) │ │
│ │ ├── SidecarManager (local) │ │
│ │ └── RemoteManager ──────────┼──┤
│ └──────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
│ │
│ (local stdio) │ (WebSocket wss://)
▼ ▼
┌───────────┐ ┌──────────────────────┐
│ Local │ │ Remote Machine │
│ Sidecar │ │ │
│ (Deno/ │ │ ┌────────────────┐ │
│ Node.js) │ │ │ agor-relay│ │
│ │ │ │ (Rust binary) │ │
└───────────┘ │ │ │ │
│ │ ├── PTY mgr │ │
│ │ ├── Sidecar mgr│ │
│ │ └── WS server │ │
│ └────────────────┘ │
└──────────────────────┘
```
### Components
#### 1. `agor-relay` — Remote Agent (Rust binary)
A standalone Rust binary that runs on each remote machine. It:
- Listens on a WebSocket port (default: 9750)
- Manages local PTYs and claude sidecar processes
- Forwards NDJSON events to the controller over WebSocket
- Receives commands (query, stop, resize, write) from the controller
**Why a Rust binary?** Reuses existing `PtyManager` and `SidecarManager` code from `src-tauri/src/`. Extracted into a shared crate.
```
agor-relay/
├── Cargo.toml # depends on agor-core
├── src/
│ └── main.rs # WebSocket server + auth
agor-core/ # shared crate (extracted from src-tauri)
├── Cargo.toml
├── src/
│ ├── pty.rs # PtyManager (from src-tauri/src/pty.rs)
│ ├── sidecar.rs # SidecarManager (from src-tauri/src/sidecar.rs)
│ └── lib.rs
```
#### 2. `RemoteManager` — Controller-Side (in Rust backend)
New module in `src-tauri/src/remote.rs`. Manages WebSocket connections to multiple relays.
```rust
pub struct RemoteMachine {
pub id: String,
pub label: String,
pub url: String, // wss://host:9750
pub token: String, // auth token
pub status: RemoteStatus, // connected | connecting | disconnected | error
}
pub enum RemoteStatus {
Connected,
Connecting,
Disconnected,
Error(String),
}
pub struct RemoteManager {
machines: Arc<Mutex<Vec<RemoteMachine>>>,
connections: Arc<Mutex<HashMap<String, WsConnection>>>,
}
```
#### 3. Frontend Adapters — Unified Interface
The frontend doesn't care whether a pane is local or remote. The bridge layer abstracts this:
```typescript
// adapters/agent-bridge.ts — extended
export async function queryAgent(options: AgentQueryOptions): Promise<void> {
if (options.remote_machine_id) {
return invoke('remote_agent_query', { machineId: options.remote_machine_id, options });
}
return invoke('agent_query', { options });
}
```
Same pattern for `pty-bridge.ts` — add optional `remote_machine_id` to all operations.
## Protocol
### WebSocket Wire Format
Same NDJSON as local sidecar, wrapped in an envelope for multiplexing:
```typescript
// Controller → Relay (commands)
interface RelayCommand {
id: string; // request correlation ID
type: 'pty_create' | 'pty_write' | 'pty_resize' | 'pty_close'
| 'agent_query' | 'agent_stop' | 'sidecar_restart'
| 'ping';
payload: Record<string, unknown>;
}
// Relay → Controller (events)
interface RelayEvent {
type: 'pty_data' | 'pty_exit' | 'pty_created'
| 'sidecar_message' | 'sidecar_exited'
| 'error' | 'pong' | 'ready';
sessionId?: string;
payload: unknown;
}
```
### Authentication
1. **Pre-shared token** — relay starts with `--token <secret>`. Controller sends token in WebSocket upgrade headers (`Authorization: Bearer <token>`).
2. **TLS required** — relay rejects non-TLS connections in production mode. Dev mode allows `ws://` with `--insecure` flag.
3. **Token rotation** — future: relay exposes endpoint to rotate token. Controller stores tokens in SQLite settings table.
### Connection Lifecycle
```
Controller Relay
│ │
│── WSS connect ─────────────────→│
│── Authorization: Bearer token ──→│
│ │
│←── { type: "ready", ...} ───────│
│ │
│── { type: "ping" } ────────────→│
│←── { type: "pong" } ────────────│ (every 15s)
│ │
│── { type: "agent_query", ... }──→│
│←── { type: "sidecar_message" }──│ (streaming)
│←── { type: "sidecar_message" }──│
│ │
│ (disconnect) │
│── reconnect (exp backoff) ─────→│ (1s, 2s, 4s, 8s, max 30s)
```
### Reconnection (Implemented)
- Controller reconnects with exponential backoff (1s, 2s, 4s, 8s, 16s, 30s cap)
- Reconnection runs as an async tokio task spawned on disconnect
- Uses `attempt_tcp_probe()`: TCP connect only (no WS upgrade), 5s timeout, default port 9750. Avoids allocating per-connection resources (PtyManager, SidecarManager) on the relay during probes.
- Emits `remote-machine-reconnecting` event (with backoff duration) and `remote-machine-reconnect-ready` when probe succeeds
- Frontend listens via `onRemoteMachineReconnecting` and `onRemoteMachineReconnectReady` in remote-bridge.ts; machines store sets status to 'reconnecting' and auto-calls `connectMachine()` on ready
- Cancels if machine is removed or manually reconnected (checks status == "disconnected" && connection == None)
- On reconnect, relay sends current state snapshot (active sessions, PTY list)
- Controller reconciles: updates pane states, re-subscribes to streams
- Active agent sessions continue on relay regardless of controller connection
## Session Persistence Across Reconnects
Key insight: **remote agents keep running even when the controller disconnects**. The relay is autonomous — it doesn't need the controller to operate.
On reconnect:
1. Relay sends `{ type: "state_sync", activeSessions: [...], activePtys: [...] }`
2. Controller matches against known panes, updates status
3. Missed messages are NOT replayed (too complex, marginal value). Agent panes show "reconnected — some messages may be missing" notice
## Frontend Integration
### Pane Model Changes
```typescript
// stores/layout.svelte.ts
export interface Pane {
id: string;
type: 'terminal' | 'agent';
title: string;
group?: string;
remoteMachineId?: string; // NEW: undefined = local
}
```
### Sidebar — Machine Groups
Remote panes auto-group by machine label in the sidebar:
```
▾ Local
├── Terminal 1
└── Agent: fix bug
▾ devbox (192.168.1.50) ← remote machine
├── SSH session
└── Agent: deploy
▾ ci-runner (10.0.0.5) ← remote machine (disconnected)
└── Agent: test suite ⚠️
```
### Settings Panel
New "Machines" section in settings:
| Field | Type | Notes |
|-------|------|-------|
| Label | string | Human-readable name |
| URL | string | `wss://host:9750` |
| Token | password | Pre-shared auth token |
| Auto-connect | boolean | Connect on app launch |
Stored in SQLite `settings` table as JSON: `remote_machines` key.
## Implementation (All Phases Complete)
### Phase A: Extract `agor-core` crate [DONE]
- Cargo workspace at level (Cargo.toml with members: src-tauri, agor-core, agor-relay)
- PtyManager and SidecarManager extracted to agor-core/
- EventSink trait (agor-core/src/event.rs) abstracts event emission
- TauriEventSink (src-tauri/src/event_sink.rs) implements EventSink for AppHandle
- src-tauri pty.rs and sidecar.rs are thin re-export wrappers
### Phase B: Build `agor-relay` binary [DONE]
- agor-relay/src/main.rs — WebSocket server (tokio-tungstenite)
- Token auth on WebSocket upgrade (Authorization: Bearer header)
- CLI: --port (default 9750), --token (required), --insecure (allow ws://)
- Routes RelayCommand to agor-core managers, forwards RelayEvent over WebSocket
- Rate limiting: 10 failed auth attempts triggers 5-minute lockout
- Per-connection isolated PtyManager + SidecarManager instances
- Command response propagation: structured responses (pty_created, pong, error) sent back via shared event channel
- send_error() helper: all command failures emit RelayEvent with commandId + error message
- PTY creation confirmation: pty_create command returns pty_created event with session ID and commandId for correlation
### Phase C: Add `RemoteManager` to controller [DONE]
- src-tauri/src/remote.rs — RemoteManager struct with WebSocket client connections
- 12 Tauri commands: remote_add_machine, remote_remove_machine, remote_connect, remote_disconnect, remote_list_machines, remote_pty_spawn/write/resize/kill, remote_agent_query/stop, remote_sidecar_restart
- Heartbeat ping every 15s
- PTY creation event: emits `remote-pty-created` Tauri event with machineId, ptyId, commandId
- Exponential backoff reconnection on disconnect (1s/2s/4s/8s/16s/30s cap) via `attempt_tcp_probe()` (TCP-only, no WS upgrade)
- Reconnection events: `remote-machine-reconnecting`, `remote-machine-reconnect-ready`
### Phase D: Frontend integration [DONE]
- src/lib/adapters/remote-bridge.ts — machine management IPC adapter
- src/lib/stores/machines.svelte.ts — remote machine state store
- Pane.remoteMachineId field in layout store
- agent-bridge.ts and pty-bridge.ts route to remote commands when remoteMachineId is set
- SettingsDialog "Remote Machines" section
- Sidebar auto-groups remote panes by machine label
### Remaining Work
- [x] Reconnection logic with exponential backoff (1s-30s cap) — implemented in remote.rs
- [x] Relay command response propagation (pty_created, pong, error events) — implemented in main.rs
- [ ] Real-world relay testing (2 machines)
- [ ] TLS/certificate pinning
## Security Considerations
| Threat | Mitigation |
|--------|-----------|
| Token interception | TLS required (reject `ws://` without `--insecure`) |
| Token brute-force | Rate limit auth attempts (5/min), lockout after 10 failures |
| Relay impersonation | Pin relay certificate fingerprint (future: mTLS) |
| Command injection | Relay validates all command payloads against schema |
| Lateral movement | Relay runs as unprivileged user, no shell access beyond PTY/sidecar |
| Data exfiltration | Agent output streams to controller only, no relay-to-relay traffic |
## Performance Considerations
| Concern | Mitigation |
|---------|-----------|
| WebSocket latency | Typical LAN: <1ms. WAN: 20-100ms. Acceptable for agent output (text, not video) |
| Bandwidth | Agent NDJSON: ~50KB/s peak. Terminal: ~200KB/s peak. Trivial even on slow links |
| Connection count | Max 10 machines initially (UI constraint, not technical) |
| Message ordering | Single WebSocket per machine = ordered delivery guaranteed |
## What This Does NOT Cover (Future)
- **Multi-controller** — multiple Agents Orchestrator instances observing the same relay (needs pub/sub)
- **Relay discovery** — automatic detection of relays on LAN (mDNS/Bonjour)
- **Agent migration** — moving a running agent from one machine to another
- **Relay-to-relay** — direct communication between remote machines
- **mTLS** — mutual TLS for enterprise environments (Phase B+ enhancement)

View file

@ -1,362 +0,0 @@
# Multi-Agent Orchestration
Agent Orchestrator supports running multiple AI agents that communicate with each other, coordinate work through a shared task board, and are managed by a hierarchy of specialized roles. This document covers the inter-agent messaging system (btmsg), the task board (bttask), agent roles and system prompts, and the auto-wake scheduler.
---
## Agent Roles (Tier 1 and Tier 2)
Agents are organized into two tiers:
### Tier 1 — Management Agents
Defined in `groups.json` under a group's `agents[]` array. Each management agent gets a full ProjectBox in the UI (converted via `agentToProject()` in the workspace store). They have role-specific capabilities, tabs, and system prompts.
| Role | Tabs | btmsg Permissions | bttask Permissions | Purpose |
|------|------|-------------------|-------------------|---------|
| **Manager** | Model, Tasks | Full (send, receive, create channels) | Full CRUD | Coordinates work, creates/assigns tasks, delegates to subagents |
| **Architect** | Model, Architecture | Send, receive | Read-only + comments | Designs solutions, creates PlantUML diagrams, reviews architecture |
| **Tester** | Model, Selenium, Tests | Send, receive | Read-only + comments | Runs tests, monitors screenshots, discovers test files |
| **Reviewer** | Model, Tasks | Send, receive | Read + status + comments | Reviews code, manages review queue, approves/rejects tasks |
### Tier 2 — Project Agents
Regular `ProjectConfig` entries in `groups.json`. Each project gets its own Claude session with optional custom context via `project.systemPrompt`. They have standard tabs (Model, Docs, Context, Files, SSH, Memory) but no role-specific tabs.
### System Prompt Generation
Tier 1 agents receive auto-generated system prompts built by `generateAgentPrompt()` in `utils/agent-prompts.ts`. The prompt has 7 sections:
1. **Identity** — Role name, project context, team membership
2. **Environment** — Working directory, available tools, shell info
3. **Team** — List of other agents in the group with their roles
4. **btmsg documentation** — CLI usage, channel commands, message format
5. **bttask documentation** — CLI usage, task lifecycle, role-specific permissions
6. **Custom context** — Optional `project.systemPrompt` (Tier 2) or role-specific instructions
7. **Workflow** — Role-specific workflow guidelines (e.g., Manager delegates, Reviewer checks review queue)
Tier 2 agents receive only the custom context section (if `project.systemPrompt` is set), injected as the `system_prompt` field in AgentQueryOptions.
### BTMSG_AGENT_ID
Tier 1 agents receive the `BTMSG_AGENT_ID` environment variable, injected via `extra_env` in AgentQueryOptions. This flows through 5 layers: TypeScript → Rust AgentQueryOptions → NDJSON → JS runner → SDK env. The CLI tools (`btmsg`, `bttask`) read this variable to identify which agent is sending messages or creating tasks.
### Periodic Re-injection
LLM context degrades over long sessions as important instructions scroll out of the context window. To counter this, AgentSession runs a 1-hour timer that re-sends the system prompt when the agent is idle. The mechanism:
1. AgentSession timer fires after 60 minutes of agent inactivity
2. Sets `autoPrompt` flag, which AgentPane reads via `onautopromptconsumed` callback
3. AgentPane calls `startQuery()` with `resume=true` and the refresh prompt
4. The agent receives the role/tools reminder as a follow-up message
---
## btmsg — Inter-Agent Messaging
btmsg is a messaging system that lets agents communicate with each other. It consists of a Rust backend (SQLite), a Python CLI tool (for agents to use in their shell), and a Svelte frontend (CommsTab).
### Architecture
```
Agent (via btmsg CLI)
├── btmsg send <recipient> "message" → writes to btmsg.db
├── btmsg read → reads from btmsg.db
├── btmsg channel create #review-queue → creates channel
├── btmsg channel post #review-queue "msg" → posts to channel
└── btmsg heartbeat → updates agent heartbeat
btmsg.db (SQLite, WAL mode, ~/.local/share/agor/btmsg.db)
├── agents table — registered agents with roles
├── messages table — DMs and channel messages
├── channels table — named channels (#review-queue, #review-log)
├── contacts table — ACL (who can message whom)
├── heartbeats table — agent liveness tracking
├── dead_letter_queue — undeliverable messages
└── audit_log — all operations for debugging
Rust Backend (btmsg.rs, ~600 lines)
├── btmsg_list_messages, btmsg_send_message, ...
├── 15+ Tauri commands for full CRUD
└── Shared database connection (WAL + 5s busy_timeout)
Frontend (btmsg-bridge.ts → CommsTab.svelte)
├── Activity feed — all messages across all agents
├── DM view — direct messages between specific agents
└── Channel view — channel messages (#review-queue, etc.)
```
### Database Schema
The btmsg database (`btmsg.db`) stores all messaging data:
| Table | Purpose | Key Columns |
|-------|---------|-------------|
| `agents` | Agent registry | id, name, role, project_id, status, created_at |
| `messages` | All messages | id, sender_id, recipient_id, channel_id, content, read, created_at |
| `channels` | Named channels | id, name, created_by, created_at |
| `contacts` | ACL | agent_id, contact_id (bidirectional) |
| `heartbeats` | Liveness | agent_id, last_heartbeat, status |
| `dead_letter_queue` | Failed delivery | message_id, reason, created_at |
| `audit_log` | All operations | id, event_type, agent_id, details, created_at |
### CLI Usage (for agents)
Agents use the `btmsg` Python CLI tool in their shell. The tool reads `BTMSG_AGENT_ID` to identify the sender:
```bash
# Send a direct message
btmsg send architect "Please review the auth module design"
# Read unread messages
btmsg read
# Create a channel
btmsg channel create #architecture-decisions
# Post to a channel
btmsg channel post #review-queue "PR #42 ready for review"
# Send heartbeat (agents do this periodically)
btmsg heartbeat
# List all agents
btmsg agents
```
### Frontend (CommsTab)
The CommsTab component (rendered in ProjectBox for all agents) shows:
- **Activity Feed** — chronological view of all messages across all agents
- **DMs** — direct message threads between agents
- **Channels** — named channel message streams
- Polling-based updates (5s interval)
### Dead Letter Queue
Messages sent to non-existent or offline agents are moved to the dead letter queue instead of being silently dropped. The Rust backend checks agent status before delivery and queues failures. The Manager agent's health dashboard shows dead letter count.
### Audit Logging
Every btmsg operation is logged to the `audit_log` table with event type, agent ID, and JSON details. Event types include: message_sent, message_read, channel_created, agent_registered, heartbeat, and prompt_injection_detected.
---
## bttask — Task Board
bttask is a kanban-style task board that agents use to coordinate work. It shares the same SQLite database as btmsg (`btmsg.db`) for deployment simplicity.
### Architecture
```
Agent (via bttask CLI)
├── bttask list → list all tasks
├── bttask create "Fix auth bug" → create task (Manager only)
├── bttask status <id> in_progress → update status
├── bttask comment <id> "Done" → add comment
└── bttask review-count → count review queue tasks
btmsg.db → tasks table + task_comments table
Rust Backend (bttask.rs, ~300 lines)
├── 7 Tauri commands: list, create, update_status, delete, add_comment, comments, review_queue_count
└── Optimistic locking via version column
Frontend (bttask-bridge.ts → TaskBoardTab.svelte)
└── Kanban board: 5 columns, 5s poll, drag-and-drop
```
### Task Lifecycle
```
┌──────────┐ assign ┌─────────────┐ complete ┌──────────┐
│ Backlog │──────────►│ In Progress │────────────►│ Review │
└──────────┘ └─────────────┘ └──────────┘
┌───────────┼───────────┐
▼ ▼
┌────────┐ ┌──────────┐
│ Done │ │ Rejected │
└────────┘ └──────────┘
```
When a task moves to the "Review" column, the system automatically posts a notification to the `#review-queue` btmsg channel. The `ensure_review_channels()` function creates `#review-queue` and `#review-log` channels idempotently on first use.
### Optimistic Locking
To prevent concurrent updates from corrupting task state, bttask uses optimistic locking via a `version` column:
1. Client reads task with current version (e.g., version=3)
2. Client sends update with expected version=3
3. Server's UPDATE query includes `WHERE version = 3`
4. If another client updated first (version=4), the WHERE clause matches 0 rows
5. Server returns a conflict error, client must re-read and retry
This is critical because multiple agents may try to update the same task simultaneously.
### Role-Based Permissions
| Role | List | Create | Update Status | Delete | Comments |
|------|------|--------|---------------|--------|----------|
| Manager | Yes | Yes | Yes | Yes | Yes |
| Reviewer | Yes | No | Yes (review decisions) | No | Yes |
| Architect | Yes | No | No | No | Yes |
| Tester | Yes | No | No | No | Yes |
| Project (Tier 2) | Yes | No | No | No | Yes |
Permissions are enforced in the CLI tool based on the agent's role (read from `BTMSG_AGENT_ID` → agents table lookup).
### Review Queue Integration
The Reviewer agent gets special treatment in the attention scoring system:
- `reviewQueueDepth` is an input to attention scoring: 10 points per review task, capped at 50
- Priority: between file_conflict (70) and context_high (40)
- ProjectBox polls `review_queue_count` every 10 seconds for reviewer agents
- Results feed into `setReviewQueueDepth()` in the health store
### Frontend (TaskBoardTab.svelte)
The kanban board renders 5 columns (Backlog, In Progress, Review, Done, Rejected) with task cards. Features:
- 5-second polling for updates
- Click to expand task details + comments
- Manager-only create/delete buttons
- Color-coded status badges
---
## Wake Scheduler
The wake scheduler automatically re-activates idle Manager agents when attention-worthy events occur. It runs in `wake-scheduler.svelte.ts` and supports three user-selectable strategies.
### Strategies
| Strategy | Behavior | Use Case |
|----------|----------|----------|
| **Persistent** | Sends a resume prompt to the existing session | Long-running managers that should maintain context |
| **On-demand** | Starts a fresh session | Managers that work in bursts |
| **Smart** | On-demand, but only when wake score exceeds threshold | Avoids waking for minor events |
Strategy and threshold are configurable per group agent via `GroupAgentConfig.wakeStrategy` and `GroupAgentConfig.wakeThreshold` fields, persisted in `groups.json`.
### Wake Signals
The wake scorer evaluates 6 signals (defined in `types/wake.ts`, scored by `utils/wake-scorer.ts`):
| Signal | Weight | Trigger |
|--------|--------|---------|
| AttentionSpike | 1.0 | Any project's attention score exceeds threshold |
| ContextPressureCluster | 0.9 | Multiple projects have >75% context usage |
| BurnRateAnomaly | 0.8 | Cost rate deviates significantly from baseline |
| TaskQueuePressure | 0.7 | Task backlog grows beyond threshold |
| ReviewBacklog | 0.6 | Review queue has pending items |
| PeriodicFloor | 0.1 | Minimum periodic check (floor signal) |
The pure scoring function in `wake-scorer.ts` is tested with 24 unit tests. The types are in `types/wake.ts` (WakeStrategy, WakeSignal, WakeEvaluation, WakeContext).
### Lifecycle
1. ProjectBox registers manager agents via `$effect` on mount
2. Wake scheduler creates per-manager timers
3. Every 5 seconds, AgentSession polls wake events
4. If score exceeds threshold (for smart strategy), triggers wake
5. On group switch, `clearWakeScheduler()` cancels all timers
6. In test mode (`AGOR_TEST=1`), wake scheduler is disabled via `disableWakeScheduler()`
---
## Health Monitoring & Attention Scoring
The health store (`health.svelte.ts`) tracks per-project health with a 5-second tick timer. It provides the data that feeds the StatusBar, wake scheduler, and attention queue.
### Activity States
| State | Meaning | Visual |
|-------|---------|--------|
| Inactive | No agent running, no recent activity | Dim dot |
| Running | Agent actively processing | Green pulse |
| Idle | Agent finished, waiting for input | Gray dot |
| Stalled | Agent hasn't produced output for >N minutes | Orange pulse |
The stall threshold is configurable per-project via `stallThresholdMin` in ProjectConfig (default 15 min, range 5-60, step 5).
### Attention Scoring
Each project gets an attention score (0-100) based on its current state. The attention queue in the StatusBar shows the top 5 projects sorted by urgency:
| Condition | Score | Priority |
|-----------|-------|----------|
| Stalled agent | 100 | Highest — agent may be stuck |
| Error state | 90 | Agent crashed or API error |
| Context >90% | 80 | Context window nearly full |
| File conflict | 70 | Two agents wrote same file |
| Review queue depth | 10/task, cap 50 | Reviewer has pending reviews |
| Context >75% | 40 | Context pressure building |
The pure scoring function is in `utils/attention-scorer.ts` (14 tests). It takes `AttentionInput` and returns a numeric score.
### Burn Rate
Cost tracking uses a 5-minute exponential moving average (EMA) of cost snapshots. The StatusBar displays aggregate $/hr across all running agents.
### File Conflict Detection
The conflicts store (`conflicts.svelte.ts`) detects two types of conflicts:
1. **Agent overlap** — Two agents in the same worktree write the same file (tracked via tool_call analysis in the dispatcher)
2. **External writes** — A file watched by an agent is modified externally (detected via inotify in `fs_watcher.rs`, uses 2s timing heuristic `AGENT_WRITE_GRACE_MS` to distinguish agent writes from external)
Both types show badges in ProjectHeader (orange ⚡ for external, red ⚠ for agent overlap).
---
## Session Anchors
Session anchors preserve important conversation turns through Claude's context compaction process. Without anchors, valuable early context (architecture decisions, debugging breakthroughs) can be lost when the context window fills up.
### Anchor Types
| Type | Created By | Behavior |
|------|-----------|----------|
| **Auto** | System (on first compaction) | Captures first 3 turns, observation-masked (reasoning preserved, tool outputs compacted) |
| **Pinned** | User (pin button in AgentPane) | Marks specific turns as important |
| **Promoted** | User (from pinned) | Re-injectable into future sessions via system prompt |
### Anchor Budget
The budget controls how many tokens are spent on anchor re-injection:
| Scale | Token Budget | Use Case |
|-------|-------------|----------|
| Small | 2,000 | Quick sessions, minimal context needed |
| Medium | 6,000 | Default, covers most scenarios |
| Large | 12,000 | Complex debugging sessions |
| Full | 20,000 | Maximum context preservation |
Configurable per-project via slider in SettingsTab, stored as `ProjectConfig.anchorBudgetScale` in `groups.json`.
### Re-injection Flow
When a session resumes with promoted anchors:
1. `anchors.svelte.ts` loads promoted anchors for the project
2. `anchor-serializer.ts` serializes them (turn grouping, observation masking, token estimation)
3. `AgentPane.startQuery()` includes serialized anchors in the `system_prompt` field
4. The sidecar passes the system prompt to the SDK
5. Claude receives the anchors as context alongside the new prompt
### Storage
Anchors are persisted in the `session_anchors` table in `sessions.db`. The ContextTab shows an anchor section with a budget meter (derived from the configured scale) and promote/demote buttons.

View file

@ -1,330 +0,0 @@
# Agents Orchestrator v2 — Implementation Phases
See [architecture.md](architecture.md) for system architecture and [decisions.md](decisions.md) for design decisions.
---
## Phase 1: Project Scaffolding [status: complete] — MVP
- [x] Create feature branch `v2-mission-control`
- [x] Initialize Tauri 2.x project with Svelte 5 frontend
- [x] Project structure (see below)
- [x] Basic Tauri window with Catppuccin Mocha CSS variables
- [x] Verify Tauri builds and launches on target system
- [x] Set up dev scripts (dev, build, lint)
### File Structure
```
agents-orchestrator/
src-tauri/
src/
main.rs # Tauri app entry
pty.rs # PTY management (portable-pty, not plugin)
sidecar.rs # Sidecar lifecycle (unified .mjs bundle, Deno-first + Node.js fallback)
watcher.rs # File watcher for markdown viewer
session.rs # Session + SSH session persistence (SQLite via rusqlite)
ctx.rs # Read-only ctx context DB access
Cargo.toml
src/
App.svelte # Root layout + detached pane mode
lib/
components/
Layout/
TilingGrid.svelte # Dynamic tiling manager
PaneContainer.svelte # Individual pane wrapper
Terminal/
TerminalPane.svelte # xterm.js terminal pane (theme-aware)
Agent/
AgentPane.svelte # SDK agent structured output
AgentTree.svelte # Subagent tree visualization (SVG)
Markdown/
MarkdownPane.svelte # Live markdown file viewer (shiki highlighting)
Context/
ContextPane.svelte # ctx database viewer (projects, entries, search)
SSH/
SshDialog.svelte # SSH session create/edit modal
SshSessionList.svelte # SSH session list in sidebar
Sidebar/
SessionList.svelte # Session browser + SSH list
StatusBar/
StatusBar.svelte # Global status bar (pane counts, cost)
Notifications/
ToastContainer.svelte # Toast notification display
Settings/
SettingsDialog.svelte # Settings modal (shell, cwd, max panes, theme)
stores/
sessions.svelte.ts # Session state ($state runes)
agents.svelte.ts # Active agent tracking
layout.svelte.ts # Pane layout state
notifications.svelte.ts # Toast notification state
theme.svelte.ts # Catppuccin theme flavor state
adapters/
sdk-messages.ts # SDK message abstraction layer
pty-bridge.ts # PTY IPC wrapper
agent-bridge.ts # Agent IPC wrapper (local + remote routing)
claude-bridge.ts # Claude profiles + skills IPC wrapper
settings-bridge.ts # Settings IPC wrapper
ctx-bridge.ts # ctx database IPC wrapper
ssh-bridge.ts # SSH session IPC wrapper
remote-bridge.ts # Remote machine management IPC wrapper
session-bridge.ts # Session/layout persistence IPC wrapper
utils/
agent-tree.ts # Agent tree builder (hierarchy from messages)
highlight.ts # Shiki syntax highlighter (lazy singleton)
detach.ts # Detached pane mode (pop-out windows)
updater.ts # Tauri auto-updater utility
styles/
catppuccin.css # Theme CSS variables (Mocha defaults)
themes.ts # All 4 Catppuccin flavor definitions
app.css
sidecar/
agent-runner.ts # Sidecar source (compiled to .mjs by esbuild)
dist/
agent-runner.mjs # Bundled sidecar (runs on both Deno and Node.js)
package.json # Agent SDK dependency
package.json
svelte.config.js
vite.config.ts
tauri.conf.json
```
**Key change from v1:** Using portable-pty directly from Rust instead of tauri-plugin-pty (38-star community plugin). portable-pty is well-maintained (used by WezTerm). More work upfront, more reliable long-term.
---
## Phase 2: Terminal Pane + Layout [status: complete] — MVP
### Layout (responsive)
**32:9 (5120px) — full density:**
```
+--------+------------------------------------+--------+
|Sidebar | 2-4 panes, CSS Grid, resizable | Right |
| 260px | | 380px |
+--------+------------------------------------+--------+
```
**16:9 (1920px) — degraded but functional:**
```
+--------+-------------------------+
|Sidebar | 1-2 panes | (right panel collapses to overlay)
| 240px | |
+--------+-------------------------+
```
- [x] CSS Grid layout with sidebar + main area + optional right panel
- [x] Responsive breakpoints (ultrawide / standard / narrow)
- [x] Pane resize via drag handles (splitter overlays in TilingGrid with mouse drag, min/max 10%/90%)
- [x] Layout presets: 1-col, 2-col, 3-col, 2x2, master+stack
- [ ] Save/restore layout to SQLite (Phase 4)
- [x] Keyboard: Ctrl+1-4 focus pane, Ctrl+N new terminal
### Terminal
- [x] xterm.js with Canvas addon (explicit — no WebGL dependency)
- [x] Catppuccin Mocha theme for xterm.js
- [x] PTY spawn from Rust (portable-pty), stream to frontend via Tauri events
- [x] Terminal resize -> PTY resize (100ms debounce)
- [x] Copy/paste (Ctrl+Shift+C/V) — via attachCustomKeyEventHandler
- [x] SSH session: spawn `ssh` command in PTY (via shell args)
- [x] Local shell: spawn user's $SHELL
- [x] Claude Code CLI: spawn `claude` in PTY (via shell args)
**Milestone: After Phase 2, we have a working multi-pane terminal.** Usable as a daily driver even without agent features.
---
## Phase 3: Agent SDK Integration [status: complete] — MVP
### Backend
- [x] Node.js/Deno sidecar: uses `@anthropic-ai/claude-agent-sdk` query() function (migrated from raw CLI spawning due to piped stdio hang bug #6775)
- [x] Sidecar communication: Rust spawns Node.js, stdio NDJSON
- [x] Sidecar lifecycle: auto-start on app launch, shutdown on exit
- [x] Sidecar lifecycle: detect crash, offer restart in UI (agent_restart command + restart button)
- [x] Tauri commands: agent_query, agent_stop, agent_ready, agent_restart
### Frontend
- [x] SDK message adapter: parses stream-json into 9 typed AgentMessage types (abstraction layer)
- [x] Agent bridge: Tauri IPC adapter (invoke + event listeners)
- [x] Agent dispatcher: singleton routing sidecar events to store, crash detection
- [x] Agent store: session state, message history, cost tracking (Svelte 5 $state)
- [x] Agent pane: renders structured messages
- [x] Text -> plain text (markdown rendering deferred)
- [x] Tool calls -> collapsible cards (tool name + input)
- [x] Tool results -> collapsible cards
- [x] Thinking -> collapsible details
- [x] Init -> model badge
- [x] Cost -> USD/tokens/turns/duration summary
- [x] Errors -> highlighted error card
- [x] Subagent spawn -> auto-creates child agent pane with parent/child navigation (Phase 7)
- [x] Agent status indicator (starting/running/done/error)
- [x] Start/stop agent from UI (prompt form + stop button)
- [x] Auto-scroll with scroll-lock on user scroll-up
- [x] Session resume (follow-up prompt in AgentPane, resume_session_id passed to SDK)
- [x] Keyboard: Ctrl+Shift+N new agent
- [x] Sidebar: agent session button
**Milestone: After Phase 3, we have the core differentiator.** SDK agents run in structured panes alongside raw terminals.
---
## Phase 4: Session Management + Markdown Viewer [status: complete] — MVP
### Sessions
- [x] SQLite persistence for sessions (rusqlite with bundled feature)
- [x] Session types: terminal, agent, markdown (SSH via terminal args)
- [x] Session CRUD: save, delete, update_title, touch (last_used_at)
- [x] Session groups/folders — group_name column, setPaneGroup, grouped sidebar with collapsible headers
- [x] Remember last layout on restart (preset + pane_ids in layout_state table)
- [x] Auto-restore panes on app startup (restoreFromDb in layout store)
### Markdown Viewer
- [x] File watcher (notify crate v6) -> Tauri events -> frontend
- [x] Markdown rendering (marked.js)
- [x] Syntax highlighting (Shiki) — added in Phase 5 (highlight.ts, 13 preloaded languages)
- [x] Open from sidebar (file picker button "M")
- [x] Catppuccin-themed markdown styles (h1-h3, code, pre, tables, blockquotes)
- [x] Live reload on file change
**Milestone: After Phase 4 = MVP ship.** Full session management, structured agent panes, terminal panes, markdown viewer.
---
## Phase 5: Agent Tree + Polish [status: complete] — Post-MVP
- [x] Agent tree visualization (SVG, compact horizontal layout) — AgentTree.svelte + agent-tree.ts utility
- [x] Click tree node -> scroll to message (handleTreeNodeClick in AgentPane, scrollIntoView smooth)
- [x] Aggregate cost per subtree (subtreeCost displayed in yellow below each tree node label)
- [x] Terminal copy/paste (Ctrl+Shift+C/V via attachCustomKeyEventHandler)
- [x] Terminal theme hot-swap (onThemeChange callback registry in theme.svelte.ts, TerminalPane subscribes)
- [x] Pane drag-resize handles (splitter overlays in TilingGrid with mouse drag)
- [x] Session resume (follow-up prompt, resume_session_id to SDK)
- [x] Global status bar (terminal/agent counts, active agents pulse, token/cost totals) — StatusBar.svelte
- [x] Notification system (toast: success/error/warning/info, auto-dismiss 4s, max 5) — notifications.svelte.ts + ToastContainer.svelte
- [x] Agent dispatcher toast integration (agent complete, error, sidecar crash notifications)
- [x] Global keyboard shortcuts — Ctrl+W close focused pane, Ctrl+, open settings
- [x] Settings dialog (default shell, cwd, max panes, theme flavor) — SettingsDialog.svelte + settings-bridge.ts
- [x] Settings backend — settings table in SQLite (session.rs), Tauri commands settings_get/set/list (lib.rs)
- [x] ctx integration — read-only access to ~/.claude-context/context.db (ctx.rs, ctx-bridge.ts, ContextPane.svelte)
- [x] SSH session management — CRUD in SQLite (SshSession struct, SshDialog.svelte, SshSessionList.svelte, ssh-bridge.ts)
- [x] Catppuccin theme flavors — Latte/Frappe/Macchiato/Mocha selectable (themes.ts, theme.svelte.ts)
- [x] Detached pane mode — pop-out terminal/agent into standalone windows (detach.ts, App.svelte)
- [x] Syntax highlighting — Shiki integration for markdown + agent messages (highlight.ts, shiki dep)
---
## Phase 6: Packaging + Distribution [status: complete] — Post-MVP
- [x] install-v2.sh — build-from-source installer with dependency checks (Node.js 20+, Rust 1.77+, system libs)
- Checks: WebKit2GTK, GTK3, GLib, libayatana-appindicator, librsvg, openssl, build-essential, pkg-config, curl, wget, FUSE
- Prompts to install missing packages via apt
- Builds with `npx tauri build`, installs binary as `agents-orchestrator` in `~/.local/bin/`
- Creates desktop entry and installs SVG icon
- [x] Tauri bundle configuration — targets: `["deb", "appimage"]`, category: DeveloperTool
- .deb depends: libwebkit2gtk-4.1-0, libgtk-3-0, libayatana-appindicator3-1
- AppImage: bundleMediaFramework disabled
- [x] Icons regenerated from agor.svg — RGBA PNGs (32x32, 128x128, 128x128@2x, 512x512, .ico)
- [x] GitHub Actions release workflow (`.github/workflows/release.yml`)
- Triggered on `v*` tags, Ubuntu 22.04 runner
- Caches Rust and npm dependencies
- Builds .deb + AppImage, uploads as GitHub Release artifacts
- [x] Build verified: .deb (4.3 MB), AppImage (103 MB)
- [x] Auto-updater plugin integrated (tauri-plugin-updater Rust + @tauri-apps/plugin-updater npm + updater.ts)
- [x] Auto-update latest.json generation in CI (version, platform URL, signature from .sig file)
- [x] release.yml: TAURI_SIGNING_PRIVATE_KEY env vars passed to build step
- [x] Auto-update signing key generated, pubkey set in tauri.conf.json
- [x] TAURI_SIGNING_PRIVATE_KEY secret set in GitHub repo settings via `gh secret set`
---
## Phase 7: Agent Teams / Subagent Support [status: complete] — Post-MVP
- [x] Agent store parent/child hierarchy — parentSessionId, parentToolUseId, childSessionIds fields on AgentSession
- [x] Agent store functions — findChildByToolUseId(), getChildSessions(), parent-aware createAgentSession()
- [x] Agent dispatcher subagent detection — SUBAGENT_TOOL_NAMES Set ('Agent', 'Task', 'dispatch_agent')
- [x] Agent dispatcher message routing — parentId-bearing messages routed to child panes via toolUseToChildPane Map
- [x] Agent dispatcher pane spawning — spawnSubagentPane() creates child session + layout pane, auto-grouped under parent
- [x] AgentPane parent navigation — SUB badge + button to focus parent agent
- [x] AgentPane children bar — clickable chips per child subagent with status colors (running/done/error)
- [x] SessionList subagent icon — '↳' for subagent panes
- [x] Subagent cost aggregation — getTotalCost() recursive helper in agents.svelte.ts, total cost shown in parent pane done-bar
- [x] Dispatcher tests for subagent routing — 10 tests covering spawn, dedup, child message routing, init/cost forwarding, fallbacks (28 total dispatcher tests)
- [ ] Test with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
### System Requirements
- Node.js 20+ (for Agent SDK sidecar)
- Rust 1.77+ (for building from source)
- WebKit2GTK 4.1+ (Tauri runtime)
- Linux x86_64 (primary target)
---
## Multi-Machine Support (Phases A-D) [status: complete]
Architecture designed in [multi-machine.md](multi-machine.md). Implementation extends Agents Orchestrator to manage agents and terminals on remote machines over WebSocket.
### Phase A: Extract `agor-core` crate [status: complete]
- [x] Created Cargo workspace at level (Cargo.toml with members)
- [x] Extracted PtyManager and SidecarManager into shared `agor-core` crate
- [x] Created EventSink trait to abstract Tauri event emission (agor-core/src/event.rs)
- [x] TauriEventSink in src-tauri/src/event_sink.rs implements EventSink
- [x] src-tauri pty.rs and sidecar.rs now thin re-exports from agor-core
### Phase B: Build `agor-relay` binary [status: complete]
- [x] WebSocket server using tokio-tungstenite with token auth
- [x] CLI flags: --port, --token, --insecure (clap)
- [x] Routes RelayCommand to PtyManager/SidecarManager, forwards RelayEvent over WebSocket
- [x] Rate limiting on auth failures (10 attempts, 5min lockout)
- [x] Per-connection isolated PTY + sidecar managers
- [x] Command response propagation: structured responses (pty_created, pong, error) via shared event channel
- [x] send_error() helper for consistent error reporting with commandId correlation
- [x] PTY creation confirmation: pty_created event with session ID and commandId
### Phase C: Add `RemoteManager` to controller [status: complete]
- [x] New remote.rs module in src-tauri — WebSocket client connections to relay instances
- [x] Machine lifecycle: add/remove/connect/disconnect
- [x] 12 new Tauri commands for remote operations
- [x] Heartbeat ping every 15s
- [x] PTY creation event: emits remote-pty-created Tauri event with machineId, ptyId, commandId
- [x] Exponential backoff reconnection on disconnect (1s/2s/4s/8s/16s/30s cap)
- [x] attempt_tcp_probe() function: TCP-only probe (5s timeout, default port 9750) — avoids allocating per-connection resources on relay during probes
- [x] Reconnection events: remote-machine-reconnecting, remote-machine-reconnect-ready
### Phase D: Frontend integration [status: complete]
- [x] remote-bridge.ts adapter for machine management + remote events
- [x] machines.svelte.ts store for remote machine state
- [x] Layout store: Pane.remoteMachineId field
- [x] agent-bridge.ts and pty-bridge.ts route to remote commands when remoteMachineId is set
- [x] SettingsDialog "Remote Machines" section (add/remove/connect/disconnect)
- [x] Sidebar auto-groups remote panes by machine label
### Remaining Work
- [x] Reconnection logic with exponential backoff — implemented in remote.rs
- [x] Relay command response propagation — implemented in agor-relay main.rs
- [ ] Real-world relay testing (2 machines)
- [ ] TLS/certificate pinning
---
## Extras: Claude Profiles & Skill Discovery [status: complete]
### Claude Profile / Account Switching
- [x] Tauri command claude_list_profiles(): reads ~/.config/switcher/profiles/ directories
- [x] Profile metadata from profile.toml (email, subscription_type, display_name)
- [x] Config dir resolution: ~/.config/switcher-claude/{name}/ or fallback ~/.claude/
- [x] Default profile fallback when no switcher profiles exist
- [x] Profile selector dropdown in AgentPane toolbar (shown when >1 profile)
- [x] Selected profile's config_dir passed as claude_config_dir -> CLAUDE_CONFIG_DIR env override
### Skill Discovery & Autocomplete
- [x] Tauri command claude_list_skills(): reads ~/.claude/skills/ (dirs with SKILL.md or .md files)
- [x] Tauri command claude_read_skill(path): reads skill file content
- [x] Frontend adapter: claude-bridge.ts (ClaudeProfile, ClaudeSkill interfaces, listProfiles/listSkills/readSkill)
- [x] Skill autocomplete in AgentPane: `/` prefix triggers menu, arrow keys navigate, Tab/Enter select
- [x] expandSkillPrompt(): reads skill content, injects as prompt with optional user args
### Extended AgentQueryOptions
- [x] Rust struct (agor-core/src/sidecar.rs): setting_sources, system_prompt, model, claude_config_dir, additional_directories
- [x] Sidecar JSON passthrough (both agent-runner.ts and agent-runner-deno.ts)
- [x] SDK query() options: settingSources defaults to ['user', 'project'], systemPrompt, model, additionalDirectories
- [x] CLAUDE_CONFIG_DIR env injection for multi-account support
- [x] Frontend AgentQueryOptions interface (agent-bridge.ts) updated with new fields

View file

@ -1,364 +0,0 @@
# Production Hardening
Agent Orchestrator includes several production-readiness features that ensure reliability, security, and observability. This document covers each subsystem in detail.
---
## Sidecar Supervisor (Crash Recovery)
The `SidecarSupervisor` in `agor-core/src/supervisor.rs` automatically restarts crashed sidecar processes.
### Behavior
When the sidecar child process exits unexpectedly:
1. The supervisor detects the exit via process monitoring
2. Waits with exponential backoff before restarting:
- Attempt 1: wait 1 second
- Attempt 2: wait 2 seconds
- Attempt 3: wait 4 seconds
- Attempt 4: wait 8 seconds
- Attempt 5: wait 16 seconds (capped at 30s)
3. After 5 failed attempts, the supervisor gives up and reports `SidecarHealth::Failed`
### Health States
```rust
pub enum SidecarHealth {
Healthy,
Restarting { attempt: u32, next_retry: Duration },
Failed { attempts: u32, last_error: String },
}
```
The frontend can query health state and offer a manual restart button when auto-recovery fails. 17 unit tests cover all recovery scenarios including edge cases like rapid successive crashes.
---
## Landlock Sandbox
Landlock is a Linux kernel (6.2+) security module that restricts filesystem access for processes. Agent Orchestrator uses it to sandbox sidecar processes, limiting what files they can read and write.
### Configuration
```rust
pub struct SandboxConfig {
pub read_write_paths: Vec<PathBuf>, // Full access (project dir, temp)
pub read_only_paths: Vec<PathBuf>, // Read-only (system libs, SDK)
}
```
The sandbox is applied via `pre_exec()` on the child process command, before the sidecar starts executing.
### Path Rules
| Path | Access | Reason |
|------|--------|--------|
| Project CWD | Read/Write | Agent needs to read and modify project files |
| `/tmp` | Read/Write | Temporary files during operation |
| `~/.local/share/agor/` | Read/Write | SQLite databases (btmsg, sessions) |
| System library paths | Read-only | Node.js/Deno runtime dependencies |
| `~/.claude/` or config dir | Read-only | Claude configuration and credentials |
### Graceful Fallback
If the kernel doesn't support Landlock (< 6.2) or the kernel module isn't loaded, the sandbox silently degrades the sidecar runs without filesystem restrictions. This is logged as a warning but doesn't prevent operation.
---
## FTS5 Full-Text Search
The search system uses SQLite's FTS5 extension for full-text search across three data types. Accessed via a Spotlight-style overlay (Ctrl+Shift+F).
### Architecture
```
SearchOverlay.svelte (Ctrl+Shift+F)
└── search-bridge.ts → Tauri commands
└── search.rs → SearchDb (separate FTS5 tables)
├── search_messages — agent session messages
├── search_tasks — bttask task content
└── search_btmsg — btmsg inter-agent messages
```
### Virtual Tables
The `SearchDb` struct in `search.rs` manages three FTS5 virtual tables:
| Table | Source | Indexed Columns |
|-------|--------|----------------|
| `search_messages` | Agent session messages | content, session_id, project_id |
| `search_tasks` | bttask tasks | title, description, assignee, status |
| `search_btmsg` | btmsg messages | content, sender, recipient, channel |
### Operations
| Tauri Command | Purpose |
|---------------|---------|
| `search_init` | Creates FTS5 virtual tables if not exist |
| `search_all` | Queries all 3 tables, returns ranked results |
| `search_rebuild` | Drops and rebuilds all indices (maintenance) |
| `search_index_message` | Indexes a single new message (real-time) |
### Frontend (SearchOverlay.svelte)
- Triggered by Ctrl+Shift+F
- Spotlight-style floating overlay centered on screen
- 300ms debounce on input to avoid excessive queries
- Results grouped by type (Messages, Tasks, Communications)
- Click result to navigate to source (focus project, switch tab)
---
## Plugin System
The plugin system allows extending Agent Orchestrator with custom commands and event handlers. Plugins are sandboxed JavaScript executing in a restricted environment.
### Plugin Discovery
Plugins live in `~/.config/agor/plugins/`. Each plugin is a directory containing a `plugin.json` manifest:
```json
{
"name": "my-plugin",
"version": "1.0.0",
"description": "A custom plugin",
"main": "index.js",
"permissions": ["notifications", "settings"]
}
```
The Rust `plugins.rs` module scans for `plugin.json` files with path-traversal protection (rejects `..` in paths).
### Sandboxed Runtime (plugin-host.ts)
Plugins execute via `new Function()` in a restricted scope:
**Shadowed globals (13):**
`fetch`, `XMLHttpRequest`, `WebSocket`, `Worker`, `eval`, `Function`, `importScripts`, `require`, `process`, `globalThis`, `window`, `document`, `localStorage`
**Provided API (permission-gated):**
| API | Permission | Purpose |
|-----|-----------|---------|
| `bt.notify(msg)` | `notifications` | Show toast notification |
| `bt.getSetting(key)` | `settings` | Read app setting |
| `bt.setSetting(key, val)` | `settings` | Write app setting |
| `bt.registerCommand(name, fn)` | — (always allowed) | Add command to palette |
| `bt.on(event, fn)` | — (always allowed) | Subscribe to app events |
The API object is frozen (`Object.freeze`) to prevent tampering. Strict mode is enforced.
### Plugin Store (`plugins.svelte.ts`)
The store manages plugin lifecycle:
- `loadAllPlugins()` — discover, validate permissions, execute in sandbox
- `unloadAllPlugins()` — cleanup event listeners, remove commands
- Command registry integrates with CommandPalette
- Event bus distributes app events to subscribed plugins
### Security Notes
The `new Function()` sandbox is best-effort — it is not a security boundary. A determined attacker could escape it. Landlock provides the actual filesystem restriction. The plugin sandbox primarily prevents accidental damage from buggy plugins.
35 tests cover the plugin system including permission validation, sandbox escape attempts, and lifecycle management.
---
## Secrets Management
Secrets (API keys, tokens) are stored in the system keyring rather than in plaintext files or SQLite.
### Backend (`secrets.rs`)
Uses the `keyring` crate with the `linux-native` feature (libsecret/DBUS):
```rust
pub struct SecretsManager;
impl SecretsManager {
pub fn store(key: &str, value: &str) -> Result<()>;
pub fn get(key: &str) -> Result<Option<String>>;
pub fn delete(key: &str) -> Result<()>;
pub fn list() -> Result<Vec<SecretMetadata>>;
pub fn has_keyring() -> bool;
}
```
Metadata (key names, last modified timestamps) is stored in SQLite settings. The actual secret values never touch disk — they live only in the system keyring (gnome-keyring, KWallet, or equivalent).
### Frontend (`secrets-bridge.ts`)
| Function | Purpose |
|----------|---------|
| `storeSecret(key, value)` | Store a secret in keyring |
| `getSecret(key)` | Retrieve a secret |
| `deleteSecret(key)` | Remove a secret |
| `listSecrets()` | List all secret metadata |
| `hasKeyring()` | Check if system keyring is available |
### No Fallback
If no keyring daemon is available (no DBUS session, no gnome-keyring), secret operations fail with a clear error message. There is no plaintext fallback — this is intentional to prevent accidental credential leakage.
---
## Notifications
Agent Orchestrator has two notification systems: in-app toasts and OS-level desktop notifications.
### In-App Toasts (`notifications.svelte.ts`)
- 6 notification types: `success`, `error`, `warning`, `info`, `agent_complete`, `agent_error`
- Maximum 5 visible toasts, 4-second auto-dismiss
- Toast history (up to 100 entries) with unread badge in NotificationCenter
- Agent dispatcher emits toasts on: agent completion, agent error, sidecar crash
### Desktop Notifications (`notifications.rs`)
Uses `notify-rust` crate for native Linux notifications. Graceful fallback if notification daemon is unavailable (e.g., no D-Bus session).
Frontend triggers via `sendDesktopNotification()` in `notifications-bridge.ts`. Used for events that should be visible even when the app is not focused.
### Notification Center (`NotificationCenter.svelte`)
Bell icon in the top-right with unread badge. Dropdown panel shows notification history with timestamps, type icons, and clear/mark-read actions.
---
## Agent Health Monitoring
### Heartbeats
Tier 1 agents send periodic heartbeats via `btmsg heartbeat` CLI command. The heartbeats table tracks last heartbeat timestamp and status per agent.
### Stale Detection
The health store detects stalled agents via the `stallThresholdMin` setting (default 15 minutes). If an agent hasn't produced output within the threshold, its activity state transitions to `Stalled` and the attention score jumps to 100 (highest priority).
### Dead Letter Queue
Messages sent to agents that are offline or have crashed are moved to the dead letter queue in `btmsg.db`. This prevents silent message loss and allows debugging delivery failures.
### Audit Logging
All significant events are logged to the `audit_log` table:
| Event Type | Logged When |
|-----------|-------------|
| `message_sent` | Agent sends a btmsg message |
| `message_read` | Agent reads messages |
| `channel_created` | New btmsg channel created |
| `agent_registered` | Agent registers with btmsg |
| `heartbeat` | Agent sends heartbeat |
| `task_created` | New bttask task |
| `task_status_changed` | Task status update |
| `wake_event` | Wake scheduler triggers |
| `prompt_injection_detected` | Suspicious content in agent messages |
The AuditLogTab component in the workspace UI displays audit entries with filtering by event type and agent, with 5-second auto-refresh and max 200 entries.
---
## Error Classification
The error classifier (`utils/error-classifier.ts`) categorizes API errors into 6 types with appropriate retry behavior:
| Type | Examples | Retry? | User Message |
|------|----------|--------|--------------|
| `rate_limit` | HTTP 429, "rate limit exceeded" | Yes (with backoff) | "Rate limited — retrying in Xs" |
| `auth` | HTTP 401/403, "invalid API key" | No | "Authentication failed — check API key" |
| `quota` | "quota exceeded", "billing" | No | "Usage quota exceeded" |
| `overloaded` | HTTP 529, "overloaded" | Yes (longer backoff) | "Service overloaded — retrying" |
| `network` | ECONNREFUSED, timeout, DNS failure | Yes | "Network error — check connection" |
| `unknown` | Anything else | No | "Unexpected error" |
20 unit tests cover classification accuracy across various error message formats.
---
## WAL Checkpoint
Both SQLite databases (`sessions.db` and `btmsg.db`) use WAL (Write-Ahead Logging) mode for concurrent read/write access. Without periodic checkpoints, the WAL file grows unboundedly.
A background tokio task runs `PRAGMA wal_checkpoint(TRUNCATE)` every 5 minutes on both databases. This moves WAL data into the main database file and resets the WAL.
---
## TLS Relay Support
The `agor-relay` binary supports TLS for encrypted WebSocket connections:
```bash
agor-relay \
--port 9750 \
--token <secret> \
--tls-cert /path/to/cert.pem \
--tls-key /path/to/key.pem
```
Without `--tls-cert`/`--tls-key`, the relay only accepts connections with the `--insecure` flag (plain WebSocket). In production, TLS is mandatory — the relay rejects `ws://` connections unless `--insecure` is explicitly set.
Certificate pinning (comparing relay certificate fingerprints) is planned for v3.1.
---
## OpenTelemetry Observability
The Rust backend supports optional OTLP trace export via the `AGOR_OTLP_ENDPOINT` environment variable.
### Backend (`telemetry.rs`)
- `TelemetryGuard` initializes tracing + OTLP export pipeline
- Uses `tracing` + `tracing-subscriber` + `opentelemetry` 0.28 + `tracing-opentelemetry` 0.29
- OTLP/HTTP export to configured endpoint
- `Drop`-based shutdown ensures spans are flushed
### Frontend (`telemetry-bridge.ts`)
The frontend cannot use the browser OTEL SDK (WebKit2GTK incompatible). Instead, it routes events through a `frontend_log` Tauri command that pipes into Rust's tracing system:
```typescript
tel.info('agent-started', { sessionId, provider });
tel.warn('context-pressure', { projectId, usage: 0.85 });
tel.error('sidecar-crash', { error: msg });
```
### Docker Stack
A pre-configured Tempo + Grafana stack lives in `docker/tempo/`:
```bash
cd docker/tempo && docker compose up -d
# Grafana at http://localhost:9715
# Set AGOR_OTLP_ENDPOINT=http://localhost:4318 to enable export
```
---
## Session Metrics
Per-project historical session data is stored in the `session_metrics` table:
| Column | Type | Purpose |
|--------|------|---------|
| `project_id` | TEXT | Which project |
| `session_id` | TEXT | Agent session ID |
| `start_time` | INTEGER | Session start timestamp |
| `end_time` | INTEGER | Session end timestamp |
| `peak_tokens` | INTEGER | Maximum context tokens used |
| `turn_count` | INTEGER | Total conversation turns |
| `tool_call_count` | INTEGER | Total tool calls made |
| `cost_usd` | REAL | Total cost in USD |
| `model` | TEXT | Model used |
| `status` | TEXT | Final status (success/error/stopped) |
| `error_message` | TEXT | Error details if failed |
100-row retention per project (oldest pruned on insert). Metrics are persisted on agent completion via the agent dispatcher.
The MetricsPanel component displays this data as:
- **Live view** — fleet aggregates, project health grid, task board summary, attention queue
- **History view** — SVG sparklines for cost/tokens/turns/tools/duration, stats row, session table

View file

@ -1,235 +0,0 @@
# Sidecar Architecture
The sidecar is the bridge between Agent Orchestrator's Rust backend and AI provider APIs. Because the Claude Agent SDK, OpenAI Codex SDK, and Ollama API are JavaScript/TypeScript libraries, they cannot run inside Rust or WebKit2GTK's webview. Instead, the Rust backend spawns child processes (sidecars) that handle AI interactions and communicate back via stdio NDJSON.
---
## Overview
```
Rust Backend (SidecarManager)
├── Spawns child process (Deno preferred, Node.js fallback)
├── Writes QueryMessage to stdin (NDJSON)
├── Reads response lines from stdout (NDJSON)
├── Emits Tauri events for each message
└── Manages lifecycle (start, stop, crash recovery)
Sidecar Process (one of):
├── claude-runner.mjs → @anthropic-ai/claude-agent-sdk
├── codex-runner.mjs → @openai/codex-sdk
└── ollama-runner.mjs → native fetch to localhost:11434
```
---
## Provider Runners
Each provider has its own runner file in `sidecar/`, compiled to a standalone ESM bundle in `sidecar/dist/` by esbuild. The runners are self-contained — all dependencies (including SDKs) are bundled into the `.mjs` file.
### Claude Runner (`claude-runner.ts``claude-runner.mjs`)
The primary runner. Uses `@anthropic-ai/claude-agent-sdk` query() function.
**Startup sequence:**
1. Reads NDJSON messages from stdin in a loop
2. On `query` message: resolves Claude CLI path via `findClaudeCli()`
3. Calls SDK `query()` with options: prompt, cwd, permissionMode, model, settingSources, systemPrompt, additionalDirectories, worktreeName, pathToClaudeCodeExecutable
4. Streams SDK messages as NDJSON to stdout
5. On `stop` message: calls AbortController.abort()
**Claude CLI detection (`findClaudeCli()`):**
Checks paths in order: `~/.local/bin/claude``~/.claude/local/claude``/usr/local/bin/claude``/usr/bin/claude``which claude`. If none found, emits `agent_error` immediately. The path is resolved once at sidecar startup and reused for all sessions.
**Session resume:** Passes `resume: sessionId` to the SDK when a resume session ID is provided. The SDK handles transcript loading internally.
**Multi-account support:** When `claudeConfigDir` is provided (from profile selection), it is set as `CLAUDE_CONFIG_DIR` in the SDK's env option. This points the Claude CLI at a different configuration directory.
**Worktree isolation:** When `worktreeName` is provided, it is passed as `extraArgs: { worktree: name }` to the SDK, which translates to `--worktree <name>` on the CLI.
### Codex Runner (`codex-runner.ts``codex-runner.mjs`)
Uses `@openai/codex-sdk` via dynamic import (graceful failure if not installed).
**Key differences from Claude:**
- Authentication via `CODEX_API_KEY` environment variable
- Sandbox mode mapping: `bypassPermissions``full-auto`, `default``suggest`
- Session resume via thread ID (Codex's equivalent of session continuity)
- No profile/skill support
- ThreadEvent format differs from Claude's stream-json (parsed by `codex-messages.ts`)
### Ollama Runner (`ollama-runner.ts``ollama-runner.mjs`)
Direct HTTP to Ollama's REST API — zero external dependencies.
**Key differences:**
- No SDK — uses native `fetch()` to `http://localhost:11434/api/chat`
- Health check on startup (`GET /api/tags`)
- NDJSON streaming response from Ollama's `/api/chat` endpoint
- Supports Qwen3's `<think>` tags for reasoning display
- Configurable: host, model, num_ctx, temperature
- Cost is always $0 (local inference)
- No subagent support, no profiles, no skills
---
## Communication Protocol
### Messages from Rust to Sidecar (stdin)
```typescript
// Query — start a new agent session
{
"type": "query",
"session_id": "uuid",
"prompt": "Fix the bug in auth.ts",
"cwd": "/home/user/project",
"provider": "claude",
"model": "claude-sonnet-4-6",
"permission_mode": "bypassPermissions",
"resume_session_id": "previous-uuid", // optional
"system_prompt": "You are an architect...", // optional
"claude_config_dir": "~/.config/switcher-claude/work/", // optional
"setting_sources": ["user", "project"], // optional
"additional_directories": ["/shared/lib"], // optional
"worktree_name": "session-123", // optional
"provider_config": { ... }, // provider-specific blob
"extra_env": { "BTMSG_AGENT_ID": "manager-1" } // optional
}
// Stop — abort a running session
{
"type": "stop",
"session_id": "uuid"
}
```
### Messages from Sidecar to Rust (stdout)
The sidecar writes one JSON object per line (NDJSON). The format depends on the provider, but all messages include a `sessionId` field added by the Rust SidecarManager before forwarding as Tauri events.
**Claude messages** follow the same format as the Claude CLI's `--output-format stream-json`:
```typescript
// System init (carries session ID, model info)
{ "type": "system", "subtype": "init", "session_id": "...", "model": "..." }
// Assistant text
{ "type": "assistant", "message": { "content": [{ "type": "text", "text": "..." }] } }
// Tool use
{ "type": "assistant", "message": { "content": [{ "type": "tool_use", "name": "Read", "input": {...} }] } }
// Tool result
{ "type": "user", "message": { "content": [{ "type": "tool_result", "content": "..." }] } }
// Final result
{ "type": "result", "subtype": "success", "cost_usd": 0.05, "duration_ms": 12000, ... }
// Error
{ "type": "agent_error", "error": "Claude CLI not found" }
```
---
## Environment Variable Stripping
When Agent Orchestrator is launched from within a Claude Code terminal session, the parent process sets `CLAUDE*` environment variables for nesting detection and sandbox configuration. If these leak to the sidecar, Claude's SDK detects nesting and either errors or behaves unexpectedly.
The solution is **dual-layer stripping**:
1. **Rust layer (primary):** `SidecarManager` calls `env_clear()` on the child process command, then explicitly sets only the variables needed (`PATH`, `HOME`, `USER`, etc.). This prevents any parent environment from leaking.
2. **JavaScript layer (defense-in-depth):** Each runner also strips provider-specific variables via `strip_provider_env_var()`:
- Claude: strips all `CLAUDE*` keys (whitelists `CLAUDE_CODE_EXPERIMENTAL_*`)
- Codex: strips all `CODEX*` keys
- Ollama: strips all `OLLAMA*` keys (except `OLLAMA_HOST`)
The `extra_env` field in AgentQueryOptions allows injecting specific variables (like `BTMSG_AGENT_ID` for Tier 1 agents) after stripping.
---
## Sidecar Lifecycle
### Startup
The SidecarManager is initialized during Tauri app setup. It does not spawn any sidecar processes at startup — processes are spawned on-demand when the first agent query arrives.
### Runtime Resolution
When a query arrives, `resolve_sidecar_for_provider(provider)` finds the appropriate runner:
1. Looks for `{provider}-runner.mjs` in the sidecar dist directory
2. Checks for Deno first (`deno` or `~/.deno/bin/deno`), then Node.js
3. Returns a `SidecarCommand` struct with the runtime binary and script path
4. If neither runtime is found, returns an error
Deno is preferred because it has faster cold-start time (~50ms vs ~150ms for Node.js) and can compile to a single binary for distribution.
### Crash Recovery (SidecarSupervisor)
The `SidecarSupervisor` in `agor-core/src/supervisor.rs` provides automatic crash recovery:
- Monitors the sidecar child process for unexpected exits
- On crash: waits with exponential backoff (1s → 2s → 4s → 8s → 16s → 30s cap)
- Maximum 5 restart attempts before giving up
- Reports health via `SidecarHealth` enum: `Healthy`, `Restarting { attempt, next_retry }`, `Failed { attempts, last_error }`
- 17 unit tests covering all recovery scenarios
### Shutdown
On app exit, `SidecarManager` sends stop messages to all active sessions and kills remaining child processes. The `Drop` implementation ensures cleanup even on panic.
---
## Build Pipeline
```bash
# Build all 3 runner bundles
npm run build:sidecar
# Internally runs esbuild 3 times:
# sidecar/claude-runner.ts → sidecar/dist/claude-runner.mjs
# sidecar/codex-runner.ts → sidecar/dist/codex-runner.mjs
# sidecar/ollama-runner.ts → sidecar/dist/ollama-runner.mjs
```
Each bundle is a standalone ESM file with all dependencies included. The Claude runner bundles `@anthropic-ai/claude-agent-sdk` directly — no `node_modules` needed at runtime. The Codex runner uses dynamic import for `@openai/codex-sdk` (graceful failure if not installed). The Ollama runner has zero external dependencies.
The built `.mjs` files are included as Tauri resources in `tauri.conf.json` and copied to the app bundle during `tauri build`.
---
## Message Adapter Layer
On the frontend, raw sidecar messages pass through a provider-specific adapter before reaching the agent store:
```
Sidecar stdout → Rust SidecarManager → Tauri event
→ agent-dispatcher.ts
→ message-adapters.ts (registry)
→ claude-messages.ts / codex-messages.ts / ollama-messages.ts
→ AgentMessage[] (common type)
→ agents.svelte.ts store
```
The `AgentMessage` type is provider-agnostic:
```typescript
interface AgentMessage {
id: string;
type: 'text' | 'tool_call' | 'tool_result' | 'thinking' | 'init'
| 'status' | 'cost' | 'error' | 'hook';
parentId?: string; // for subagent tracking
content: unknown; // type-specific payload
timestamp: number;
}
```
This means the agent store and AgentPane rendering code never need to know which provider generated a message. The adapter layer is the only code that understands provider-specific formats.
### Test Coverage
- `claude-messages.test.ts` — 25 tests covering all Claude message types
- `codex-messages.test.ts` — 19 tests covering all Codex ThreadEvent types
- `ollama-messages.test.ts` — 11 tests covering all Ollama chunk types