docs: complete reorganization — move remaining docs into subdirectories

2026-03-17 04:22:38 +01:00 · 2026-03-17 04:22:38 +01:00 · 493b436eef
commit 493b436eef
parent 8641f260f7
5 changed files with 669 additions and 0 deletions
--- a/docs/architecture/decisions.md
+++ b/docs/architecture/decisions.md
@ -0,0 +1,51 @@
+# Architecture Decisions Log
+
+This document records significant architecture decisions made during the development of Agent Orchestrator (agor). Each entry captures the decision, its rationale, and the date it was made. Decisions are listed chronologically within each category.
+
+---
+
+## Data & Configuration
+
+| Decision | Rationale | Date |
+|----------|-----------|------|
+| JSON for groups config, SQLite for session state | JSON is human-editable, shareable, version-controllable. SQLite for ephemeral runtime state. Load at startup only — no hot-reload, no split-brain risk. | 2026-03-07 |
+| btmsg/bttask shared SQLite DB | Both CLI tools share `~/.local/share/agor/btmsg.db`. Single DB simplifies deployment — agents already have the path. Read-only for non-Manager roles via CLI permissions. | 2026-03-11 |
+
+## Layout & UI
+
+| Decision | Rationale | Date |
+|----------|-----------|------|
+| Adaptive project count from viewport width | `Math.min(projects.length, Math.max(1, Math.floor(containerWidth / 520)))` — 5 at 5120px, 3 at 1920px, scroll-snap for overflow. min-width 480px. Better than forcing 5 at all sizes. | 2026-03-07 |
+| Flexbox + scroll-snap over CSS Grid | Allows horizontal scroll on narrow screens. Scroll-snap gives clean project-to-project scrolling. | 2026-03-07 |
+| Team panel: inline >2560px, overlay <2560px | Adapts to available space. Collapsed when no subagents running. Saves ~240px on smaller screens. | 2026-03-07 |
+| VSCode-style left sidebar (replaces top tab bar) | Vertical icon rail (2.75rem) + expandable drawer (max 50%) + always-visible workspace. Settings is a regular tab, not a special drawer. ProjectGrid always visible. Ctrl+B toggles. | 2026-03-08 |
+| CSS relative units (rule 18) | rem/em for all layout CSS. Pixels only for icon sizes, borders, box shadows. Exception: `--ui-font-size`/`--term-font-size` store px for xterm.js API. | 2026-03-08 |
+| Project accent colors from Catppuccin palette | Visual distinction: blue/green/mauve/peach/pink per slot 1-5. Applied to border + header tint via `var(--accent)`. | 2026-03-07 |
+
+## Agent Architecture
+
+| Decision | Rationale | Date |
+|----------|-----------|------|
+| Single shared sidecar (v3.0) | Existing multiplexed protocol handles concurrent sessions. Per-project pool deferred to v3.1 if crash isolation needed. Saves ~200MB RAM. | 2026-03-07 |
+| xterm budget: 4 active, unlimited suspended | WebKit2GTK OOM at ~5 instances. Serialize scrollback to text buffer, destroy xterm, recreate on focus. PTY stays alive. Suspend/resume < 50ms. | 2026-03-07 |
+| AgentPane splits into AgentSession + TeamAgentsPanel | Team agents shown inline in right panel, not as separate panes. Saves xterm/pane slots. | 2026-03-07 |
+| Tier 1 agents as ProjectBoxes via `agentToProject()` | Agents render as full ProjectBoxes (not separate UI). `getAllWorkItems()` merges agents + projects. Unified rendering = less code, same capabilities. | 2026-03-11 |
+| `extra_env` 5-layer passthrough for BTMSG_AGENT_ID | TS -> Rust AgentQueryOptions -> NDJSON -> JS runner -> SDK env. Minimal surface — only agent projects get env injection. | 2026-03-11 |
+| Periodic system prompt re-injection (1 hour) | LLM context degrades over long sessions. 1-hour timer re-sends role/tools reminder when agent is idle. `autoPrompt`/`onautopromptconsumed` callback pattern. | 2026-03-11 |
+| Role-specific tabs via conditional rendering | Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks. PERSISTED-LAZY pattern (mount on first activation). Conditional on `isAgent && agentRole`. | 2026-03-11 |
+| PlantUML via plantuml.com server (~h hex encoding) | Avoids Java dependency. Hex encoding simpler than deflate+base64. Works with free tier. Trade-off: requires internet. | 2026-03-11 |
+
+## Themes & Typography
+
+| Decision | Rationale | Date |
+|----------|-----------|------|
+| All 17 themes map to `--ctp-*` CSS vars | 4 Catppuccin + 7 Editor + 6 Deep Dark themes. All map to same 26 CSS custom properties — zero component changes when adding themes. Pure data operation. | 2026-03-07 |
+| Typography via CSS custom properties | `--ui-font-family`/`--ui-font-size` + `--term-font-family`/`--term-font-size` in `:root`. Restored by `initTheme()` on startup. Persisted as SQLite settings. | 2026-03-07 |
+
+## System Design
+
+| Decision | Rationale | Date |
+|----------|-----------|------|
+| Keyboard shortcut layers: App > Workspace > Terminal | Prevents conflicts. Terminal captures raw keys only when focused. App layer uses Ctrl+K/G/B. | 2026-03-07 |
+| Unmount/remount on group switch | Serialize xterm scrollbacks, destroy, remount new group. <100ms perceived. Frees ~80MB per switch. | 2026-03-07 |
+| Remote machines deferred to v3.1 | Elevate to project level (`project.remote_machine_id`) but don't implement in MVP. Focus on local orchestration first. | 2026-03-07 |
--- a/docs/architecture/findings.md
+++ b/docs/architecture/findings.md
@ -0,0 +1,160 @@
+# Research Findings
+
+Research conducted during development — technology evaluations, architecture reviews, performance measurements, and design analysis. Each finding informed implementation decisions recorded in [decisions.md](decisions.md).
+
+---
+
+## 1. Claude Agent SDK
+
+**Source:** https://platform.claude.com/docs/en/agent-sdk/overview
+
+The Claude Agent SDK provides structured streaming, subagent detection, hooks, and telemetry — everything needed for a rich agent UI without terminal emulation.
+
+**Key Insight:** The SDK gives structured data — we render it as rich UI (markdown, diff views, file cards, agent trees) instead of raw terminal text. Terminal emulation (xterm.js) is only needed for SSH, local shell, and legacy CLI sessions.
+
+---
+
+## 2. Tauri + xterm.js Integration
+
+Integration pattern: `Frontend (xterm.js) <-> Tauri IPC <-> Rust PTY (portable-pty) <-> Shell/SSH/Claude`
+
+Existing projects (tauri-terminal, Terminon, tauri-plugin-pty) validated the approach.
+
+---
+
+## 3. Terminal Performance Benchmarks
+
+| Terminal | Latency | Notes |
+|----------|---------|-------|
+| xterm (native) | ~10ms | Gold standard |
+| Alacritty | ~12ms | GPU-rendered Rust |
+| VTE (GNOME Terminal) | ~50ms | GTK3/4 |
+| Hyper (Electron+xterm.js) | ~40ms | Web-based worst case |
+
+xterm.js in Tauri: ~20-30ms latency, ~20MB per instance. For AI output, perfectly fine. VTE in v1 GTK3 was actually slower at ~50ms.
+
+---
+
+## 4. Frontend Framework Choice
+
+**Why Svelte 5:** Fine-grained reactivity (`$state`/`$derived` runes), no VDOM (critical for 4-8 panes streaming simultaneously), ~5KB runtime vs React's ~40KB. Larger ecosystem than Solid.js.
+
+---
+
+## 5. Adversarial Architecture Review (v3)
+
+Three specialized agents reviewed the v3 Mission Control architecture before implementation. Caught 12 issues (4 critical) that would have required expensive rework if discovered later.
+
+### Critical Issues Found
+
+| # | Issue | Resolution |
+|---|-------|------------|
+| 1 | xterm.js 4-instance ceiling (WebKit2GTK OOM) | Budget system with suspend/resume |
+| 2 | Single sidecar = SPOF | Supervisor with crash recovery, per-project pool deferred |
+| 3 | Layout store has no workspace concept | Full rewrite to workspace.svelte.ts |
+| 4 | 384px per project on 1920px (too narrow) | Adaptive count from viewport width |
+
+8 more issues (Major/Minor) resolved before implementation.
+
+---
+
+## 6. Provider Adapter Coupling Analysis (v3)
+
+Before implementing multi-provider support, mapped every Claude-specific dependency. 13+ files classified into 4 severity levels.
+
+### Key Insights
+
+1. **Sidecar is the natural abstraction boundary.** Each provider needs its own runner.
+2. **Message format is the main divergence point.** Per-provider adapters normalize to `AgentMessage`.
+3. **Capability flags eliminate provider switches.** UI checks `capabilities.hasProfiles` instead of `provider === 'claude'`.
+4. **Env var stripping is provider-specific.**
+
+---
+
+## 7. Codebase Reuse Analysis: v2 to v3
+
+### Survived (with modifications)
+
+| Component | Modifications |
+|-----------|---------------|
+| TerminalPane.svelte | Added suspend/resume lifecycle |
+| MarkdownPane.svelte | Unchanged |
+| AgentTree.svelte | Reused inside AgentSession |
+| agents.svelte.ts | Added projectId field |
+| theme.svelte.ts | Unchanged |
+| notifications.svelte.ts | Unchanged |
+| All adapters | Minor updates for provider routing |
+| All Rust backend | Added new modules (btmsg, bttask, search, secrets, plugins) |
+
+### Replaced
+
+| v2 Component | v3 Replacement | Reason |
+|-------------|---------------|--------|
+| layout.svelte.ts | workspace.svelte.ts | Pane-based -> project-group model |
+| TilingGrid.svelte | ProjectGrid.svelte | Free-form grid -> fixed project boxes |
+| PaneContainer.svelte | ProjectBox.svelte | Generic pane -> 11-tab container |
+| SettingsDialog.svelte | SettingsTab.svelte | Modal -> sidebar drawer |
+| AgentPane.svelte | AgentSession + TeamAgentsPanel | Monolithic -> split for teams |
+| App.svelte | Full rewrite | VSCode-style sidebar layout |
+
+---
+
+## 8. Session Anchor Design (v3)
+
+### Problem
+
+When Claude's context window fills (~80% of model limit), the SDK automatically compacts older turns. Important early decisions and debugging breakthroughs can be permanently lost.
+
+### Design Decisions
+
+1. **Auto-anchor on first compaction** — Captures first 3 turns automatically.
+2. **Observation masking** — Tool outputs compacted, reasoning preserved in full.
+3. **Budget system** — Fixed scales (2K/6K/12K/20K tokens) instead of percentage-based.
+4. **Re-injection via system prompt** — Simplest SDK integration.
+
+---
+
+## 9. Multi-Agent Orchestration Design (v3)
+
+| Approach | Decision |
+|----------|----------|
+| Claude Agent Teams (native) | Supported but not primary (experimental, resume broken) |
+| Message bus (Redis/NATS) | Rejected (runtime dependency) |
+| Shared SQLite + CLI tools | **Selected** (zero deps, agents use shell) |
+| MCP server for agent comm | Rejected (overhead, complexity) |
+
+**Why SQLite + CLI:** Agents have full shell access. Python CLI tools reading/writing SQLite is lowest friction. Zero configuration, no runtime services, WAL handles concurrency.
+
+---
+
+## 10. Theme System Evolution
+
+All 17 themes (4 Catppuccin + 7 Editor + 6 Deep Dark) map to the same 26 `--ctp-*` CSS custom properties. No component ever needs to know which theme is active. Adding new themes is a pure data operation.
+
+---
+
+## 11. Performance Measurements (v3)
+
+### xterm.js Canvas Performance (WebKit2GTK, no WebGL)
+
+- Latency: ~20-30ms per keystroke
+- Memory: ~20MB per active instance
+- OOM threshold: ~5 simultaneous instances
+- Mitigation: 4-instance budget with suspend/resume
+
+### Tauri IPC Latency
+
+- Linux: ~5ms for typical payloads
+- Terminal keystroke echo: 10-15ms total
+- Agent message forwarding: negligible
+
+### SQLite WAL Concurrent Access
+
+WAL mode with 5s busy_timeout handles concurrent access reliably. 5-minute checkpoint prevents WAL growth.
+
+### Workspace Switch Latency
+
+- Serialize 4 xterm scrollbacks: ~30ms
+- Destroy + unmount: ~15ms
+- Mount new group + create xterm: ~55ms
+- **Total perceived: ~100ms**
--- a/docs/architecture/phases.md
+++ b/docs/architecture/phases.md
@ -0,0 +1,125 @@
+# Implementation Phases
+
+See [overview.md](overview.md) for system architecture and [decisions.md](decisions.md) for design decisions.
+
+---
+
+## Phase 1: Project Scaffolding [complete]
+
+- Tauri 2.x + Svelte 5 frontend initialized
+- Catppuccin Mocha CSS variables, dev scripts
+- portable-pty (used by WezTerm) over tauri-plugin-pty for reliability
+
+---
+
+## Phase 2: Terminal Pane + Layout [complete]
+
+- CSS Grid layout with responsive breakpoints (ultrawide / standard / narrow)
+- Pane resize via drag handles, layout presets (1-col, 2-col, 3-col, 2x2, master+stack)
+- xterm.js with Canvas addon (no WebGL on WebKit2GTK), Catppuccin theme
+- PTY spawn from Rust (portable-pty), stream via Tauri events
+- Copy/paste (Ctrl+Shift+C/V), SSH via PTY shell args
+
+---
+
+## Phase 3: Agent SDK Integration [complete]
+
+- Node.js/Deno sidecar using `@anthropic-ai/claude-agent-sdk` query() function
+- Sidecar communication: Rust spawns process, stdio NDJSON
+- SDK message adapter: 9 typed AgentMessage types
+- Agent store with session state, message history, cost tracking
+- AgentPane: text, tool calls/results, thinking, init, cost, errors, subagent spawn
+- Session resume (resume_session_id to SDK)
+
+---
+
+## Phase 4: Session Management + Markdown Viewer [complete]
+
+- SQLite persistence (rusqlite), session groups with collapsible headers
+- Auto-restore layout on startup
+- Markdown viewer with Shiki highlighting and live reload via file watcher
+
+---
+
+## Phase 5: Agent Tree + Polish [complete]
+
+- SVG agent tree visualization with click-to-scroll and subtree cost
+- Terminal theme hot-swap, pane drag-resize handles
+- StatusBar with counts, notifications (toast system)
+- Settings dialog, ctx integration, SSH session management
+- 4 Catppuccin themes, detached pane mode, Shiki syntax highlighting
+
+---
+
+## Phase 6: Packaging + Distribution [complete]
+
+- install-v2.sh build-from-source installer (Node.js 20+, Rust 1.77+, system libs)
+- Tauri bundle: .deb (4.3 MB) + AppImage (103 MB)
+- GitHub Actions release workflow on `v*` tags
+- Auto-updater with signing key
+
+---
+
+## Phase 7: Agent Teams / Subagent Support [complete]
+
+- Agent store parent/child hierarchy
+- Dispatcher subagent detection and message routing
+- AgentPane parent navigation + children bar
+- Subagent cost aggregation
+- 28 dispatcher tests including 10 for subagent routing
+
+---
+
+## Multi-Machine Support (Phases A-D) [complete]
+
+Architecture in [../multi-machine/relay.md](../multi-machine/relay.md).
+
+### Phase A: Extract `agor-core` crate
+
+Cargo workspace with PtyManager, SidecarManager, EventSink trait in shared crate.
+
+### Phase B: Build `agor-relay` binary
+
+WebSocket server with token auth, rate limiting, per-connection isolation, structured command responses.
+
+### Phase C: Add `RemoteManager` to controller
+
+12 Tauri commands, heartbeat ping, exponential backoff reconnection with TCP probing.
+
+### Phase D: Frontend integration
+
+remote-bridge.ts adapter, machines.svelte.ts store, routing via Pane.remoteMachineId.
+
+### Remaining
+
+- [ ] Real-world relay testing (2 machines)
+- [ ] TLS/certificate pinning
+
+---
+
+## Extras: Claude Profiles & Skill Discovery [complete]
+
+### Claude Profile / Account Switching
+
+- Reads ~/.config/switcher/profiles/ with profile.toml metadata
+- Profile selector dropdown, config_dir passed as CLAUDE_CONFIG_DIR env override
+
+### Skill Discovery & Autocomplete
+
+- Reads ~/.claude/skills/ (dirs with SKILL.md or .md files)
+- `/` prefix triggers autocomplete menu in AgentPane
+- expandSkillPrompt() injects skill content as prompt
+
+### Extended AgentQueryOptions
+
+- setting_sources, system_prompt, model, claude_config_dir, additional_directories
+- CLAUDE_CONFIG_DIR env injection for multi-account support
+
+---
+
+## System Requirements
+
+- Node.js 20+ (for Agent SDK sidecar)
+- Rust 1.77+ (for building from source)
+- WebKit2GTK 4.1+ (Tauri runtime)
+- Linux x86_64 (primary target)
--- a/docs/contributing/testing.md
+++ b/docs/contributing/testing.md
@ -0,0 +1,178 @@
+# E2E Testing Facility
+
+Agor's end-to-end testing uses **WebDriverIO + tauri-driver** to drive the real Tauri application through WebKit2GTK's inspector protocol. The facility has three pillars:
+
+1. **Test Fixtures** — isolated fake environments with dummy projects
+2. **Test Mode** — app-level env vars that disable watchers and redirect data/config paths
+3. **LLM Judge** — Claude-powered semantic assertions for evaluating agent behavior
+
+## Quick Start
+
+```bash
+# Run all tests (vitest + cargo + E2E)
+npm run test:all:e2e
+
+# Run E2E only (requires pre-built debug binary)
+SKIP_BUILD=1 npm run test:e2e
+
+# Build debug binary separately (faster iteration)
+cargo tauri build --debug --no-bundle
+
+# Run with LLM judge via CLI (default, auto-detected)
+npm run test:e2e
+
+# Force LLM judge to use API instead of CLI
+LLM_JUDGE_BACKEND=api ANTHROPIC_API_KEY=sk-... npm run test:e2e
+```
+
+## Prerequisites
+
+| Dependency | Purpose | Install |
+|-----------|---------|---------|
+| Rust + Cargo | Build Tauri backend | [rustup.rs](https://rustup.rs) |
+| Node.js 20+ | Frontend + test runner | `mise install node` |
+| tauri-driver | WebDriver bridge to WebKit2GTK | `cargo install tauri-driver` |
+| X11 display | WebKit2GTK needs a display | Real X, or `xvfb-run` in CI |
+| Claude CLI | LLM judge (optional) | [claude.ai/download](https://claude.ai/download) |
+
+## Architecture
+
+```
+-----------------------------------------------------+
+| WebDriverIO (mocha runner)                          |
+|   specs/*.test.ts                                   |
+|     +- browser.execute() -> DOM queries + assertions |
+|     +- assertWithJudge() -> LLM semantic evaluation |
+-----------------------------------------------------+
+| tauri-driver (port 4444)                            |
+|   WebDriver protocol <-> WebKit2GTK inspector        |
+-----------------------------------------------------+
+| Agor debug binary                                   |
+|   AGOR_TEST=1 (disables watchers, wake scheduler)   |
+|   AGOR_TEST_DATA_DIR -> isolated SQLite DBs          |
+|   AGOR_TEST_CONFIG_DIR -> test groups.json           |
+-----------------------------------------------------+
+```
+
+## Pillar 1: Test Fixtures (`fixtures.ts`)
+
+The fixture generator creates isolated temporary environments so tests never touch real user data. Each fixture includes:
+
+- **Temp root dir** under `/tmp/agor-e2e-{timestamp}/`
+- **Data dir** — empty, SQLite databases created at runtime
+- **Config dir** — contains a generated `groups.json` with test projects
+- **Project dir** — a real git repo with `README.md` and `hello.py` (for agent testing)
+
+### Single-Project Fixture
+
+```typescript
+import { createTestFixture, destroyTestFixture } from '../fixtures';
+
+const fixture = createTestFixture('my-test');
+// fixture.rootDir    -> /tmp/my-test-1710234567890/
+// fixture.dataDir    -> /tmp/my-test-1710234567890/data/
+// fixture.configDir  -> /tmp/my-test-1710234567890/config/
+// fixture.projectDir -> /tmp/my-test-1710234567890/test-project/
+// fixture.env        -> { AGOR_TEST: '1', AGOR_TEST_DATA_DIR: '...', ... }
+
+destroyTestFixture(fixture);
+```
+
+### Multi-Project Fixture
+
+```typescript
+import { createMultiProjectFixture } from '../fixtures';
+const fixture = createMultiProjectFixture(3); // 3 separate git repos
+```
+
+### Fixture Environment Variables
+
+| Variable | Effect |
+|----------|--------|
+| `AGOR_TEST=1` | Disables file watchers, wake scheduler, enables `is_test_mode` |
+| `AGOR_TEST_DATA_DIR` | Redirects `sessions.db` and `btmsg.db` storage |
+| `AGOR_TEST_CONFIG_DIR` | Redirects `groups.json` config loading |
+
+## Pillar 2: Test Mode
+
+When `AGOR_TEST=1` is set:
+
+- **Rust backend**: `watcher.rs` and `fs_watcher.rs` skip file watchers
+- **Frontend**: `is_test_mode` Tauri command returns true, wake scheduler disabled via `disableWakeScheduler()`
+- **Data isolation**: `AGOR_TEST_DATA_DIR` / `AGOR_TEST_CONFIG_DIR` override default paths
+
+The WebDriverIO config (`wdio.conf.js`) passes these env vars via `tauri:options.env` in capabilities.
+
+## Pillar 3: LLM Judge (`llm-judge.ts`)
+
+The LLM judge enables semantic assertions — evaluating whether agent output "looks right" rather than exact string matching.
+
+### Dual Backend
+
+| Backend | How it works | Requires |
+|---------|-------------|----------|
+| `cli` (default) | Spawns `claude` CLI with `--output-format text` | Claude CLI installed |
+| `api` | Raw `fetch` to `https://api.anthropic.com/v1/messages` | `ANTHROPIC_API_KEY` env var |
+
+**Auto-detection order**: CLI first -> API fallback -> skip test.
+
+### API
+
+```typescript
+import { isJudgeAvailable, judge, assertWithJudge } from '../llm-judge';
+
+if (!isJudgeAvailable()) { this.skip(); return; }
+
+const verdict = await judge(
+  'The output should contain a file listing with at least one filename',
+  actualOutput,
+  'Agent was asked to list files in a directory containing README.md',
+);
+// verdict: { pass: boolean, reasoning: string, confidence: number }
+```
+
+## Test Spec Files
+
+| File | Phase | Tests | Focus |
+|------|-------|-------|-------|
+| `agor.test.ts` | Smoke | ~50 | Basic UI rendering, CSS class selectors |
+| `agent-scenarios.test.ts` | A | 22 | `data-testid` selectors, 7 deterministic scenarios |
+| `phase-b.test.ts` | B | ~15 | Multi-project grid, LLM-judged agent responses |
+| `phase-c.test.ts` | C | 27 | Hardening features (palette, search, notifications, keyboard, settings, health, metrics, context, files) |
+
+## Test Results Tracking (`results-db.ts`)
+
+A lightweight JSON store for tracking test runs and individual step results. Writes to `test-results/results.json`.
+
+## CI Integration (`.github/workflows/e2e.yml`)
+
+1. **Unit tests** — `npm run test` (vitest)
+2. **Cargo tests** — `cargo test` (with `env -u AGOR_TEST` to prevent env leakage)
+3. **E2E tests** — `xvfb-run npm run test:e2e` (virtual framebuffer for headless WebKit2GTK)
+
+LLM-judged tests are gated on the `ANTHROPIC_API_KEY` secret — they skip gracefully in forks.
+
+## Writing New Tests
+
+1. Pick the appropriate spec file (or create a new phase file)
+2. Use `data-testid` selectors where possible
+3. For DOM queries, use `browser.execute()` to run JS in the app context
+4. For semantic assertions, use `assertWithJudge()` with clear criteria
+
+### WebDriverIO Config (`wdio.conf.js`)
+
+- **Single session**: `maxInstances: 1` — tauri-driver can't handle parallel sessions
+- **Lifecycle**: `onPrepare` builds debug binary, `beforeSession` spawns tauri-driver with TCP readiness probe
+- **Timeouts**: 60s per test, 10s waitfor, 30s connection retry
+- **Skip build**: Set `SKIP_BUILD=1` to reuse existing binary
+
+## Troubleshooting
+
+| Problem | Solution |
+|---------|----------|
+| "Callback was not called before unload" | Stale binary — rebuild with `cargo tauri build --debug --no-bundle` |
+| Tests hang on startup | Kill stale `tauri-driver` processes: `pkill -f tauri-driver` |
+| All tests skip LLM judge | Install Claude CLI or set `ANTHROPIC_API_KEY` |
+| SIGUSR2 / exit code 144 | Stale tauri-driver on port 4444 — kill and retry |
+| `AGOR_TEST` leaking to cargo | Run cargo tests with `env -u AGOR_TEST cargo test` |
+| No display available | Use `xvfb-run` or ensure X11/Wayland display is set |
--- a/docs/multi-machine/relay.md
+++ b/docs/multi-machine/relay.md
@ -0,0 +1,155 @@
+# Multi-Machine Support
+
+**Status: Implemented (Phases A-D complete, 2026-03-06)**
+
+## Overview
+
+Extends agor to manage Claude agent sessions and terminal panes running on **remote machines** over WebSocket, while keeping the local sidecar path unchanged.
+
+## Architecture
+
+### Three-Layer Model
+
+```
+----------------------------------------------------------------+
+|  Agent Orchestrator (Controller)                                |
+|                                                                |
+|  +----------+    Tauri IPC    +------------------------------+ |
+|  | WebView  | <------------> | Rust Backend                 | |
+|  | (Svelte) |                |                              | |
+|  +----------+                |  +-- PtyManager (local)      | |
+|                              |  +-- SidecarManager (local)  | |
+|                              |  +-- RemoteManager ----------+-+
+|                              +------------------------------+ |
+----------------------------------------------------------------+
+        |                                      |
+        | (local stdio)                        | (WebSocket wss://)
+        v                                      v
+  +-----------+                    +----------------------+
+  | Local     |                    | Remote Machine       |
+  | Sidecar   |                    |  +--------------+    |
+  | (Deno/    |                    |  | agor-relay   |    |
+  |  Node.js) |                    |  | (Rust binary) |    |
+  +-----------+                    |  |              |    |
+                                   |  | +-- PTY mgr  |    |
+                                   |  | +-- Sidecar  |    |
+                                   |  | +-- WS server|    |
+                                   |  +--------------+    |
+                                   +----------------------+
+```
+
+### Components
+
+#### 1. `agor-relay` — Remote Agent (Rust binary)
+
+A standalone Rust binary that runs on each remote machine:
+- Listens on a WebSocket port (default: 9750)
+- Manages local PTYs and sidecar processes
+- Forwards NDJSON events to the controller over WebSocket
+- Receives commands (query, stop, resize, write) from the controller
+
+Reuses `PtyManager` and `SidecarManager` from `agor-core`.
+
+#### 2. `RemoteManager` — Controller-Side
+
+Module in `src-tauri/src/remote.rs`. Manages WebSocket connections to multiple relays. 12 Tauri commands for remote operations.
+
+#### 3. Frontend Adapters — Unified Interface
+
+The frontend doesn't care whether a pane is local or remote. Bridge adapters check `remoteMachineId` and route accordingly.
+
+## Protocol
+
+### WebSocket Wire Format
+
+Same NDJSON as local sidecar, wrapped in an envelope for multiplexing:
+
+```typescript
+// Controller -> Relay (commands)
+interface RelayCommand {
+  id: string;
+  type: 'pty_create' | 'pty_write' | 'pty_resize' | 'pty_close'
+      | 'agent_query' | 'agent_stop' | 'sidecar_restart' | 'ping';
+  payload: Record<string, unknown>;
+}
+
+// Relay -> Controller (events)
+interface RelayEvent {
+  type: 'pty_data' | 'pty_exit' | 'pty_created'
+      | 'sidecar_message' | 'sidecar_exited'
+      | 'error' | 'pong' | 'ready';
+  sessionId?: string;
+  payload: unknown;
+}
+```
+
+### Authentication
+
+1. **Pre-shared token** — relay starts with `--token <secret>`. Controller sends token in WebSocket upgrade headers.
+2. **TLS required** — relay rejects non-TLS connections in production mode. Dev mode allows `ws://` with `--insecure` flag.
+3. **Rate limiting** — 10 failed auth attempts triggers 5-minute lockout.
+
+### Reconnection
+
+- Exponential backoff: 1s, 2s, 4s, 8s, 16s, 30s cap
+- Uses `attempt_tcp_probe()`: TCP-only, 5s timeout (avoids allocating resources on relay during probes)
+- Emits `remote-machine-reconnecting` and `remote-machine-reconnect-ready` events
+- Active agent sessions continue on relay regardless of controller connection
+
+### Session Persistence Across Reconnects
+
+Remote agents keep running even when the controller disconnects. On reconnect:
+1. Relay sends state sync with active sessions and PTYs
+2. Controller reconciles and updates pane states
+3. Missed messages are NOT replayed (agent panes show "reconnected" notice)
+
+## Implementation Summary
+
+### Phase A: Extract `agor-core` crate
+
+Cargo workspace with PtyManager, SidecarManager, EventSink trait extracted to shared crate.
+
+### Phase B: Build `agor-relay` binary
+
+WebSocket server with token auth, per-connection isolated managers, structured command responses with commandId correlation.
+
+### Phase C: Add `RemoteManager` to controller
+
+12 Tauri commands, heartbeat ping every 15s, exponential backoff reconnection.
+
+### Phase D: Frontend integration
+
+`remote-bridge.ts` adapter, `machines.svelte.ts` store, `Pane.remoteMachineId` routing field.
+
+### Remaining Work
+
+- [ ] Real-world relay testing (2 machines)
+- [ ] TLS/certificate pinning
+
+## Security
+
+| Threat | Mitigation |
+|--------|-----------|
+| Token interception | TLS required |
+| Token brute-force | Rate limit + lockout |
+| Relay impersonation | Certificate pinning (future: mTLS) |
+| Command injection | Payload schema validation |
+| Lateral movement | Unprivileged user, no shell beyond PTY/sidecar |
+| Data exfiltration | Agent output streams to controller only |
+
+## Performance
+
+| Concern | Mitigation |
+|---------|-----------|
+| WebSocket latency | LAN: <1ms, WAN: 20-100ms (acceptable for text) |
+| Bandwidth | Agent NDJSON: ~50KB/s peak, Terminal: ~200KB/s peak |
+| Connection count | Max 10 machines (UI constraint) |
+| Message ordering | Single WebSocket per machine = ordered delivery |
+
+## Future (Not Covered)
+
+- Multi-controller (multiple agor instances observing same relay)
+- Relay discovery (mDNS/Bonjour)
+- Agent migration between machines
+- Relay-to-relay communication
+- mTLS for enterprise environments