# Multi-Machine Support **Status: Implemented (Phases A-D complete, 2026-03-06)** ## Overview Extends agor to manage Claude agent sessions and terminal panes running on **remote machines** over WebSocket, while keeping the local sidecar path unchanged. ## Architecture ### Three-Layer Model ``` +----------------------------------------------------------------+ | Agent Orchestrator (Controller) | | | | +----------+ Tauri IPC +------------------------------+ | | | WebView | <------------> | Rust Backend | | | | (Svelte) | | | | | +----------+ | +-- PtyManager (local) | | | | +-- SidecarManager (local) | | | | +-- RemoteManager ----------+-+ | +------------------------------+ | +----------------------------------------------------------------+ | | | (local stdio) | (WebSocket wss://) v v +-----------+ +----------------------+ | Local | | Remote Machine | | Sidecar | | +--------------+ | | (Deno/ | | | agor-relay | | | Node.js) | | | (Rust binary) | | +-----------+ | | | | | | +-- PTY mgr | | | | +-- Sidecar | | | | +-- WS server| | | +--------------+ | +----------------------+ ``` ### Components #### 1. `agor-relay` — Remote Agent (Rust binary) A standalone Rust binary that runs on each remote machine: - Listens on a WebSocket port (default: 9750) - Manages local PTYs and sidecar processes - Forwards NDJSON events to the controller over WebSocket - Receives commands (query, stop, resize, write) from the controller Reuses `PtyManager` and `SidecarManager` from `agor-core`. #### 2. `RemoteManager` — Controller-Side Module in `src-tauri/src/remote.rs`. Manages WebSocket connections to multiple relays. 12 Tauri commands for remote operations. #### 3. Frontend Adapters — Unified Interface The frontend doesn't care whether a pane is local or remote. Bridge adapters check `remoteMachineId` and route accordingly. ## Protocol ### WebSocket Wire Format Same NDJSON as local sidecar, wrapped in an envelope for multiplexing: ```typescript // Controller -> Relay (commands) interface RelayCommand { id: string; type: 'pty_create' | 'pty_write' | 'pty_resize' | 'pty_close' | 'agent_query' | 'agent_stop' | 'sidecar_restart' | 'ping'; payload: Record; } // Relay -> Controller (events) interface RelayEvent { type: 'pty_data' | 'pty_exit' | 'pty_created' | 'sidecar_message' | 'sidecar_exited' | 'error' | 'pong' | 'ready'; sessionId?: string; payload: unknown; } ``` ### Authentication 1. **Pre-shared token** — relay starts with `--token `. Controller sends token in WebSocket upgrade headers. 2. **TLS required** — relay rejects non-TLS connections in production mode. Dev mode allows `ws://` with `--insecure` flag. 3. **Rate limiting** — 10 failed auth attempts triggers 5-minute lockout. ### Reconnection - Exponential backoff: 1s, 2s, 4s, 8s, 16s, 30s cap - Uses `attempt_tcp_probe()`: TCP-only, 5s timeout (avoids allocating resources on relay during probes) - Emits `remote-machine-reconnecting` and `remote-machine-reconnect-ready` events - Active agent sessions continue on relay regardless of controller connection ### Session Persistence Across Reconnects Remote agents keep running even when the controller disconnects. On reconnect: 1. Relay sends state sync with active sessions and PTYs 2. Controller reconciles and updates pane states 3. Missed messages are NOT replayed (agent panes show "reconnected" notice) ## Implementation Summary ### Phase A: Extract `agor-core` crate Cargo workspace with PtyManager, SidecarManager, EventSink trait extracted to shared crate. ### Phase B: Build `agor-relay` binary WebSocket server with token auth, per-connection isolated managers, structured command responses with commandId correlation. ### Phase C: Add `RemoteManager` to controller 12 Tauri commands, heartbeat ping every 15s, exponential backoff reconnection. ### Phase D: Frontend integration `remote-bridge.ts` adapter, `machines.svelte.ts` store, `Pane.remoteMachineId` routing field. ### Remaining Work - [ ] Real-world relay testing (2 machines) - [ ] TLS/certificate pinning ## Security | Threat | Mitigation | |--------|-----------| | Token interception | TLS required | | Token brute-force | Rate limit + lockout | | Relay impersonation | Certificate pinning (future: mTLS) | | Command injection | Payload schema validation | | Lateral movement | Unprivileged user, no shell beyond PTY/sidecar | | Data exfiltration | Agent output streams to controller only | ## Performance | Concern | Mitigation | |---------|-----------| | WebSocket latency | LAN: <1ms, WAN: 20-100ms (acceptable for text) | | Bandwidth | Agent NDJSON: ~50KB/s peak, Terminal: ~200KB/s peak | | Connection count | Max 10 machines (UI constraint) | | Message ordering | Single WebSocket per machine = ordered delivery | ## Future (Not Covered) - Multi-controller (multiple agor instances observing same relay) - Relay discovery (mDNS/Bonjour) - Agent migration between machines - Relay-to-relay communication - mTLS for enterprise environments