5.7 KiB
Multi-Machine Support
Status: Implemented (Phases A-D complete, 2026-03-06)
Overview
Extends agor to manage Claude agent sessions and terminal panes running on remote machines over WebSocket, while keeping the local sidecar path unchanged.
Architecture
Three-Layer Model
+----------------------------------------------------------------+
| Agent Orchestrator (Controller) |
| |
| +----------+ Tauri IPC +------------------------------+ |
| | WebView | <------------> | Rust Backend | |
| | (Svelte) | | | |
| +----------+ | +-- PtyManager (local) | |
| | +-- SidecarManager (local) | |
| | +-- RemoteManager ----------+-+
| +------------------------------+ |
+----------------------------------------------------------------+
| |
| (local stdio) | (WebSocket wss://)
v v
+-----------+ +----------------------+
| Local | | Remote Machine |
| Sidecar | | +--------------+ |
| (Deno/ | | | agor-relay | |
| Node.js) | | | (Rust binary) | |
+-----------+ | | | |
| | +-- PTY mgr | |
| | +-- Sidecar | |
| | +-- WS server| |
| +--------------+ |
+----------------------+
Components
1. agor-relay — Remote Agent (Rust binary)
A standalone Rust binary that runs on each remote machine:
- Listens on a WebSocket port (default: 9750)
- Manages local PTYs and sidecar processes
- Forwards NDJSON events to the controller over WebSocket
- Receives commands (query, stop, resize, write) from the controller
Reuses PtyManager and SidecarManager from agor-core.
2. RemoteManager — Controller-Side
Module in src-tauri/src/remote.rs. Manages WebSocket connections to multiple relays. 12 Tauri commands for remote operations.
3. Frontend Adapters — Unified Interface
The frontend doesn't care whether a pane is local or remote. Bridge adapters check remoteMachineId and route accordingly.
Protocol
WebSocket Wire Format
Same NDJSON as local sidecar, wrapped in an envelope for multiplexing:
// Controller -> Relay (commands)
interface RelayCommand {
id: string;
type: 'pty_create' | 'pty_write' | 'pty_resize' | 'pty_close'
| 'agent_query' | 'agent_stop' | 'sidecar_restart' | 'ping';
payload: Record<string, unknown>;
}
// Relay -> Controller (events)
interface RelayEvent {
type: 'pty_data' | 'pty_exit' | 'pty_created'
| 'sidecar_message' | 'sidecar_exited'
| 'error' | 'pong' | 'ready';
sessionId?: string;
payload: unknown;
}
Authentication
- Pre-shared token — relay starts with
--token <secret>. Controller sends token in WebSocket upgrade headers. - TLS required — relay rejects non-TLS connections in production mode. Dev mode allows
ws://with--insecureflag. - Rate limiting — 10 failed auth attempts triggers 5-minute lockout.
Reconnection
- Exponential backoff: 1s, 2s, 4s, 8s, 16s, 30s cap
- Uses
attempt_tcp_probe(): TCP-only, 5s timeout (avoids allocating resources on relay during probes) - Emits
remote-machine-reconnectingandremote-machine-reconnect-readyevents - Active agent sessions continue on relay regardless of controller connection
Session Persistence Across Reconnects
Remote agents keep running even when the controller disconnects. On reconnect:
- Relay sends state sync with active sessions and PTYs
- Controller reconciles and updates pane states
- Missed messages are NOT replayed (agent panes show "reconnected" notice)
Implementation Summary
Phase A: Extract agor-core crate
Cargo workspace with PtyManager, SidecarManager, EventSink trait extracted to shared crate.
Phase B: Build agor-relay binary
WebSocket server with token auth, per-connection isolated managers, structured command responses with commandId correlation.
Phase C: Add RemoteManager to controller
12 Tauri commands, heartbeat ping every 15s, exponential backoff reconnection.
Phase D: Frontend integration
remote-bridge.ts adapter, machines.svelte.ts store, Pane.remoteMachineId routing field.
Remaining Work
- Real-world relay testing (2 machines)
- TLS/certificate pinning
Security
| Threat | Mitigation |
|---|---|
| Token interception | TLS required |
| Token brute-force | Rate limit + lockout |
| Relay impersonation | Certificate pinning (future: mTLS) |
| Command injection | Payload schema validation |
| Lateral movement | Unprivileged user, no shell beyond PTY/sidecar |
| Data exfiltration | Agent output streams to controller only |
Performance
| Concern | Mitigation |
|---|---|
| WebSocket latency | LAN: <1ms, WAN: 20-100ms (acceptable for text) |
| Bandwidth | Agent NDJSON: ~50KB/s peak, Terminal: ~200KB/s peak |
| Connection count | Max 10 machines (UI constraint) |
| Message ordering | Single WebSocket per machine = ordered delivery |
Future (Not Covered)
- Multi-controller (multiple agor instances observing same relay)
- Relay discovery (mDNS/Bonjour)
- Agent migration between machines
- Relay-to-relay communication
- mTLS for enterprise environments