docs: complete reorganization — move remaining docs into subdirectories
This commit is contained in:
parent
8641f260f7
commit
493b436eef
5 changed files with 669 additions and 0 deletions
155
docs/multi-machine/relay.md
Normal file
155
docs/multi-machine/relay.md
Normal file
|
|
@ -0,0 +1,155 @@
|
|||
# Multi-Machine Support
|
||||
|
||||
**Status: Implemented (Phases A-D complete, 2026-03-06)**
|
||||
|
||||
## Overview
|
||||
|
||||
Extends agor to manage Claude agent sessions and terminal panes running on **remote machines** over WebSocket, while keeping the local sidecar path unchanged.
|
||||
|
||||
## Architecture
|
||||
|
||||
### Three-Layer Model
|
||||
|
||||
```
|
||||
+----------------------------------------------------------------+
|
||||
| Agent Orchestrator (Controller) |
|
||||
| |
|
||||
| +----------+ Tauri IPC +------------------------------+ |
|
||||
| | WebView | <------------> | Rust Backend | |
|
||||
| | (Svelte) | | | |
|
||||
| +----------+ | +-- PtyManager (local) | |
|
||||
| | +-- SidecarManager (local) | |
|
||||
| | +-- RemoteManager ----------+-+
|
||||
| +------------------------------+ |
|
||||
+----------------------------------------------------------------+
|
||||
| |
|
||||
| (local stdio) | (WebSocket wss://)
|
||||
v v
|
||||
+-----------+ +----------------------+
|
||||
| Local | | Remote Machine |
|
||||
| Sidecar | | +--------------+ |
|
||||
| (Deno/ | | | agor-relay | |
|
||||
| Node.js) | | | (Rust binary) | |
|
||||
+-----------+ | | | |
|
||||
| | +-- PTY mgr | |
|
||||
| | +-- Sidecar | |
|
||||
| | +-- WS server| |
|
||||
| +--------------+ |
|
||||
+----------------------+
|
||||
```
|
||||
|
||||
### Components
|
||||
|
||||
#### 1. `agor-relay` — Remote Agent (Rust binary)
|
||||
|
||||
A standalone Rust binary that runs on each remote machine:
|
||||
- Listens on a WebSocket port (default: 9750)
|
||||
- Manages local PTYs and sidecar processes
|
||||
- Forwards NDJSON events to the controller over WebSocket
|
||||
- Receives commands (query, stop, resize, write) from the controller
|
||||
|
||||
Reuses `PtyManager` and `SidecarManager` from `agor-core`.
|
||||
|
||||
#### 2. `RemoteManager` — Controller-Side
|
||||
|
||||
Module in `src-tauri/src/remote.rs`. Manages WebSocket connections to multiple relays. 12 Tauri commands for remote operations.
|
||||
|
||||
#### 3. Frontend Adapters — Unified Interface
|
||||
|
||||
The frontend doesn't care whether a pane is local or remote. Bridge adapters check `remoteMachineId` and route accordingly.
|
||||
|
||||
## Protocol
|
||||
|
||||
### WebSocket Wire Format
|
||||
|
||||
Same NDJSON as local sidecar, wrapped in an envelope for multiplexing:
|
||||
|
||||
```typescript
|
||||
// Controller -> Relay (commands)
|
||||
interface RelayCommand {
|
||||
id: string;
|
||||
type: 'pty_create' | 'pty_write' | 'pty_resize' | 'pty_close'
|
||||
| 'agent_query' | 'agent_stop' | 'sidecar_restart' | 'ping';
|
||||
payload: Record<string, unknown>;
|
||||
}
|
||||
|
||||
// Relay -> Controller (events)
|
||||
interface RelayEvent {
|
||||
type: 'pty_data' | 'pty_exit' | 'pty_created'
|
||||
| 'sidecar_message' | 'sidecar_exited'
|
||||
| 'error' | 'pong' | 'ready';
|
||||
sessionId?: string;
|
||||
payload: unknown;
|
||||
}
|
||||
```
|
||||
|
||||
### Authentication
|
||||
|
||||
1. **Pre-shared token** — relay starts with `--token <secret>`. Controller sends token in WebSocket upgrade headers.
|
||||
2. **TLS required** — relay rejects non-TLS connections in production mode. Dev mode allows `ws://` with `--insecure` flag.
|
||||
3. **Rate limiting** — 10 failed auth attempts triggers 5-minute lockout.
|
||||
|
||||
### Reconnection
|
||||
|
||||
- Exponential backoff: 1s, 2s, 4s, 8s, 16s, 30s cap
|
||||
- Uses `attempt_tcp_probe()`: TCP-only, 5s timeout (avoids allocating resources on relay during probes)
|
||||
- Emits `remote-machine-reconnecting` and `remote-machine-reconnect-ready` events
|
||||
- Active agent sessions continue on relay regardless of controller connection
|
||||
|
||||
### Session Persistence Across Reconnects
|
||||
|
||||
Remote agents keep running even when the controller disconnects. On reconnect:
|
||||
1. Relay sends state sync with active sessions and PTYs
|
||||
2. Controller reconciles and updates pane states
|
||||
3. Missed messages are NOT replayed (agent panes show "reconnected" notice)
|
||||
|
||||
## Implementation Summary
|
||||
|
||||
### Phase A: Extract `agor-core` crate
|
||||
|
||||
Cargo workspace with PtyManager, SidecarManager, EventSink trait extracted to shared crate.
|
||||
|
||||
### Phase B: Build `agor-relay` binary
|
||||
|
||||
WebSocket server with token auth, per-connection isolated managers, structured command responses with commandId correlation.
|
||||
|
||||
### Phase C: Add `RemoteManager` to controller
|
||||
|
||||
12 Tauri commands, heartbeat ping every 15s, exponential backoff reconnection.
|
||||
|
||||
### Phase D: Frontend integration
|
||||
|
||||
`remote-bridge.ts` adapter, `machines.svelte.ts` store, `Pane.remoteMachineId` routing field.
|
||||
|
||||
### Remaining Work
|
||||
|
||||
- [ ] Real-world relay testing (2 machines)
|
||||
- [ ] TLS/certificate pinning
|
||||
|
||||
## Security
|
||||
|
||||
| Threat | Mitigation |
|
||||
|--------|-----------|
|
||||
| Token interception | TLS required |
|
||||
| Token brute-force | Rate limit + lockout |
|
||||
| Relay impersonation | Certificate pinning (future: mTLS) |
|
||||
| Command injection | Payload schema validation |
|
||||
| Lateral movement | Unprivileged user, no shell beyond PTY/sidecar |
|
||||
| Data exfiltration | Agent output streams to controller only |
|
||||
|
||||
## Performance
|
||||
|
||||
| Concern | Mitigation |
|
||||
|---------|-----------|
|
||||
| WebSocket latency | LAN: <1ms, WAN: 20-100ms (acceptable for text) |
|
||||
| Bandwidth | Agent NDJSON: ~50KB/s peak, Terminal: ~200KB/s peak |
|
||||
| Connection count | Max 10 machines (UI constraint) |
|
||||
| Message ordering | Single WebSocket per machine = ordered delivery |
|
||||
|
||||
## Future (Not Covered)
|
||||
|
||||
- Multi-controller (multiple agor instances observing same relay)
|
||||
- Relay discovery (mDNS/Bonjour)
|
||||
- Agent migration between machines
|
||||
- Relay-to-relay communication
|
||||
- mTLS for enterprise environments
|
||||
Loading…
Add table
Add a link
Reference in a new issue