Full WebSocket architecture spec for remote agent/terminal management: bterminal-relay binary, RemoteManager, NDJSON protocol, pre-shared token + TLS auth, 4-phase implementation plan (A-D).
12 KiB
Multi-Machine Support — Architecture Design
Overview
Extend BTerminal to manage Claude agent sessions and terminal panes running on remote machines over WebSocket, while keeping the local sidecar path unchanged.
Problem
Current architecture is local-only:
WebView ←→ Rust (Tauri IPC) ←→ Local Sidecar (stdio NDJSON)
←→ Local PTY (portable-pty)
Target state: BTerminal acts as a mission control that observes agents and terminals running on multiple machines (dev servers, cloud VMs, CI runners).
Design Constraints
- Zero changes to local path — local sidecar/PTY must work identically
- Same NDJSON protocol — remote and local agents speak the same message format
- No new runtime dependencies — use Rust's
tokio-tungstenite(already available via Tauri) - Graceful degradation — remote machine goes offline → pane shows disconnected state, reconnects automatically
- Security — all remote connections authenticated and encrypted (TLS + token)
Architecture
Three-Layer Model
┌──────────────────────────────────────────────────────────────────┐
│ BTerminal (Controller) │
│ │
│ ┌──────────┐ Tauri IPC ┌──────────────────────────────┐ │
│ │ WebView │ ←────────────→ │ Rust Backend │ │
│ │ (Svelte) │ │ │ │
│ └──────────┘ │ ├── PtyManager (local) │ │
│ │ ├── SidecarManager (local) │ │
│ │ └── RemoteManager ──────────┼──┤
│ └──────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
│ │
│ (local stdio) │ (WebSocket wss://)
▼ ▼
┌───────────┐ ┌──────────────────────┐
│ Local │ │ Remote Machine │
│ Sidecar │ │ │
│ (Deno/ │ │ ┌────────────────┐ │
│ Node.js) │ │ │ bterminal-relay│ │
│ │ │ │ (Rust binary) │ │
└───────────┘ │ │ │ │
│ │ ├── PTY mgr │ │
│ │ ├── Sidecar mgr│ │
│ │ └── WS server │ │
│ └────────────────┘ │
└──────────────────────┘
Components
1. bterminal-relay — Remote Agent (Rust binary)
A standalone Rust binary that runs on each remote machine. It:
- Listens on a WebSocket port (default: 9750)
- Manages local PTYs and claude sidecar processes
- Forwards NDJSON events to the controller over WebSocket
- Receives commands (query, stop, resize, write) from the controller
Why a Rust binary? Reuses existing PtyManager and SidecarManager code from src-tauri/src/. Extracted into a shared crate.
bterminal-relay/
├── Cargo.toml # depends on bterminal-core
├── src/
│ └── main.rs # WebSocket server + auth
│
bterminal-core/ # shared crate (extracted from src-tauri)
├── Cargo.toml
├── src/
│ ├── pty.rs # PtyManager (from v2/src-tauri/src/pty.rs)
│ ├── sidecar.rs # SidecarManager (from v2/src-tauri/src/sidecar.rs)
│ └── lib.rs
2. RemoteManager — Controller-Side (in Rust backend)
New module in v2/src-tauri/src/remote.rs. Manages WebSocket connections to multiple relays.
pub struct RemoteMachine {
pub id: String,
pub label: String,
pub url: String, // wss://host:9750
pub token: String, // auth token
pub status: RemoteStatus, // connected | connecting | disconnected | error
}
pub enum RemoteStatus {
Connected,
Connecting,
Disconnected,
Error(String),
}
pub struct RemoteManager {
machines: Arc<Mutex<Vec<RemoteMachine>>>,
connections: Arc<Mutex<HashMap<String, WsConnection>>>,
}
3. Frontend Adapters — Unified Interface
The frontend doesn't care whether a pane is local or remote. The bridge layer abstracts this:
// adapters/agent-bridge.ts — extended
export async function queryAgent(options: AgentQueryOptions): Promise<void> {
if (options.remote_machine_id) {
return invoke('remote_agent_query', { machineId: options.remote_machine_id, options });
}
return invoke('agent_query', { options });
}
Same pattern for pty-bridge.ts — add optional remote_machine_id to all operations.
Protocol
WebSocket Wire Format
Same NDJSON as local sidecar, wrapped in an envelope for multiplexing:
// Controller → Relay (commands)
interface RelayCommand {
id: string; // request correlation ID
type: 'pty_create' | 'pty_write' | 'pty_resize' | 'pty_close'
| 'agent_query' | 'agent_stop' | 'sidecar_restart'
| 'ping';
payload: Record<string, unknown>;
}
// Relay → Controller (events)
interface RelayEvent {
type: 'pty_data' | 'pty_exit'
| 'sidecar_message' | 'sidecar_exited'
| 'error' | 'pong' | 'ready';
sessionId?: string;
payload: unknown;
}
Authentication
- Pre-shared token — relay starts with
--token <secret>. Controller sends token in WebSocket upgrade headers (Authorization: Bearer <token>). - TLS required — relay rejects non-TLS connections in production mode. Dev mode allows
ws://with--insecureflag. - Token rotation — future: relay exposes endpoint to rotate token. Controller stores tokens in SQLite settings table.
Connection Lifecycle
Controller Relay
│ │
│── WSS connect ─────────────────→│
│── Authorization: Bearer token ──→│
│ │
│←── { type: "ready", ...} ───────│
│ │
│── { type: "ping" } ────────────→│
│←── { type: "pong" } ────────────│ (every 15s)
│ │
│── { type: "agent_query", ... }──→│
│←── { type: "sidecar_message" }──│ (streaming)
│←── { type: "sidecar_message" }──│
│ │
│ (disconnect) │
│── reconnect (exp backoff) ─────→│ (1s, 2s, 4s, 8s, max 30s)
Reconnection
- Controller reconnects with exponential backoff (1s → 30s cap)
- On reconnect, relay sends current state snapshot (active sessions, PTY list)
- Controller reconciles: updates pane states, re-subscribes to streams
- Active agent sessions continue on relay regardless of controller connection
Session Persistence Across Reconnects
Key insight: remote agents keep running even when the controller disconnects. The relay is autonomous — it doesn't need the controller to operate.
On reconnect:
- Relay sends
{ type: "state_sync", activeSessions: [...], activePtys: [...] } - Controller matches against known panes, updates status
- Missed messages are NOT replayed (too complex, marginal value). Agent panes show "reconnected — some messages may be missing" notice
Frontend Integration
Pane Model Changes
// stores/layout.svelte.ts
export interface Pane {
id: string;
type: 'terminal' | 'agent';
title: string;
group?: string;
remoteMachineId?: string; // NEW: undefined = local
}
Sidebar — Machine Groups
Remote panes auto-group by machine label in the sidebar:
▾ Local
├── Terminal 1
└── Agent: fix bug
▾ devbox (192.168.1.50) ← remote machine
├── SSH session
└── Agent: deploy
▾ ci-runner (10.0.0.5) ← remote machine (disconnected)
└── Agent: test suite ⚠️
Settings Panel
New "Machines" section in settings:
| Field | Type | Notes |
|---|---|---|
| Label | string | Human-readable name |
| URL | string | wss://host:9750 |
| Token | password | Pre-shared auth token |
| Auto-connect | boolean | Connect on app launch |
Stored in SQLite settings table as JSON: remote_machines key.
Implementation Plan
Phase A: Extract bterminal-core crate
- Extract
PtyManagerandSidecarManagerinto a shared crate src-tauridepends onbterminal-coreinstead of owning the code- Zero behavior change — purely structural refactor
- Estimate: ~2h of mechanical refactoring
Phase B: Build bterminal-relay binary
- WebSocket server using
tokio-tungstenite - Token auth on upgrade
- Routes commands to
bterminal-coremanagers - Forwards events back over WebSocket
- Includes
--port,--token,--insecureCLI flags - Ships as: single static Rust binary (~5MB),
cargo install bterminal-relay
Phase C: Add RemoteManager to controller
- New
remote.rsmodule insrc-tauri - Manages WebSocket client connections
- Tauri commands:
remote_add,remote_remove,remote_connect,remote_disconnect - Forwards remote events as Tauri events (same
sidecar-message/pty-dataevents, tagged with machine ID)
Phase D: Frontend integration
- Extend bridge adapters with
remoteMachineIdrouting - Add machine management UI in settings
- Add machine status indicators in sidebar
- Add reconnection banner in pane chrome
- Test with 2 machines (local + 1 remote)
Security Considerations
| Threat | Mitigation |
|---|---|
| Token interception | TLS required (reject ws:// without --insecure) |
| Token brute-force | Rate limit auth attempts (5/min), lockout after 10 failures |
| Relay impersonation | Pin relay certificate fingerprint (future: mTLS) |
| Command injection | Relay validates all command payloads against schema |
| Lateral movement | Relay runs as unprivileged user, no shell access beyond PTY/sidecar |
| Data exfiltration | Agent output streams to controller only, no relay-to-relay traffic |
Performance Considerations
| Concern | Mitigation |
|---|---|
| WebSocket latency | Typical LAN: <1ms. WAN: 20-100ms. Acceptable for agent output (text, not video) |
| Bandwidth | Agent NDJSON: ~50KB/s peak. Terminal: ~200KB/s peak. Trivial even on slow links |
| Connection count | Max 10 machines initially (UI constraint, not technical) |
| Message ordering | Single WebSocket per machine = ordered delivery guaranteed |
What This Does NOT Cover (Future)
- Multi-controller — multiple BTerminal instances observing the same relay (needs pub/sub)
- Relay discovery — automatic detection of relays on LAN (mDNS/Bonjour)
- Agent migration — moving a running agent from one machine to another
- Relay-to-relay — direct communication between remote machines
- mTLS — mutual TLS for enterprise environments (Phase B+ enhancement)