BTerminal/docs/multi-machine.md
Hibryda 04a7a4bb94 docs: add multi-machine support architecture design
Full WebSocket architecture spec for remote agent/terminal management:
bterminal-relay binary, RemoteManager, NDJSON protocol, pre-shared
token + TLS auth, 4-phase implementation plan (A-D).
2026-03-06 18:45:56 +01:00

12 KiB

Multi-Machine Support — Architecture Design

Overview

Extend BTerminal to manage Claude agent sessions and terminal panes running on remote machines over WebSocket, while keeping the local sidecar path unchanged.

Problem

Current architecture is local-only:

WebView ←→ Rust (Tauri IPC) ←→ Local Sidecar (stdio NDJSON)
                              ←→ Local PTY (portable-pty)

Target state: BTerminal acts as a mission control that observes agents and terminals running on multiple machines (dev servers, cloud VMs, CI runners).

Design Constraints

  1. Zero changes to local path — local sidecar/PTY must work identically
  2. Same NDJSON protocol — remote and local agents speak the same message format
  3. No new runtime dependencies — use Rust's tokio-tungstenite (already available via Tauri)
  4. Graceful degradation — remote machine goes offline → pane shows disconnected state, reconnects automatically
  5. Security — all remote connections authenticated and encrypted (TLS + token)

Architecture

Three-Layer Model

┌──────────────────────────────────────────────────────────────────┐
│  BTerminal (Controller)                                          │
│                                                                  │
│  ┌──────────┐    Tauri IPC    ┌──────────────────────────────┐  │
│  │ WebView  │ ←────────────→  │ Rust Backend                 │  │
│  │ (Svelte) │                 │                              │  │
│  └──────────┘                 │  ├── PtyManager (local)      │  │
│                               │  ├── SidecarManager (local)  │  │
│                               │  └── RemoteManager ──────────┼──┤
│                               └──────────────────────────────┘  │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘
        │                                      │
        │ (local stdio)                        │ (WebSocket wss://)
        ▼                                      ▼
  ┌───────────┐                    ┌──────────────────────┐
  │ Local     │                    │ Remote Machine       │
  │ Sidecar   │                    │                      │
  │ (Deno/    │                    │  ┌────────────────┐  │
  │  Node.js) │                    │  │ bterminal-relay│  │
  │           │                    │  │ (Rust binary)  │  │
  └───────────┘                    │  │                │  │
                                   │  │ ├── PTY mgr   │  │
                                   │  │ ├── Sidecar mgr│  │
                                   │  │ └── WS server  │  │
                                   │  └────────────────┘  │
                                   └──────────────────────┘

Components

1. bterminal-relay — Remote Agent (Rust binary)

A standalone Rust binary that runs on each remote machine. It:

  • Listens on a WebSocket port (default: 9750)
  • Manages local PTYs and claude sidecar processes
  • Forwards NDJSON events to the controller over WebSocket
  • Receives commands (query, stop, resize, write) from the controller

Why a Rust binary? Reuses existing PtyManager and SidecarManager code from src-tauri/src/. Extracted into a shared crate.

bterminal-relay/
├── Cargo.toml        # depends on bterminal-core
├── src/
│   └── main.rs       # WebSocket server + auth
│
bterminal-core/       # shared crate (extracted from src-tauri)
├── Cargo.toml
├── src/
│   ├── pty.rs        # PtyManager (from v2/src-tauri/src/pty.rs)
│   ├── sidecar.rs    # SidecarManager (from v2/src-tauri/src/sidecar.rs)
│   └── lib.rs

2. RemoteManager — Controller-Side (in Rust backend)

New module in v2/src-tauri/src/remote.rs. Manages WebSocket connections to multiple relays.

pub struct RemoteMachine {
    pub id: String,
    pub label: String,
    pub url: String,          // wss://host:9750
    pub token: String,        // auth token
    pub status: RemoteStatus, // connected | connecting | disconnected | error
}

pub enum RemoteStatus {
    Connected,
    Connecting,
    Disconnected,
    Error(String),
}

pub struct RemoteManager {
    machines: Arc<Mutex<Vec<RemoteMachine>>>,
    connections: Arc<Mutex<HashMap<String, WsConnection>>>,
}

3. Frontend Adapters — Unified Interface

The frontend doesn't care whether a pane is local or remote. The bridge layer abstracts this:

// adapters/agent-bridge.ts — extended
export async function queryAgent(options: AgentQueryOptions): Promise<void> {
  if (options.remote_machine_id) {
    return invoke('remote_agent_query', { machineId: options.remote_machine_id, options });
  }
  return invoke('agent_query', { options });
}

Same pattern for pty-bridge.ts — add optional remote_machine_id to all operations.

Protocol

WebSocket Wire Format

Same NDJSON as local sidecar, wrapped in an envelope for multiplexing:

// Controller → Relay (commands)
interface RelayCommand {
  id: string;                      // request correlation ID
  type: 'pty_create' | 'pty_write' | 'pty_resize' | 'pty_close'
      | 'agent_query' | 'agent_stop' | 'sidecar_restart'
      | 'ping';
  payload: Record<string, unknown>;
}

// Relay → Controller (events)
interface RelayEvent {
  type: 'pty_data' | 'pty_exit'
      | 'sidecar_message' | 'sidecar_exited'
      | 'error' | 'pong' | 'ready';
  sessionId?: string;
  payload: unknown;
}

Authentication

  1. Pre-shared token — relay starts with --token <secret>. Controller sends token in WebSocket upgrade headers (Authorization: Bearer <token>).
  2. TLS required — relay rejects non-TLS connections in production mode. Dev mode allows ws:// with --insecure flag.
  3. Token rotation — future: relay exposes endpoint to rotate token. Controller stores tokens in SQLite settings table.

Connection Lifecycle

Controller                          Relay
    │                                 │
    │── WSS connect ─────────────────→│
    │── Authorization: Bearer token ──→│
    │                                 │
    │←── { type: "ready", ...} ───────│
    │                                 │
    │── { type: "ping" } ────────────→│
    │←── { type: "pong" } ────────────│  (every 15s)
    │                                 │
    │── { type: "agent_query", ... }──→│
    │←── { type: "sidecar_message" }──│  (streaming)
    │←── { type: "sidecar_message" }──│
    │                                 │
    │     (disconnect)                │
    │── reconnect (exp backoff) ─────→│  (1s, 2s, 4s, 8s, max 30s)

Reconnection

  • Controller reconnects with exponential backoff (1s → 30s cap)
  • On reconnect, relay sends current state snapshot (active sessions, PTY list)
  • Controller reconciles: updates pane states, re-subscribes to streams
  • Active agent sessions continue on relay regardless of controller connection

Session Persistence Across Reconnects

Key insight: remote agents keep running even when the controller disconnects. The relay is autonomous — it doesn't need the controller to operate.

On reconnect:

  1. Relay sends { type: "state_sync", activeSessions: [...], activePtys: [...] }
  2. Controller matches against known panes, updates status
  3. Missed messages are NOT replayed (too complex, marginal value). Agent panes show "reconnected — some messages may be missing" notice

Frontend Integration

Pane Model Changes

// stores/layout.svelte.ts
export interface Pane {
  id: string;
  type: 'terminal' | 'agent';
  title: string;
  group?: string;
  remoteMachineId?: string;  // NEW: undefined = local
}

Sidebar — Machine Groups

Remote panes auto-group by machine label in the sidebar:

▾ Local
  ├── Terminal 1
  └── Agent: fix bug

▾ devbox (192.168.1.50)      ← remote machine
  ├── SSH session
  └── Agent: deploy

▾ ci-runner (10.0.0.5)       ← remote machine (disconnected)
  └── Agent: test suite ⚠️

Settings Panel

New "Machines" section in settings:

Field Type Notes
Label string Human-readable name
URL string wss://host:9750
Token password Pre-shared auth token
Auto-connect boolean Connect on app launch

Stored in SQLite settings table as JSON: remote_machines key.

Implementation Plan

Phase A: Extract bterminal-core crate

  • Extract PtyManager and SidecarManager into a shared crate
  • src-tauri depends on bterminal-core instead of owning the code
  • Zero behavior change — purely structural refactor
  • Estimate: ~2h of mechanical refactoring

Phase B: Build bterminal-relay binary

  • WebSocket server using tokio-tungstenite
  • Token auth on upgrade
  • Routes commands to bterminal-core managers
  • Forwards events back over WebSocket
  • Includes --port, --token, --insecure CLI flags
  • Ships as: single static Rust binary (~5MB), cargo install bterminal-relay

Phase C: Add RemoteManager to controller

  • New remote.rs module in src-tauri
  • Manages WebSocket client connections
  • Tauri commands: remote_add, remote_remove, remote_connect, remote_disconnect
  • Forwards remote events as Tauri events (same sidecar-message / pty-data events, tagged with machine ID)

Phase D: Frontend integration

  • Extend bridge adapters with remoteMachineId routing
  • Add machine management UI in settings
  • Add machine status indicators in sidebar
  • Add reconnection banner in pane chrome
  • Test with 2 machines (local + 1 remote)

Security Considerations

Threat Mitigation
Token interception TLS required (reject ws:// without --insecure)
Token brute-force Rate limit auth attempts (5/min), lockout after 10 failures
Relay impersonation Pin relay certificate fingerprint (future: mTLS)
Command injection Relay validates all command payloads against schema
Lateral movement Relay runs as unprivileged user, no shell access beyond PTY/sidecar
Data exfiltration Agent output streams to controller only, no relay-to-relay traffic

Performance Considerations

Concern Mitigation
WebSocket latency Typical LAN: <1ms. WAN: 20-100ms. Acceptable for agent output (text, not video)
Bandwidth Agent NDJSON: ~50KB/s peak. Terminal: ~200KB/s peak. Trivial even on slow links
Connection count Max 10 machines initially (UI constraint, not technical)
Message ordering Single WebSocket per machine = ordered delivery guaranteed

What This Does NOT Cover (Future)

  • Multi-controller — multiple BTerminal instances observing the same relay (needs pub/sub)
  • Relay discovery — automatic detection of relays on LAN (mDNS/Bonjour)
  • Agent migration — moving a running agent from one machine to another
  • Relay-to-relay — direct communication between remote machines
  • mTLS — mutual TLS for enterprise environments (Phase B+ enhancement)