Hibryda 04a7a4bb94 docs: add multi-machine support architecture design

Full WebSocket architecture spec for remote agent/terminal management:
bterminal-relay binary, RemoteManager, NDJSON protocol, pre-shared
token + TLS auth, 4-phase implementation plan (A-D).

2026-03-06 18:45:56 +01:00

12 KiB

Raw Blame History

Multi-Machine Support — Architecture Design

Overview

Extend BTerminal to manage Claude agent sessions and terminal panes running on remote machines over WebSocket, while keeping the local sidecar path unchanged.

Problem

Current architecture is local-only:

WebView ←→ Rust (Tauri IPC) ←→ Local Sidecar (stdio NDJSON)
                              ←→ Local PTY (portable-pty)

Target state: BTerminal acts as a mission control that observes agents and terminals running on multiple machines (dev servers, cloud VMs, CI runners).

Design Constraints

Zero changes to local path — local sidecar/PTY must work identically
Same NDJSON protocol — remote and local agents speak the same message format
No new runtime dependencies — use Rust's tokio-tungstenite (already available via Tauri)
Graceful degradation — remote machine goes offline → pane shows disconnected state, reconnects automatically
Security — all remote connections authenticated and encrypted (TLS + token)

Architecture

Three-Layer Model

┌──────────────────────────────────────────────────────────────────┐
│  BTerminal (Controller)                                          │
│                                                                  │
│  ┌──────────┐    Tauri IPC    ┌──────────────────────────────┐  │
│  │ WebView  │ ←────────────→  │ Rust Backend                 │  │
│  │ (Svelte) │                 │                              │  │
│  └──────────┘                 │  ├── PtyManager (local)      │  │
│                               │  ├── SidecarManager (local)  │  │
│                               │  └── RemoteManager ──────────┼──┤
│                               └──────────────────────────────┘  │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘
        │                                      │
        │ (local stdio)                        │ (WebSocket wss://)
        ▼                                      ▼
  ┌───────────┐                    ┌──────────────────────┐
  │ Local     │                    │ Remote Machine       │
  │ Sidecar   │                    │                      │
  │ (Deno/    │                    │  ┌────────────────┐  │
  │  Node.js) │                    │  │ bterminal-relay│  │
  │           │                    │  │ (Rust binary)  │  │
  └───────────┘                    │  │                │  │
                                   │  │ ├── PTY mgr   │  │
                                   │  │ ├── Sidecar mgr│  │
                                   │  │ └── WS server  │  │
                                   │  └────────────────┘  │
                                   └──────────────────────┘

Components

1. `bterminal-relay` — Remote Agent (Rust binary)

A standalone Rust binary that runs on each remote machine. It:

Listens on a WebSocket port (default: 9750)
Manages local PTYs and claude sidecar processes
Forwards NDJSON events to the controller over WebSocket
Receives commands (query, stop, resize, write) from the controller

Why a Rust binary? Reuses existing PtyManager and SidecarManager code from src-tauri/src/. Extracted into a shared crate.

bterminal-relay/
├── Cargo.toml        # depends on bterminal-core
├── src/
│   └── main.rs       # WebSocket server + auth
│
bterminal-core/       # shared crate (extracted from src-tauri)
├── Cargo.toml
├── src/
│   ├── pty.rs        # PtyManager (from v2/src-tauri/src/pty.rs)
│   ├── sidecar.rs    # SidecarManager (from v2/src-tauri/src/sidecar.rs)
│   └── lib.rs

2. `RemoteManager` — Controller-Side (in Rust backend)

New module in v2/src-tauri/src/remote.rs. Manages WebSocket connections to multiple relays.

pub struct RemoteMachine {
    pub id: String,
    pub label: String,
    pub url: String,          // wss://host:9750
    pub token: String,        // auth token
    pub status: RemoteStatus, // connected | connecting | disconnected | error
}

pub enum RemoteStatus {
    Connected,
    Connecting,
    Disconnected,
    Error(String),
}

pub struct RemoteManager {
    machines: Arc<Mutex<Vec<RemoteMachine>>>,
    connections: Arc<Mutex<HashMap<String, WsConnection>>>,
}

3. Frontend Adapters — Unified Interface

The frontend doesn't care whether a pane is local or remote. The bridge layer abstracts this:

// adapters/agent-bridge.ts — extended
export async function queryAgent(options: AgentQueryOptions): Promise<void> {
  if (options.remote_machine_id) {
    return invoke('remote_agent_query', { machineId: options.remote_machine_id, options });
  }
  return invoke('agent_query', { options });
}

Same pattern for pty-bridge.ts — add optional remote_machine_id to all operations.

Protocol

WebSocket Wire Format

Same NDJSON as local sidecar, wrapped in an envelope for multiplexing:

// Controller → Relay (commands)
interface RelayCommand {
  id: string;                      // request correlation ID
  type: 'pty_create' | 'pty_write' | 'pty_resize' | 'pty_close'
      | 'agent_query' | 'agent_stop' | 'sidecar_restart'
      | 'ping';
  payload: Record<string, unknown>;
}

// Relay → Controller (events)
interface RelayEvent {
  type: 'pty_data' | 'pty_exit'
      | 'sidecar_message' | 'sidecar_exited'
      | 'error' | 'pong' | 'ready';
  sessionId?: string;
  payload: unknown;
}

Authentication

Pre-shared token — relay starts with --token <secret>. Controller sends token in WebSocket upgrade headers (Authorization: Bearer <token>).
TLS required — relay rejects non-TLS connections in production mode. Dev mode allows ws:// with --insecure flag.
Token rotation — future: relay exposes endpoint to rotate token. Controller stores tokens in SQLite settings table.

Connection Lifecycle

Controller                          Relay
    │                                 │
    │── WSS connect ─────────────────→│
    │── Authorization: Bearer token ──→│
    │                                 │
    │←── { type: "ready", ...} ───────│
    │                                 │
    │── { type: "ping" } ────────────→│
    │←── { type: "pong" } ────────────│  (every 15s)
    │                                 │
    │── { type: "agent_query", ... }──→│
    │←── { type: "sidecar_message" }──│  (streaming)
    │←── { type: "sidecar_message" }──│
    │                                 │
    │     (disconnect)                │
    │── reconnect (exp backoff) ─────→│  (1s, 2s, 4s, 8s, max 30s)

Reconnection

Controller reconnects with exponential backoff (1s → 30s cap)
On reconnect, relay sends current state snapshot (active sessions, PTY list)
Controller reconciles: updates pane states, re-subscribes to streams
Active agent sessions continue on relay regardless of controller connection

Session Persistence Across Reconnects

Key insight: remote agents keep running even when the controller disconnects. The relay is autonomous — it doesn't need the controller to operate.

On reconnect:

Relay sends { type: "state_sync", activeSessions: [...], activePtys: [...] }
Controller matches against known panes, updates status
Missed messages are NOT replayed (too complex, marginal value). Agent panes show "reconnected — some messages may be missing" notice

Frontend Integration

Pane Model Changes

// stores/layout.svelte.ts
export interface Pane {
  id: string;
  type: 'terminal' | 'agent';
  title: string;
  group?: string;
  remoteMachineId?: string;  // NEW: undefined = local
}

Sidebar — Machine Groups

Remote panes auto-group by machine label in the sidebar:

▾ Local
  ├── Terminal 1
  └── Agent: fix bug

▾ devbox (192.168.1.50)      ← remote machine
  ├── SSH session
  └── Agent: deploy

▾ ci-runner (10.0.0.5)       ← remote machine (disconnected)
  └── Agent: test suite ⚠️

Settings Panel

New "Machines" section in settings:

Field	Type	Notes
Label	string	Human-readable name
URL	string	`wss://host:9750`
Token	password	Pre-shared auth token
Auto-connect	boolean	Connect on app launch

Stored in SQLite settings table as JSON: remote_machines key.

Implementation Plan

Phase A: Extract `bterminal-core` crate

Extract PtyManager and SidecarManager into a shared crate
src-tauri depends on bterminal-core instead of owning the code
Zero behavior change — purely structural refactor
Estimate: ~2h of mechanical refactoring

Phase B: Build `bterminal-relay` binary

WebSocket server using tokio-tungstenite
Token auth on upgrade
Routes commands to bterminal-core managers
Forwards events back over WebSocket
Includes --port, --token, --insecure CLI flags
Ships as: single static Rust binary (~5MB), cargo install bterminal-relay

Phase C: Add `RemoteManager` to controller

New remote.rs module in src-tauri
Manages WebSocket client connections
Tauri commands: remote_add, remote_remove, remote_connect, remote_disconnect
Forwards remote events as Tauri events (same sidecar-message / pty-data events, tagged with machine ID)

Phase D: Frontend integration

Extend bridge adapters with remoteMachineId routing
Add machine management UI in settings
Add machine status indicators in sidebar
Add reconnection banner in pane chrome
Test with 2 machines (local + 1 remote)

Security Considerations

Threat	Mitigation
Token interception	TLS required (reject `ws://` without `--insecure`)
Token brute-force	Rate limit auth attempts (5/min), lockout after 10 failures
Relay impersonation	Pin relay certificate fingerprint (future: mTLS)
Command injection	Relay validates all command payloads against schema
Lateral movement	Relay runs as unprivileged user, no shell access beyond PTY/sidecar
Data exfiltration	Agent output streams to controller only, no relay-to-relay traffic

Performance Considerations

Concern	Mitigation
WebSocket latency	Typical LAN: <1ms. WAN: 20-100ms. Acceptable for agent output (text, not video)
Bandwidth	Agent NDJSON: ~50KB/s peak. Terminal: ~200KB/s peak. Trivial even on slow links
Connection count	Max 10 machines initially (UI constraint, not technical)
Message ordering	Single WebSocket per machine = ordered delivery guaranteed

What This Does NOT Cover (Future)

Multi-controller — multiple BTerminal instances observing the same relay (needs pub/sub)
Relay discovery — automatic detection of relays on LAN (mDNS/Bonjour)
Agent migration — moving a running agent from one machine to another
Relay-to-relay — direct communication between remote machines
mTLS — mutual TLS for enterprise environments (Phase B+ enhancement)

12 KiB Raw Blame History