docs: update docs for TCP probe refactor and frontend reconnection listeners

Replace stale attempt_ws_connect() references with attempt_tcp_probe()
across all docs. Add progress entry for reconnection hardening session.
Update CHANGELOG with new entries and probe refactor change.
This commit is contained in:
Hibryda 2026-03-06 21:50:54 +01:00
parent 71100da125
commit 4c06b5f121
6 changed files with 24 additions and 7 deletions

View file

@ -49,7 +49,7 @@
- remote-bridge.ts adapter wraps remote machine management IPC. machines.svelte.ts store tracks remote machine state.
- Pane.remoteMachineId?: string routes operations through RemoteManager instead of local managers. Bridge adapters (pty-bridge, agent-bridge) check this field.
- bterminal-relay binary (v2/bterminal-relay/) is a standalone WebSocket server with token auth, rate limiting, and per-connection isolated managers. Commands return structured responses (pty_created, pong, error) with commandId for correlation via send_error() helper.
- RemoteManager reconnection: exponential backoff (1s-30s cap) on disconnect, attempt_ws_connect() probe, emits remote-machine-reconnecting and remote-machine-reconnect-ready events.
- RemoteManager reconnection: exponential backoff (1s-30s cap) on disconnect, attempt_tcp_probe() (TCP-only, no WS upgrade), emits remote-machine-reconnecting and remote-machine-reconnect-ready events. Frontend listeners in remote-bridge.ts; machines store auto-reconnects on ready.
## Memora Tags

View file

@ -8,7 +8,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
## [Unreleased]
### Added
- Exponential backoff reconnection in RemoteManager: on disconnect, spawns async task with 1s/2s/4s/8s/16s/30s-cap backoff, uses attempt_ws_connect() probe (5s timeout), emits remote-machine-reconnecting and remote-machine-reconnect-ready events
- Exponential backoff reconnection in RemoteManager: on disconnect, spawns async task with 1s/2s/4s/8s/16s/30s-cap backoff, uses attempt_tcp_probe() (TCP-only, no WS upgrade, 5s timeout, default port 9750), emits remote-machine-reconnecting and remote-machine-reconnect-ready events
- Frontend reconnection listeners: onRemoteMachineReconnecting and onRemoteMachineReconnectReady in remote-bridge.ts; machines store sets status to 'reconnecting' and auto-calls connectMachine() on ready
- Relay command response propagation: bterminal-relay now sends structured responses (pty_created, pong, error) back to client via shared event channel with commandId correlation
- send_error() helper in bterminal-relay for consistent error reporting across all command handlers
- PTY creation confirmation flow: pty_create command returns pty_created event with session ID and commandId; RemoteManager emits remote-pty-created Tauri event
@ -48,6 +49,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- tempfile dev dependency for Rust test isolation
### Changed
- RemoteManager reconnection probe refactored from attempt_ws_connect() (full WS handshake + auth) to attempt_tcp_probe() (TCP-only connect, no resource allocation on relay)
- bterminal-relay command handlers refactored: all error paths now use send_error() helper instead of log::error!() only; pong response sent via event channel instead of no-op
- RemoteManager disconnect handler: scoped mutex release before event emission to prevent deadlocks; spawns reconnection task
- PtyManager and SidecarManager extracted from src-tauri to bterminal-core shared crate (src-tauri now has thin re-export wrappers)

View file

@ -10,7 +10,7 @@
## Completed
- [x] **Multi-machine reconnection** -- Exponential backoff reconnection (1s-30s cap) in RemoteManager, attempt_ws_connect() probe, reconnection events. | Done: 2026-03-06
- [x] **Multi-machine reconnection** -- Exponential backoff reconnection (1s-30s cap) in RemoteManager, attempt_tcp_probe() (TCP-only), frontend reconnection listeners + auto-reconnect. | Done: 2026-03-06
- [x] **Relay command response propagation** -- Structured responses (pty_created, pong, error) with commandId correlation, send_error() helper. | Done: 2026-03-06
- [x] **Multi-machine support (Phases A-D)** -- bterminal-core crate extraction, bterminal-relay WebSocket binary, RemoteManager, frontend integration. | Done: 2026-03-06
- [x] **Agent Teams frontend support** -- Subagent pane spawning, parent/child navigation, message routing by parentId, SUBAGENT_TOOL_NAMES detection in dispatcher. | Done: 2026-03-06

View file

@ -185,8 +185,9 @@ Controller Relay
- Controller reconnects with exponential backoff (1s, 2s, 4s, 8s, 16s, 30s cap)
- Reconnection runs as an async tokio task spawned on disconnect
- Uses `attempt_ws_connect()` probe: connects with auth header, immediately closes (5s timeout)
- Uses `attempt_tcp_probe()`: TCP connect only (no WS upgrade), 5s timeout, default port 9750. Avoids allocating per-connection resources (PtyManager, SidecarManager) on the relay during probes.
- Emits `remote-machine-reconnecting` event (with backoff duration) and `remote-machine-reconnect-ready` when probe succeeds
- Frontend listens via `onRemoteMachineReconnecting` and `onRemoteMachineReconnectReady` in remote-bridge.ts; machines store sets status to 'reconnecting' and auto-calls `connectMachine()` on ready
- Cancels if machine is removed or manually reconnected (checks status == "disconnected" && connection == None)
- On reconnect, relay sends current state snapshot (active sessions, PTY list)
- Controller reconciles: updates pane states, re-subscribes to streams
@ -274,7 +275,7 @@ Stored in SQLite `settings` table as JSON: `remote_machines` key.
- 12 Tauri commands: remote_add_machine, remote_remove_machine, remote_connect, remote_disconnect, remote_list_machines, remote_pty_spawn/write/resize/kill, remote_agent_query/stop, remote_sidecar_restart
- Heartbeat ping every 15s
- PTY creation event: emits `remote-pty-created` Tauri event with machineId, ptyId, commandId
- Exponential backoff reconnection on disconnect (1s/2s/4s/8s/16s/30s cap) via `attempt_ws_connect()` probe
- Exponential backoff reconnection on disconnect (1s/2s/4s/8s/16s/30s cap) via `attempt_tcp_probe()` (TCP-only, no WS upgrade)
- Reconnection events: `remote-machine-reconnecting`, `remote-machine-reconnect-ready`
### Phase D: Frontend integration [DONE]

View file

@ -282,7 +282,7 @@ Architecture designed in [multi-machine.md](multi-machine.md). Implementation ex
- [x] Heartbeat ping every 15s
- [x] PTY creation event: emits remote-pty-created Tauri event with machineId, ptyId, commandId
- [x] Exponential backoff reconnection on disconnect (1s/2s/4s/8s/16s/30s cap)
- [x] attempt_ws_connect() probe function (5s timeout, auth header, immediate close)
- [x] attempt_tcp_probe() function: TCP-only probe (5s timeout, default port 9750) — avoids allocating per-connection resources on relay during probes
- [x] Reconnection events: remote-machine-reconnecting, remote-machine-reconnect-ready
### Phase D: Frontend integration [status: complete]

View file

@ -323,7 +323,7 @@ Design: No separate sidecar process per subagent. Parent's sidecar handles all;
#### RemoteManager Reconnection
- [x] Exponential backoff reconnection in remote.rs: spawns async tokio task on disconnect
- [x] Backoff schedule: 1s, 2s, 4s, 8s, 16s, 30s (capped)
- [x] attempt_ws_connect() probe function: connects with proper WebSocket upgrade + auth header, 5s timeout, immediate close
- [x] attempt_tcp_probe() function: TCP-only connect probe (5s timeout, default port 9750) — avoids allocating per-connection resources on relay
- [x] Emits remote-machine-reconnecting (with backoffSecs) and remote-machine-reconnect-ready Tauri events
- [x] Cancellation: stops if machine removed (not in HashMap) or manually reconnected (status != disconnected)
- [x] Fixed scoping: disconnection cleanup uses inner block to release mutex before emitting event
@ -331,6 +331,20 @@ Design: No separate sidecar process per subagent. Parent's sidecar handles all;
#### RemoteManager PTY Creation Confirmation
- [x] Handles pty_created event type from relay: emits remote-pty-created Tauri event with machineId, ptyId, commandId
### Session: 2026-03-06 (continued) — Reconnection Hardening
#### TCP Probe Refactor
- [x] Replaced attempt_ws_connect() with attempt_tcp_probe() in remote.rs: TCP-only connect (no WS upgrade), 5s timeout, default port 9750
- [x] Avoids allocating per-connection resources (PtyManager, SidecarManager) on the relay during reconnection probes
- [x] Probe no longer needs auth token — only checks TCP reachability
#### Frontend Reconnection Listeners
- [x] Added onRemoteMachineReconnecting() listener in remote-bridge.ts: receives machineId + backoffSecs
- [x] Added onRemoteMachineReconnectReady() listener in remote-bridge.ts: receives machineId when probe succeeds
- [x] machines.svelte.ts: reconnecting handler sets machine status to 'reconnecting', shows toast with backoff duration
- [x] machines.svelte.ts: reconnect-ready handler auto-calls connectMachine() to re-establish full WebSocket connection
- [x] Updated docs/multi-machine.md to reflect TCP probe and frontend listener changes
### Next Steps
- [ ] Real-world relay testing (2 machines)
- [ ] TLS/certificate pinning for relay connections