docs: reconcile hib_changes onto flat structure

Bring over comprehensive documentation, CLI tools, and project
scaffolding from the archived v2/ branch onto the rebuilt flat
main. All v2/ path references updated to match flat layout.

- docs/: architecture, decisions, phases, progress, findings, etc.
- docker/tempo: telemetry stack (Grafana + Tempo)
- CLAUDE.md, .claude/CLAUDE.md: comprehensive project guides
- CHANGELOG.md, TODO.md, README.md: project meta
- consult, ctx: CLI tools
- .gitignore: merged entries from both branches
This commit is contained in:
Hibryda 2026-03-16 03:34:04 +01:00
parent 37b2b82ae5
commit 356660f17d
29 changed files with 7673 additions and 66 deletions

View file

@ -2,8 +2,7 @@
## Workflow
- v1 is a single-file Python app (`bterminal.py`). Changes are localized.
- v2 docs are in `docs/`. Architecture in `docs/architecture.md`.
- Docs are in `docs/`. Architecture in `docs/architecture.md`.
- v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery.
- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (Web Worker sandbox, permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, SPKI pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (26), SidecarManager actor pattern, per-message btmsg acknowledgment, Aider autonomous mode. 507 vitest + 110 cargo + 109 E2E.
- Consult Memora (tag: `bterminal`) before making architectural changes.
@ -21,8 +20,7 @@
## Rules
- Do not modify v1 code (`bterminal.py`) unless explicitly asked — it is production-stable.
- v2/v3 work goes on the `hib_changes` branch (repo: agent-orchestrator), not master.
- Work goes on the `hib_changes` branch (repo: agent-orchestrator), not master.
- Architecture decisions must reference `docs/decisions.md`.
- When adding new decisions, append to the appropriate category table with date.
- Update `docs/progress/` after each significant work session.
@ -71,12 +69,12 @@
- Theme system: 17 themes in 3 groups — 4 Catppuccin + 7 Editor (VSCode Dark+, Atom One Dark, Monokai, Dracula, Nord, Solarized Dark, GitHub Dark) + 6 Deep Dark (Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper, Midnight). All map to same 26 --ctp-* CSS custom properties — zero component changes needed. ThemeId replaces CatppuccinFlavor. getCurrentTheme()/setTheme() are primary API (deprecated wrappers exist). THEME_LIST has ThemeMeta with group metadata for custom dropdown UI. Open terminals hot-swap via onThemeChange() callback registry in theme.svelte.ts. Typography uses --ui-font-family/--ui-font-size (UI elements, sans-serif fallback) and --term-font-family/--term-font-size (terminal, monospace fallback) CSS custom properties (defined in catppuccin.css). initTheme() restores all 4 font settings (ui_font_family, ui_font_size, term_font_family, term_font_size) from SQLite on startup.
- Detached pane mode: App.svelte checks URL param `?detached=1` and renders a single pane without sidebar/grid chrome. Used for pop-out windows.
- Shiki syntax highlighting uses lazy singleton pattern (avoid repeated WASM init). 13 languages preloaded. Used in MarkdownPane and AgentPane text messages.
- Cargo workspace at v2/ level: members = [src-tauri, bterminal-core, bterminal-relay]. Cargo.lock is at workspace root (v2/), not in src-tauri/.
- Cargo workspace at repo root: members = [src-tauri, bterminal-core, bterminal-relay]. Cargo.lock is at workspace root, not in src-tauri/.
- EventSink trait (bterminal-core/src/event.rs) abstracts event emission. PtyManager and SidecarManager are in bterminal-core, not src-tauri. src-tauri has thin re-exports.
- RemoteManager (src-tauri/src/remote.rs) manages WebSocket client connections to bterminal-relay instances. 12 Tauri commands prefixed with `remote_`.
- remote-bridge.ts adapter wraps remote machine management IPC. machines.svelte.ts store tracks remote machine state.
- Pane.remoteMachineId?: string routes operations through RemoteManager instead of local managers. Bridge adapters (pty-bridge, agent-bridge) check this field.
- bterminal-relay binary (v2/bterminal-relay/) is a standalone WebSocket server with token auth, rate limiting, and per-connection isolated managers. Commands return structured responses (pty_created, pong, error) with commandId for correlation via send_error() helper.
- bterminal-relay binary (bterminal-relay/) is a standalone WebSocket server with token auth, rate limiting, and per-connection isolated managers. Commands return structured responses (pty_created, pong, error) with commandId for correlation via send_error() helper.
- RemoteManager reconnection: exponential backoff (1s-30s cap) on disconnect, attempt_tcp_probe() (TCP-only, no WS upgrade), emits remote-machine-reconnecting and remote-machine-reconnect-ready events. Frontend listeners in remote-bridge.ts; machines store auto-reconnects on ready.
- v3 workspace store (`workspace.svelte.ts`) replaces layout store for v3. Groups loaded from `~/.config/bterminal/groups.json` via `groups-bridge.ts`. State: groups, activeGroupId, activeTab, focusedProjectId. Derived: activeGroup, activeProjects.
- v3 groups backend (`groups.rs`): load_groups(), save_groups(), default_groups(). Tauri commands: groups_load, groups_save.

15
.gitignore vendored
View file

@ -26,3 +26,18 @@ sidecar/node_modules
*.njsproj
*.sln
*.sw?
# Python
__pycache__/
*.pyc
*.pyo
# Project-specific
/CLAUDE.md
/plugins/
projects/
.playwright-mcp/
.audit/
.tribunal/
.local/
debug/

521
CHANGELOG.md Normal file
View file

@ -0,0 +1,521 @@
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [Unreleased]
### Added
- **Plugin sandbox Web Worker migration** — replaced `new Function()` sandbox with dedicated Web Worker per plugin. True process-level isolation — no DOM, no Tauri IPC, no main-thread access. Permission-gated API proxied via postMessage with RPC pattern. 26 tests (MockWorker class in vitest)
- **seen_messages startup pruning**`pruneSeen()` called fire-and-forget in App.svelte onMount. Removes entries older than 7 days (emergency: 3 days at 200k rows)
- **Aider autonomous mode toggle** — per-project `autonomousMode` setting ('restricted'|'autonomous') gates shell command execution in Aider sidecar. Default restricted. SettingsTab toggle
- **SPKI certificate pinning (TOFU)**`remote_probe_spki` Tauri command + `probe_spki_hash()` extracts relay TLS certificate SPKI hash. `remote_add_pin`/`remote_remove_pin` commands. In-memory pin store in RemoteManager
- **Per-message btmsg acknowledgment**`seen_messages` table with session-scoped tracking replaces count-based polling. `btmsg_unseen_messages`, `btmsg_mark_seen`, `btmsg_prune_seen` commands. ON DELETE CASCADE cleanup
- **Aider parser test suite** — 72 vitest tests for extracted `aider-parser.ts` (pure parsing functions). 8 realistic Aider output fixtures. Covers prompt detection, suppression, turn parsing, cost extraction, shell execution, format-drift canaries
- **Dead code wiring** — 4 orphaned Rust functions wired as Tauri commands: `btmsg_get_agent_heartbeats`, `btmsg_queue_dead_letter`, `search_index_task`, `search_index_btmsg`
### Security
- **Plugin sandbox hardened**`new Function()` (same-realm, escapable via prototype walking) replaced with Web Worker (separate JS context, no escape vectors). Eliminates `arguments.callee.constructor` and `({}).constructor.constructor` attack paths
### Changed
- **SidecarManager actor refactor** — replaced `Arc<Mutex<HashMap>>` with dedicated actor thread via `std::sync::mpsc` channel. Eliminates TOCTOU race conditions on session lifecycle. All mutable state owned by single thread
- **Aider parser extraction** — pure functions (`looksLikePrompt`, `parseTurnOutput`, `extractSessionCost`, etc.) extracted from `aider-runner.ts` to `aider-parser.ts` for testability. Runner imports from parser module
### Fixed
- **groups.rs test failure**`test_groups_roundtrip` missing 9 Option fields added in P1-P10 (provider, model, use_worktrees, sandbox_enabled, anchor_budget_scale, stall_threshold_min, is_agent, agent_role, system_prompt)
- **remote_probe_spki tracing skip mismatch**`#[tracing::instrument(skip(state))]` referenced non-existent parameter name. Removed unused State parameter
### Added
- **Comprehensive documentation suite** — 4 new docs: `architecture.md` (end-to-end system architecture with component hierarchy, data flow, IPC patterns), `sidecar.md` (multi-provider runner lifecycle, env stripping, NDJSON protocol, build pipeline), `orchestration.md` (btmsg messaging, bttask kanban, agent roles, wake scheduler, session anchors, health monitoring), `production.md` (sidecar supervisor, Landlock sandbox, FTS5 search, plugin system, secrets management, notifications, audit logging, error classification, telemetry)
- **Sidecar crash recovery/supervision**`bterminal-core/src/supervisor.rs`: SidecarSupervisor wraps SidecarManager with auto-restart, exponential backoff (1s base, 30s cap, 5 retries), SidecarHealth enum (Healthy/Degraded/Failed), 5min stability window. 17 tests
- **Notification system** — OS desktop notifications via `notify-rust` + in-app NotificationCenter.svelte (bell icon, unread badge, history max 100, 6 notification types). Agent dispatcher emits on complete/error/crash. notifications-bridge.ts adapter
- **Secrets management**`keyring` crate with linux-native (libsecret). SecretsManager in secrets.rs: store/get/delete/list with `__bterminal_keys__` metadata tracking. SettingsTab Secrets section. secrets-bridge.ts adapter. No plaintext fallback
- **Keyboard-first UX** — Alt+1-5 project jump, Ctrl+H/L vi-nav, Ctrl+Shift+1-9 tab switch, Ctrl+J terminal toggle, Ctrl+Shift+K focus agent, Ctrl+Shift+F search overlay. `isEditing()` guard prevents conflicts. CommandPalette rewritten: 18+ commands, 6 categories, fuzzy filter, arrow nav, keyboard shortcuts overlay
- **Agent health monitoring** — heartbeats table + dead_letter_queue table in btmsg.db. 15s heartbeat polling in ProjectBox. Stale detection (5min threshold). ProjectHeader heart indicator (green/yellow/red). StatusBar health badge
- **FTS5 full-text search** — rusqlite upgraded to `bundled-full`. SearchDb with 3 FTS5 virtual tables (search_messages, search_tasks, search_btmsg). SearchOverlay.svelte: Spotlight-style Ctrl+Shift+F overlay, 300ms debounce, grouped results with FTS5 highlight snippets
- **Plugin system**`~/.config/bterminal/plugins/` with plugin.json manifest. plugins.rs: discovery, path-traversal-safe file reading, permission validation. plugin-host.ts: sandboxed `new Function()` execution, permission-gated API (palette, btmsg:read, bttask:read, events). plugins.svelte.ts store. SettingsTab plugins section. Example hello plugin
- **Landlock sandbox**`bterminal-core/src/sandbox.rs`: SandboxConfig with RW/RO paths, applied via `pre_exec()` in sidecar child process. Requires kernel 6.2+ (graceful fallback). Per-project toggle in SettingsTab
- **Error classification**`error-classifier.ts`: classifyApiError() with 6 types (rate_limit, auth, quota, overloaded, network, unknown), actionable messages, retry delays. 20 tests
- **Audit log** — audit_log table in btmsg.db. AuditLogTab.svelte: Manager-only tab, filter by type+agent, 5s auto-refresh. audit-bridge.ts adapter. Events: agent_start/stop/error, task changes, wake events, prompt injection
- **Usage meter** — UsageMeter.svelte: compact inline cost/token meter with color thresholds (50/75/90%), hover tooltip. Integrated in AgentPane cost bar
- **Team agent orchestration** — install_cli_tools() copies btmsg/bttask to ~/.local/bin on startup. register_agents_from_groups() with bidirectional contacts. ensure_review_channels_for_group() creates #review-queue/#review-log per group
- **Optimistic locking for bttask**`version` column in tasks table. `WHERE id=? AND version=?` in update_task_status(). Conflict detection in TaskBoardTab. Both Rust + Python CLI updated
- **Unified test runner**`v2/scripts/test-all.sh` runs vitest + cargo tests with optional E2E (`--e2e` flag). npm scripts: `test:all`, `test:all:e2e`, `test:cargo`. Summary output with color-coded pass/fail
- **Testing gate rule**`.claude/rules/20-testing-gate.md` requires running full test suite after every major change (new features, refactors touching 3+ files, store/adapter/bridge/backend changes)
- **E2E test mode infrastructure**`BTERMINAL_TEST=1` env var disables file watchers (watcher.rs, fs_watcher.rs), wake scheduler, and allows data/config dir overrides via `BTERMINAL_TEST_DATA_DIR`/`BTERMINAL_TEST_CONFIG_DIR`. New `is_test_mode` Tauri command bridges test state to frontend
- **E2E data-testid attributes** — Stable test selectors on 7 key Svelte components: AgentPane (agent-pane, data-agent-status, agent-messages, agent-stop, agent-prompt, agent-submit), ProjectBox (project-box, data-project-id, project-tabs, terminal-toggle), StatusBar, AgentSession, GlobalTabBar, CommandPalette, TerminalTabs
- **E2E Phase A scenarios** — 7 human-authored test scenarios (22 tests) in `agent-scenarios.test.ts`: app structural integrity, settings panel, agent pane initial state, terminal tab management, command palette, project focus/tab switching, agent prompt submission (graceful Claude CLI skip)
- **E2E test fixtures**`tests/e2e/fixtures.ts`: creates isolated temp environments with data/config dirs, git repos, and groups.json. `createTestFixture()`, `createMultiProjectFixture()`, `destroyTestFixture()`
- **E2E results store**`tests/e2e/results-db.ts`: JSON-based test run/step tracking (pivoted from better-sqlite3 due to Node 25 native compile failure)
- **E2E Phase B scenarios** — 6 multi-project + LLM-judged test scenarios in `phase-b.test.ts`: multi-project grid rendering, independent tab switching, status bar fleet state, LLM-judged agent response quality, LLM-judged code generation, context tab verification
- **LLM judge helper**`tests/e2e/llm-judge.ts`: dual-mode judge (CLI first, API fallback). CLI backend spawns `claude` with `--output-format text` (unsets CLAUDECODE). API backend uses raw fetch to Anthropic. Backend selectable via `LLM_JUDGE_BACKEND` env var. Structured verdicts (pass/fail + reasoning + confidence), `assertWithJudge()` with configurable min confidence threshold
- **E2E testing documentation**`docs/e2e-testing.md`: comprehensive guide covering all 3 pillars (test fixtures, test mode, LLM judge), spec phases A-C, CI integration, WebKit2GTK pitfalls, troubleshooting
- **E2E CI workflow**`.github/workflows/e2e.yml`: 3 jobs (vitest, cargo, e2e), xvfb-run for headless WebKit2GTK, path-filtered triggers on v2 source changes, LLM-judged tests gated on `ANTHROPIC_API_KEY` secret availability
### Fixed
- **E2E fixture env propagation**`tauri:options.env` does not reliably set process-level env vars for Rust `std::env::var()`. Added `process.env` injection at module scope in wdio.conf.js so fixture groups.json is loaded instead of real user config
- **LLM judge CLI context pollution** — Claude CLI loaded project CLAUDE.md files causing model to refuse JSON output. Fixed by running judge from `cwd: /tmp` with `--setting-sources user` and `--system-prompt` flags
- **E2E mocha timeout** — Increased global mocha timeout from 60s to 180s. Agent-running tests (B4/B5) need 120s+ for Claude CLI round-trip
- **E2E test suite — 27 failures fixed** across 3 spec files: bterminal.test.ts (22 — stale v2 CSS selectors, v3 tab order/count, JS-dispatched KeyboardEvent for Ctrl+K, idempotent palette open/close, backdrop click close, scrollIntoView for below-fold settings, scoped theme dropdown selectors), agent-scenarios.test.ts (3 — JS click for settings button, programmatic focus check, graceful 40s agent timeout with skip), phase-b.test.ts (2 — waitUntil for project box render, conditional null handling for burn-rate/cost elements). 82 E2E passing, 0 failing, 4 skipped
- **AgentPane.svelte missing closing `>`** — div tag with data-testid attributes was missing closing angle bracket, causing template parse issues
### Changed
- **WebDriverIO config** — TCP readiness probe replaces blind 2s sleep for tauri-driver startup (200ms interval, 10s deadline). Added BTERMINAL_TEST=1 passthrough in capabilities
### Security
- `claude_read_skill` path traversal: added `canonicalize()` + `starts_with()` validation to prevent reading arbitrary files via crafted skill paths (commands/claude.rs)
- **Sidecar env allowlist hardening** — added `ANTHROPIC_*` to Rust-level `strip_provider_env_var()` as defense-in-depth (Claude CLI uses credentials file, not env for auth). Dual-layer stripping documented: Rust layer (first checkpoint) + JS runner layer (per-provider)
- **Plugin sandbox hardening** — 13 shadowed globals in `new Function()` sandbox (window, document, fetch, globalThis, self, XMLHttpRequest, WebSocket, Function, importScripts, require, process, Deno, __TAURI__, __TAURI_INTERNALS__). `this` bound to undefined via `.call()`. 35 tests covering all shadows, permissions, and lifecycle. Known escape vectors documented in JSDoc
- **WAL checkpoint** — periodic `PRAGMA wal_checkpoint(TRUNCATE)` every 5 minutes on sessions.db + btmsg.db to prevent unbounded WAL growth under sustained multi-agent load. 2 tests
- **TLS support for bterminal-relay** — optional `--tls-cert` and `--tls-key` CLI args. Server wraps TCP streams with native-tls. Client already supports `wss://` URLs. Generic handler refactor avoids code duplication
- **Landlock fallback logging** — improved warning message with kernel version requirement (6.2+) and documented 3 enforcement states
### Fixed
- **btmsg.rs column index mismatch**`get_agents()` used `SELECT a.*` with positional index 7 for `status`, but column 7 is actually `system_prompt`. Converted all query functions in btmsg.rs and bttask.rs from positional to named column access (`row.get("column_name")`). Added SQL aliases for JOIN columns
- **btmsg-bridge.ts camelCase mismatch**`BtmsgAgent` and `BtmsgMessage` TypeScript interfaces used snake_case fields (`group_id`, `unread_count`, `from_agent`) but Rust `#[serde(rename_all = "camelCase")]` sends camelCase. Fixed interfaces + all consumers (CommsTab.svelte)
- **GroupAgentsPanel event propagation** — toggleAgent button click propagated to parent card click handler (`setActiveProject`). Added `e.stopPropagation()`
- **ArchitectureTab PlantUML encoding**`rawDeflate()` was a no-op, `encode64()` did hex encoding. Collapsed into single `plantumlEncode()` using PlantUML's `~h` hex encoding
- **TestingTab Tauri 2.x asset URL** — used `asset://localhost/` (Tauri 1.x). Fixed to `convertFileSrc()` from `@tauri-apps/api/core`
- **Reconnect loop race in RemoteManager** — orphaned reconnect tasks continued running after `remove_machine()` or `disconnect()`. Added `cancelled: Arc<AtomicBool>` flag to `RemoteMachine`; set on removal/disconnect, checked each reconnect iteration. `connect()` resets flag for new connections (remote.rs)
- **Subagent delegation not triggering** — Manager system prompt had no documentation of Agent tool / delegation capability. Added "Multi-Agent Delegation" section with usage examples and guidelines. Also inject `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env var for Manager agents
- **Gitignore ignoring source code** — root `.gitignore` `plugins/` rule matched `v2/src/lib/plugins/` (source code). Narrowed to `/plugins/` and `/v2/plugins/` (runtime dirs only)
### Added
- **Reviewer agent role** — Tier 1 specialist with reviewer workflow in `agent-prompts.ts` (8-step process: inbox → review-queue → analyze → verdict → status update → review-log → report). Rust `bttask.rs` auto-posts to `#review-queue` btmsg channel on task→review transition via `notify_review_channel()` + `ensure_review_channels()` (idempotent). `reviewQueueDepth` in `attention-scorer.ts` (10pts/task, cap 50). `review_queue_count()` Rust function + Tauri command + `reviewQueueCount()` IPC bridge. ProjectBox: 'Tasks' tab for reviewer (reuses TaskBoardTab), 10s review queue polling → `setReviewQueueDepth()` in health store. 7 new vitest + 4 new cargo tests. 388 vitest + 76 cargo total
- **Auto-wake Manager scheduler**`wake-scheduler.svelte.ts` + `wake-scorer.ts` with 3 user-selectable strategies: persistent (Manager stays running, resume prompt with fleet context), on-demand (fresh session per wake), smart (threshold-gated on-demand, default). 6 wake signals from tribunal S-3 hybrid: AttentionSpike(1.0), ContextPressureCluster(0.9), BurnRateAnomaly(0.8), TaskQueuePressure(0.7), ReviewBacklog(0.6), PeriodicFloor(0.1). Settings UI: strategy segmented button + threshold slider in Manager agent cards. `GroupAgentConfig` extended with `wakeStrategy` + `wakeThreshold` fields. 24 tests in wake-scorer.test.ts. 381 vitest + 72 cargo total
- **Dashboard metrics panel**`MetricsPanel.svelte` new ProjectBox tab ('metrics', PERSISTED-LAZY, all projects). Live view: fleet aggregates (running/idle/stalled + burn rate), project health grid (status, burn rate, context %, idle, tokens, cost, turns, model, conflicts, attention), task board summary (5 kanban columns polled every 10s), cross-project attention queue. History view: 5 switchable SVG sparkline charts (cost/tokens/turns/tools/duration) with area fill, stats row (last/avg/max/min), recent sessions table. 25 tests in MetricsPanel.test.ts. 357 vitest + 72 cargo total
### Changed
- **Branded types for GroupId/AgentId (SOLID Phase 3b)** — Extended `types/ids.ts` with GroupId and AgentId branded types. Applied to ~40 sites: groups.ts interfaces (ProjectConfig.id, GroupConfig.id, GroupAgentConfig.id, GroupsFile.activeGroupId), btmsg-bridge.ts (5 interfaces, 15 function params), bttask-bridge.ts (Task/TaskComment, 6 params), groups-bridge.ts (AgentMessageRecord, ProjectAgentState, SessionMetric), 3 Svelte components (GroupAgentsPanel, TaskBoardTab, SettingsTab). agentToProject() uses `as unknown as ProjectId` cast for domain crossing. 12 tests in ids.test.ts. 332 vitest + 72 cargo total
- **Branded types for SessionId/ProjectId (SOLID Phase 3)**`types/ids.ts` with compile-time branded types (`string & { __brand }`) and factory functions. Applied to ~140 sites across 11 files: Map/Set keys in conflicts.svelte.ts (4 maps), health.svelte.ts (2 maps), session-persistence.ts (3 maps), function signatures across 6 files, boundary branding at sidecar entry in agent-dispatcher.ts, Svelte component call sites in AgentSession/ProjectBox/ProjectHeader. 293 vitest + 49 cargo total
- **agent-dispatcher.ts split (SOLID Phase 2)** — 496→260 lines. Extracted 4 modules: `utils/worktree-detection.ts` (pure function), `utils/session-persistence.ts` (session maps + persist), `utils/auto-anchoring.ts` (compaction anchor), `utils/subagent-router.ts` (spawn + route). Dispatcher is now a thin coordinator
- **session.rs split (SOLID Phase 2)** — 1008-line monolith split into 7 sub-modules under `session/` directory: mod.rs (struct + migrate), sessions.rs, layout.rs, settings.rs, ssh.rs, agents.rs, metrics.rs, anchors.rs. `pub(in crate::session)` conn visibility. 21 new cargo tests
- **lib.rs command module split** — 976-line monolith with 48 Tauri commands split into 11 domain modules under `src-tauri/src/commands/` (pty, agent, watcher, session, persistence, knowledge, claude, groups, files, remote, misc). lib.rs reduced to ~170 lines (AppState + setup + handler registration)
- **Attention scorer extraction**`scoreAttention()` pure function extracted from inline health store code to `utils/attention-scorer.ts` with 14 tests. Priority chain: stalled > error > context critical > file conflict > context high
- **Shared type guards** — deduplicated `str()`/`num()` runtime guards from claude-messages.ts, codex-messages.ts, ollama-messages.ts into shared `utils/type-guards.ts`
- **btmsg/bttask WAL mode** — added SQLite WAL journal mode + 5s busy_timeout to both `btmsg.rs` and `bttask.rs` `open_db()` for safe concurrent access from Python CLIs + Rust backend
### Added
- **Regression tests for btmsg/bttask bug fixes** — 49 new tests: btmsg.rs (8, in-memory SQLite with named column access regression for status vs system_prompt), bttask.rs (7, named column access + serde camelCase), sidecar strip_provider_env_var (8, all prefix combinations), btmsg-bridge.test.ts (17, camelCase fields + IPC commands), bttask-bridge.test.ts (10, camelCase + IPC), plantuml-encode.test.ts (7, hex encoding algorithm). Total: 327 vitest + 72 cargo
- **Configurable stall threshold** — per-project range slider (560 min, step 5) in SettingsTab. `stallThresholdMin` in `ProjectConfig` (groups.json), `setStallThreshold()` API in health store with `stallThresholds` Map and `DEFAULT_STALL_THRESHOLD_MS` fallback. ProjectBox `$effect` syncs config → store on mount/change
- **Memora adapter**`MemoraAdapter` (memora-bridge.ts) implements `MemoryAdapter` interface, bridging to Memora's SQLite database (`~/.local/share/memora/memories.db`) via read-only Rust backend (`memora.rs`). FTS5 text search, tag filtering via `json_each()`. 4 Tauri commands (memora_available, memora_list, memora_search, memora_get). Registered in App.svelte onMount. 16 vitest + 7 cargo tests. MemoriesTab now shows Memora memories on startup
- **Codex provider runner**`sidecar/codex-runner.ts` wraps `@openai/codex-sdk` (dynamic import, graceful failure if not installed). Maps Codex ThreadEvents (agent_message, reasoning, command_execution, file_change, mcp_tool_call, web_search) to common AgentMessage format via `codex-messages.ts` adapter. Sandbox/approval mode mapping from BTerminal permission modes. Session resume via thread ID. `providers/codex.ts` ProviderMeta (gpt-5.4 default, hasSandbox, supportsResume). 19 adapter tests
- **Ollama provider runner**`sidecar/ollama-runner.ts` uses direct HTTP to `localhost:11434/api/chat` with NDJSON streaming (zero external dependencies). Health check before session start. Configurable host/model/num_ctx/think via providerConfig. Supports Qwen3 extended thinking. `ollama-messages.ts` adapter maps streaming chunks to AgentMessage (text, thinking, cost with token counts). `providers/ollama.ts` ProviderMeta (qwen3:8b default, modelSelection only). 11 adapter tests
- All 3 providers registered in App.svelte onMount + message-adapters.ts. `build:sidecar` builds all 3 runners
- **S-1 Phase 3: Worktree isolation per project** — per-project `useWorktrees` toggle in SettingsTab. When enabled, agents run in git worktrees at `<repo>/.claude/worktrees/<sessionId>/` via SDK `extraArgs: { worktree: sessionId }`. CWD-based worktree detection in agent-dispatcher (`detectWorktreeFromCwd()`) matches `.claude/`, `.codex/`, `.cursor/` worktree patterns on init events. Dual detection: CWD-based (primary) + tool_call-based (subagent fallback). 8 files, +125 lines, 7 new tests. 226 vitest + 42 cargo tests
- **S-2 Session Anchors** — preserves important conversation turns through context compaction chains. Auto-anchors first 3 turns with observation masking (reasoning preserved in full per research). Manual pin button on AgentPane text messages. Three anchor types: auto (re-injectable), pinned (display-only), promoted (user-promoted, re-injectable). Re-injection via `system_prompt` field. ContextTab anchor section with budget meter bar, per-anchor promote/demote/remove actions. SQLite `session_anchors` table with 5 CRUD commands. 5 new files, 7 modified. 219 vitest + 42 cargo tests
- **Configurable anchor budget scale**`AnchorBudgetScale` type with 4 presets: Small (2K), Medium (6K, default), Large (12K), Full (20K). Per-project 4-stop range slider in SettingsTab. `ProjectConfig.anchorBudgetScale` persisted in groups.json. ContextTab budget meter derives from project setting. agent-dispatcher resolves scale on auto-anchor
- **Agent provider adapter pattern** — full implementation (3 phases complete): core abstraction layer (provider types/registry/capabilities, message adapter registry, 4 file renames), Settings UI (collapsible per-provider config panels, per-project provider dropdown, settings persistence), sidecar routing (provider-based runner selection, env var stripping for CLAUDE*/CODEX*/OLLAMA*). 5 new files, 4 renames, 20+ modified. 6 architecture decisions (PA-1PA-6). Docs at `docs/provider-adapter/`
- **PDF viewer** in Files tab: `PdfViewer.svelte` using pdfjs-dist (v5.5.207). Canvas-based multi-page rendering, zoom controls (0.5x3x, 25% steps), HiDPI-aware via devicePixelRatio. Reads PDF via `convertFileSrc()` — no new Rust commands needed
- **CSV table view** in Files tab: `CsvTable.svelte` with RFC 4180 CSV parser (no external dependency). Auto-detects delimiter (comma, semicolon, tab). Sortable columns (numeric-aware), sticky header, row numbers, text truncation at 20rem
- FilesTab routing update: Binary+pdf → PdfViewer, Text+csv → CsvTable. Updated file icons (📕 PDF, 📊 CSV)
- **S-1 Phase 2: Filesystem write detection** — inotify-based real-time file change detection via `ProjectFsWatcher` (fs_watcher.rs). Watches project CWDs recursively, filters .git/node_modules/target, debounces 100ms per-file (fs_watcher.rs, lib.rs)
- External write conflict detection: timing heuristic (2s grace window) distinguishes agent writes from external edits. `EXTERNAL_SESSION_ID` sentinel, `recordExternalWrite()`, `getExternalConflictCount()`, `FileConflict.isExternal` flag (conflicts.svelte.ts)
- Separate external write badge (orange ⚡) and agent conflict badge (red ⚠) in ProjectHeader (ProjectHeader.svelte)
- `externalConflictCount` in ProjectHealth interface with attention scoring integration (health.svelte.ts)
- Frontend bridge for filesystem watcher: `fsWatchProject()`, `fsUnwatchProject()`, `onFsWriteDetected()`, `fsWatcherStatus()` (fs-watcher-bridge.ts)
- Inotify watch limit sensing: `FsWatcherStatus` reads `/proc/sys/fs/inotify/max_user_watches`, counts watched directories per project, warns at >75% usage with shell command to increase limit (fs_watcher.rs, lib.rs, ProjectBox.svelte)
- Delayed scanning toast: "Scanning project directories…" info toast shown only when inotify status check takes >300ms, auto-dismissed on completion (ProjectBox.svelte)
- `notify()` returns toast ID (was void) to enable dismissing specific toasts via `dismissNotification(id)` (notifications.svelte.ts)
- ProjectBox `$effect` starts/stops fs watcher per project CWD on mount/unmount with toast on new external conflict + inotify capacity check (ProjectBox.svelte)
- Collapsible text messages in AgentPane: model responses wrapped in `<details open>` (open by default, user-collapsible with first-line preview) (AgentPane.svelte)
- Collapsible cost summary in AgentPane: `cost.result` wrapped in `<details>` (collapsed by default, expandable with 80-char preview) (AgentPane.svelte)
- Project max aspect ratio setting: `project_max_aspect` (float 0.33.0, default 1.0) limits project box width via CSS `max-width: calc(100vh * var(--project-max-aspect))` (SettingsTab.svelte, ProjectGrid.svelte, App.svelte)
- No-implicit-push rule: `.claude/rules/52-no-implicit-push.md` — never push unless user explicitly asks
- `StartupWMClass=bterminal` in install-v2.sh .desktop template for GNOME auto-move extension compatibility
- MarkdownPane link navigation: relative file links open in Files tab, external URLs open in system browser via `xdg-open`, anchor links scroll in-page (MarkdownPane.svelte, ProjectFiles.svelte, lib.rs)
- `open_url` Tauri command for opening http/https URLs in system browser (lib.rs)
- Tab system overhaul: renamed Claude→Model, Files→Docs, added 3 new tabs (Files, SSH, Memory) with PERSISTED-EAGER/LAZY mount strategies (ProjectBox.svelte)
- FilesTab: VSCode-style directory tree sidebar + tabbed content viewer with shiki syntax highlighting, word wrap, image display via convertFileSrc, 10MB file size gate, collapsible/resizable sidebar, preview vs pinned tabs (FilesTab.svelte)
- CodeEditor: CodeMirror 6 editor component with Catppuccin theme (reads --ctp-* CSS vars), 15 lazy-loaded language modes, auto-close brackets, bracket matching, code folding, line numbers, search, line wrapping, Ctrl+S save binding, blur event (CodeEditor.svelte)
- FilesTab editor mode: files are now editable with dirty dot indicator on tabs, (unsaved) label in path bar, Ctrl+S save, auto-save dirty tabs on close (FilesTab.svelte)
- Rust `write_file_content` command: writes content to existing files only — safety check prevents creating new files (lib.rs)
- Save-on-blur setting: `files_save_on_blur` toggle in Settings → Defaults → Editor, auto-saves files when editor loses focus (SettingsTab.svelte, FilesTab.svelte)
- SshTab: SSH connection CRUD panel with launch-to-terminal button, reuses existing ssh-bridge.ts model (SshTab.svelte)
- MemoriesTab: pluggable knowledge explorer with MemoryAdapter interface, adapter registry, search, tag display, expandable cards (MemoriesTab.svelte, memory-adapter.ts)
- Rust `list_directory_children` command: lazy tree expansion, hidden files skipped, dirs-first alphabetical sort (lib.rs)
- Rust `read_file_content` command: FileContent tagged union (Text/Binary/TooLarge), 30+ language mappings (lib.rs)
- Frontend `files-bridge.ts` adapter: DirEntry and FileContent TypeScript types + IPC wrappers
- ContextTab: LLM context window visualization with stats bar (tokens, cost, turns, duration), segmented token meter (color-coded by message type), file references tree (extracted from tool calls), and collapsible turn breakdown — replaces old ContextPane ctx database viewer (ContextTab.svelte)
- ContextTab AST view: per-turn SVG conversation trees showing hierarchical message flow (Turn → Thinking/Response/Tool Calls → File operations), with bezier edges, color-coded nodes, token counts, and detail tooltips (ContextTab.svelte)
- ContextTab Graph view: bipartite tool→file DAG with tools on left (color-coded by type) and files on right, curved SVG edges showing which tools touched which files, count badges on both sides (ContextTab.svelte)
- Compaction event detection: `compact_boundary` SDK messages adapted to `CompactionContent` type in sdk-messages.ts, ContextTab shows yellow compaction count pill in stats bar and red boundary nodes in AST view
- Project health store: per-project activity state (running/idle/stalled), burn rate ($/hr EMA), context pressure (% of model limit), attention scoring with urgency weights (health.svelte.ts)
- Mission Control status bar: running/idle/stalled agent counts, total $/hr burn rate, "needs attention" dropdown priority queue with click-to-focus cards (StatusBar.svelte)
- ProjectHeader health indicators: color-coded status dot (green=running, orange=stalled), context pressure badge, burn rate badge (ProjectHeader.svelte)
- Session metrics SQLite table: per-project historical metrics with 100-row retention, `session_metric_save` and `session_metrics_load` Tauri commands (session.rs, lib.rs)
- Session metric persistence on agent completion: records peak tokens, turn count, tool call count, cost, model, status (agent-dispatcher.ts)
- File overlap conflict detection store: per-project tracking of Write/Edit tool file paths across agent sessions, detects when 2+ sessions write same file, SCORE_FILE_CONFLICT=70 attention signal (conflicts.svelte.ts)
- Shared tool-files utility: extractFilePaths() and extractWritePaths() extracted from ContextTab to reusable module (tool-files.ts)
- File conflict indicators: red "⚠ N conflicts" badge in ProjectHeader, conflict count in StatusBar, toast notification on new conflict, conflict cards in attention queue (ProjectHeader.svelte, StatusBar.svelte)
- Health tick auto-stop/auto-start: tick timer self-stops when no running/starting sessions, auto-restarts on recordActivity() (health.svelte.ts)
- Bash write detection in tool-files.ts: BASH_WRITE_PATTERNS regex array covering >, >>, sed -i, tee, cp, mv, chmod/chown — conflict detection now catches shell-based file writes (tool-files.ts)
- Worktree-aware conflict suppression: sessions in different git worktrees don't trigger conflicts, sessionWorktrees tracking map, setSessionWorktree() API, extractWorktreePath() detects Agent/Task isolation:"worktree" and EnterWorktree tool calls (conflicts.svelte.ts, tool-files.ts, agent-dispatcher.ts)
- Acknowledge/dismiss conflicts: acknowledgeConflicts(projectId) suppresses badge until new session writes, acknowledgedFiles state map, auto-clear on new session write to acknowledged file (conflicts.svelte.ts)
- Clickable conflict badge in ProjectHeader: red button with ✕ calls acknowledgeConflicts() on click with stopPropagation, hover darkens background (ProjectHeader.svelte)
- `useWorktrees` optional boolean field on ProjectConfig for future per-project worktree spawning setting (groups.ts)
### Changed
- Anchor observation masking no longer truncates assistant reasoning text (was 500 chars) — reasoning is preserved in full per research consensus (JetBrains NeurIPS 2025, SWE-agent, OpenDev ACC); only tool outputs are compacted (anchor-serializer.ts)
- `getAnchorSettings()` now accepts optional `AnchorBudgetScale` parameter to resolve budget from per-project scale setting (anchors.svelte.ts)
- ContextTab now derives anchor budget from `anchorBudgetScale` prop via `ANCHOR_BUDGET_SCALE_MAP` instead of hardcoded `DEFAULT_ANCHOR_SETTINGS` (ContextTab.svelte)
- Renamed `sdk-messages.ts``claude-messages.ts`, `agent-runner.ts``claude-runner.ts`, `ClaudeSession.svelte``AgentSession.svelte` — provider-neutral naming for multi-provider support
- `agent-dispatcher.ts` now uses `adaptMessage(provider, event)` from message-adapters.ts registry instead of directly calling `adaptSDKMessage` — enables per-provider message parsing
- Rust `AgentQueryOptions` gained `provider` (String, defaults "claude") and `provider_config` (serde_json::Value) fields with serde defaults for backward compatibility
- Rust `SidecarManager.resolve_sidecar_for_provider(provider)` looks for `{provider}-runner.mjs` instead of hardcoded `claude-runner.mjs`
- Rust `strip_provider_env_var()` strips CLAUDE*/CODEX*/OLLAMA* env vars (whitelists CLAUDE_CODE_EXPERIMENTAL_*)
- SettingsTab: added Providers section with collapsible per-provider config panels (enabled toggle, default model, capabilities display) and per-project provider dropdown
- AgentPane: capability-driven rendering via ProviderCapabilities props (hasProfiles, hasSkills, supportsResume gates)
- AgentPane UI redesign: sans-serif root font (system-ui), tool calls paired with results in collapsible `<details>` groups, hook messages collapsed into compact labels, context window usage meter in status strip, cost bar made minimal (no background), session summary with translucent background, two-phase scroll anchoring, tool-aware output truncation (Bash 500/Read 50/Glob 20 lines), colors softened via `color-mix()`, responsive margins via container queries (AgentPane.svelte)
- MarkdownPane: added inner scroll wrapper with `container-type: inline-size`, responsive padding via shared `--bterminal-pane-padding-inline` variable (MarkdownPane.svelte)
- Added `--bterminal-pane-padding-inline: clamp(0.75rem, 3.5cqi, 2rem)` shared CSS variable for responsive pane padding (catppuccin.css)
### Fixed
- FilesTab invalid HTML nesting: file tab bar used `<button>` inside `<button>` which Svelte/browser rejects — changed outer element to `<div role="tab">` (FilesTab.svelte)
- FilesTab file content not rendering: after inserting a FileTab into the `$state` array, the local plain-object reference lost Svelte 5 proxy reactivity — content mutations were invisible. Fixed by looking up from the reactive array before setting content (FilesTab.svelte)
- ClaudeSession type errors: cast `last_session_id` to UUID template literal type, add missing `timestamp` field (from `created_at`) to restored AgentMessage records (ClaudeSession.svelte)
- Cost bar shows only last turn's cost instead of cumulative session total: `updateAgentCost()` changed from assignment to accumulation (`+=`) so continued sessions properly sum costs across all turns (agents.svelte.ts)
- ProjectBox tab switch destroys running agent sessions: changed `{#if activeTab}` conditional rendering to CSS `style:display` (flex/none) for all three content panes and terminal section — ClaudeSession now stays mounted across tab switches, preserving session ID, message history, and running agents (ProjectBox.svelte)
- Sidecar env var stripping now whitelists `CLAUDE_CODE_EXPERIMENTAL_*` vars (both Rust sidecar.rs and JS agent-runner.ts) — previously all `CLAUDE*` vars were stripped, blocking feature flags like agent teams from reaching the SDK (sidecar.rs, agent-runner.ts)
- E2E terminal tab tests: scoped selectors to `.tab-bar .tab-title` (was `.tab-title` which matched project tabs), used `browser.execute()` for DOM text reads to avoid stale element issues (bterminal.test.ts)
- E2E wdio.conf.js: added `wdio:enforceWebDriverClassic: true` to disable BiDi negotiation (wdio v9 injects `webSocketUrl:true` which tauri-driver rejects), removed unnecessary `browserName: 'wry'`, fixed binary path to Cargo workspace target dir (`v2/target/debug/` not `v2/src-tauri/target/debug/`)
- E2E consolidated to single spec file: Tauri creates one app session per spec file; multiple files caused "invalid session id" on 2nd+ file (wdio.conf.js, bterminal.test.ts)
- E2E WebDriver clicks on Svelte 5 components: `element.click()` doesn't reliably trigger onclick handlers inside complex components via WebKit2GTK/tauri-driver; replaced with `browser.execute()` JS-level clicks for .ptab, .dropdown-trigger, .panel-close (bterminal.test.ts)
- Removed `tauri-plugin-log` entirely — `telemetry::init()` already registers tracing-subscriber which bridges the `log` crate; adding plugin-log after panics with "attempted to set a logger after the logging system was already initialized" (lib.rs, Cargo.toml)
### Changed
- E2E tests expanded from 6 smoke tests to 48 tests across 8 describe blocks: Smoke (6), Workspace & Projects (8), Settings Panel (6), Keyboard Shortcuts (5), Command Palette (5), Terminal Tabs (7), Theme Switching (3), Settings Interaction (8) — all in single bterminal.test.ts file
- wdio.conf.js: added SKIP_BUILD env var to skip cargo tauri build when debug binary already exists
### Removed
- Ollama-specific warning toast from AgentPane when injecting anchors — replaced by generic configurable budget scale slider (AgentPane.svelte)
- Unused `notify` import from AgentPane (AgentPane.svelte)
- `tauri-plugin-log` dependency from Cargo.toml — redundant with telemetry::init() tracing-subscriber setup
- Individual E2E spec files (smoke.test.ts, keyboard.test.ts, settings.test.ts, workspace.test.ts) — consolidated into bterminal.test.ts
- Workspace teardown race: `switchGroup()` now awaits `waitForPendingPersistence()` before clearing agent state, preventing data loss when agents complete during group switch (agent-dispatcher.ts, workspace.svelte.ts)
- SettingsTab switchGroup click handler made async with await to properly handle the async switchGroup() flow (SettingsTab.svelte)
- Re-entrant sidecar exit handler race condition: added `restarting` guard flag preventing double-restart on rapid disconnect/reconnect (agent-dispatcher.ts)
- Memory leak: `toolUseToChildPane` and `sessionProjectMap` maps now cleared in `stopAgentDispatcher()` (agent-dispatcher.ts)
- Listener leak: 5 Tauri event listeners in machines store now tracked via `UnlistenFn[]` array with `destroyMachineListeners()` cleanup function (machines.svelte.ts)
- Fragile abort detection: replaced `errMsg.includes('aborted')` with `controller.signal.aborted` for authoritative abort state check (agent-runner.ts)
- Unhandled rejection: `handleMessage` made async with `.catch()` on `rl.on('line')` handler preventing sidecar crash on malformed input (agent-runner.ts)
- Remote machine `add_machine`/`list_machines`/`remove_machine` converted from `try_lock()` (silent failure on contention) to async `.lock().await` (remote.rs)
- `remove_machine` now aborts `WsConnection` tasks before removal, preventing resource leak (remote.rs)
- `save_agent_messages` wrapped in `unchecked_transaction()` for atomic DELETE+INSERT, preventing partial writes on crash (session.rs)
- Non-null assertion `msg.event!` replaced with safe check `if (msg.event)` in agent bridge event handler (agent-bridge.ts)
- Runtime type guards (`str()`, `num()`) replace bare `as` casts on untrusted SDK wire format in sdk-messages.ts
- ANTHROPIC_* environment variables now stripped alongside CLAUDE* in sidecar agent-runner.ts
- Frontend persistence timestamps use `Math.floor(Date.now() / 1000)` matching Rust seconds convention (agent-dispatcher.ts)
- Remote disconnect handler converted from `try_lock()` to async `.lock().await` (remote.rs)
- `save_layout` pane_ids serialization error now propagated instead of silent fallback (session.rs)
- ctx.rs Mutex::lock() returns Err instead of panicking on poisoned lock (5 occurrences)
- ctx CLI: `int()` limit argument validated with try/except (ctx)
- ctx CLI: FTS5 MATCH query wrapped in try/except for syntax errors (ctx)
- File watcher: explicit error for root-level path instead of silent fallback (watcher.rs)
- Agent bridge payload validated before cast to SidecarMessage (agent-bridge.ts)
- Profile.toml and resource_dir failures now log::warn instead of silent empty fallback (lib.rs)
### Changed
- All ~100 px layout values converted to rem across 10 components per rule 18: AgentPane, ToastContainer, CommandPalette, SettingsTab, TeamAgentsPanel, AgentCard, StatusBar, AgentTree, TerminalPane, AgentPreviewPane (1rem = 16px base, icon/dot dimensions kept as px)
### Added
- E2E testing infrastructure: WebdriverIO v9.24 + tauri-driver setup with `wdio.conf.js` (lifecycle hooks for tauri-driver spawn/kill, debug binary build), 6 smoke tests (`smoke.test.ts`), TypeScript config, `test:e2e` npm script, 4 new devDeps (@wdio/cli, @wdio/local-runner, @wdio/mocha-framework, @wdio/spec-reporter)
- `waitForPendingPersistence()` export in agent-dispatcher.ts: counter-based fence that resolves when all in-flight `persistSessionForProject()` calls complete
- OpenTelemetry instrumentation: `telemetry.rs` module with TelemetryGuard (Drop-based shutdown), tracing + optional OTLP/HTTP export to Tempo, controlled by `BTERMINAL_OTLP_ENDPOINT` env var (absent = console-only fallback)
- `#[tracing::instrument]` on 10 key Tauri commands: pty_spawn, pty_kill, agent_query, agent_stop, agent_restart, remote_connect, remote_disconnect, remote_agent_query, remote_agent_stop, remote_pty_spawn
- `frontend_log` Tauri command: routes frontend telemetry events (level + message + context JSON) to Rust tracing layer with `source="frontend"` field
- `telemetry-bridge.ts` adapter: `tel.info/warn/error/debug/trace()` convenience wrappers for frontend → Rust tracing bridge via IPC
- Agent dispatcher telemetry: structured events for agent_started, agent_stopped, agent_error, sidecar_crashed, and agent_cost (with full metrics: costUsd, tokens, turns, duration)
- Docker Tempo + Grafana stack (`docker/tempo/`): Tempo (OTLP gRPC 4317, HTTP 4318, query 3200) + Grafana (port 9715) with auto-provisioned Tempo datasource
- 6 new Rust dependencies: tracing 0.1, tracing-subscriber 0.3, opentelemetry 0.28, opentelemetry_sdk 0.28, opentelemetry-otlp 0.28, tracing-opentelemetry 0.29
- `ctx_register_project` Tauri command and `ctxRegisterProject()` bridge function: registers a project in the ctx database via `INSERT OR IGNORE` into sessions table; opens DB read-write briefly then closes
- Agent preview terminal (`AgentPreviewPane.svelte`): read-only xterm.js terminal that subscribes to agent session messages in real-time; renders Bash commands as cyan ` command`, file operations as yellow `[Read/Write/Edit] path`, tool results (80-line truncation), text summaries, errors in red, session start/complete with cost; uses `disableStdin: true`, Canvas addon, theme hot-swap; spawned via 👁 button in TerminalTabs tab bar (appears when agent session is active); deduplicates — only one preview per session
- `TerminalTab.type` extended with `'agent-preview'` variant and `agentSessionId?: string` field in workspace store
- `ProjectBox` passes `mainSessionId` to `TerminalTabs` for agent preview tab creation
- SettingsTab project settings card redesign: each project rendered as a polished card with icon picker (Svelte state-driven emoji grid popup), inline-editable name input, CWD with left-ellipsis (`direction: rtl`), account/profile dropdown (via `listProfiles()` from claude-bridge.ts), custom toggle switch (green track/thumb), and subtle remove footer with trash icon
- Account/profile dropdown per project in SettingsTab: uses `listProfiles()` to fetch Claude profiles, displays display_name + email in dropdown, blue badge styling; falls back to static label when single profile
- ProjectHeader profile badge: account name styled as blue pill with translucent background (`color-mix(in srgb, var(--ctp-blue) 10%, transparent)`), font-weight 600, expanded max-width to 8rem
- Theme integration rule (`.claude/rules/51-theme-integration.md`): mandates all colors via `--ctp-*` CSS custom properties, never hardcode hex/rgb/hsl values
- AgentPane VSCode-style prompt: unified input always at bottom with auto-resizing textarea, send icon button (arrow SVG) inside rounded container, welcome state with chat icon when no session
- AgentPane session controls: New Session and Continue buttons shown after session completes, enabling explicit session management
- ClaudeSession `handleNewSession()`: resets sessionId for fresh agent sessions, wired via `onExit` prop to AgentPane
- ContextPane "Initialize Database" button: when ctx database doesn't exist, shows a prominent button to create `~/.claude-context/context.db` with full schema (sessions, contexts, shared, summaries + FTS5 + sync triggers) directly from the UI; replaces old "run ctx init" hint text; auto-loads data after successful init
- Project-level tab bar in ProjectBox: Claude | Files | Context tabs switch the content area between ClaudeSession, ProjectFiles, and ContextPane
- ProjectFiles.svelte: project-scoped markdown file viewer (file picker sidebar + MarkdownPane), accepts cwd/projectName props
- ProjectHeader info bar: CWD path (ellipsized from start via `direction: rtl`) + profile name displayed as read-only info alongside project icon/name
- Emoji icon picker in SettingsTab: 24 project-relevant emoji in 8-column grid popup, replaces plain text icon input
- Native directory picker for CWD fields: custom `pick_directory` Tauri command using `rfd` crate with `set_parent(&window)` for modal behavior on Linux; browse buttons added to Default CWD, existing project CWD, and Add Project path inputs in SettingsTab
- `rfd = { version = "0.16", default-features = false, features = ["gtk3"] }` direct dependency for modal file dialogs (zero extra compile — already built transitively via tauri-plugin-dialog)
- CSS relative units rule (`.claude/rules/18-relative-units.md`): enforces rem/em for layout CSS, px only for icons/borders/shadows
### Changed
- ContextPane redesigned as project-scoped: now receives `projectName` + `projectCwd` props from ProjectBox; auto-registers project in ctx database on mount (`INSERT OR IGNORE`); removed project selector list — directly shows context entries, shared context, and session summaries for the current project; empty state shows `ctx set <project> <key> <value>` usage hint; all CSS converted to rem; header shows project name in accent color
- Sidebar simplified to Settings-only: removed Sessions, Docs, Context icons from GlobalTabBar (project-specific tabs already in ProjectBox); removed DocsTab/ContextTab imports from App.svelte; removed Alt+1..4 keyboard shortcuts; drawer always renders SettingsTab
- MarkdownPane file switching: replaced onMount-only `watchFile()` with reactive `$effect` that unwatches previous file and watches new one when `filePath` prop changes; added `highlighterReady` gate to prevent premature watches
- MarkdownPane premium typography overhaul: font changed from `var(--ui-font-family)` (resolved to JetBrains Mono) to hardcoded `'Inter', system-ui, sans-serif` for proper prose rendering; added `text-rendering: optimizeLegibility`, `-webkit-font-smoothing: antialiased`, `font-feature-settings: 'cv01', 'cv02', 'cv03', 'cv04', 'ss01'` (Inter alternates); body color softened from `--ctp-text` to `--ctp-subtext1` for reduced dark-mode contrast; Tailwind-prose-inspired spacing (1.15-1.75em paragraph/heading margins); heading line-height tightened to 1.2-1.4 with negative letter-spacing on h1/h2; gradient HR (`linear-gradient` fading to transparent edges); link underlines use `text-decoration-color` transition (30% opacity → full on hover, VitePress pattern); blockquotes now italic with translucent bg; code blocks have inset `box-shadow` for depth; added h5 (uppercase small) and h6 styles; all colors via `--ctp-*` vars for 17-theme compatibility
- ProjectBox terminal area: only visible on Claude tab, now collapsible — collapsed shows a status bar with chevron toggle, "Terminal" label, and tab count badge; expanded shows full 16rem TerminalTabs area. Default: collapsed. Grid rows: `auto auto 1fr auto`
- SettingsTab project settings: flat row layout replaced with stacked card layout; icon picker rewritten from DOM `classList.toggle('visible')` to Svelte `$state` (iconPickerOpenFor); checkbox replaced with custom toggle switch component
- SettingsTab CSS: all remaining px values in project section converted to rem; add-project form uses dashed border container
- AgentPane prompt: replaced separate initial prompt + follow-up input with single unified prompt area; removed `followUpPrompt` state, `handleSubmit` function; follow-up handled via `isResume` detection in `handleUnifiedSubmit()`
- AgentPane CSS: migrated all legacy CSS vars (`--bg-primary`, `--bg-surface`, `--text-primary`, `--text-secondary`, `--text-muted`, `--border`, `--accent`, `--font-mono`, `--border-radius`) to `--ctp-*` theme vars + rem units
- ContextPane CSS: same legacy-to-theme var migration as AgentPane
- ProjectBox tab CSS: polished with `margin-bottom: -1px` active tab trick (merges with content), `scrollbar-width: none`, `focus-visible` outline, hover with `var(--ctp-surface0)` background
- ProjectBox layout: CSS grid with 4 rows (`auto auto 1fr auto`) — header | tab bar | content | terminal; content area switches by tab
- AgentPane: removed DIR/ACC toolbar entirely — CWD and profile now passed as props from parent (set in Settings, shown in ProjectHeader); clean chat window with prompt + send button only
- AgentPane prompt area: anchored to bottom (`justify-content: flex-end`) instead of vertical center, removed `max-width: 600px` constraint — uses full panel width
- ClaudeSession passes `project.profile` to AgentPane for automatic profile resolution
- ProjectGrid.svelte CSS converted from px to rem: gap 0.25rem, padding 0.25rem, min-width 30rem
- TerminalTabs.svelte CSS converted from px to rem: tab bar, tabs, close/add buttons, empty state
### Removed
- Dead ctx code: `ContextTab.svelte` wrapper component, `CtxProject` struct (Rust), `list_projects()` method, `ctx_list_projects` Tauri command, `ctxListProjects()` bridge function, `CtxProject` TypeScript interface — all unused after ContextPane project-scoped redesign
- Unused Python imports in `ctx` CLI: `os`, `datetime`/`timezone` modules
- AgentPane session toolbar (DIR/ACC inputs) — CWD and profile are now props, not interactive inputs
- Nerd Font codepoints for project icons — replaced with emoji (`📁` default) for cross-platform compatibility
- Nerd Font `font-family` declarations from ProjectHeader and TerminalTabs
- Stub `pick_directory` Tauri command (replaced by `tauri-plugin-dialog` frontend API)
### Fixed
- `ctx init` fails when `~/.claude-context/` directory doesn't exist: `get_db()` called `sqlite3.connect()` without creating the parent directory; added `DB_PATH.parent.mkdir(parents=True, exist_ok=True)` before connect
- Terminal tabs cannot be closed and all named "Shell 1": `$state<Map<string, TerminalTab[]>>` in workspace store didn't trigger reactive updates for `$derived` consumers when `Map.set()` was called; changed `projectTerminals` from `Map` to `Record<string, TerminalTab[]>` (plain object property access is Svelte 5's strongest reactivity path)
- SettingsTab icon picker not opening: replaced broken DOM `classList.toggle('visible')` approach with Svelte `$state` (`iconPickerOpenFor` keyed by project ID); icon picker now reliably opens/closes and dismisses on click-outside or Escape
- SettingsTab CWD path truncated from right: added `direction: rtl; text-align: left; unicode-bidi: plaintext` on CWD input so path shows the end (project directory) instead of the beginning when truncated
- Project icons showing "?" — Nerd Font codepoint `\uf120` not rendering without font installed; switched to emoji
- Native directory picker not opening: added missing `"dialog:default"` permission to `v2/src-tauri/capabilities/default.json` — Tauri's IPC security layer silently blocked `invoke()` calls without this capability
- Native directory picker not modal on Linux: replaced `@tauri-apps/plugin-dialog` `open()` with custom `pick_directory` Tauri command using `rfd::AsyncFileDialog::set_parent(&window)` — the plugin skips `set_parent` on Linux via `cfg(any(windows, target_os = "macos"))` gate
- Native directory picker not dark-themed: set `GTK_THEME=Adwaita:dark` via `std::env::set_var` at Tauri startup to force dark theme on native GTK dialogs
- Sidebar drawer not scaling to content width: removed leftover v2 grid layout on `#app` in `app.css` (`display: grid; grid-template-columns: var(--sidebar-width) 1fr` + media queries) that constrained `.app-shell` to 260px first column; v3 `.app-shell` manages its own flexbox layout internally
- ContextPane.svelte CSS converted from px to rem: font-size, padding, margin, gap; added `white-space: nowrap` on `.ctx-header`/`.ctx-error` for intrinsic width measurement
### Changed
- GlobalTabBar.svelte CSS converted from px to rem: rail width 2.75rem, button 2rem, gap 0.25rem, padding 0.5rem 0.375rem, border-radius 0.375rem; rail-btn color changed from --ctp-overlay1 to --ctp-subtext0 for better contrast
- App.svelte sidebar header CSS converted from px to rem: padding 0.5rem 0.75rem, close button 1.375rem, border-radius 0.25rem
- App.svelte sidebar drawer: JS `$effect` measures content width via `requestAnimationFrame` + `querySelectorAll` for nowrap elements, headings, inputs, and tab-specific selectors; `panelWidth` state drives inline `style:width` on `aside.sidebar-panel`
- Sidebar panel changed from fixed width (28em) to content-driven sizing with `min-width: 16em` and `max-width: 50%`; each tab component defines its own `min-width: 22em`
- Sidebar panel and panel-content overflow changed from `hidden` to `overflow-y: auto` to allow content to drive parent width
- SettingsTab.svelte padding converted from px to rem (0.75rem 1rem)
- DocsTab.svelte converted from px to rem: file-picker 14em, picker-title/file-btn/empty padding in rem
- ContextTab.svelte, DocsTab.svelte, SettingsTab.svelte all now set `min-width: 22em` for content-driven drawer sizing
- UI redesigned from top tab bar + right-side settings drawer to VSCode-style left sidebar: vertical icon rail (GlobalTabBar, 2.75rem, 4 SVG icons) + expandable drawer panel (content-driven width) + always-visible main workspace (ProjectGrid)
- GlobalTabBar rewritten from horizontal text tabs + gear icon to vertical icon rail with SVG icons for Sessions, Docs, Context, Settings; Props: `expanded`/`ontoggle` (was `settingsOpen`/`ontoggleSettings`)
- Settings is now a regular sidebar tab (not a special right-side drawer); `WorkspaceTab` type: `'sessions' | 'docs' | 'context' | 'settings'`
- App.svelte layout: `.main-row` flex container with icon rail + optional sidebar panel + workspace; state renamed `settingsOpen` -> `drawerOpen`
- Keyboard shortcuts: Alt+1..4 (switch tabs + open drawer), Ctrl+B (toggle sidebar), Ctrl+, (toggle settings), Escape (close drawer)
- SettingsTab CSS: `height: 100%` (was `flex: 1`) for sidebar panel context
### Added
- SettingsTab split font controls: separate UI font (sans-serif options: System Sans-Serif, Inter, Roboto, Open Sans, Lato, Noto Sans, Source Sans 3, IBM Plex Sans, Ubuntu) and Terminal font (monospace options: JetBrains Mono, Fira Code, Cascadia Code, Source Code Pro, IBM Plex Mono, Hack, Inconsolata, Ubuntu Mono, monospace), each with custom themed dropdown + size stepper (8-24px), font previews in own typeface
- `--term-font-family` and `--term-font-size` CSS custom properties in catppuccin.css (defaults: JetBrains Mono fallback chain, 13px)
- Deep Dark theme group: 6 new themes (Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper, Midnight) — total 17 themes across 3 groups (Catppuccin, Editor, Deep Dark). Midnight is pure OLED black (#000000), Ayu Dark near-black (#0b0e14), Vesper warm dark (#101010)
- Multi-theme system: 7 new editor themes (VSCode Dark+, Atom One Dark, Monokai, Dracula, Nord, Solarized Dark, GitHub Dark) alongside 4 Catppuccin flavors
- `ThemeId` union type, `ThemePalette` (26-color interface), `ThemeMeta` (id/label/group/isDark), `THEME_LIST` registry with group metadata, `ALL_THEME_IDS` for validation
- Theme store `getCurrentTheme()`/`setTheme()` as primary API; deprecated `getCurrentFlavor()`/`setFlavor()` wrappers for backwards compat
- SettingsTab custom themed dropdown for theme selection: color swatches (base color per theme), 4 accent color dots (red/green/blue/yellow), grouped sections (Catppuccin/Editor/Deep Dark) with styled headers, click-outside and Escape to close
- SettingsTab global settings section: theme selector, UI font dropdown (sans-serif options), Terminal font dropdown (monospace options), each with size stepper (8-24px), default shell input, default CWD input — all custom themed dropdowns (no native `<select>`), all persisted via settings-bridge
- Typography CSS custom properties (`--ui-font-family`, `--ui-font-size`, `--term-font-family`, `--term-font-size`) in catppuccin.css with defaults; consumed by app.css body rule
- `initTheme()` now restores 4 saved font settings (ui_font_family, ui_font_size, term_font_family, term_font_size) from SQLite on startup alongside theme restoration
- v3 Mission Control (All Phases 1-10 complete): multi-project dashboard with project groups, per-project Claude sessions, team agents panel, terminal tabs, 3 workspace tabs (Sessions/Docs/Context) + settings drawer
- v3 session continuity (P6): `persistSessionForProject()` saves agent state + messages to SQLite on session complete; `registerSessionProject()` maps session to project; `ClaudeSession.restoreMessagesFromRecords()` restores cached messages on mount
- v3 workspace teardown (P7): `clearAllAgentSessions()` clears agent sessions on group switch; terminal tabs reset via `switchGroup()`
- v3 data model: `groups.rs` (Rust structs + load/save `~/.config/bterminal/groups.json`), `groups.ts` (TypeScript interfaces), `groups-bridge.ts` (IPC adapter), `--group` CLI argument
- v3 workspace store (`workspace.svelte.ts`): replaces `layout.svelte.ts`, manages groups/activeGroupId/activeTab/focusedProjectId with Svelte 5 runes
- v3 SQLite migrations: `agent_messages` table (per-project message persistence), `project_agent_state` table (sdkSessionId/cost/status per project), `project_id` column on sessions
- 12 new Workspace components: GlobalTabBar, ProjectGrid, ProjectBox, ProjectHeader, ClaudeSession, TeamAgentsPanel, AgentCard, TerminalTabs, CommandPalette, DocsTab, ContextTab, SettingsTab
- v3 App.svelte full rewrite: GlobalTabBar + tab content area + StatusBar (no sidebar, no TilingGrid)
- 24 new vitest tests for workspace store, 7 new cargo tests for groups (total: 138 vitest + 36 cargo)
- v3 adversarial architecture review: 3 agents (Architect, Devil's Advocate, UX+Performance Specialist), 12 issues identified and resolved
- v3 Mission Control redesign planning: architecture docs (`docs/architecture.md`, `docs/decisions.md`, `docs/findings.md`), codebase reuse analysis
- Claude profile/account switching: `claude_list_profiles()` reads `~/.config/switcher/profiles/` directories with `profile.toml` metadata (email, subscription_type, display_name); profile selector dropdown in AgentPane toolbar when multiple profiles available; selected profile's `config_dir` passed as `CLAUDE_CONFIG_DIR` env override to SDK
- Skill discovery and autocomplete: `claude_list_skills()` reads `~/.claude/skills/` (directories with `SKILL.md` or standalone `.md` files); type `/` in agent prompt textarea to trigger autocomplete menu with arrow key navigation, Tab/Enter selection, Escape dismiss; `expandSkillPrompt()` reads skill content and injects as prompt
- New frontend adapter `claude-bridge.ts`: `ClaudeProfile` and `ClaudeSkill` interfaces, `listProfiles()`, `listSkills()`, `readSkill()` IPC wrappers
- AgentPane session toolbar: editable working directory input, profile/account selector (shown when >1 profile), all rendered above prompt form
- Extended `AgentQueryOptions` with 5 new fields across full stack (Rust struct, sidecar JSON, SDK options): `setting_sources` (defaults to `['user', 'project']`), `system_prompt`, `model`, `claude_config_dir`, `additional_directories`
- 4 new Tauri commands: `claude_list_profiles`, `claude_list_skills`, `claude_read_skill`, `pick_directory`
- Claude CLI path auto-detection: `findClaudeCli()` in both sidecar runners checks common paths (~/.local/bin/claude, ~/.claude/local/claude, /usr/local/bin/claude, /usr/bin/claude) then falls back to `which`/`where`; resolved path passed to SDK via `pathToClaudeCodeExecutable` option
- Early error reporting when Claude CLI is not found — sidecar emits `agent_error` immediately instead of cryptic SDK failure
### Changed
- SettingsTab global settings restructured to single-column layout with labels above controls, split into "Appearance" (theme, UI font, terminal font) and "Defaults" (shell, CWD) subsections; all native `<select>` replaced with custom themed dropdowns
- Font setting keys changed from `font_family`/`font_size` to `ui_font_family`/`ui_font_size` + `term_font_family`/`term_font_size`; UI font fallback changed from monospace to sans-serif
- `app.css` body font-family and font-size now use CSS custom properties (`var(--ui-font-family)`, `var(--ui-font-size)`) instead of hardcoded values
- Theme system generalized from Catppuccin-only to multi-theme: all 17 themes map to same `--ctp-*` CSS custom properties (26 vars) — zero component-level changes needed
- `CatppuccinFlavor` type deprecated in favor of `ThemeId`; `CatppuccinPalette` deprecated in favor of `ThemePalette`; `FLAVOR_LABELS` and `ALL_FLAVORS` deprecated in favor of `THEME_LIST` and `ALL_THEME_IDS`
### Fixed
- SettingsTab theme dropdown sizing: set `min-width: 180px` on trigger container, `min-width: 280px` and `max-height: 400px` on dropdown menu, `white-space: nowrap` on option labels to prevent text truncation
- SettingsTab input overflow: added `min-width: 0` on `.setting-row` to prevent flex children from overflowing container
- SettingsTab a11y: project field labels changed from `<div><label>` to wrapping `<label><span class="field-label">` pattern for proper label/input association
- SettingsTab CSS: removed unused `.project-field label` selector, simplified input selector to `.project-field input:not([type="checkbox"])`
### Removed
- Dead `update_ssh_session()` method from session.rs and its unit test (method was unused after SSH CRUD refactoring)
- Stale TilingGrid reference in AgentPane.svelte comment (TilingGrid was deleted in v3 P10)
### Changed
- StatusBar rewritten for v3 workspace store: shows active group name, project count, agent count instead of pane counts; version label updated to "BTerminal v3"
- Agent dispatcher subagent routing: project-scoped sessions skip layout pane creation (subagents render in TeamAgentsPanel instead); detached mode still creates layout pane
- AgentPane `cwd` prop renamed to `initialCwd` — now editable via text input in session toolbar instead of fixed prop
### Removed
- Dead v2 components deleted in P10 (~1,836 lines): `TilingGrid.svelte` (328), `PaneContainer.svelte` (113), `PaneHeader.svelte` (44), `SessionList.svelte` (374), `SshSessionList.svelte` (263), `SshDialog.svelte` (281), `SettingsDialog.svelte` (433)
- Empty component directories removed: `Layout/`, `Sidebar/`, `Settings/`, `SSH/`
- Sidecar runners now pass `settingSources` (defaults to `['user', 'project']`), `systemPrompt`, `model`, and `additionalDirectories` to SDK `query()` options
- Sidecar runners inject `CLAUDE_CONFIG_DIR` into clean env when `claudeConfigDir` provided in query message (multi-account support)
### Fixed
- AgentPane Svelte 5 event modifier syntax: `on:click` changed to `onclick` (Svelte 5 requires lowercase event handler attributes, not colon syntax)
- CLAUDE* env var stripping now applied at Rust level in SidecarManager (bterminal-core/src/sidecar.rs): `env_clear()` + `envs(clean_env)` strips all CLAUDE-prefixed vars before spawning sidecar process, providing primary defense against nesting detection (JS-side stripping retained as defense-in-depth)
### Changed
- Sidecar resolution unified: single pre-built `agent-runner.mjs` bundle replaces separate `agent-runner-deno.ts` + `agent-runner.ts` lookup; same `.mjs` file runs under both Deno and Node.js
- `resolve_sidecar_command()` in sidecar.rs now checks deno/node availability upfront before searching paths, improved error message with runtime availability note
- Removed `agent-runner-deno.ts` from tauri.conf.json bundled resources (only `dist/agent-runner.mjs` shipped)
### Added
- `@anthropic-ai/claude-agent-sdk` ^0.2.70 npm dependency for sidecar agent session management
- `build:sidecar` npm script for esbuild bundling of agent-runner.ts (SDK bundled in, no external dependency at runtime)
- `permission_mode` field in AgentQueryOptions (Rust, TypeScript) — flows from controller through sidecar to SDK, defaults to 'bypassPermissions', supports 'default' mode
### Changed
- Sidecar agent runners migrated from raw `claude` CLI spawning (`child_process.spawn`/`Deno.Command`) to `@anthropic-ai/claude-agent-sdk` query() function — fixes silent hang when CLI spawned with piped stdio (known bug github.com/anthropics/claude-code/issues/6775)
- agent-runner.ts: sessions now use `{ query: Query, controller: AbortController }` map instead of `ChildProcess` map; stop uses `controller.abort()` instead of `child.kill()`
- agent-runner-deno.ts: sessions now use `AbortController` map; uses `npm:@anthropic-ai/claude-agent-sdk` import specifier
- Deno sidecar permissions expanded: added `--allow-write` and `--allow-net` flags in sidecar.rs (required by SDK)
- CLAUDE* env var stripping now passes clean env via SDK's `env` option in query() instead of filtering process.env before spawn
- SDK permissionMode and allowDangerouslySkipPermissions now dynamically set based on permission_mode option (was hardcoded to bypassPermissions)
- build:sidecar esbuild command no longer uses --external for SDK (SDK bundled into output)
### Fixed
- AgentPane onDestroy no longer kills running agent sessions on component remount — stopAgent() moved from AgentPane.svelte onDestroy to TilingGrid.svelte onClose handler, ensuring agents only stop on explicit user close action
### Previously Added
- Exponential backoff reconnection in RemoteManager: on disconnect, spawns async task with 1s/2s/4s/8s/16s/30s-cap backoff, uses attempt_tcp_probe() (TCP-only, no WS upgrade, 5s timeout, default port 9750), emits remote-machine-reconnecting and remote-machine-reconnect-ready events
- Frontend reconnection listeners: onRemoteMachineReconnecting and onRemoteMachineReconnectReady in remote-bridge.ts; machines store sets status to 'reconnecting' and auto-calls connectMachine() on ready
- Relay command response propagation: bterminal-relay now sends structured responses (pty_created, pong, error) back to client via shared event channel with commandId correlation
- send_error() helper in bterminal-relay for consistent error reporting across all command handlers
- PTY creation confirmation flow: pty_create command returns pty_created event with session ID and commandId; RemoteManager emits remote-pty-created Tauri event
- bterminal-core shared crate with EventSink trait: extracted PtyManager and SidecarManager into reusable crate at v2/bterminal-core/, EventSink trait abstracts event emission for both Tauri and WebSocket contexts
- bterminal-relay WebSocket server binary: standalone Rust binary at v2/bterminal-relay/ with token auth (--port, --token, --insecure CLI flags), rate limiting (10 attempts, 5min lockout), per-connection isolated PTY + sidecar managers
- RemoteManager for multi-machine WebSocket connections: v2/src-tauri/src/remote.rs manages WebSocket client connections to relay instances, 12 new Tauri commands for remote operations, heartbeat ping every 15s
- Remote machine management UI in settings: SettingsDialog "Remote Machines" section for add/remove/connect/disconnect
- Auto-grouping of remote panes in sidebar: remote panes auto-grouped by machine label in SessionList
- remote-bridge.ts adapter for remote machine IPC operations
- machines.svelte.ts store for remote machine state management (Svelte 5 runes)
- Pane.remoteMachineId field in layout store for local vs remote routing
- TauriEventSink (event_sink.rs) implementing EventSink trait for Tauri AppHandle
- Multi-machine support architecture design (`docs/multi-machine.md`): WebSocket NDJSON protocol, pre-shared token + TLS auth, autonomous relay model
- Subagent cost aggregation: getTotalCost() recursive helper in agents store aggregates cost across parent + all child sessions; total cost displayed in parent pane done-bar when children present
- 10 new subagent routing tests in agent-dispatcher.test.ts: spawn, dedup, child message routing, init/cost forwarding, fallbacks (28 total dispatcher tests, 114 vitest tests overall)
- TAURI_SIGNING_PRIVATE_KEY secret set in GitHub repo for auto-update signing
- Agent teams/subagent support (Phase 7): auto-detects subagent tool calls ('Agent', 'Task', 'dispatch_agent'), spawns child agent panes with parent/child navigation, routes messages via parentId field
- Agent store parent/child hierarchy: AgentSession extended with parentSessionId, parentToolUseId, childSessionIds; findChildByToolUseId() and getChildSessions() query functions
- AgentPane parent link bar: SUB badge with navigate-to-parent button for subagent panes
- AgentPane children bar: clickable chips per child subagent with status-colored indicators (running/done/error)
- SessionList subagent icon: subagent panes show '↳' instead of '*' in sidebar
- Session groups/folders: group_name column in sessions table, setPaneGroup in layout store, collapsible group headers in sidebar with arrow/count, right-click pane to set group
- Auto-update signing key: generated minisign keypair, pubkey configured in tauri.conf.json updater section
- Deno-first sidecar: SidecarCommand struct in sidecar.rs, resolve_sidecar_command() prefers Deno (runs TS directly) with Node.js fallback, both runners bundled via tauri.conf.json resources
- Vitest integration tests: layout.test.ts (30 tests), agent-bridge.test.ts (11 tests), agent-dispatcher.test.ts (28 tests) — total 114 vitest tests passing
- E2E test scaffold: v2/tests/e2e/README.md documenting WebDriver approach
- Terminal copy/paste: Ctrl+Shift+C copies selection, Ctrl+Shift+V pastes from clipboard to PTY (TerminalPane.svelte)
- Terminal theme hot-swap: onThemeChange() callback registry in theme.svelte.ts, open terminals update immediately when flavor changes
- Agent tree node click: clicking a tree node scrolls to the corresponding message in the agent pane (scrollIntoView smooth)
- Agent tree subtree cost: cumulative cost displayed in yellow below each tree node label (subtreeCost utility)
- Agent session resume: follow-up prompt input after session completes or errors, passes resume_session_id to SDK
- Pane drag-resize handles: splitter overlays in TilingGrid with mouse drag, supports 2-col/3-col/2-row layouts with 10-90% ratio clamping
- Auto-update CI workflow: release.yml generates latest.json with version, platform URL, and signature from .sig file; uploads as release artifact
- Deno sidecar proof-of-concept: agent-runner-deno.ts with same NDJSON protocol, compiles to single binary via deno compile
- Vitest test suite: sdk-messages.test.ts (SDK message adapter) and agent-tree.test.ts (tree builder/cost), vite.config.ts test config, npm run test script
- Cargo test suite: session.rs tests (SessionDb CRUD for sessions, SSH sessions, settings, layout) and ctx.rs tests (CtxDb error handling with missing database)
- tempfile dev dependency for Rust test isolation
### Fixed
- Sidecar env var leak: both agent-runner.ts and agent-runner-deno.ts now strip ALL `CLAUDE*` prefixed env vars before spawning the claude CLI, preventing silent hangs when BTerminal is launched from within a Claude Code terminal session (previously only CLAUDECODE was removed)
### Changed
- RemoteManager reconnection probe refactored from attempt_ws_connect() (full WS handshake + auth) to attempt_tcp_probe() (TCP-only connect, no resource allocation on relay)
- bterminal-relay command handlers refactored: all error paths now use send_error() helper instead of log::error!() only; pong response sent via event channel instead of no-op
- RemoteManager disconnect handler: scoped mutex release before event emission to prevent deadlocks; spawns reconnection task
- PtyManager and SidecarManager extracted from src-tauri to bterminal-core shared crate (src-tauri now has thin re-export wrappers)
- Cargo workspace structure at v2/ level: members = [src-tauri, bterminal-core, bterminal-relay], Cargo.lock moved from src-tauri/ to workspace root
- agent-bridge.ts and pty-bridge.ts extended with remote routing (check remoteMachineId, route to remote_* commands)
- Agent dispatcher refactored to split messages: parentId-bearing messages routed to child panes via toolUseToChildPane Map, main session messages stay in parent
- Agent store createAgentSession() now accepts optional parent parameter for registering bidirectional parent/child links
- Agent store removeAgentSession() cleans up parent's childSessionIds on removal
- Sidecar manager refactored from Node.js-only to Deno-first with Node.js fallback (SidecarCommand abstraction)
- Session struct: added group_name field with serde default
- SessionDb: added update_group method, list/save queries updated for group_name column
- SessionList sidebar: uses Svelte 5 snippets for grouped pane rendering with collapsible headers
- Agent tree NODE_H increased from 32 to 40 to accommodate subtree cost display
- release.yml build step now passes TAURI_SIGNING_PRIVATE_KEY and PASSWORD env vars from secrets
- release.yml uploads latest.json alongside .deb and .AppImage artifacts
- vitest ^4.0.18 added as npm dev dependency
### Previously Added
- SSH session management: SshSession CRUD in SQLite, SshDialog create/edit modal, SshSessionList grouped by folder with color dots, SSH pane type routing to TerminalPane with shell=/usr/bin/ssh (Phase 5)
- ctx context database integration: read-only CtxDb (Rust, SQLITE_OPEN_READ_ONLY), ContextPane with project selector, tabs for entries/summaries/search, ctx-bridge adapter (Phase 5)
- Catppuccin theme flavors: all 4 palettes (Latte/Frappe/Macchiato/Mocha) selectable via Settings dialog, theme.svelte.ts reactive store with SQLite persistence, TerminalPane theme-aware (Phase 5)
- Detached pane mode: pop-out terminal/agent panes into standalone windows via URL params (?detached=1), detach.ts utility, App.svelte conditional rendering (Phase 5)
- Shiki syntax highlighting: lazy singleton highlighter with catppuccin-mocha theme, 13 preloaded languages, integrated in MarkdownPane and AgentPane text messages (Phase 5)
- Tauri auto-updater plugin: tauri-plugin-updater (Rust + npm) + updater.ts frontend utility (Phase 6)
- Markdown rendering in agent text messages with Shiki code highlighting (Phase 5)
- Build-from-source installer `install-v2.sh` with 6-step dependency checking (Node.js 20+, Rust 1.77+, WebKit2GTK, GTK3, and 8 other system libraries), auto-install via apt, binary install to `~/.local/bin/bterminal-v2` with desktop entry (Phase 6)
- Tauri bundle configuration for .deb and AppImage targets with category, descriptions, and deb dependencies (Phase 6)
- GitHub Actions release workflow (`.github/workflows/release.yml`): triggered on `v*` tags, builds on Ubuntu 22.04 with Rust/npm caching, uploads .deb + AppImage as GitHub Release artifacts (Phase 6)
- Regenerated application icons from `bterminal.svg` as RGBA PNGs (32x32, 128x128, 256x256, 512x512, .ico) (Phase 6)
- Agent tree visualization: SVG tree of tool calls with horizontal layout, bezier edges, status-colored nodes (AgentTree.svelte + agent-tree.ts) (Phase 5)
- Global status bar showing terminal/agent pane counts, active agents with pulse animation, total tokens and cost (StatusBar.svelte) (Phase 5)
- Toast notification system with auto-dismiss (4s), max 5 visible, color-coded by type (notifications.svelte.ts + ToastContainer.svelte) (Phase 5)
- Agent dispatcher toast integration: notifications on agent complete, error, and sidecar crash (Phase 5)
- Settings dialog with default shell, working directory, and max panes configuration (SettingsDialog.svelte) (Phase 5)
- Settings persistence: key-value settings table in SQLite, Tauri commands settings_get/set/list, settings-bridge.ts adapter (Phase 5)
- Keyboard shortcuts: Ctrl+W close focused pane, Ctrl+, open settings dialog (Phase 5)
- SQLite session persistence with rusqlite (bundled, WAL mode) — sessions table + layout_state singleton (Phase 4)
- Session CRUD: save, delete, update_title, touch with 7 Tauri commands (Phase 4)
- Layout restore on app startup — panes and preset restored from database (Phase 4)
- File watcher backend using notify crate v6 — watches files, emits Tauri events on change (Phase 4)
- MarkdownPane component with marked.js rendering, Catppuccin-themed styles, and live reload (Phase 4)
- Sidebar "M" button for opening markdown/text files via file picker (Phase 4)
- Session bridge adapter for Tauri IPC (session + layout persistence wrappers) (Phase 4)
- File bridge adapter for Tauri IPC (watch, unwatch, read, onChange wrappers) (Phase 4)
- Sidecar crash detection — dispatcher listens for process exit, marks running sessions as error (Phase 3 polish)
- Sidecar restart UI — "Restart Sidecar" button in AgentPane error bar (Phase 3 polish)
- Auto-scroll lock — disables auto-scroll when user scrolls up, shows "Scroll to bottom" button (Phase 3 polish)
- Agent restart Tauri command (agent_restart) (Phase 3 polish)
- Agent pane with prompt input, structured message rendering, stop button, and cost display (Phase 3)
### Fixed
- Svelte 5 rune stores (layout, agents, sessions) renamed from `.ts` to `.svelte.ts` — runes only work in `.svelte` and `.svelte.ts` files, plain `.ts` caused "rune_outside_svelte" runtime error (blank screen)
- Updated all import paths to use `.svelte` suffix for store modules
- Node.js sidecar manager (Rust) for spawning and communicating with agent-runner via stdio NDJSON (Phase 3)
- Agent-runner sidecar: spawns `claude` CLI with `--output-format stream-json` for structured agent output (Phase 3)
- SDK message adapter parsing stream-json into 9 typed message types: init, text, thinking, tool_call, tool_result, status, cost, error, unknown (Phase 3)
- Agent bridge adapter for Tauri IPC (invoke + event listeners) (Phase 3)
- Agent dispatcher routing sidecar events to agent session store (Phase 3)
- Agent session store with message history, cost tracking, and lifecycle management (Phase 3)
- Keyboard shortcut: Ctrl+Shift+N to open new agent pane (Phase 3)
- Sidebar button for creating new agent sessions (Phase 3)
- Rust PTY backend with portable-pty: spawn, write, resize, kill with Tauri event streaming (Phase 2)
- xterm.js terminal pane with Canvas addon, FitAddon, and Catppuccin Mocha theme (Phase 2)
- CSS Grid tiling layout with 5 presets: 1-col, 2-col, 3-col, 2x2, master-stack (Phase 2)
- Layout store with Svelte 5 $state runes and auto-preset selection (Phase 2)
- Sidebar with session list, layout preset selector, and new terminal button (Phase 2)
- Keyboard shortcuts: Ctrl+N new terminal, Ctrl+1-4 focus pane (Phase 2)
- PTY bridge adapter for Tauri IPC (invoke + event listeners) (Phase 2)
- PaneContainer component with header bar, status indicator, and close button (Phase 2)
- Terminal resize handling with ResizeObserver and 100ms debounce (Phase 2)
- v2 project scaffolding: Tauri 2.x + Svelte 5 in `v2/` directory (Phase 1)
- Rust backend stubs: main.rs, lib.rs, pty.rs, sidecar.rs, watcher.rs, session.rs (Phase 1)
- Svelte frontend with Catppuccin Mocha CSS variables and component structure (Phase 1)
- Node.js sidecar scaffold with NDJSON communication pattern (Phase 1)
- v2 architecture planning: Tauri 2.x + Svelte 5 + Claude Agent SDK via Node.js sidecar
- Research documentation covering Agent SDK, xterm.js performance, Tauri ecosystem, and ultrawide layout patterns
- Phased implementation plan (6 phases, MVP = Phases 1-4)
- Error handling and testing strategy for v2
- Documentation structure in `docs/` (task_plan, phases, findings, progress)
- 17 operational rules in `.claude/rules/`
- TODO.md for tracking active work
- `.claude/CLAUDE.md` behavioral guide for Claude sessions
- VS Code workspace configuration with Peacock color

243
CLAUDE.md
View file

@ -1,38 +1,227 @@
# agent_orchestrator
# BTerminal — Project Guide for Claude
On session start, load context:
```bash
ctx get agent_orchestrator
```
## Project Overview
Context manager: `ctx --help`
Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (Web Worker sandbox, 26 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, SPKI certificate pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, SidecarManager actor pattern (mpsc), per-message btmsg acknowledgment (seen_messages), Aider autonomous mode toggle.
During work:
- Save important discoveries: `ctx set agent_orchestrator <key> <value>`
- Append to existing: `ctx append agent_orchestrator <key> <value>`
- Before ending session: `ctx summary agent_orchestrator "<what was done>"`
- **Repository:** github.com/DexterFromLab/BTerminal
- **License:** MIT
- **Primary target:** Linux x86_64
## External AI consultation (OpenRouter)
## Documentation (SOURCE OF TRUTH)
Consult other models (GPT, Gemini, DeepSeek, etc.) for code review, cross-checks, or analysis:
```bash
consult "question" # ask default model
consult -m model_id "question" # ask specific model
consult -f file.py "review this code" # include file
consult # show available models
```
**All project documentation lives in [`docs/`](docs/README.md). This is the single source of truth for this project.** Before making changes, consult the docs. After making changes, update the docs. No exceptions.
## Task management (CLI tool)
## Key Paths
IMPORTANT: Use the `tasks` CLI tool via Bash — NOT the built-in TaskCreate/TaskUpdate/TaskList tools.
The built-in task tools are a different system. Always use `tasks` in Bash.
| Path | Description |
|------|-------------|
| `bterminal.py` | v1 main application (2092 lines, GTK3+VTE) |
| `ctx` | Context manager CLI tool (SQLite-based) |
| `install.sh` | v1 system installer |
| `install-v2.sh` | v2 build-from-source installer (Node.js 20+, Rust 1.77+, system libs) |
| `.github/workflows/release.yml` | CI: builds .deb + AppImage on v* tags, uploads to GitHub Releases |
| `docs/architecture.md` | End-to-end system architecture, data model, layout system |
| `docs/decisions.md` | Architecture decisions log with rationale and dates |
| `docs/phases.md` | v2 implementation phases (1-7 + multi-machine A-D) |
| `docs/findings.md` | All research findings (v2 + v3 combined) |
| `docs/progress/` | Session progress logs (v2, v3, archive) |
| `docs/multi-machine.md` | Multi-machine architecture (implemented, Phases A-D) |
| `docs/release-notes.md` | v3.0 release notes |
| `docs/e2e-testing.md` | E2E testing facility: fixtures, test mode, LLM judge, spec phases, CI |
| `Cargo.toml` | Cargo workspace root (members: src-tauri, bterminal-core, bterminal-relay) |
| `bterminal-core/` | Shared crate: EventSink trait, PtyManager, SidecarManager |
| `bterminal-relay/` | Standalone relay binary (WebSocket server, token auth, CLI) |
| `src-tauri/src/pty.rs` | PTY backend (thin re-export from bterminal-core) |
| `src-tauri/src/groups.rs` | Groups config (load/save ~/.config/bterminal/groups.json) |
| `src-tauri/src/fs_watcher.rs` | ProjectFsWatcher (inotify per-project recursive file change detection, S-1 Phase 2) |
| `src-tauri/src/lib.rs` | AppState + setup + handler registration (~170 lines) |
| `src-tauri/src/commands/` | 16 domain command modules (pty, agent, watcher, session, persistence, knowledge, claude, groups, files, remote, misc, bttask, notifications, plugins, search, secrets) |
| `src-tauri/src/btmsg.rs` | Agent messaging backend (agents, DMs, channels, contacts ACL, heartbeats, dead_letter_queue, audit_log; SQLite WAL mode, named column access) |
| `src-tauri/src/bttask.rs` | Task board backend (list, create, update status with optimistic locking, delete, comments, review_queue_count; shared btmsg.db) |
| `src-tauri/src/search.rs` | FTS5 full-text search (SearchDb, 3 virtual tables: search_messages/tasks/btmsg, index/search/rebuild) |
| `src-tauri/src/secrets.rs` | SecretsManager (keyring crate, linux-native/libsecret, store/get/delete/list with metadata tracking) |
| `src-tauri/src/plugins.rs` | Plugin discovery (scan config dir for plugin.json, path-traversal-safe file reading, permission validation) |
| `src-tauri/src/notifications.rs` | Desktop notifications (notify-rust, graceful fallback if daemon unavailable) |
| `bterminal-core/src/supervisor.rs` | SidecarSupervisor (auto-restart, exponential backoff 1s-30s, 5 retries, SidecarHealth enum, 17 tests) |
| `bterminal-core/src/sandbox.rs` | Landlock sandbox (SandboxConfig RW/RO paths, pre_exec() integration, kernel 6.2+ graceful fallback) |
| `src-tauri/src/sidecar.rs` | SidecarManager (thin re-export from bterminal-core) |
| `src-tauri/src/event_sink.rs` | TauriEventSink (implements EventSink for AppHandle) |
| `src-tauri/src/remote.rs` | RemoteManager (WebSocket client connections to relays) |
| `src-tauri/src/session/` | SessionDb module: mod.rs (struct + migrate), sessions.rs, layout.rs, settings.rs, ssh.rs, agents.rs, metrics.rs, anchors.rs |
| `src-tauri/src/watcher.rs` | FileWatcherManager (notify crate, file change events) |
| `src-tauri/src/ctx.rs` | CtxDb (read-only access to ~/.claude-context/context.db) |
| `src-tauri/src/memora.rs` | MemoraDb (read-only access to ~/.local/share/memora/memories.db, FTS5 search) |
| `src-tauri/src/telemetry.rs` | OTEL telemetry (TelemetryGuard, tracing + OTLP export, BTERMINAL_OTLP_ENDPOINT) |
| `src/lib/stores/workspace.svelte.ts` | v3 workspace store (project groups, tabs, focus, replaces layout store) |
| `src/lib/stores/layout.svelte.ts` | v2 layout store (panes, presets, groups, persistence, Svelte 5 runes) |
| `src/lib/stores/agents.svelte.ts` | Agent session store (messages, cost, parent/child hierarchy) |
| `src/lib/components/Terminal/TerminalPane.svelte` | xterm.js terminal pane |
| `src/lib/components/Terminal/AgentPreviewPane.svelte` | Read-only xterm.js showing agent activity (Bash commands, tool results, errors) |
| `src/lib/components/Agent/AgentPane.svelte` | Agent session pane (sans-serif font, tool call/result pairing, hook collapsing, context meter, prompt, cost, profile selector, skill autocomplete) |
| `src/lib/adapters/pty-bridge.ts` | PTY IPC wrapper (Tauri invoke/listen) |
| `src/lib/adapters/agent-bridge.ts` | Agent IPC wrapper (Tauri invoke/listen) |
| `src/lib/adapters/claude-messages.ts` | Claude message adapter (stream-json parser, renamed from sdk-messages.ts) |
| `src/lib/adapters/message-adapters.ts` | Provider message adapter registry (per-provider routing to common AgentMessage) |
| `src/lib/adapters/provider-bridge.ts` | Generic provider bridge (delegates to provider-specific bridges) |
| `src/lib/providers/types.ts` | Provider abstraction types (ProviderId, ProviderCapabilities, ProviderMeta, ProviderSettings) |
| `src/lib/providers/registry.svelte.ts` | Svelte 5 rune-based provider registry (registerProvider, getProviders) |
| `src/lib/providers/claude.ts` | Claude provider metadata constant (CLAUDE_PROVIDER) |
| `src/lib/providers/codex.ts` | Codex provider metadata constant (CODEX_PROVIDER, gpt-5.4 default) |
| `src/lib/providers/ollama.ts` | Ollama provider metadata constant (OLLAMA_PROVIDER, qwen3:8b default) |
| `src/lib/adapters/codex-messages.ts` | Codex message adapter (ThreadEvent parser) |
| `src/lib/adapters/ollama-messages.ts` | Ollama message adapter (streaming chunk parser) |
| `src/lib/agent-dispatcher.ts` | Thin coordinator: routes sidecar events to agent store, delegates to extracted modules |
| `src/lib/utils/session-persistence.ts` | Session-project maps + persistSessionForProject + waitForPendingPersistence |
| `src/lib/utils/auto-anchoring.ts` | triggerAutoAnchor on first compaction event |
| `src/lib/utils/subagent-router.ts` | Subagent pane creation + toolUseToChildPane routing |
| `src/lib/utils/worktree-detection.ts` | detectWorktreeFromCwd pure function (3 provider patterns) |
| `src/lib/adapters/file-bridge.ts` | File watcher IPC wrapper |
| `src/lib/adapters/settings-bridge.ts` | Settings IPC wrapper (get/set/list) |
| `src/lib/adapters/ctx-bridge.ts` | ctx database IPC wrapper |
| `src/lib/adapters/ssh-bridge.ts` | SSH session IPC wrapper |
| `src/lib/adapters/claude-bridge.ts` | Claude profiles + skills IPC wrapper |
| `src/lib/adapters/groups-bridge.ts` | Groups config IPC wrapper (load/save) |
| `src/lib/adapters/remote-bridge.ts` | Remote machine management IPC wrapper |
| `src/lib/adapters/files-bridge.ts` | File browser IPC wrapper (list_directory_children, read_file_content) |
| `src/lib/adapters/memory-adapter.ts` | Pluggable memory adapter interface (MemoryAdapter, registry) |
| `src/lib/adapters/memora-bridge.ts` | Memora IPC bridge + MemoraAdapter (read-only SQLite via Tauri commands) |
| `src/lib/adapters/fs-watcher-bridge.ts` | Filesystem watcher IPC wrapper (project CWD write detection) |
| `src/lib/adapters/anchors-bridge.ts` | Session anchors IPC wrapper (save, load, delete, clear, updateType) |
| `src/lib/adapters/bttask-bridge.ts` | Task board IPC adapter (listTasks, createTask, updateTaskStatus, deleteTask, comments) |
| `src/lib/adapters/telemetry-bridge.ts` | Frontend telemetry bridge (routes events to Rust tracing via IPC) |
| `src/lib/utils/agent-prompts.ts` | Agent prompt generator (generateAgentPrompt: identity, env, team, btmsg/bttask docs, workflow) |
| `docker/tempo/` | Docker compose: Tempo + Grafana for trace visualization (port 9715) |
| `scripts/test-all.sh` | Unified test runner: vitest + cargo + optional E2E (--e2e flag) |
| `tests/e2e/wdio.conf.js` | WebDriverIO config (tauri-driver lifecycle, TCP probe, test env vars) |
| `tests/e2e/fixtures.ts` | E2E test fixture generator (isolated temp dirs, git repos, groups.json) |
| `tests/e2e/results-db.ts` | JSON test results store (run/step tracking, no native deps) |
| `tests/e2e/specs/bterminal.test.ts` | E2E smoke tests (CSS class selectors, 50+ tests) |
| `tests/e2e/specs/agent-scenarios.test.ts` | Phase A E2E scenarios (data-testid selectors, 7 scenarios, 22 tests) |
| `tests/e2e/specs/phase-b.test.ts` | Phase B E2E scenarios (multi-project, LLM-judged assertions, 6 scenarios) |
| `tests/e2e/llm-judge.ts` | LLM judge helper (Claude API assertions, confidence thresholds) |
| `.github/workflows/e2e.yml` | CI: unit + cargo + E2E tests (xvfb-run, path-filtered, LLM tests gated on secret) |
| `src/lib/stores/machines.svelte.ts` | Remote machine state store (Svelte 5 runes) |
| `src/lib/utils/attention-scorer.ts` | Pure attention scoring function (extracted from health store, 14 tests) |
| `src/lib/utils/wake-scorer.ts` | Pure wake signal evaluation (6 signals, 24 tests) |
| `src/lib/types/wake.ts` | WakeStrategy, WakeSignal, WakeEvaluation, WakeContext types |
| `src/lib/stores/wake-scheduler.svelte.ts` | Manager auto-wake scheduler (3 strategies, per-manager timers) |
| `src/lib/utils/type-guards.ts` | Shared runtime guards: str(), num() for untyped wire format parsing |
| `src/lib/utils/agent-tree.ts` | Agent tree builder (hierarchy from messages) |
| `src/lib/utils/highlight.ts` | Shiki syntax highlighter (lazy singleton, 13 languages) |
| `src/lib/utils/detach.ts` | Detached pane mode (pop-out windows via URL params) |
| `src/lib/utils/updater.ts` | Tauri auto-updater utility |
| `src/lib/stores/notifications.svelte.ts` | Notification store (toast + history, 6 NotificationTypes, unread badge, max 100 history) |
| `src/lib/stores/plugins.svelte.ts` | Plugin store (command registry, event bus, loadAllPlugins/unloadAllPlugins) |
| `src/lib/adapters/audit-bridge.ts` | Audit log IPC adapter (logAuditEvent, getAuditLog, AuditEntry, AuditEventType) |
| `src/lib/adapters/notifications-bridge.ts` | Desktop notification IPC wrapper (sendDesktopNotification) |
| `src/lib/adapters/plugins-bridge.ts` | Plugin discovery IPC wrapper (discoverPlugins, readPluginFile) |
| `src/lib/adapters/search-bridge.ts` | FTS5 search IPC wrapper (initSearch, searchAll, rebuildIndex, indexMessage) |
| `src/lib/adapters/secrets-bridge.ts` | Secrets IPC wrapper (storeSecret, getSecret, deleteSecret, listSecrets, hasKeyring) |
| `src/lib/utils/error-classifier.ts` | API error classification (6 types: rate_limit/auth/quota/overloaded/network/unknown, retry logic, 20 tests) |
| `src/lib/plugins/plugin-host.ts` | Sandboxed plugin runtime (Web Worker isolation, permission-gated API via postMessage, load/unload lifecycle) |
| `src/lib/components/Agent/UsageMeter.svelte` | Compact inline usage meter (color thresholds 50/75/90%, hover tooltip) |
| `src/lib/components/Notifications/NotificationCenter.svelte` | Bell icon + dropdown notification panel (unread badge, history, mark read/clear) |
| `src/lib/components/Workspace/AuditLogTab.svelte` | Manager audit log tab (filter by type+agent, 5s auto-refresh, max 200 entries) |
| `src/lib/components/Workspace/SearchOverlay.svelte` | FTS5 search overlay (Ctrl+Shift+F, Spotlight-style, 300ms debounce, grouped results) |
| `src/lib/stores/theme.svelte.ts` | Theme store (17 themes: 4 Catppuccin + 7 Editor + 6 Deep Dark, UI + terminal font restoration on startup) |
| `src/lib/styles/themes.ts` | Theme palette definitions (17 themes), ThemeId/ThemePalette/ThemeMeta types, THEME_LIST |
| `src/lib/styles/catppuccin.css` | CSS custom properties: 26 --ctp-* color vars + --ui-font-* + --term-font-* |
| `src/lib/components/Agent/AgentTree.svelte` | SVG agent tree visualization |
| `src/lib/components/Context/ContextPane.svelte` | ctx database viewer (projects, entries, search) — replaced by ContextTab in ProjectBox |
| `src/lib/components/Workspace/ContextTab.svelte` | LLM context window visualization (stats, token meter, file refs, turn breakdown) |
| `src/lib/components/Workspace/CodeEditor.svelte` | CodeMirror 6 wrapper (15 languages, Catppuccin theme, save/blur callbacks) |
| `src/lib/components/Workspace/PdfViewer.svelte` | PDF viewer (pdfjs-dist, canvas multi-page, zoom 0.5x3x, HiDPI) |
| `src/lib/components/Workspace/CsvTable.svelte` | CSV table viewer (RFC 4180 parser, delimiter auto-detect, sortable columns) |
| `src/lib/components/Workspace/MetricsPanel.svelte` | Dashboard metrics panel (live health + task counts + history sparklines, 25 tests) |
| `src/lib/stores/health.svelte.ts` | Project health store (activity state, burn rate, context pressure, file conflicts, attention scoring) |
| `src/lib/stores/conflicts.svelte.ts` | File overlap + external write conflict detection (per-project, session-scoped, worktree-aware, dismissible, inotify-backed) |
| `src/lib/stores/anchors.svelte.ts` | Session anchor store (per-project anchors, auto-anchor tracking, re-injection support) |
| `src/lib/types/anchors.ts` | Anchor types (AnchorType, SessionAnchor, AnchorSettings, AnchorBudgetScale, SessionAnchorRecord) |
| `src/lib/utils/anchor-serializer.ts` | Anchor serialization (turn grouping, observation masking, token estimation) |
| `src/lib/utils/tool-files.ts` | Shared file path extraction from tool_call inputs (extractFilePaths, extractWritePaths, extractWorktreePath) |
| `src/lib/components/StatusBar/StatusBar.svelte` | Mission Control bar (agent states, $/hr burn rate, attention queue, cost) |
| `src/lib/components/Notifications/ToastContainer.svelte` | Toast notification display |
| `src/lib/components/Workspace/` | v3 components: GlobalTabBar, ProjectGrid, ProjectBox, ProjectHeader, AgentSession, TeamAgentsPanel, AgentCard, TerminalTabs, ProjectFiles, FilesTab, SshTab, MemoriesTab, CommandPalette, DocsTab, SettingsTab, TaskBoardTab, ArchitectureTab, TestingTab |
| `src/lib/types/groups.ts` | TypeScript interfaces (ProjectConfig, GroupConfig, GroupsFile) |
| `src/lib/adapters/session-bridge.ts` | Session/layout/group persistence IPC wrapper |
| `src/lib/components/Markdown/MarkdownPane.svelte` | Markdown file viewer (marked.js + shiki, live reload) |
| `sidecar/claude-runner.ts` | Claude sidecar source (compiled to .mjs by esbuild, includes findClaudeCli()) |
| `sidecar/codex-runner.ts` | Codex sidecar source (@openai/codex-sdk dynamic import, sandbox/approval mapping) |
| `sidecar/ollama-runner.ts` | Ollama sidecar source (direct HTTP to localhost:11434, zero external deps) |
| `sidecar/aider-parser.ts` | Aider output parser (pure functions: looksLikePrompt, parseTurnOutput, extractSessionCost, execShell) |
| `sidecar/aider-parser.test.ts` | Vitest tests for Aider parser (72 tests: prompt detection, turn parsing, cost extraction, format-drift canaries) |
| `sidecar/agent-runner-deno.ts` | Standalone Deno sidecar runner (not used by SidecarManager, alternative) |
| `sidecar/dist/claude-runner.mjs` | Bundled Claude sidecar (runs on both Deno and Node.js) |
| `src/lib/adapters/claude-messages.test.ts` | Vitest tests for Claude message adapter (25 tests) |
| `src/lib/adapters/codex-messages.test.ts` | Vitest tests for Codex message adapter (19 tests) |
| `src/lib/adapters/ollama-messages.test.ts` | Vitest tests for Ollama message adapter (11 tests) |
| `src/lib/adapters/memora-bridge.test.ts` | Vitest tests for Memora bridge + adapter (16 tests) |
| `src/lib/adapters/btmsg-bridge.test.ts` | Vitest tests for btmsg bridge (17 tests: camelCase, IPC commands) |
| `src/lib/adapters/bttask-bridge.test.ts` | Vitest tests for bttask bridge (10 tests: camelCase, IPC commands) |
| `src/lib/adapters/agent-bridge.test.ts` | Vitest tests for agent IPC bridge (11 tests) |
| `src/lib/agent-dispatcher.test.ts` | Vitest tests for agent dispatcher (29 tests) |
| `src/lib/stores/conflicts.test.ts` | Vitest tests for conflict detection (28 tests) |
| `src/lib/utils/tool-files.test.ts` | Vitest tests for tool file extraction (27 tests) |
| `src/lib/stores/layout.test.ts` | Vitest tests for layout store (30 tests) |
| `src/lib/utils/agent-tree.test.ts` | Vitest tests for agent tree builder (20 tests) |
| `src/lib/stores/workspace.test.ts` | Vitest tests for workspace store (24 tests) |
## v1 Stack
- Python 3, GTK3 (PyGObject), VTE 2.91
- Config: `~/.config/bterminal/` (sessions.json, claude_sessions.json)
- Context DB: `~/.claude-context/context.db`
- Theme: Catppuccin Mocha
## v2/v3 Stack (v2 complete, v3 All Phases 1-10 complete, branch: v2-mission-control)
- Tauri 2.x (Rust backend) + Svelte 5 (frontend)
- Cargo workspace: bterminal-core (shared), bterminal-relay (remote binary), src-tauri (Tauri app)
- xterm.js with Canvas addon (no WebGL on WebKit2GTK)
- Agent sessions via `@anthropic-ai/claude-agent-sdk` query() function (migrated from raw CLI spawning)
- Sidecar uses SDK internally (single .mjs bundle, Deno-first + Node.js fallback, stdio NDJSON to Rust, auto-detects Claude CLI path via findClaudeCli(), supports CLAUDE_CONFIG_DIR override for multi-account)
- portable-pty for terminal management (in bterminal-core)
- Multi-machine: bterminal-relay WebSocket server + RemoteManager WebSocket client
- SQLite session persistence (rusqlite, WAL mode) + layout restore on startup
- File watcher (notify crate) for live markdown viewer
- OpenTelemetry: tracing + tracing-subscriber + opentelemetry 0.28 + tracing-opentelemetry 0.29, OTLP/HTTP to Tempo, BTERMINAL_OTLP_ENDPOINT env var
- Rust deps (src-tauri): tauri, bterminal-core (path), rusqlite (bundled-full, FTS5), dirs, notify, serde, tokio, tokio-tungstenite, futures-util, tracing, tracing-subscriber, opentelemetry, opentelemetry_sdk, opentelemetry-otlp, tracing-opentelemetry, tauri-plugin-updater, tauri-plugin-dialog, notify-rust, keyring (linux-native)
- Rust deps (bterminal-core): portable-pty, uuid, serde, serde_json, log, landlock
- Rust deps (bterminal-relay): bterminal-core, tokio, tokio-tungstenite, clap, env_logger, futures-util
- npm deps: @anthropic-ai/claude-agent-sdk, @xterm/xterm, @xterm/addon-canvas, @xterm/addon-fit, @tauri-apps/api, @tauri-apps/plugin-updater, @tauri-apps/plugin-dialog, marked, shiki, pdfjs-dist, vitest (dev)
- Source: `` directory
## Build / Run
```bash
tasks list agent_orchestrator # show all tasks
tasks context agent_orchestrator # show tasks + next task instructions
tasks add agent_orchestrator "description" # add a task
tasks done agent_orchestrator <task_id> # mark task as done
tasks --help # full help
# v1 (current production)
./install.sh # Install system-wide
bterminal # Run
# v1 Dependencies (Debian/Ubuntu)
sudo apt install python3-gi gir1.2-gtk-3.0 gir1.2-vte-2.91
# Development
npm install && npm run tauri dev # Dev mode
npm run tauri build # Release build
# Tests
npm run test:all # All tests (vitest + cargo)
npm run test:all:e2e # All tests + E2E (needs built binary)
npm run test # Vitest only (frontend)
npm run test:cargo # Cargo only (backend)
npm run test:e2e # E2E only (WebDriverIO)
# Telemetry stack (Tempo + Grafana)
cd docker/tempo && docker compose up -d # Grafana at http://localhost:9715
BTERMINAL_OTLP_ENDPOINT=http://localhost:4318 npm run tauri dev # Enable OTLP export
```
Do NOT pick up tasks on your own. Only execute tasks when the auto-trigger system sends you a command.
## Conventions
- 17 themes in 3 groups: 4 Catppuccin (Mocha default) + 7 Editor + 6 Deep Dark (Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper, Midnight)
- CSS uses rem/em for layout; px only for icons/borders (see `.claude/rules/18-relative-units.md`)
- Session configs stored as JSON
- Single-file Python app (v1) — will change to multi-file Rust+Svelte (v2)
- Polish language in some code comments (v1 legacy)

160
README.md
View file

@ -1,47 +1,141 @@
# Svelte + TS + Vite
# Agent Orchestrator
This template should help get you started developing with Svelte and TypeScript in Vite.
Multi-project agent dashboard for orchestrating Claude AI teams across multiple codebases simultaneously. Built with Tauri 2.x (Rust) + Svelte 5 + Claude Agent SDK.
## Recommended IDE Setup
![Agent Orchestrator](screenshot.png)
[VS Code](https://code.visualstudio.com/) + [Svelte](https://marketplace.visualstudio.com/items?itemName=svelte.svelte-vscode).
## What it does
## Need an official Svelte framework?
Agent Orchestrator lets you run multiple Claude Code agents in parallel, organized into project groups. Each agent gets its own terminal, file browser, and Claude session. Agents communicate with each other via built-in messaging (btmsg) and coordinate work through a shared task board (bttask).
Check out [SvelteKit](https://github.com/sveltejs/kit#readme), which is also powered by Vite. Deploy anywhere with its serverless-first approach and adapt to various platforms, with out of the box support for TypeScript, SCSS, and Less, and easily-added support for mdsvex, GraphQL, PostCSS, Tailwind CSS, and more.
## Key Features
## Technical considerations
### Multi-Agent Orchestration
- **Project groups** — up to 5 projects side-by-side, adaptive layout (5 @ultrawide, 3 @1920px)
- **Tier 1 agents** — Manager, Architect, Tester, Reviewer with role-specific tabs and auto-generated system prompts
- **Tier 2 agents** — per-project Claude sessions with custom context
- **btmsg** — inter-agent messaging CLI + UI (Activity Feed, DMs, Channels)
- **bttask** — Kanban task board with role-based visibility (Manager CRUD, others read-only)
- **Auto-wake scheduler** — 3 strategies (persistent, on-demand, smart) with configurable wake signals
**Why use this over SvelteKit?**
### Per-Project Workspace
- **Claude sessions** with session continuity, anchors, and structured output
- **Terminal tabs** — shell, SSH, agent preview per project
- **File browser** — CodeMirror 6 editor (15 languages), PDF viewer, CSV table
- **Docs viewer** — live Markdown with Shiki syntax highlighting
- **Context tab** — LLM context window visualization (token meter, turn breakdown)
- **Metrics panel** — live health, SVG sparkline history, session stats
- It brings its own routing solution which might not be preferable for some users.
- It is first and foremost a framework that just happens to use Vite under the hood, not a Vite app.
### Multi-Provider Support
- **Claude** (primary) — via Agent SDK sidecar
- **Codex** — OpenAI Codex SDK adapter
- **Ollama** — local models via native fetch
This template contains as little as possible to get started with Vite + TypeScript + Svelte, while taking into account the developer experience with regards to HMR and intellisense. It demonstrates capabilities on par with the other `create-vite` templates and is a good starting point for beginners dipping their toes into a Vite + Svelte project.
### Production Hardening
- **Sidecar supervisor** — crash recovery with exponential backoff
- **Landlock sandbox** — Linux kernel process isolation for sidecar
- **FTS5 search** — full-text search with Spotlight-style overlay (Ctrl+F)
- **Plugin system** — sandboxed runtime with permission gates
- **Secrets management** — system keyring integration
- **Notifications** — OS + in-app notification center
- **Agent health monitoring** — heartbeats, dead letter queue, audit log
- **Optimistic locking** — bttask concurrent access protection
- **Error classification** — 6 error types with auto-retry logic
- **TLS relay** — encrypted WebSocket for remote machines
- **WAL checkpoint** — periodic SQLite maintenance (5min interval)
Should you later need the extended capabilities and extensibility provided by SvelteKit, the template has been structured similarly to SvelteKit so that it is easy to migrate.
### Developer Experience
- **17 themes** — Catppuccin (4), Editor (7), Deep Dark (6)
- **Keyboard-first UX** — Command Palette (Ctrl+K), 18+ commands, vi-style navigation
- **Claude profiles** — per-project account switching
- **Skill discovery** — type `/` in agent prompt for autocomplete
- **ctx integration** — SQLite context database for cross-session memory
**Why `global.d.ts` instead of `compilerOptions.types` inside `jsconfig.json` or `tsconfig.json`?**
### Testing
- **516 vitest** + **159 cargo** + **109 E2E** tests
- **E2E engine** — WebDriverIO + tauri-driver, Phase A/B/C scenarios
- **LLM judge** — dual-mode CLI/API for semantic assertion (claude-haiku)
- **CI** — GitHub Actions with xvfb + LLM-judged test gating
Setting `compilerOptions.types` shuts out all other types not explicitly listed in the configuration. Using triple-slash references keeps the default TypeScript setting of accepting type information from the entire workspace, while also adding `svelte` and `vite/client` type information.
## Architecture
**Why include `.vscode/extensions.json`?**
Other templates indirectly recommend extensions via the README, but this file allows VS Code to prompt the user to install the recommended extension upon opening the project.
**Why enable `allowJs` in the TS template?**
While `allowJs: false` would indeed prevent the use of `.js` files in the project, it does not prevent the use of JavaScript syntax in `.svelte` files. In addition, it would force `checkJs: false`, bringing the worst of both worlds: not being able to guarantee the entire codebase is TypeScript, and also having worse typechecking for the existing JavaScript. In addition, there are valid use cases in which a mixed codebase may be relevant.
**Why is HMR not preserving my local component state?**
HMR state preservation comes with a number of gotchas! It has been disabled by default in both `svelte-hmr` and `@sveltejs/vite-plugin-svelte` due to its often surprising behavior. You can read the details [here](https://github.com/rixo/svelte-hmr#svelte-hmr).
If you have state that's important to retain within a component, consider creating an external store which would not be replaced by HMR.
```ts
// store.ts
// An extremely simple external store
import { writable } from 'svelte/store'
export default writable(0)
```
Agent Orchestrator (Tauri 2.x)
├── Rust backend (src-tauri/)
│ ├── Commands: groups, sessions, btmsg, bttask, search, secrets, plugins, notifications
│ ├── bterminal-core: PtyManager, SidecarManager, EventSink trait
│ └── bterminal-relay: WebSocket server for remote machines (+ TLS)
├── Svelte 5 frontend (src/)
│ ├── Workspace: ProjectGrid, ProjectBox (per-project tabs), StatusBar
│ ├── Stores: workspace, agents, health, conflicts, wake-scheduler, plugins
│ ├── Adapters: claude-bridge, btmsg-bridge, bttask-bridge, groups-bridge
│ └── Agent dispatcher: sidecar event routing, session persistence, auto-anchoring
└── Node.js sidecar (sidecar/)
├── claude-runner.mjs (Agent SDK)
├── codex-runner.mjs (OpenAI Codex)
└── ollama-runner.mjs (local models)
```
## Installation
Requires Node.js 20+, Rust 1.77+, WebKit2GTK 4.1, GTK3.
```bash
git clone https://github.com/DexterFromLab/agent-orchestrator.git
cd agent-orchestrator/v2
npm install
npm run build:sidecar
npm run tauri:dev
```
### Build for distribution
```bash
npm run tauri:build
# Output: .deb + AppImage in target/release/bundle/
```
## Configuration
Config: `~/.config/bterminal/groups.json` — project groups, agents, prompts (human-editable JSON).
Database: `~/.local/share/bterminal/` — sessions.db (sessions, metrics, anchors), btmsg.db (messages, tasks, agents).
## Multi-Machine Support
```
Agent Orchestrator --WebSocket/TLS--> bterminal-relay (Remote Machine)
├── PtyManager (remote terminals)
└── SidecarManager (remote agents)
```
```bash
cd v2 && cargo build --release -p bterminal-relay
./target/release/bterminal-relay --port 9750 --token <secret> --tls-cert cert.pem --tls-key key.pem
```
## Keyboard Shortcuts
| Shortcut | Action |
|----------|--------|
| `Ctrl+K` | Command Palette |
| `Ctrl+M` | Messages (CommsTab) |
| `Ctrl+B` | Toggle sidebar |
| `Ctrl+,` | Settings |
| `Ctrl+F` | Full-text search |
| `Ctrl+1-5` | Focus project by index |
| `Escape` | Close overlay/sidebar |
## Documentation
| Document | Description |
|----------|-------------|
| [docs/decisions.md](docs/decisions.md) | Architecture decisions log |
| [docs/progress/](docs/progress/) | Session progress logs (v2, v3, archive) |
| [docs/release-notes.md](docs/release-notes.md) | v3.0 release notes |
| [docs/e2e-testing.md](docs/e2e-testing.md) | E2E testing facility documentation |
| [docs/multi-machine.md](docs/multi-machine.md) | Multi-machine relay architecture |
## License
MIT

28
TODO.md Normal file
View file

@ -0,0 +1,28 @@
# Agent Orchestrator — TODO
## Multi-Machine (v3.1)
- [ ] **Real-world relay testing** — TLS added, code complete in bridges/stores. Needs 2-machine test to verify relay + RemoteManager end-to-end. Multi-machine UI not yet surfaced in v3 ProjectBox.
- [ ] **SPKI pin persistence** — TOFU pinning implemented (probe_spki_hash + in-memory pin store in RemoteManager), but pins are lost on restart. Persist to groups.json or separate config file.
## Multi-Agent (v3.1)
- [ ] **Agent Teams real-world testing** — Subagent delegation prompt + `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env injection done. Needs real multi-agent session to verify Manager spawns child agents via SDK teams.
## Reliability
- [ ] **Soak test** — Run 4-hour soak with 6+ agents across 3+ projects. Monitor: memory growth, SQLite WAL size, xterm.js instance count, sidecar supervisor restarts.
- [ ] **WebKit2GTK Worker verification** — Verify Web Worker Blob URL approach works in Tauri's WebKit2GTK webview (tested in vitest only so far).
## Completed
- [x] Plugin sandbox migration — new Function() → Web Worker isolation, 26 tests | Done: 2026-03-15
- [x] seen_messages startup pruning — pruneSeen() on app startup, fire-and-forget | Done: 2026-03-15
- [x] Tribunal priorities: Aider security, SidecarManager actor, SPKI pinning, btmsg reliability, Aider tests | Done: 2026-03-14
- [x] Dead code cleanup — 7 warnings resolved, 4 new Tauri commands wired | Done: 2026-03-14
- [x] E2E fixture + judge hardening | Done: 2026-03-12
- [x] LLM judge refactor + E2E docs | Done: 2026-03-12
- [x] v3 Hardening Sprint (TLS, WAL, Landlock, plugin tests, Phase C E2E) | Done: 2026-03-12
- [x] v3 Production Readiness — all 13 tribunal items | Done: 2026-03-12
- [x] Unified test runner + testing gate rule | Done: 2026-03-12
- [x] E2E Phase B + 27 test fixes | Done: 2026-03-12

1268
consult Executable file

File diff suppressed because it is too large Load diff

472
ctx Executable file
View file

@ -0,0 +1,472 @@
#!/usr/bin/env python3
"""
ctx — Cross-session context manager for Claude Code.
Stores project contexts and shared data in SQLite for instant access.
Usage: ctx <command> [args]
"""
import sqlite3
import sys
import json
from pathlib import Path
DB_PATH = Path.home() / ".claude-context" / "context.db"
def get_db():
DB_PATH.parent.mkdir(parents=True, exist_ok=True)
db = sqlite3.connect(str(DB_PATH))
db.row_factory = sqlite3.Row
db.execute("PRAGMA journal_mode=WAL")
return db
def init_db():
db = get_db()
db.executescript("""
CREATE TABLE IF NOT EXISTS sessions (
name TEXT PRIMARY KEY,
description TEXT,
work_dir TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS contexts (
id INTEGER PRIMARY KEY AUTOINCREMENT,
project TEXT NOT NULL,
key TEXT NOT NULL,
value TEXT NOT NULL,
updated_at TEXT DEFAULT (datetime('now')),
UNIQUE(project, key)
);
CREATE TABLE IF NOT EXISTS shared (
key TEXT PRIMARY KEY,
value TEXT NOT NULL,
updated_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS summaries (
id INTEGER PRIMARY KEY AUTOINCREMENT,
project TEXT NOT NULL,
summary TEXT NOT NULL,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE VIRTUAL TABLE IF NOT EXISTS contexts_fts USING fts5(
project, key, value, content=contexts, content_rowid=id
);
CREATE VIRTUAL TABLE IF NOT EXISTS shared_fts USING fts5(
key, value, content=shared
);
-- Triggers to keep FTS in sync
CREATE TRIGGER IF NOT EXISTS contexts_ai AFTER INSERT ON contexts BEGIN
INSERT INTO contexts_fts(rowid, project, key, value)
VALUES (new.id, new.project, new.key, new.value);
END;
CREATE TRIGGER IF NOT EXISTS contexts_ad AFTER DELETE ON contexts BEGIN
INSERT INTO contexts_fts(contexts_fts, rowid, project, key, value)
VALUES ('delete', old.id, old.project, old.key, old.value);
END;
CREATE TRIGGER IF NOT EXISTS contexts_au AFTER UPDATE ON contexts BEGIN
INSERT INTO contexts_fts(contexts_fts, rowid, project, key, value)
VALUES ('delete', old.id, old.project, old.key, old.value);
INSERT INTO contexts_fts(rowid, project, key, value)
VALUES (new.id, new.project, new.key, new.value);
END;
""")
db.close()
# ─── Commands ───────────────────────────────────────────────────────────
def cmd_init(args):
"""Register a new project. Usage: ctx init <project> <description> [work_dir]"""
if len(args) < 2:
print("Usage: ctx init <project> <description> [work_dir]")
sys.exit(1)
name, desc = args[0], args[1]
work_dir = args[2] if len(args) > 2 else None
db = get_db()
db.execute(
"INSERT OR REPLACE INTO sessions (name, description, work_dir) VALUES (?, ?, ?)",
(name, desc, work_dir),
)
db.commit()
db.close()
print(f"Project '{name}' registered.")
def cmd_get(args):
"""Get full context for a project (shared + project-specific + recent summaries)."""
if len(args) < 1:
print("Usage: ctx get <project>")
sys.exit(1)
project = args[0]
db = get_db()
# Session info
session = db.execute("SELECT * FROM sessions WHERE name = ?", (project,)).fetchone()
# Shared context
shared = db.execute("SELECT key, value FROM shared ORDER BY key").fetchall()
# Project context
contexts = db.execute(
"SELECT key, value FROM contexts WHERE project = ? ORDER BY key", (project,)
).fetchall()
# Recent summaries (last 5)
summaries = db.execute(
"SELECT summary, created_at FROM summaries WHERE project = ? ORDER BY created_at DESC LIMIT 5",
(project,),
).fetchall()
# Output
print("=" * 60)
if session:
print(f"PROJECT: {session['name']} — {session['description']}")
if session["work_dir"]:
print(f"DIR: {session['work_dir']}")
else:
print(f"PROJECT: {project} (not registered, use: ctx init)")
print("=" * 60)
if shared:
print("\n--- Shared Context ---")
for row in shared:
print(f"\n[{row['key']}]")
print(row["value"])
if contexts:
print(f"\n--- {project} Context ---")
for row in contexts:
print(f"\n[{row['key']}]")
print(row["value"])
if summaries:
print("\n--- Recent Sessions ---")
for row in reversed(summaries):
print(f"\n[{row['created_at']}]")
print(row["summary"])
if not shared and not contexts and not summaries:
print("\nNo context stored yet. Use 'ctx set' or 'ctx shared set' to add.")
db.close()
def cmd_set(args):
"""Set a project context entry. Usage: ctx set <project> <key> <value>"""
if len(args) < 3:
print("Usage: ctx set <project> <key> <value>")
sys.exit(1)
project, key, value = args[0], args[1], " ".join(args[2:])
db = get_db()
db.execute(
"""INSERT INTO contexts (project, key, value, updated_at)
VALUES (?, ?, ?, datetime('now'))
ON CONFLICT(project, key) DO UPDATE SET value=excluded.value, updated_at=excluded.updated_at""",
(project, key, value),
)
db.commit()
db.close()
print(f"[{project}] {key} = saved.")
def cmd_append(args):
"""Append to an existing context entry. Usage: ctx append <project> <key> <value>"""
if len(args) < 3:
print("Usage: ctx append <project> <key> <value>")
sys.exit(1)
project, key, new_value = args[0], args[1], " ".join(args[2:])
db = get_db()
existing = db.execute(
"SELECT value FROM contexts WHERE project = ? AND key = ?", (project, key)
).fetchone()
if existing:
value = existing["value"] + "\n" + new_value
else:
value = new_value
db.execute(
"""INSERT INTO contexts (project, key, value, updated_at)
VALUES (?, ?, ?, datetime('now'))
ON CONFLICT(project, key) DO UPDATE SET value=excluded.value, updated_at=excluded.updated_at""",
(project, key, value),
)
db.commit()
db.close()
print(f"[{project}] {key} += appended.")
def cmd_shared(args):
"""Manage shared context. Usage: ctx shared get | ctx shared set <key> <value>"""
if len(args) < 1:
print("Usage: ctx shared get | ctx shared set <key> <value>")
sys.exit(1)
if args[0] == "get":
db = get_db()
rows = db.execute("SELECT key, value FROM shared ORDER BY key").fetchall()
db.close()
if rows:
for row in rows:
print(f"\n[{row['key']}]")
print(row["value"])
else:
print("No shared context yet.")
elif args[0] == "set":
if len(args) < 3:
print("Usage: ctx shared set <key> <value>")
sys.exit(1)
key, value = args[1], " ".join(args[2:])
db = get_db()
db.execute(
"""INSERT INTO shared (key, value, updated_at)
VALUES (?, ?, datetime('now'))
ON CONFLICT(key) DO UPDATE SET value=excluded.value, updated_at=excluded.updated_at""",
(key, value),
)
db.commit()
db.close()
print(f"[shared] {key} = saved.")
elif args[0] == "delete":
if len(args) < 2:
print("Usage: ctx shared delete <key>")
sys.exit(1)
db = get_db()
db.execute("DELETE FROM shared WHERE key = ?", (args[1],))
db.commit()
db.close()
print(f"[shared] {args[1]} deleted.")
else:
print("Usage: ctx shared get | ctx shared set <key> <value> | ctx shared delete <key>")
def cmd_summary(args):
"""Save a session summary. Usage: ctx summary <project> <text>"""
if len(args) < 2:
print("Usage: ctx summary <project> <summary text>")
sys.exit(1)
project, summary = args[0], " ".join(args[1:])
db = get_db()
db.execute(
"INSERT INTO summaries (project, summary) VALUES (?, ?)", (project, summary)
)
# Keep last 20 summaries per project
db.execute(
"""DELETE FROM summaries WHERE project = ? AND id NOT IN (
SELECT id FROM summaries WHERE project = ? ORDER BY created_at DESC LIMIT 20
)""",
(project, project),
)
db.commit()
db.close()
print(f"[{project}] Summary saved.")
def cmd_history(args):
"""Show session history. Usage: ctx history <project> [limit]"""
if len(args) < 1:
print("Usage: ctx history <project> [limit]")
sys.exit(1)
project = args[0]
try:
limit = int(args[1]) if len(args) > 1 else 10
except ValueError:
print(f"Error: limit must be an integer, got '{args[1]}'")
sys.exit(1)
db = get_db()
rows = db.execute(
"SELECT summary, created_at FROM summaries WHERE project = ? ORDER BY created_at DESC LIMIT ?",
(project, limit),
).fetchall()
db.close()
if rows:
for row in reversed(rows):
print(f"\n[{row['created_at']}]")
print(row["summary"])
else:
print(f"No history for '{project}'.")
def cmd_search(args):
"""Full-text search across all contexts. Usage: ctx search <query>"""
if len(args) < 1:
print("Usage: ctx search <query>")
sys.exit(1)
query = " ".join(args)
db = get_db()
# Search project contexts (FTS5 MATCH can fail on malformed query syntax)
try:
results_ctx = db.execute(
"SELECT project, key, value FROM contexts_fts WHERE contexts_fts MATCH ?",
(query,),
).fetchall()
except sqlite3.OperationalError:
print(f"Invalid search query: '{query}' (FTS5 syntax error)")
db.close()
sys.exit(1)
# Search shared contexts
try:
results_shared = db.execute(
"SELECT key, value FROM shared_fts WHERE shared_fts MATCH ?", (query,)
).fetchall()
except sqlite3.OperationalError:
results_shared = []
# Search summaries (simple LIKE since no FTS on summaries)
results_sum = db.execute(
"SELECT project, summary, created_at FROM summaries WHERE summary LIKE ?",
(f"%{query}%",),
).fetchall()
db.close()
total = len(results_ctx) + len(results_shared) + len(results_sum)
print(f"Found {total} result(s) for '{query}':\n")
if results_shared:
print("--- Shared ---")
for row in results_shared:
print(f" [{row['key']}] {row['value'][:100]}")
if results_ctx:
print("--- Project Contexts ---")
for row in results_ctx:
print(f" [{row['project']}:{row['key']}] {row['value'][:100]}")
if results_sum:
print("--- Summaries ---")
for row in results_sum:
print(f" [{row['project']} @ {row['created_at']}] {row['summary'][:100]}")
def cmd_list(args):
"""List all registered projects."""
db = get_db()
sessions = db.execute("SELECT * FROM sessions ORDER BY name").fetchall()
# Also find projects with context but no session registration
orphans = db.execute(
"""SELECT DISTINCT project FROM contexts
WHERE project NOT IN (SELECT name FROM sessions)
ORDER BY project"""
).fetchall()
db.close()
if sessions:
print("Registered projects:")
for s in sessions:
ctx_count = _count_contexts(s["name"])
print(f" {s['name']:25s} — {s['description']} ({ctx_count} entries)")
if orphans:
print("\nUnregistered (have context but no init):")
for o in orphans:
print(f" {o['project']}")
if not sessions and not orphans:
print("No projects yet. Use 'ctx init <name> <description>' to start.")
def cmd_delete(args):
"""Delete a project or specific key. Usage: ctx delete <project> [key]"""
if len(args) < 1:
print("Usage: ctx delete <project> [key]")
sys.exit(1)
project = args[0]
db = get_db()
if len(args) >= 2:
key = args[1]
db.execute(
"DELETE FROM contexts WHERE project = ? AND key = ?", (project, key)
)
db.commit()
print(f"[{project}] {key} deleted.")
else:
db.execute("DELETE FROM contexts WHERE project = ?", (project,))
db.execute("DELETE FROM summaries WHERE project = ?", (project,))
db.execute("DELETE FROM sessions WHERE name = ?", (project,))
db.commit()
print(f"Project '{project}' and all its data deleted.")
db.close()
def cmd_export(args):
"""Export all data as JSON. Usage: ctx export"""
db = get_db()
data = {
"sessions": [dict(r) for r in db.execute("SELECT * FROM sessions").fetchall()],
"shared": [dict(r) for r in db.execute("SELECT * FROM shared").fetchall()],
"contexts": [dict(r) for r in db.execute("SELECT * FROM contexts").fetchall()],
"summaries": [dict(r) for r in db.execute("SELECT * FROM summaries").fetchall()],
}
db.close()
print(json.dumps(data, indent=2, ensure_ascii=False))
def _count_contexts(project):
db = get_db()
row = db.execute(
"SELECT COUNT(*) as c FROM contexts WHERE project = ?", (project,)
).fetchone()
db.close()
return row["c"]
# ─── Main ───────────────────────────────────────────────────────────────
COMMANDS = {
"init": cmd_init,
"get": cmd_get,
"set": cmd_set,
"append": cmd_append,
"shared": cmd_shared,
"summary": cmd_summary,
"history": cmd_history,
"search": cmd_search,
"list": cmd_list,
"delete": cmd_delete,
"export": cmd_export,
}
def print_help():
print("ctx — Cross-session context manager for Claude Code\n")
print("Commands:")
print(" init <project> <desc> [dir] Register a new project")
print(" get <project> Load full context (shared + project)")
print(" set <project> <key> <value> Set project context entry")
print(" append <project> <key> <val> Append to existing entry")
print(" shared get|set|delete Manage shared context")
print(" summary <project> <text> Save session work summary")
print(" history <project> [limit] Show session history")
print(" search <query> Full-text search across everything")
print(" list List all projects")
print(" delete <project> [key] Delete project or entry")
print(" export Export all data as JSON")
if __name__ == "__main__":
init_db()
if len(sys.argv) < 2 or sys.argv[1] in ("-h", "--help", "help"):
print_help()
sys.exit(0)
cmd = sys.argv[1]
if cmd not in COMMANDS:
print(f"Unknown command: {cmd}")
print_help()
sys.exit(1)
COMMANDS[cmd](sys.argv[2:])

View file

@ -0,0 +1,27 @@
services:
tempo:
image: grafana/tempo:latest
command: ["-config.file=/etc/tempo.yaml"]
volumes:
- ./tempo.yaml:/etc/tempo.yaml:ro
- tempo-data:/var/tempo
ports:
- "4317:4317" # OTLP gRPC
- "4318:4318" # OTLP HTTP
- "3200:3200" # Tempo query API
grafana:
image: grafana/grafana:latest
environment:
- GF_AUTH_ANONYMOUS_ENABLED=true
- GF_AUTH_ANONYMOUS_ORG_ROLE=Admin
- GF_AUTH_DISABLE_LOGIN_FORM=true
volumes:
- ./grafana-datasources.yaml:/etc/grafana/provisioning/datasources/datasources.yaml:ro
ports:
- "9715:3000" # Grafana UI (project port convention)
depends_on:
- tempo
volumes:
tempo-data:

View file

@ -0,0 +1,11 @@
apiVersion: 1
datasources:
- name: Tempo
type: tempo
access: proxy
url: http://tempo:3200
isDefault: true
jsonData:
tracesToLogsV2:
datasourceUid: ''

19
docker/tempo/tempo.yaml Normal file
View file

@ -0,0 +1,19 @@
server:
http_listen_port: 3200
distributor:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
storage:
trace:
backend: local
local:
path: /var/tempo/traces
wal:
path: /var/tempo/wal

91
docs/README.md Normal file
View file

@ -0,0 +1,91 @@
---
title: "Documentation"
role: part
parent: null
order: 1
description: "Project documentation index"
---
# Agent Orchestrator Documentation
Agent Orchestrator (formerly BTerminal) is a multi-project AI agent orchestration dashboard built with Tauri 2.x, Svelte 5, and the Claude Agent SDK. It transforms a traditional terminal emulator into a mission control for running, monitoring, and coordinating multiple AI agent sessions across multiple codebases simultaneously.
The application has three major version milestones:
- **v1** — A single-file Python GTK3+VTE terminal emulator with Claude Code session management. Production-stable, still shipped as `bterminal`.
- **v2** — A ground-up rewrite using Tauri 2.x (Rust backend) + Svelte 5 (frontend). Multi-pane terminal with structured agent sessions, subagent tree visualization, session persistence, multi-machine relay support, 17 themes, and comprehensive packaging.
- **v3 (Mission Control)** — A further redesign on top of v2's codebase. Replaces the free-form pane grid with a project-group dashboard. Adds multi-agent orchestration (4 management roles), inter-agent messaging (btmsg), task boards (bttask), session anchors, health monitoring, FTS5 search, plugin system, Landlock sandboxing, secrets management, and 704 automated tests.
> **Important:** The `docs/` directory is the single source of truth for this project. Before making changes, consult the docs. After making changes, update the docs.
---
## Documentation Map
### Architecture & Design
| Document | What It Covers |
|----------|---------------|
| [architecture.md](architecture.md) | End-to-end system architecture: Rust backend, Svelte frontend, sidecar layer, data model, layout system, data flow, IPC patterns |
| [decisions.md](decisions.md) | Architecture decisions log: rationale and dates for all major design choices |
| [multi-machine.md](multi-machine.md) | Multi-machine relay architecture: bterminal-core extraction, bterminal-relay binary, RemoteManager, WebSocket protocol, reconnection |
### Subsystem Guides
| Document | What It Covers |
|----------|---------------|
| [sidecar.md](sidecar.md) | Sidecar process lifecycle, multi-provider runners (Claude/Codex/Ollama), env var stripping, CLI detection, NDJSON protocol |
| [orchestration.md](orchestration.md) | Multi-agent orchestration: btmsg messaging, bttask kanban, Tier 1/2 agent roles, wake scheduler, system prompts |
| [production.md](production.md) | Production hardening: sidecar supervisor, Landlock sandbox, FTS5 search, plugin system, secrets management, notifications, health monitoring, audit logging |
| [provider-adapter/](provider-adapter/) | Multi-provider adapter pattern: architecture decisions, coupling analysis, implementation progress |
### Implementation & Progress
| Document | What It Covers |
|----------|---------------|
| [phases.md](phases.md) | v2 implementation phases (1-7 + multi-machine A-D + profiles/skills) with checklists |
| [progress/v3.md](progress/v3.md) | v3 session-by-session progress log (Phases 1-10 + production hardening) |
| [progress/v2.md](progress/v2.md) | v2 session-by-session progress log (recent sessions) |
| [progress/v2-archive.md](progress/v2-archive.md) | Archived v2 progress (2026-03-05 to 2026-03-06 early) |
### Research & Analysis
| Document | What It Covers |
|----------|---------------|
| [findings.md](findings.md) | All research: Claude Agent SDK, Tauri+xterm.js, terminal performance, adversarial review, provider coupling, codebase reuse, session anchors, multi-agent design, theme evolution, performance measurements |
### Release & Testing
| Document | What It Covers |
|----------|---------------|
| [release-notes.md](release-notes.md) | v3.0 release notes: feature summary, breaking changes, test coverage, known limitations |
| [e2e-testing.md](e2e-testing.md) | E2E testing facility: WebDriverIO fixtures, test mode, LLM judge, CI integration, troubleshooting |
---
## Quick Orientation
If you are new to this codebase, read the documents in this order:
1. **[architecture.md](architecture.md)** — Understand how the pieces fit together
2. **[decisions.md](decisions.md)** — Understand why things are built the way they are
3. **[sidecar.md](sidecar.md)** — Understand how agent sessions actually run
4. **[orchestration.md](orchestration.md)** — Understand multi-agent coordination
5. **[e2e-testing.md](e2e-testing.md)** — Understand how to test changes
For research context, read [findings.md](findings.md). For implementation history, see [phases.md](phases.md) and [progress/](progress/).
---
## Key Directories
| Path | Purpose |
|------|---------|
| `src-tauri/src/` | Rust backend: commands, SQLite, btmsg, bttask, search, secrets, plugins |
| `bterminal-core/` | Shared Rust crate: PtyManager, SidecarManager, EventSink trait, Landlock sandbox |
| `bterminal-relay/` | Standalone relay binary for remote machine support |
| `src/lib/` | Svelte 5 frontend: components, stores, adapters, utils, providers |
| `sidecar/` | Agent sidecar runners (Claude, Codex, Ollama) — compiled to ESM bundles |
| `tests/e2e/` | WebDriverIO E2E tests, fixtures, LLM judge |
| `ctx/` | Context manager CLI tool (SQLite-based, standalone) |
| `consult/` | Multi-model tribunal CLI (OpenRouter, standalone Python) |

530
docs/architecture.md Normal file
View file

@ -0,0 +1,530 @@
# System Architecture
This document describes the end-to-end architecture of Agent Orchestrator — how the Rust backend, Svelte 5 frontend, and Node.js/Deno sidecar processes work together to provide a multi-project AI agent orchestration dashboard.
---
## High-Level Overview
Agent Orchestrator is a Tauri 2.x desktop application. Tauri provides a Rust backend process and a WebKit2GTK-based webview for the frontend. The application manages AI agent sessions by spawning sidecar child processes that communicate with AI provider APIs (Claude, Codex, Ollama).
```
┌────────────────────────────────────────────────────────────────┐
│ Agent Orchestrator (Tauri 2.x) │
│ │
│ ┌─────────────────┐ Tauri IPC ┌────────────────────┐ │
│ │ WebView │ ◄─────────────► │ Rust Backend │ │
│ │ (Svelte 5) │ invoke/listen │ │ │
│ │ │ │ ├── PtyManager │ │
│ │ ├── ProjectGrid │ │ ├── SidecarManager │ │
│ │ ├── AgentPane │ │ ├── SessionDb │ │
│ │ ├── TerminalPane │ │ ├── BtmsgDb │ │
│ │ ├── StatusBar │ │ ├── SearchDb │ │
│ │ └── Stores │ │ ├── SecretsManager │ │
│ └─────────────────┘ │ ├── RemoteManager │ │
│ │ └── FileWatchers │ │
│ └────────────────────┘ │
│ │ │
└───────────────────────────────────────────┼────────────────────┘
│ stdio NDJSON
┌───────────────────┐
│ Sidecar Processes │
│ (Deno or Node.js) │
│ │
│ claude-runner.mjs │
│ codex-runner.mjs │
│ ollama-runner.mjs │
└───────────────────┘
```
### Why Three Layers?
1. **Rust backend** — Manages OS-level resources (PTY processes, file watchers, SQLite databases) with memory safety and low overhead. Exposes everything to the frontend via Tauri IPC commands and events.
2. **Svelte 5 frontend** — Renders the UI with fine-grained reactivity (no VDOM). Svelte 5 runes (`$state`, `$derived`, `$effect`) provide signal-based reactivity comparable to Solid.js but with a larger ecosystem.
3. **Sidecar processes** — The Claude Agent SDK, OpenAI Codex SDK, and Ollama API are all JavaScript/TypeScript libraries. They cannot run in Rust or in the WebKit2GTK webview (no Node.js APIs). The sidecar layer bridges this gap: Rust spawns a JS process, communicates via stdio NDJSON, and forwards structured messages to the frontend.
---
## Rust Backend (`src-tauri/`)
The Rust backend is the central coordinator. It owns all OS resources and database connections.
### Cargo Workspace
The Rust code is organized as a Cargo workspace with three members:
```
v2/
├── Cargo.toml # Workspace root
├── bterminal-core/ # Shared crate
│ └── src/
│ ├── lib.rs
│ ├── pty.rs # PtyManager (portable-pty)
│ ├── sidecar.rs # SidecarManager (multi-provider)
│ ├── supervisor.rs # SidecarSupervisor (crash recovery)
│ ├── sandbox.rs # Landlock sandbox
│ └── event.rs # EventSink trait
├── bterminal-relay/ # Remote machine relay
│ └── src/main.rs # WebSocket server + token auth
└── src-tauri/ # Tauri application
└── src/
├── lib.rs # AppState + setup + handler registration
├── commands/ # 16 domain command modules
├── btmsg.rs # Inter-agent messaging (SQLite)
├── bttask.rs # Task board (SQLite, shared btmsg.db)
├── search.rs # FTS5 full-text search
├── secrets.rs # System keyring (libsecret)
├── plugins.rs # Plugin discovery
├── notifications.rs # Desktop notifications
├── session/ # SessionDb (sessions, layout, settings, agents, metrics, anchors)
├── remote.rs # RemoteManager (WebSocket client)
├── ctx.rs # Read-only ctx database access
├── memora.rs # Read-only Memora database access
├── telemetry.rs # OpenTelemetry tracing
├── groups.rs # Project groups config
├── watcher.rs # File watcher (notify crate)
├── fs_watcher.rs # Per-project filesystem watcher (inotify)
├── event_sink.rs # TauriEventSink implementation
├── pty.rs # Thin re-export from bterminal-core
└── sidecar.rs # Thin re-export from bterminal-core
```
### Why a Workspace?
The `bterminal-core` crate exists so that both the Tauri application and the standalone `bterminal-relay` binary can share PtyManager and SidecarManager code. The `EventSink` trait abstracts event emission — TauriEventSink wraps Tauri's AppHandle, while the relay uses a WebSocket-based EventSink.
### AppState
All backend state lives in `AppState`, initialized during Tauri setup:
```rust
pub struct AppState {
pub pty_manager: Mutex<PtyManager>,
pub sidecar_manager: Mutex<SidecarManager>,
pub session_db: Mutex<SessionDb>,
pub remote_manager: Mutex<RemoteManager>,
pub telemetry: Option<TelemetryGuard>,
}
```
### SQLite Databases
The backend manages two SQLite databases, both in WAL mode with 5-second busy timeout for concurrent access:
| Database | Location | Purpose |
|----------|----------|---------|
| `sessions.db` | `~/.local/share/bterminal/` | Sessions, layout, settings, agent state, metrics, anchors |
| `btmsg.db` | `~/.local/share/bterminal/` | Inter-agent messages, tasks, agents registry, audit log |
WAL checkpoints run every 5 minutes via a background tokio task to prevent unbounded WAL growth.
All queries use **named column access** (`row.get("column_name")`) — never positional indices. Rust structs use `#[serde(rename_all = "camelCase")]` so TypeScript interfaces receive camelCase field names on the wire.
### Command Modules
Tauri commands are organized into 16 domain modules under `commands/`:
| Module | Commands | Purpose |
|--------|----------|---------|
| `pty` | spawn, write, resize, kill | Terminal management |
| `agent` | query, stop, ready, restart | Agent session lifecycle |
| `session` | session CRUD, layout, settings | Session persistence |
| `persistence` | agent state, messages | Agent session continuity |
| `knowledge` | ctx, memora queries | External knowledge bases |
| `claude` | profiles, skills | Claude-specific features |
| `groups` | load, save | Project group config |
| `files` | list_directory, read/write file | File browser |
| `watcher` | start, stop | File change monitoring |
| `remote` | 12 commands | Remote machine management |
| `bttask` | list, create, update, delete, comments | Task board |
| `search` | init, search, rebuild, index | FTS5 search |
| `secrets` | store, get, delete, list, has_keyring | Secrets management |
| `plugins` | discover, read_file | Plugin discovery |
| `notifications` | send_desktop | OS notifications |
| `misc` | test_mode, frontend_log | Utilities |
---
## Svelte 5 Frontend (`src/`)
The frontend uses Svelte 5 with runes for reactive state management. The UI follows a VSCode-inspired layout with a left icon rail, expandable drawer, project grid, and status bar.
### Component Hierarchy
```
App.svelte [Root — VSCode-style layout]
├── CommandPalette.svelte [Ctrl+K overlay, 18+ commands]
├── SearchOverlay.svelte [Ctrl+Shift+F, FTS5 Spotlight-style]
├── NotificationCenter.svelte [Bell icon + dropdown]
├── GlobalTabBar.svelte [Left icon rail, 2.75rem wide]
├── [Sidebar Panel] [Expandable drawer, max 50%]
│ └── SettingsTab.svelte [Global settings + group/project CRUD]
├── ProjectGrid.svelte [Flex + scroll-snap, adaptive count]
│ └── ProjectBox.svelte [Per-project container, 11 tab types]
│ ├── ProjectHeader.svelte [Icon + name + status + badges]
│ ├── AgentSession.svelte [Main Claude session wrapper]
│ │ ├── AgentPane.svelte [Structured message rendering]
│ │ └── TeamAgentsPanel.svelte [Tier 1 subagent cards]
│ ├── TerminalTabs.svelte [Shell/SSH/agent-preview tabs]
│ │ ├── TerminalPane.svelte [xterm.js + Canvas addon]
│ │ └── AgentPreviewPane.svelte [Read-only agent activity]
│ ├── DocsTab.svelte [Markdown file browser]
│ ├── ContextTab.svelte [LLM context visualization]
│ ├── FilesTab.svelte [Directory tree + CodeMirror editor]
│ ├── SshTab.svelte [SSH connection manager]
│ ├── MemoriesTab.svelte [Memora database viewer]
│ ├── MetricsPanel.svelte [Health + history sparklines]
│ ├── TaskBoardTab.svelte [Kanban board, Manager only]
│ ├── ArchitectureTab.svelte [PlantUML viewer, Architect only]
│ └── TestingTab.svelte [Selenium/test files, Tester only]
└── StatusBar.svelte [Agent counts, burn rate, attention queue]
```
### Stores (Svelte 5 Runes)
All store files use the `.svelte.ts` extension — this is required for Svelte 5 runes (`$state`, `$derived`, `$effect`). Files with plain `.ts` extension will compile but fail at runtime with "rune_outside_svelte".
| Store | Purpose |
|-------|---------|
| `workspace.svelte.ts` | Project groups, active group, tabs, focus |
| `agents.svelte.ts` | Agent sessions, messages, cost, parent/child hierarchy |
| `health.svelte.ts` | Per-project health tracking, attention scoring, burn rate |
| `conflicts.svelte.ts` | File overlap + external write detection |
| `anchors.svelte.ts` | Session anchor management (auto/pinned/promoted) |
| `notifications.svelte.ts` | Toast + history (6 types, unread badge) |
| `plugins.svelte.ts` | Plugin command registry, event bus |
| `theme.svelte.ts` | 17 themes, font restoration |
| `machines.svelte.ts` | Remote machine state |
| `wake-scheduler.svelte.ts` | Manager auto-wake (3 strategies, per-manager timers) |
### Adapters (IPC Bridge Layer)
Adapters wrap Tauri `invoke()` calls and `listen()` event subscriptions. They isolate the frontend from IPC details and provide typed TypeScript interfaces.
| Adapter | Backend Module | Purpose |
|---------|---------------|---------|
| `agent-bridge.ts` | sidecar + commands/agent | Agent query/stop/restart |
| `pty-bridge.ts` | pty + commands/pty | Terminal spawn/write/resize |
| `claude-messages.ts` | — (frontend-only) | Parse Claude SDK NDJSON → AgentMessage |
| `codex-messages.ts` | — (frontend-only) | Parse Codex ThreadEvents → AgentMessage |
| `ollama-messages.ts` | — (frontend-only) | Parse Ollama chunks → AgentMessage |
| `message-adapters.ts` | — (frontend-only) | Provider registry for message parsers |
| `provider-bridge.ts` | commands/claude | Generic provider bridge (profiles, skills) |
| `btmsg-bridge.ts` | btmsg | Inter-agent messaging |
| `bttask-bridge.ts` | bttask | Task board operations |
| `groups-bridge.ts` | groups | Group config load/save |
| `session-bridge.ts` | session | Session/layout persistence |
| `settings-bridge.ts` | session/settings | Key-value settings |
| `files-bridge.ts` | commands/files | File browser operations |
| `search-bridge.ts` | search | FTS5 search |
| `secrets-bridge.ts` | secrets | System keyring |
| `anchors-bridge.ts` | session/anchors | Session anchor CRUD |
| `remote-bridge.ts` | remote | Remote machine management |
| `ssh-bridge.ts` | session/ssh | SSH session CRUD |
| `ctx-bridge.ts` | ctx | Context database queries |
| `memora-bridge.ts` | memora | Memora database queries |
| `fs-watcher-bridge.ts` | fs_watcher | Filesystem change events |
| `audit-bridge.ts` | btmsg (audit_log) | Audit log queries |
| `telemetry-bridge.ts` | telemetry | Frontend → Rust tracing |
| `notifications-bridge.ts` | notifications | Desktop notification trigger |
| `plugins-bridge.ts` | plugins | Plugin discovery |
### Agent Dispatcher
The agent dispatcher (`agent-dispatcher.ts`, ~260 lines) is the central router between sidecar events and the agent store. When the Rust backend emits a `sidecar-message` Tauri event, the dispatcher:
1. Looks up the provider for the session (via `sessionProviderMap`)
2. Routes the raw message through the appropriate adapter (claude-messages.ts, codex-messages.ts, or ollama-messages.ts) via `message-adapters.ts`
3. Feeds the resulting `AgentMessage[]` into the agent store
4. Handles side effects: subagent pane spawning, session persistence, auto-anchoring, worktree detection, health tracking, conflict recording
The dispatcher delegates to four extracted utility modules:
- `utils/session-persistence.ts` — session-project maps, persistSessionForProject
- `utils/subagent-router.ts` — spawn + route subagent panes
- `utils/auto-anchoring.ts` — triggerAutoAnchor on first compaction event
- `utils/worktree-detection.ts` — detectWorktreeFromCwd pure function
---
## Sidecar Layer (`sidecar/`)
See [sidecar.md](sidecar.md) for the full sidecar architecture. In brief:
- Each AI provider has its own runner file (e.g., `claude-runner.ts`) compiled to an ESM bundle (`claude-runner.mjs`) by esbuild
- Rust's SidecarManager spawns the appropriate runner based on the `provider` field in AgentQueryOptions
- Communication uses stdio NDJSON — one JSON object per line, newline-delimited
- Deno is preferred (faster startup), Node.js is the fallback
- The Claude runner uses `@anthropic-ai/claude-agent-sdk` query() internally
---
## Data Flow: Agent Query Lifecycle
Here is the complete path of a user prompt through the system:
```
1. User types prompt in AgentPane
2. AgentPane calls agentBridge.queryAgent(options)
3. agent-bridge.ts invokes Tauri command 'agent_query'
4. Rust agent_query handler calls SidecarManager.query()
5. SidecarManager resolves provider runner (e.g., claude-runner.mjs)
6. SidecarManager writes QueryMessage as NDJSON to sidecar stdin
7. Sidecar runner calls provider SDK (e.g., Claude Agent SDK query())
8. Provider SDK streams responses
9. Runner forwards each response as NDJSON to stdout
10. SidecarManager reads stdout line-by-line
11. SidecarManager emits Tauri event 'sidecar-message' with sessionId + data
12. Frontend agent-dispatcher.ts receives event
13. Dispatcher routes through message-adapters.ts → provider-specific parser
14. Parser converts to AgentMessage[]
15. Dispatcher feeds messages into agents.svelte.ts store
16. AgentPane reactively re-renders via $derived bindings
```
### Session Stop Flow
```
1. User clicks Stop button in AgentPane
2. AgentPane calls agentBridge.stopAgent(sessionId)
3. agent-bridge.ts invokes Tauri command 'agent_stop'
4. Rust handler calls SidecarManager.stop(sessionId)
5. SidecarManager writes StopMessage to sidecar stdin
6. Runner calls AbortController.abort() on the SDK query
7. SDK terminates the Claude subprocess
8. Runner emits final status message, then closes
```
---
## Configuration
### Project Groups (`~/.config/bterminal/groups.json`)
Human-editable JSON file defining project groups and their projects. Loaded at startup by `groups.rs`. Not hot-reloaded — changes require app restart or group switch.
### SQLite Settings (`sessions.db``settings` table)
Key-value store for user preferences: theme, fonts, shell, CWD, provider settings. Accessed via `settings-bridge.ts``settings_get`/`settings_set` Tauri commands.
### Environment Variables
| Variable | Purpose |
|----------|---------|
| `BTERMINAL_TEST` | Enables test mode (disables watchers, wake scheduler) |
| `BTERMINAL_TEST_DATA_DIR` | Redirects SQLite database storage |
| `BTERMINAL_TEST_CONFIG_DIR` | Redirects groups.json config |
| `BTERMINAL_OTLP_ENDPOINT` | Enables OpenTelemetry OTLP export |
---
## Data Model
### Project Group Config (`~/.config/bterminal/groups.json`)
Human-editable JSON file defining workspaces. Each group contains up to 5 projects. Loaded at startup by `groups.rs`, not hot-reloaded.
```jsonc
{
"version": 1,
"groups": [
{
"id": "work-ai",
"name": "AI Projects",
"projects": [
{
"id": "bterminal",
"name": "BTerminal",
"identifier": "bterminal",
"description": "Terminal emulator with Claude integration",
"icon": "\uf120",
"cwd": "/home/user/code/BTerminal",
"profile": "default",
"enabled": true
}
]
}
],
"activeGroupId": "work-ai"
}
```
### TypeScript Types (`src/lib/types/groups.ts`)
```typescript
export interface ProjectConfig {
id: string;
name: string;
identifier: string;
description: string;
icon: string;
cwd: string;
profile: string;
enabled: boolean;
}
export interface GroupConfig {
id: string;
name: string;
projects: ProjectConfig[]; // max 5
}
export interface GroupsFile {
version: number;
groups: GroupConfig[];
activeGroupId: string;
}
```
### SQLite Schema (v3 Additions)
Beyond the core `sessions` and `settings` tables, v3 added project-scoped agent persistence:
```sql
ALTER TABLE sessions ADD COLUMN project_id TEXT DEFAULT '';
CREATE TABLE IF NOT EXISTS agent_messages (
id INTEGER PRIMARY KEY AUTOINCREMENT,
session_id TEXT NOT NULL,
project_id TEXT NOT NULL,
sdk_session_id TEXT,
message_type TEXT NOT NULL,
content TEXT NOT NULL,
parent_id TEXT,
created_at INTEGER NOT NULL,
FOREIGN KEY (session_id) REFERENCES sessions(id) ON DELETE CASCADE
);
CREATE TABLE IF NOT EXISTS project_agent_state (
project_id TEXT PRIMARY KEY,
last_session_id TEXT NOT NULL,
sdk_session_id TEXT,
status TEXT NOT NULL,
cost_usd REAL DEFAULT 0,
input_tokens INTEGER DEFAULT 0,
output_tokens INTEGER DEFAULT 0,
last_prompt TEXT,
updated_at INTEGER NOT NULL
);
```
---
## Layout System
### Project Grid (Flexbox + scroll-snap)
Projects are arranged horizontally in a flex container with CSS scroll-snap for clean project-to-project scrolling:
```css
.project-grid {
display: flex;
gap: 4px;
height: 100%;
overflow-x: auto;
scroll-snap-type: x mandatory;
}
.project-box {
flex: 0 0 calc((100% - (N-1) * 4px) / N);
scroll-snap-align: start;
min-width: 480px;
}
```
N is computed from viewport width: `Math.min(projects.length, Math.max(1, Math.floor(containerWidth / 520)))`
### Project Box Internal Layout
Each project box uses a CSS grid with 4 rows:
```
┌─ ProjectHeader (auto) ─────────────────┐
├─────────────────────┬──────────────────┤
│ AgentSession │ TeamAgentsPanel │
│ (flex: 1) │ (240px/overlay) │
├─────────────────────┴──────────────────┤
│ [Tab1] [Tab2] [+] TabBar auto │
├────────────────────────────────────────┤
│ Terminal content (xterm or scrollback) │
└────────────────────────────────────────┘
```
Team panel: inline at >2560px viewport (240px wide), overlay at <2560px. Collapsed when no subagents running.
### Responsive Breakpoints
| Viewport Width | Visible Projects | Team Panel Mode |
|---------------|-----------------|-----------------|
| 5120px+ | 5 | inline 240px |
| 3840px | 4 | inline 200px |
| 2560px | 3 | overlay |
| 1920px | 3 | overlay |
| <1600px | 1 + project tabs | overlay |
### xterm.js Budget: 4 Active Instances
WebKit2GTK OOMs at ~5 simultaneous xterm.js instances. The budget system manages this:
| State | xterm.js Instance? | Memory |
|-------|--------------------|--------|
| Active-Focused | Yes | ~20MB |
| Active-Background | Yes (if budget allows) | ~20MB |
| Suspended | No (HTML pre scrollback) | ~200KB |
| Uninitialized | No (placeholder) | 0 |
On focus: serialize least-recent xterm scrollback, destroy it, create new for focused tab, reconnect PTY. Suspend/resume cycle < 50ms.
### Project Accent Colors
Each project slot gets a distinct Catppuccin accent color for visual distinction:
| Slot | Color | CSS Variable |
|------|-------|-------------|
| 1 | Blue | `var(--ctp-blue)` |
| 2 | Green | `var(--ctp-green)` |
| 3 | Mauve | `var(--ctp-mauve)` |
| 4 | Peach | `var(--ctp-peach)` |
| 5 | Pink | `var(--ctp-pink)` |
Applied to border tint and header accent via `var(--accent)` CSS custom property set per ProjectBox.
---
## Keyboard Shortcuts
Three-layer shortcut system prevents conflicts between terminal input, workspace navigation, and app-level commands:
| Shortcut | Action | Layer |
|----------|--------|-------|
| Ctrl+K | Command palette | App |
| Ctrl+G | Switch group (palette filtered) | App |
| Ctrl+1..5 | Focus project by index | App |
| Alt+1..4 | Switch sidebar tab + open drawer | App |
| Ctrl+B | Toggle sidebar open/closed | App |
| Ctrl+, | Toggle settings panel | App |
| Escape | Close sidebar drawer | App |
| Ctrl+Shift+F | FTS5 search overlay | App |
| Ctrl+N | New terminal in focused project | Workspace |
| Ctrl+Shift+N | New agent query | Workspace |
| Ctrl+Tab | Next terminal tab | Project |
| Ctrl+W | Close terminal tab | Project |
| Ctrl+Shift+C/V | Copy/paste in terminal | Terminal |
Terminal layer captures raw keys only when focused. App layer has highest priority.
---
## Key Constraints
1. **WebKit2GTK has no WebGL** — xterm.js must use the Canvas addon explicitly. Maximum 4 active xterm.js instances to avoid OOM.
2. **Svelte 5 runes require `.svelte.ts`** — Store files using `$state`/`$derived` must have the `.svelte.ts` extension. The compiler silently accepts `.ts` but runes fail at runtime.
3. **Single shared sidecar** — All agent sessions share one SidecarManager. Per-project isolation is via `cwd`, `claude_config_dir`, and `session_id` routing. Per-project sidecar pools deferred to v3.1.
4. **SQLite WAL mode** — Both databases use WAL with 5s busy_timeout for concurrent access from Rust backend + Python CLIs (btmsg/bttask).
5. **camelCase wire format** — Rust uses `#[serde(rename_all = "camelCase")]`. TypeScript interfaces must match. This was a source of bugs during development (see [findings.md](findings.md) for context).

51
docs/decisions.md Normal file
View file

@ -0,0 +1,51 @@
# Architecture Decisions Log
This document records significant architecture decisions made during the development of Agent Orchestrator. Each entry captures the decision, its rationale, and the date it was made. Decisions are listed chronologically within each category.
---
## Data & Configuration
| Decision | Rationale | Date |
|----------|-----------|------|
| JSON for groups config, SQLite for session state | JSON is human-editable, shareable, version-controllable. SQLite for ephemeral runtime state. Load at startup only — no hot-reload, no split-brain risk. | 2026-03-07 |
| btmsg/bttask shared SQLite DB | Both CLI tools share `~/.local/share/bterminal/btmsg.db`. Single DB simplifies deployment — agents already have the path. Read-only for non-Manager roles via CLI permissions. | 2026-03-11 |
## Layout & UI
| Decision | Rationale | Date |
|----------|-----------|------|
| Adaptive project count from viewport width | `Math.min(projects.length, Math.max(1, Math.floor(containerWidth / 520)))` — 5 at 5120px, 3 at 1920px, scroll-snap for overflow. min-width 480px. Better than forcing 5 at all sizes. | 2026-03-07 |
| Flexbox + scroll-snap over CSS Grid | Allows horizontal scroll on narrow screens. Scroll-snap gives clean project-to-project scrolling. | 2026-03-07 |
| Team panel: inline >2560px, overlay <2560px | Adapts to available space. Collapsed when no subagents running. Saves ~240px on smaller screens. | 2026-03-07 |
| VSCode-style left sidebar (replaces top tab bar) | Vertical icon rail (2.75rem) + expandable drawer (max 50%) + always-visible workspace. Settings is a regular tab, not a special drawer. ProjectGrid always visible. Ctrl+B toggles. | 2026-03-08 |
| CSS relative units (rule 18) | rem/em for all layout CSS. Pixels only for icon sizes, borders, box shadows. Exception: `--ui-font-size`/`--term-font-size` store px for xterm.js API. | 2026-03-08 |
| Project accent colors from Catppuccin palette | Visual distinction: blue/green/mauve/peach/pink per slot 1-5. Applied to border + header tint via `var(--accent)`. | 2026-03-07 |
## Agent Architecture
| Decision | Rationale | Date |
|----------|-----------|------|
| Single shared sidecar (v3.0) | Existing multiplexed protocol handles concurrent sessions. Per-project pool deferred to v3.1 if crash isolation needed. Saves ~200MB RAM. | 2026-03-07 |
| xterm budget: 4 active, unlimited suspended | WebKit2GTK OOM at ~5 instances. Serialize scrollback to text buffer, destroy xterm, recreate on focus. PTY stays alive. Suspend/resume < 50ms. | 2026-03-07 |
| AgentPane splits into AgentSession + TeamAgentsPanel | Team agents shown inline in right panel, not as separate panes. Saves xterm/pane slots. | 2026-03-07 |
| Tier 1 agents as ProjectBoxes via `agentToProject()` | Agents render as full ProjectBoxes (not separate UI). `getAllWorkItems()` merges agents + projects. Unified rendering = less code, same capabilities. | 2026-03-11 |
| `extra_env` 5-layer passthrough for BTMSG_AGENT_ID | TS → Rust AgentQueryOptions → NDJSON → JS runner → SDK env. Minimal surface — only agent projects get env injection. | 2026-03-11 |
| Periodic system prompt re-injection (1 hour) | LLM context degrades over long sessions. 1-hour timer re-sends role/tools reminder when agent is idle. `autoPrompt`/`onautopromptconsumed` callback pattern. | 2026-03-11 |
| Role-specific tabs via conditional rendering | Manager=Tasks, Architect=Arch, Tester=Selenium+Tests, Reviewer=Tasks. PERSISTED-LAZY pattern (mount on first activation). Conditional on `isAgent && agentRole`. | 2026-03-11 |
| PlantUML via plantuml.com server (~h hex encoding) | Avoids Java dependency. Hex encoding simpler than deflate+base64. Works with free tier. Trade-off: requires internet. | 2026-03-11 |
## Themes & Typography
| Decision | Rationale | Date |
|----------|-----------|------|
| All 17 themes map to `--ctp-*` CSS vars | 4 Catppuccin + 7 Editor + 6 Deep Dark themes. All map to same 26 CSS custom properties — zero component changes when adding themes. Pure data operation. | 2026-03-07 |
| Typography via CSS custom properties | `--ui-font-family`/`--ui-font-size` + `--term-font-family`/`--term-font-size` in `:root`. Restored by `initTheme()` on startup. Persisted as SQLite settings. | 2026-03-07 |
## System Design
| Decision | Rationale | Date |
|----------|-----------|------|
| Keyboard shortcut layers: App > Workspace > Terminal | Prevents conflicts. Terminal captures raw keys only when focused. App layer uses Ctrl+K/G/B. | 2026-03-07 |
| Unmount/remount on group switch | Serialize xterm scrollbacks, destroy, remount new group. <100ms perceived. Frees ~80MB per switch. | 2026-03-07 |
| Remote machines deferred to v3.1 | Elevate to project level (`project.remote_machine_id`) but don't implement in MVP. Focus on local orchestration first. | 2026-03-07 |

282
docs/e2e-testing.md Normal file
View file

@ -0,0 +1,282 @@
# E2E Testing Facility
BTerminal's end-to-end testing uses **WebDriverIO + tauri-driver** to drive the real Tauri application through WebKit2GTK's inspector protocol. The facility has three pillars:
1. **Test Fixtures** — isolated fake environments with dummy projects
2. **Test Mode** — app-level env vars that disable watchers and redirect data/config paths
3. **LLM Judge** — Claude-powered semantic assertions for evaluating agent behavior
## Quick Start
```bash
# Run all tests (vitest + cargo + E2E)
npm run test:all:e2e
# Run E2E only (requires pre-built debug binary)
SKIP_BUILD=1 npm run test:e2e
# Build debug binary separately (faster iteration)
cargo tauri build --debug --no-bundle
# Run with LLM judge via CLI (default, auto-detected)
npm run test:e2e
# Force LLM judge to use API instead of CLI
LLM_JUDGE_BACKEND=api ANTHROPIC_API_KEY=sk-... npm run test:e2e
```
## Prerequisites
| Dependency | Purpose | Install |
|-----------|---------|---------|
| Rust + Cargo | Build Tauri backend | [rustup.rs](https://rustup.rs) |
| Node.js 20+ | Frontend + test runner | `mise install node` |
| tauri-driver | WebDriver bridge to WebKit2GTK | `cargo install tauri-driver` |
| X11 display | WebKit2GTK needs a display | Real X, or `xvfb-run` in CI |
| Claude CLI | LLM judge (optional) | [claude.ai/download](https://claude.ai/download) |
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ WebDriverIO (mocha runner) │
│ specs/*.test.ts │
│ └─ browser.execute() → DOM queries + assertions │
│ └─ assertWithJudge() → LLM semantic evaluation │
├─────────────────────────────────────────────────────────┤
│ tauri-driver (port 4444) │
│ WebDriver protocol ↔ WebKit2GTK inspector │
├─────────────────────────────────────────────────────────┤
│ BTerminal debug binary │
│ BTERMINAL_TEST=1 (disables watchers, wake scheduler) │
│ BTERMINAL_TEST_DATA_DIR → isolated SQLite DBs │
│ BTERMINAL_TEST_CONFIG_DIR → test groups.json │
└─────────────────────────────────────────────────────────┘
```
## Pillar 1: Test Fixtures (`fixtures.ts`)
The fixture generator creates isolated temporary environments so tests never touch real user data. Each fixture includes:
- **Temp root dir** under `/tmp/bterminal-e2e-{timestamp}/`
- **Data dir** — empty, SQLite databases created at runtime
- **Config dir** — contains a generated `groups.json` with test projects
- **Project dir** — a real git repo with `README.md` and `hello.py` (for agent testing)
### Single-Project Fixture
```typescript
import { createTestFixture, destroyTestFixture } from '../fixtures';
const fixture = createTestFixture('my-test');
// fixture.rootDir → /tmp/my-test-1710234567890/
// fixture.dataDir → /tmp/my-test-1710234567890/data/
// fixture.configDir → /tmp/my-test-1710234567890/config/
// fixture.projectDir → /tmp/my-test-1710234567890/test-project/
// fixture.env → { BTERMINAL_TEST: '1', BTERMINAL_TEST_DATA_DIR: '...', BTERMINAL_TEST_CONFIG_DIR: '...' }
// The test project is a git repo with:
// README.md — "# Test Project\n\nA simple test project for BTerminal E2E tests."
// hello.py — "def greet(name: str) -> str:\n return f\"Hello, {name}!\""
// Both committed as "initial commit"
// groups.json contains one group "Test Group" with one project pointing at projectDir
// Cleanup when done:
destroyTestFixture(fixture);
```
### Multi-Project Fixture
```typescript
import { createMultiProjectFixture } from '../fixtures';
const fixture = createMultiProjectFixture(3); // 3 separate git repos
// Creates project-0, project-1, project-2 under fixture.rootDir
// Each is a git repo with README.md
// groups.json has one group "Multi Project Group" with all 3 projects
```
### Fixture Environment Variables
Pass `fixture.env` to the app to redirect all data/config paths:
| Variable | Effect |
|----------|--------|
| `BTERMINAL_TEST=1` | Disables file watchers, wake scheduler, enables `is_test_mode` |
| `BTERMINAL_TEST_DATA_DIR` | Redirects `sessions.db` and `btmsg.db` storage |
| `BTERMINAL_TEST_CONFIG_DIR` | Redirects `groups.json` config loading |
## Pillar 2: Test Mode
When `BTERMINAL_TEST=1` is set:
- **Rust backend**: `watcher.rs` and `fs_watcher.rs` skip file watchers
- **Frontend**: `is_test_mode` Tauri command returns true, wake scheduler disabled via `disableWakeScheduler()`
- **Data isolation**: `BTERMINAL_TEST_DATA_DIR` / `BTERMINAL_TEST_CONFIG_DIR` override default paths
The WebDriverIO config (`wdio.conf.js`) passes these env vars via `tauri:options.env` in capabilities.
## Pillar 3: LLM Judge (`llm-judge.ts`)
The LLM judge enables semantic assertions — evaluating whether agent output "looks right" rather than exact string matching. Useful for testing AI agent responses where exact output is non-deterministic.
### Dual Backend
The judge supports two backends, auto-detected or explicitly set:
| Backend | How it works | Requires |
|---------|-------------|----------|
| `cli` (default) | Spawns `claude` CLI with `--output-format text` | Claude CLI installed |
| `api` | Raw `fetch` to `https://api.anthropic.com/v1/messages` | `ANTHROPIC_API_KEY` env var |
**Auto-detection order**: CLI first → API fallback → skip test.
**Override**: Set `LLM_JUDGE_BACKEND=cli` or `LLM_JUDGE_BACKEND=api`.
### API
```typescript
import { isJudgeAvailable, judge, assertWithJudge } from '../llm-judge';
// Check availability (CLI or API key present)
if (!isJudgeAvailable()) {
this.skip(); // graceful skip in mocha
return;
}
// Basic judge call
const verdict = await judge(
'The output should contain a file listing with at least one filename', // criteria
actualOutput, // actual
'Agent was asked to list files in a directory containing README.md', // context (optional)
);
// verdict: { pass: boolean, reasoning: string, confidence: number }
// With confidence threshold (default 0.7)
const verdict = await assertWithJudge(
'Response should describe the greet function',
agentMessages,
{ context: 'hello.py contains def greet(name)', minConfidence: 0.8 },
);
```
### How It Works
1. Builds a structured prompt with criteria, actual output, and optional context
2. Asks Claude (Haiku) to evaluate as a test assertion judge
3. Expects JSON response: `{"pass": true/false, "reasoning": "...", "confidence": 0.0-1.0}`
4. Validates and returns structured `JudgeVerdict`
The CLI backend unsets `CLAUDECODE` env var to avoid nested session errors when running inside Claude Code.
## Test Spec Files
| File | Phase | Tests | Focus |
|------|-------|-------|-------|
| `bterminal.test.ts` | Smoke | ~50 | Basic UI rendering, CSS class selectors |
| `agent-scenarios.test.ts` | A | 22 | `data-testid` selectors, 7 deterministic scenarios |
| `phase-b.test.ts` | B | ~15 | Multi-project grid, LLM-judged agent responses |
| `phase-c.test.ts` | C | 27 | Hardening features (palette, search, notifications, keyboard, settings, health, metrics, context, files) |
### Phase A: Deterministic Agent Scenarios
Uses `data-testid` attributes for reliable selectors. Tests app structure, project rendering, and agent pane states without live agent interaction.
### Phase B: Multi-Project + LLM Judge
Tests multi-project grid rendering, independent tab switching, status bar fleet state. LLM-judged tests (B4, B5) send real prompts to agents and evaluate response quality — these require Claude CLI or API key and are skipped otherwise.
### Phase C: Production Hardening
Tests v3 hardening features: command palette commands (C1), search overlay (C2), notification center (C3), keyboard navigation (C4), settings panel (C5), project health indicators (C6), metrics tab (C7), context tab (C8), files tab with editor (C9), LLM-judged settings (C10), LLM-judged status bar (C11).
## Test Results Tracking (`results-db.ts`)
A lightweight JSON store for tracking test runs and individual step results:
```typescript
import { ResultsDb } from '../results-db';
const db = new ResultsDb(); // writes to test-results/results.json
db.startRun('run-001', 'v2-mission-control', 'abc123');
db.recordStep({
run_id: 'run-001',
scenario_name: 'B4',
step_name: 'should send prompt and get meaningful response',
status: 'passed',
duration_ms: 15000,
error_message: null,
screenshot_path: null,
agent_cost_usd: 0.003,
});
db.finishRun('run-001', 'passed', 45000);
```
## CI Integration (`.github/workflows/e2e.yml`)
The CI pipeline runs on push/PR with path-filtered triggers:
1. **Unit tests**`npm run test` (vitest)
2. **Cargo tests**`cargo test` (with `env -u BTERMINAL_TEST` to prevent env leakage)
3. **E2E tests**`xvfb-run npm run test:e2e` (virtual framebuffer for headless WebKit2GTK)
LLM-judged tests are gated on the `ANTHROPIC_API_KEY` secret — they skip gracefully in forks or when the secret is absent.
## Writing New Tests
### Adding a New Scenario
1. Pick the appropriate spec file (or create a new phase file)
2. Use `data-testid` selectors where possible (more stable than CSS classes)
3. For DOM queries, use `browser.execute()` to run JS in the app context
4. For semantic assertions, use `assertWithJudge()` with clear criteria
### Common Helpers
All spec files share similar helper patterns:
```typescript
// Get project IDs
const ids: string[] = await browser.execute(() => {
const boxes = document.querySelectorAll('[data-testid="project-box"]');
return Array.from(boxes).map(b => b.getAttribute('data-project-id') ?? '').filter(Boolean);
});
// Focus a project
await browser.execute((id) => {
const box = document.querySelector(`[data-project-id="${id}"]`);
const header = box?.querySelector('.project-header');
if (header) (header as HTMLElement).click();
}, projectId);
// Switch tab in a project
await browser.execute((id, idx) => {
const box = document.querySelector(`[data-project-id="${id}"]`);
const tabs = box?.querySelectorAll('[data-testid="project-tabs"] .ptab');
if (tabs && tabs[idx]) (tabs[idx] as HTMLElement).click();
}, projectId, tabIndex);
```
### WebDriverIO Config (`wdio.conf.js`)
Key settings:
- **Single session**: `maxInstances: 1` — tauri-driver can't handle parallel sessions
- **Lifecycle**: `onPrepare` builds debug binary, `beforeSession` spawns tauri-driver with TCP readiness probe, `afterSession` kills tauri-driver
- **Timeouts**: 60s per test (mocha), 10s waitfor, 30s connection retry
- **Skip build**: Set `SKIP_BUILD=1` to reuse existing binary
## Troubleshooting
| Problem | Solution |
|---------|----------|
| "Callback was not called before unload" | Stale binary — rebuild with `cargo tauri build --debug --no-bundle` |
| Tests hang on startup | Kill stale `tauri-driver` processes: `pkill -f tauri-driver` |
| All tests skip LLM judge | Install Claude CLI or set `ANTHROPIC_API_KEY` |
| SIGUSR2 / exit code 144 | Stale tauri-driver on port 4444 — kill and retry |
| `BTERMINAL_TEST` leaking to cargo | Run cargo tests with `env -u BTERMINAL_TEST cargo test` |
| No display available | Use `xvfb-run` or ensure X11/Wayland display is set |

398
docs/findings.md Normal file
View file

@ -0,0 +1,398 @@
# Research Findings
This document captures research conducted during v2 and v3 development — technology evaluations, architecture reviews, performance measurements, and design analysis. Each finding informed implementation decisions recorded in [decisions.md](decisions.md).
---
## 1. Claude Agent SDK (v2 Research, 2026-03-05)
**Source:** https://platform.claude.com/docs/en/agent-sdk/overview
The Claude Agent SDK (formerly Claude Code SDK, renamed Sept 2025) provides structured streaming, subagent detection, hooks, and telemetry — everything needed for a rich agent UI without terminal emulation.
### Streaming API
```typescript
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Fix the bug",
options: { allowedTools: ["Read", "Edit", "Bash"] }
})) {
console.log(message); // structured, typed, parseable
}
```
### Subagent Detection
Messages from subagents include `parent_tool_use_id`:
```typescript
for (const block of msg.message?.content ?? []) {
if (block.type === "tool_use" && block.name === "Task") {
console.log(`Subagent invoked: ${block.input.subagent_type}`);
}
}
if (msg.parent_tool_use_id) {
console.log("Running inside subagent");
}
```
### Session Management
- `session_id` captured from init message
- Resume with `options: { resume: sessionId }`
- Subagent transcripts persist independently
### Hooks
`PreToolUse`, `PostToolUse`, `Stop`, `SessionStart`, `SessionEnd`, `UserPromptSubmit`
### Telemetry
Every `SDKResultMessage` contains: `total_cost_usd`, `duration_ms`, per-model `modelUsage` breakdowns.
### Key Insight
The SDK gives structured data — we render it as rich UI (markdown, diff views, file cards, agent trees) instead of raw terminal text. Terminal emulation (xterm.js) is only needed for SSH, local shell, and legacy CLI sessions.
---
## 2. Tauri + xterm.js Integration (v2 Research, 2026-03-05)
### Existing Projects
- **tauri-terminal** — basic Tauri + xterm.js + portable-pty
- **Terminon** — Tauri v2 + React + xterm.js, SSH profiles, split panes
- **tauri-plugin-pty** — PTY plugin for Tauri 2, xterm.js bridge
### Integration Pattern
```
Frontend (xterm.js) <-> Tauri IPC <-> Rust PTY (portable-pty) <-> Shell/SSH/Claude
```
- `pty.onData()` -> `term.write()` (output)
- `term.onData()` -> `pty.write()` (input)
---
## 3. Terminal Performance Benchmarks (v2 Research, 2026-03-05)
### Native Terminal Latency
| Terminal | Latency | Notes |
|----------|---------|-------|
| xterm (native) | ~10ms | Gold standard |
| Alacritty | ~12ms | GPU-rendered Rust |
| Kitty | ~13ms | GPU-rendered |
| VTE (GNOME Terminal) | ~50ms | GTK3/4, spikes above |
| Hyper (Electron+xterm.js) | ~40ms | Web-based worst case |
### Memory
- Alacritty: ~30MB, WezTerm: ~45MB, xterm native: ~5MB
### Verdict
xterm.js in Tauri: ~20-30ms latency, ~20MB per instance. For AI output (not vim), perfectly fine. The VTE we used in v1 GTK3 is actually slower at ~50ms.
---
## 4. Zellij Architecture (v2 Inspiration, 2026-03-05)
Zellij uses WASM plugins for extensibility: message passing at WASM boundary, permission model, event types for rendering/input/lifecycle, KDL layout files.
**Relevance:** We don't need WASM plugins — our "plugins" are different pane types. But the layout concept (JSON layout definitions) is worth borrowing for saved layouts.
---
## 5. Ultrawide Design Patterns (v2 Research, 2026-03-05)
**Key Insight:** 5120px width / ~600px per pane = ~8 panes max, ~4-5 comfortable.
**Layout Philosophy:**
- Center = primary attention (1-2 main agent panes)
- Left edge = navigation (sidebar, 250-300px)
- Right edge = context (agent tree, file viewer, 350-450px)
- Never use tabs for primary content — everything visible
- Tabs only for switching saved layouts
---
## 6. Frontend Framework Choice (v2 Research, 2026-03-05)
### Why Svelte 5
- **Fine-grained reactivity**`$state`/`$derived` runes match Solid's signals model
- **No VDOM** — critical when 4-8 panes stream data simultaneously
- **Small bundle** — ~5KB runtime vs React's ~40KB
- **Larger ecosystem** than Solid.js — more component libraries, better tooling
### Why NOT Solid.js (initially considered)
- Ecosystem too small for production use
- Svelte 5 runes eliminated the ceremony gap
### Why NOT React
- VDOM reconciliation across 4-8 simultaneously updating panes = CPU waste
- Larger bundle, state management complexity (Redux/Zustand needed)
---
## 7. Claude Code CLI Observation (v2 Research, 2026-03-05)
Three observation tiers for Claude sessions:
1. **SDK sessions** (best): Full structured streaming, subagent detection, hooks, cost tracking
2. **CLI with stream-json** (good): `claude -p "prompt" --output-format stream-json` — structured output but non-interactive
3. **Interactive CLI** (fallback): Tail JSONL session files at `~/.claude/projects/<encoded-dir>/<session-uuid>.jsonl` + show terminal via xterm.js
### JSONL Session Files
Path encoding: `/home/user/project` -> `-home-user-project`. Append-only, written immediately. Can be `tail -f`'d for external observation.
### Hooks (SDK only)
`SubagentStart`, `SubagentStop` (gives `agent_transcript_path`), `PreToolUse`, `PostToolUse`, `Stop`, `Notification`, `TeammateIdle`
---
## 8. Agent Teams (v2 Research, 2026-03-05)
`CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` enables full independent Claude Code instances sharing a task list and mailbox.
- 3-5 teammates is the practical sweet spot (linear token cost)
- Display modes: in-process (Shift+Down cycles), tmux (own pane each), auto
- Session resumption is broken for in-process teammates
- Agent Orchestrator is the ideal frontend for Agent Teams — each teammate gets its own ProjectBox
---
## 9. Competing Approaches (v2 Research, 2026-03-05)
- **claude-squad** (Go+tmux): Most adopted multi-agent manager
- **agent-deck**: MCP socket pooling (~85-90% memory savings)
- **Git worktrees**: Dominant isolation strategy for parallel Claude sessions
---
## 10. Adversarial Architecture Review (v3, 2026-03-07)
Three specialized agents reviewed the v3 Mission Control architecture before implementation. This adversarial process caught 12 issues (4 critical) that would have required expensive rework if discovered later.
### Agent: Architect (Advocate)
Proposed the core design:
- **Project Groups** as primary organizational unit (replacing free-form panes)
- **JSON config** for human-editable definitions, SQLite for runtime state
- **Single shared sidecar** with per-project isolation via `cwd`, `claude_config_dir`, `session_id`
- **Component split:** AgentPane -> AgentSession + TeamAgentsPanel
- **MVP boundary at Phase 5** (5 phases core, 5 polish)
### Agent: Devil's Advocate
Found 12 issues across the Architect's proposal:
| # | Issue | Severity | Why It Matters |
|---|-------|----------|----------------|
| 1 | xterm.js 4-instance ceiling | **Critical** | WebKit2GTK OOMs at ~5 instances. 5 projects x 1 terminal = immediate wall. |
| 2 | Single sidecar = SPOF | **Critical** | One crash kills all 5 project agents. No isolation. |
| 3 | Layout store has no workspace concept | **Critical** | v2 pane-based store cannot represent project groups. Full rewrite needed. |
| 4 | 384px per project on 1920px | **Critical** | 5 projects on 1920px = 384px each — too narrow for code. Must adapt to viewport. |
| 5 | Session identity collision | Major | Without persisted `sdkSessionId`, resuming wrong session corrupts state. |
| 6 | JSON + SQLite = split-brain risk | Major | Two sources of truth can diverge. Must clearly separate config vs state. |
| 7 | Dispatcher has no project scoping | Major | Singleton routes all messages globally. Needs projectId and per-project cleanup. |
| 8 | Markdown discovery undefined | Minor | No spec for which .md files appear in Docs tab. |
| 9 | Keyboard shortcut conflicts | Major | Three input layers can conflict without explicit precedence. |
| 10 | Remote machine support orphaned | Major | v2 remote UI doesn't map to project model. |
| 11 | No graceful degradation | Major | Broken CWD or git could fail the whole group. |
| 12 | Flat event stream wastes CPU | Minor | Messages for hidden projects still process through adapters. |
All 12 resolved before implementation. Critical items addressed in architecture. Major items implemented in MVP or deferred to v3.1 with rationale.
### Agent: UX + Performance Specialist
Provided concrete wireframes and performance budgets:
- **Adaptive layout** formula: 5 at 5120px, 3 at 1920px, 1 with scroll at <1600px
- **xterm budget:** 4 active max, suspend/resume < 50ms
- **Memory budget:** ~225MB total (4 xterm @ 20MB + Tauri + SQLite + agent stores)
- **Workspace switch:** <100ms perceived (serialize scrollbacks + unmount/mount)
- **RAF batching:** For 5 concurrent agent streams, batch DOM updates to avoid layout thrashing
---
## 11. Provider Adapter Coupling Analysis (v3, 2026-03-11)
Before implementing multi-provider support, a systematic coupling analysis mapped every Claude-specific dependency. 13+ files examined and classified into 4 severity levels.
### Coupling Severity Map
**CRITICAL — hardcoded SDK, must abstract:**
- `sidecar/agent-runner.ts` — imports Claude Agent SDK, calls `query()`, hardcoded `findClaudeCli()`. Became `claude-runner.ts` with other providers getting separate runners.
- `bterminal-core/src/sidecar.rs``AgentQueryOptions` had no `provider` field. `SidecarCommand` hardcoded runner path. Added provider-based runner selection.
- `src/lib/adapters/sdk-messages.ts``parseMessage()` assumed Claude SDK JSON format. Became `claude-messages.ts` with per-provider parsers.
**HIGH — TS mirror types, provider-specific commands:**
- `agent-bridge.ts``AgentQueryOptions` interface mirrored Rust with no provider field.
- `lib.rs``claude_list_profiles`, `claude_list_skills` are Claude-specific (kept, gated by capability).
- `claude-bridge.ts` — provider-specific adapter (kept, genericized via `provider-bridge.ts`).
**MEDIUM — provider-aware routing:**
- `agent-dispatcher.ts` — called `parseMessage()` (Claude-specific), subagent tool names hardcoded.
- `AgentPane.svelte` — profile selector, skill autocomplete assumed Claude.
**LOW — already generic:**
- `agents.svelte.ts`, `health.svelte.ts`, `conflicts.svelte.ts` — provider-agnostic.
- `bterminal-relay/` — forwards `AgentQueryOptions` as-is.
### Key Insights
1. **Sidecar is the natural abstraction boundary.** Each provider needs its own runner because SDKs are incompatible.
2. **Message format is the main divergence point.** Per-provider adapters normalize to `AgentMessage`.
3. **Capability flags eliminate provider switches.** UI checks `capabilities.hasProfiles` instead of `provider === 'claude'`.
4. **Env var stripping is provider-specific.** Claude strips `CLAUDE*`, Codex strips `CODEX*`, Ollama strips nothing.
---
## 12. Codebase Reuse Analysis: v2 to v3 (2026-03-07)
### Survived (with modifications)
| Component/Module | Modifications |
|-----------------|---------------|
| TerminalPane.svelte | Added suspend/resume lifecycle for xterm budget |
| MarkdownPane.svelte | Unchanged |
| AgentTree.svelte | Reused inside AgentSession |
| StatusBar.svelte | Rewritten for workspace store (group name, fleet status, attention queue) |
| ToastContainer.svelte | Unchanged |
| agents.svelte.ts | Added projectId field to AgentSession |
| theme.svelte.ts | Unchanged |
| notifications.svelte.ts | Unchanged |
| All adapters | Minor updates for provider routing |
| All Rust backend | Added new modules (btmsg, bttask, search, secrets, plugins) |
### Replaced
| v2 Component | v3 Replacement | Reason |
|-------------|---------------|--------|
| layout.svelte.ts | workspace.svelte.ts | Pane-based model -> project-group model |
| TilingGrid.svelte | ProjectGrid.svelte | Free-form grid -> fixed project boxes |
| PaneContainer.svelte | ProjectBox.svelte | Generic pane -> per-project container with 11 tabs |
| SessionList.svelte | ProjectHeader + CommandPalette | Sidebar list -> inline headers + Ctrl+K |
| SettingsDialog.svelte | SettingsTab.svelte | Modal dialog -> sidebar drawer tab |
| AgentPane.svelte | AgentSession + TeamAgentsPanel | Monolithic -> split for team support |
| App.svelte | Full rewrite | Tab bar -> VSCode-style sidebar layout |
### Dropped (v3.0)
| Feature | Reason |
|---------|--------|
| Detached pane mode | Doesn't fit workspace model (projects are grouped) |
| Drag-resize splitters | Project boxes have fixed internal layout |
| Layout presets | Replaced by adaptive project count from viewport |
| Remote machine UI | Deferred to v3.1 (elevated to project level) |
---
## 13. Session Anchor Design (v3, 2026-03-12)
Session anchors solve context loss during Claude's automatic context compaction.
### Problem
When Claude's context window fills up (~80% of model limit), the SDK automatically compacts older turns. This is lossy — important early decisions, architecture context, and debugging breakthroughs can be permanently lost.
### Design Decisions
1. **Auto-anchor on first compaction** — Automatically captures the first 3 turns when compaction is first detected. Preserves the session's initial context (task definition, first architecture decisions).
2. **Observation masking** — Tool outputs (Read results, Bash output) are compacted in anchors, but reasoning text is preserved in full. Dramatically reduces anchor token cost while keeping important reasoning.
3. **Budget system** — Fixed scales (2K/6K/12K/20K tokens) instead of percentage-based. "6,000 tokens" is more intuitive than "15% of context."
4. **Re-injection via system prompt** — Promoted anchors are serialized and injected as the `system_prompt` field. Simplest integration with the SDK — no conversation history modification needed.
---
## 14. Multi-Agent Orchestration Design (v3, 2026-03-11)
### Evaluated Approaches
| Approach | Pros | Cons | Decision |
|----------|------|------|----------|
| Claude Agent Teams (native) | Zero custom code, SDK-managed | Experimental, session resume broken | Supported but not primary |
| Message bus (Redis/NATS) | Proven, scalable | Runtime dependency, deployment complexity | Rejected |
| Shared SQLite + CLI tools | Zero deps, agents use shell | Polling-based, no real-time push | **Selected** |
| MCP server for agent comm | Standard protocol | Overhead per message, complex setup | Rejected |
### Why SQLite + CLI
Agents run Claude Code sessions with full shell access. Python CLI tools (`btmsg`, `bttask`) reading/writing SQLite is the lowest-friction integration:
- Zero configuration (`btmsg send architect "review this"`)
- No runtime services (no Redis, no MCP server)
- WAL mode handles concurrent access from multiple agent processes
- Same database readable by Rust backend for UI display
- 5s polling is acceptable — agents don't need millisecond latency
### Role Hierarchy
4 Tier 1 roles based on common development workflows:
- **Manager** — coordinates work (tech lead assigning sprint tasks). Unique: Task board tab, full bttask CRUD.
- **Architect** — designs solutions (senior engineer doing design reviews). Unique: PlantUML tab.
- **Tester** — runs tests (QA monitoring test suites). Unique: Selenium + Tests tabs.
- **Reviewer** — reviews code (processing PR queue). Unique: review queue depth in attention scoring.
---
## 15. Theme System Evolution (v3, 2026-03-07)
### Phase 1: 4 Catppuccin Flavors (v2)
Mocha, Macchiato, Frappe, Latte. All colors mapped to 26 `--ctp-*` CSS custom properties.
### Phase 2: +7 Editor Themes
VSCode Dark+, Atom One Dark, Monokai, Dracula, Nord, Solarized Dark, GitHub Dark. Same 26 variables — zero component changes. `CatppuccinFlavor` type generalized to `ThemeId`.
### Phase 3: +6 Deep Dark Themes
Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper (warm dark), Midnight (pure OLED black). Same mapping.
### Key Decision
All 17 themes map to the same CSS custom property names. No component ever needs to know which theme is active. Adding new themes is a pure data operation: define 26 color values and add to `THEME_LIST`.
---
## 16. Performance Measurements (v3, 2026-03-11)
### xterm.js Canvas Performance
WebKit2GTK lacks WebGL — xterm.js falls back to Canvas 2D:
- **Latency:** ~20-30ms per keystroke (acceptable for AI output)
- **Memory:** ~20MB per active instance
- **OOM threshold:** ~5 simultaneous instances causes WebKit2GTK crash
- **Mitigation:** 4-instance budget with suspend/resume
### Tauri IPC Latency
- **Linux:** ~5ms for typical payloads
- **Terminal keystroke echo:** 5ms IPC + xterm render = 10-15ms total
- **Agent message forwarding:** Negligible (human-readable speed)
### SQLite WAL Concurrent Access
Both databases accessed concurrently by Rust backend + Python CLIs + frontend reads via IPC. WAL mode with 5s busy_timeout handles this reliably. 5-minute checkpoint prevents WAL growth.
### Workspace Switch Latency
- Serialize 4 xterm scrollbacks: ~30ms
- Destroy 4 xterm instances: ~10ms
- Unmount ProjectGrid children: ~5ms
- Mount new group: ~20ms
- Create new xterm instances: ~35ms
- **Total perceived: ~100ms** (acceptable)

323
docs/multi-machine.md Normal file
View file

@ -0,0 +1,323 @@
# Multi-Machine Support — Architecture & Implementation
**Status: Implemented (Phases A-D complete, 2026-03-06)**
## Overview
Extend BTerminal to manage Claude agent sessions and terminal panes running on **remote machines** over WebSocket, while keeping the local sidecar path unchanged.
## Problem
Current architecture is local-only:
```
WebView ←→ Rust (Tauri IPC) ←→ Local Sidecar (stdio NDJSON)
←→ Local PTY (portable-pty)
```
Target state: BTerminal acts as a **mission control** that observes agents and terminals running on multiple machines (dev servers, cloud VMs, CI runners).
## Design Constraints
1. **Zero changes to local path** — local sidecar/PTY must work identically
2. **Same NDJSON protocol** — remote and local agents speak the same message format
3. **No new runtime dependencies** — use Rust's `tokio-tungstenite` (already available via Tauri)
4. **Graceful degradation** — remote machine goes offline → pane shows disconnected state, reconnects automatically
5. **Security** — all remote connections authenticated and encrypted (TLS + token)
## Architecture
### Three-Layer Model
```
┌──────────────────────────────────────────────────────────────────┐
│ BTerminal (Controller) │
│ │
│ ┌──────────┐ Tauri IPC ┌──────────────────────────────┐ │
│ │ WebView │ ←────────────→ │ Rust Backend │ │
│ │ (Svelte) │ │ │ │
│ └──────────┘ │ ├── PtyManager (local) │ │
│ │ ├── SidecarManager (local) │ │
│ │ └── RemoteManager ──────────┼──┤
│ └──────────────────────────────┘ │
│ │
└──────────────────────────────────────────────────────────────────┘
│ │
│ (local stdio) │ (WebSocket wss://)
▼ ▼
┌───────────┐ ┌──────────────────────┐
│ Local │ │ Remote Machine │
│ Sidecar │ │ │
│ (Deno/ │ │ ┌────────────────┐ │
│ Node.js) │ │ │ bterminal-relay│ │
│ │ │ │ (Rust binary) │ │
└───────────┘ │ │ │ │
│ │ ├── PTY mgr │ │
│ │ ├── Sidecar mgr│ │
│ │ └── WS server │ │
│ └────────────────┘ │
└──────────────────────┘
```
### Components
#### 1. `bterminal-relay` — Remote Agent (Rust binary)
A standalone Rust binary that runs on each remote machine. It:
- Listens on a WebSocket port (default: 9750)
- Manages local PTYs and claude sidecar processes
- Forwards NDJSON events to the controller over WebSocket
- Receives commands (query, stop, resize, write) from the controller
**Why a Rust binary?** Reuses existing `PtyManager` and `SidecarManager` code from `src-tauri/src/`. Extracted into a shared crate.
```
bterminal-relay/
├── Cargo.toml # depends on bterminal-core
├── src/
│ └── main.rs # WebSocket server + auth
bterminal-core/ # shared crate (extracted from src-tauri)
├── Cargo.toml
├── src/
│ ├── pty.rs # PtyManager (from src-tauri/src/pty.rs)
│ ├── sidecar.rs # SidecarManager (from src-tauri/src/sidecar.rs)
│ └── lib.rs
```
#### 2. `RemoteManager` — Controller-Side (in Rust backend)
New module in `src-tauri/src/remote.rs`. Manages WebSocket connections to multiple relays.
```rust
pub struct RemoteMachine {
pub id: String,
pub label: String,
pub url: String, // wss://host:9750
pub token: String, // auth token
pub status: RemoteStatus, // connected | connecting | disconnected | error
}
pub enum RemoteStatus {
Connected,
Connecting,
Disconnected,
Error(String),
}
pub struct RemoteManager {
machines: Arc<Mutex<Vec<RemoteMachine>>>,
connections: Arc<Mutex<HashMap<String, WsConnection>>>,
}
```
#### 3. Frontend Adapters — Unified Interface
The frontend doesn't care whether a pane is local or remote. The bridge layer abstracts this:
```typescript
// adapters/agent-bridge.ts — extended
export async function queryAgent(options: AgentQueryOptions): Promise<void> {
if (options.remote_machine_id) {
return invoke('remote_agent_query', { machineId: options.remote_machine_id, options });
}
return invoke('agent_query', { options });
}
```
Same pattern for `pty-bridge.ts` — add optional `remote_machine_id` to all operations.
## Protocol
### WebSocket Wire Format
Same NDJSON as local sidecar, wrapped in an envelope for multiplexing:
```typescript
// Controller → Relay (commands)
interface RelayCommand {
id: string; // request correlation ID
type: 'pty_create' | 'pty_write' | 'pty_resize' | 'pty_close'
| 'agent_query' | 'agent_stop' | 'sidecar_restart'
| 'ping';
payload: Record<string, unknown>;
}
// Relay → Controller (events)
interface RelayEvent {
type: 'pty_data' | 'pty_exit' | 'pty_created'
| 'sidecar_message' | 'sidecar_exited'
| 'error' | 'pong' | 'ready';
sessionId?: string;
payload: unknown;
}
```
### Authentication
1. **Pre-shared token** — relay starts with `--token <secret>`. Controller sends token in WebSocket upgrade headers (`Authorization: Bearer <token>`).
2. **TLS required** — relay rejects non-TLS connections in production mode. Dev mode allows `ws://` with `--insecure` flag.
3. **Token rotation** — future: relay exposes endpoint to rotate token. Controller stores tokens in SQLite settings table.
### Connection Lifecycle
```
Controller Relay
│ │
│── WSS connect ─────────────────→│
│── Authorization: Bearer token ──→│
│ │
│←── { type: "ready", ...} ───────│
│ │
│── { type: "ping" } ────────────→│
│←── { type: "pong" } ────────────│ (every 15s)
│ │
│── { type: "agent_query", ... }──→│
│←── { type: "sidecar_message" }──│ (streaming)
│←── { type: "sidecar_message" }──│
│ │
│ (disconnect) │
│── reconnect (exp backoff) ─────→│ (1s, 2s, 4s, 8s, max 30s)
```
### Reconnection (Implemented)
- Controller reconnects with exponential backoff (1s, 2s, 4s, 8s, 16s, 30s cap)
- Reconnection runs as an async tokio task spawned on disconnect
- Uses `attempt_tcp_probe()`: TCP connect only (no WS upgrade), 5s timeout, default port 9750. Avoids allocating per-connection resources (PtyManager, SidecarManager) on the relay during probes.
- Emits `remote-machine-reconnecting` event (with backoff duration) and `remote-machine-reconnect-ready` when probe succeeds
- Frontend listens via `onRemoteMachineReconnecting` and `onRemoteMachineReconnectReady` in remote-bridge.ts; machines store sets status to 'reconnecting' and auto-calls `connectMachine()` on ready
- Cancels if machine is removed or manually reconnected (checks status == "disconnected" && connection == None)
- On reconnect, relay sends current state snapshot (active sessions, PTY list)
- Controller reconciles: updates pane states, re-subscribes to streams
- Active agent sessions continue on relay regardless of controller connection
## Session Persistence Across Reconnects
Key insight: **remote agents keep running even when the controller disconnects**. The relay is autonomous — it doesn't need the controller to operate.
On reconnect:
1. Relay sends `{ type: "state_sync", activeSessions: [...], activePtys: [...] }`
2. Controller matches against known panes, updates status
3. Missed messages are NOT replayed (too complex, marginal value). Agent panes show "reconnected — some messages may be missing" notice
## Frontend Integration
### Pane Model Changes
```typescript
// stores/layout.svelte.ts
export interface Pane {
id: string;
type: 'terminal' | 'agent';
title: string;
group?: string;
remoteMachineId?: string; // NEW: undefined = local
}
```
### Sidebar — Machine Groups
Remote panes auto-group by machine label in the sidebar:
```
▾ Local
├── Terminal 1
└── Agent: fix bug
▾ devbox (192.168.1.50) ← remote machine
├── SSH session
└── Agent: deploy
▾ ci-runner (10.0.0.5) ← remote machine (disconnected)
└── Agent: test suite ⚠️
```
### Settings Panel
New "Machines" section in settings:
| Field | Type | Notes |
|-------|------|-------|
| Label | string | Human-readable name |
| URL | string | `wss://host:9750` |
| Token | password | Pre-shared auth token |
| Auto-connect | boolean | Connect on app launch |
Stored in SQLite `settings` table as JSON: `remote_machines` key.
## Implementation (All Phases Complete)
### Phase A: Extract `bterminal-core` crate [DONE]
- Cargo workspace at level (Cargo.toml with members: src-tauri, bterminal-core, bterminal-relay)
- PtyManager and SidecarManager extracted to bterminal-core/
- EventSink trait (bterminal-core/src/event.rs) abstracts event emission
- TauriEventSink (src-tauri/src/event_sink.rs) implements EventSink for AppHandle
- src-tauri pty.rs and sidecar.rs are thin re-export wrappers
### Phase B: Build `bterminal-relay` binary [DONE]
- bterminal-relay/src/main.rs — WebSocket server (tokio-tungstenite)
- Token auth on WebSocket upgrade (Authorization: Bearer header)
- CLI: --port (default 9750), --token (required), --insecure (allow ws://)
- Routes RelayCommand to bterminal-core managers, forwards RelayEvent over WebSocket
- Rate limiting: 10 failed auth attempts triggers 5-minute lockout
- Per-connection isolated PtyManager + SidecarManager instances
- Command response propagation: structured responses (pty_created, pong, error) sent back via shared event channel
- send_error() helper: all command failures emit RelayEvent with commandId + error message
- PTY creation confirmation: pty_create command returns pty_created event with session ID and commandId for correlation
### Phase C: Add `RemoteManager` to controller [DONE]
- src-tauri/src/remote.rs — RemoteManager struct with WebSocket client connections
- 12 Tauri commands: remote_add_machine, remote_remove_machine, remote_connect, remote_disconnect, remote_list_machines, remote_pty_spawn/write/resize/kill, remote_agent_query/stop, remote_sidecar_restart
- Heartbeat ping every 15s
- PTY creation event: emits `remote-pty-created` Tauri event with machineId, ptyId, commandId
- Exponential backoff reconnection on disconnect (1s/2s/4s/8s/16s/30s cap) via `attempt_tcp_probe()` (TCP-only, no WS upgrade)
- Reconnection events: `remote-machine-reconnecting`, `remote-machine-reconnect-ready`
### Phase D: Frontend integration [DONE]
- src/lib/adapters/remote-bridge.ts — machine management IPC adapter
- src/lib/stores/machines.svelte.ts — remote machine state store
- Pane.remoteMachineId field in layout store
- agent-bridge.ts and pty-bridge.ts route to remote commands when remoteMachineId is set
- SettingsDialog "Remote Machines" section
- Sidebar auto-groups remote panes by machine label
### Remaining Work
- [x] Reconnection logic with exponential backoff (1s-30s cap) — implemented in remote.rs
- [x] Relay command response propagation (pty_created, pong, error events) — implemented in main.rs
- [ ] Real-world relay testing (2 machines)
- [ ] TLS/certificate pinning
## Security Considerations
| Threat | Mitigation |
|--------|-----------|
| Token interception | TLS required (reject `ws://` without `--insecure`) |
| Token brute-force | Rate limit auth attempts (5/min), lockout after 10 failures |
| Relay impersonation | Pin relay certificate fingerprint (future: mTLS) |
| Command injection | Relay validates all command payloads against schema |
| Lateral movement | Relay runs as unprivileged user, no shell access beyond PTY/sidecar |
| Data exfiltration | Agent output streams to controller only, no relay-to-relay traffic |
## Performance Considerations
| Concern | Mitigation |
|---------|-----------|
| WebSocket latency | Typical LAN: <1ms. WAN: 20-100ms. Acceptable for agent output (text, not video) |
| Bandwidth | Agent NDJSON: ~50KB/s peak. Terminal: ~200KB/s peak. Trivial even on slow links |
| Connection count | Max 10 machines initially (UI constraint, not technical) |
| Message ordering | Single WebSocket per machine = ordered delivery guaranteed |
## What This Does NOT Cover (Future)
- **Multi-controller** — multiple BTerminal instances observing the same relay (needs pub/sub)
- **Relay discovery** — automatic detection of relays on LAN (mDNS/Bonjour)
- **Agent migration** — moving a running agent from one machine to another
- **Relay-to-relay** — direct communication between remote machines
- **mTLS** — mutual TLS for enterprise environments (Phase B+ enhancement)

362
docs/orchestration.md Normal file
View file

@ -0,0 +1,362 @@
# Multi-Agent Orchestration
Agent Orchestrator supports running multiple AI agents that communicate with each other, coordinate work through a shared task board, and are managed by a hierarchy of specialized roles. This document covers the inter-agent messaging system (btmsg), the task board (bttask), agent roles and system prompts, and the auto-wake scheduler.
---
## Agent Roles (Tier 1 and Tier 2)
Agents are organized into two tiers:
### Tier 1 — Management Agents
Defined in `groups.json` under a group's `agents[]` array. Each management agent gets a full ProjectBox in the UI (converted via `agentToProject()` in the workspace store). They have role-specific capabilities, tabs, and system prompts.
| Role | Tabs | btmsg Permissions | bttask Permissions | Purpose |
|------|------|-------------------|-------------------|---------|
| **Manager** | Model, Tasks | Full (send, receive, create channels) | Full CRUD | Coordinates work, creates/assigns tasks, delegates to subagents |
| **Architect** | Model, Architecture | Send, receive | Read-only + comments | Designs solutions, creates PlantUML diagrams, reviews architecture |
| **Tester** | Model, Selenium, Tests | Send, receive | Read-only + comments | Runs tests, monitors screenshots, discovers test files |
| **Reviewer** | Model, Tasks | Send, receive | Read + status + comments | Reviews code, manages review queue, approves/rejects tasks |
### Tier 2 — Project Agents
Regular `ProjectConfig` entries in `groups.json`. Each project gets its own Claude session with optional custom context via `project.systemPrompt`. They have standard tabs (Model, Docs, Context, Files, SSH, Memory) but no role-specific tabs.
### System Prompt Generation
Tier 1 agents receive auto-generated system prompts built by `generateAgentPrompt()` in `utils/agent-prompts.ts`. The prompt has 7 sections:
1. **Identity** — Role name, project context, team membership
2. **Environment** — Working directory, available tools, shell info
3. **Team** — List of other agents in the group with their roles
4. **btmsg documentation** — CLI usage, channel commands, message format
5. **bttask documentation** — CLI usage, task lifecycle, role-specific permissions
6. **Custom context** — Optional `project.systemPrompt` (Tier 2) or role-specific instructions
7. **Workflow** — Role-specific workflow guidelines (e.g., Manager delegates, Reviewer checks review queue)
Tier 2 agents receive only the custom context section (if `project.systemPrompt` is set), injected as the `system_prompt` field in AgentQueryOptions.
### BTMSG_AGENT_ID
Tier 1 agents receive the `BTMSG_AGENT_ID` environment variable, injected via `extra_env` in AgentQueryOptions. This flows through 5 layers: TypeScript → Rust AgentQueryOptions → NDJSON → JS runner → SDK env. The CLI tools (`btmsg`, `bttask`) read this variable to identify which agent is sending messages or creating tasks.
### Periodic Re-injection
LLM context degrades over long sessions as important instructions scroll out of the context window. To counter this, AgentSession runs a 1-hour timer that re-sends the system prompt when the agent is idle. The mechanism:
1. AgentSession timer fires after 60 minutes of agent inactivity
2. Sets `autoPrompt` flag, which AgentPane reads via `onautopromptconsumed` callback
3. AgentPane calls `startQuery()` with `resume=true` and the refresh prompt
4. The agent receives the role/tools reminder as a follow-up message
---
## btmsg — Inter-Agent Messaging
btmsg is a messaging system that lets agents communicate with each other. It consists of a Rust backend (SQLite), a Python CLI tool (for agents to use in their shell), and a Svelte frontend (CommsTab).
### Architecture
```
Agent (via btmsg CLI)
├── btmsg send <recipient> "message" → writes to btmsg.db
├── btmsg read → reads from btmsg.db
├── btmsg channel create #review-queue → creates channel
├── btmsg channel post #review-queue "msg" → posts to channel
└── btmsg heartbeat → updates agent heartbeat
btmsg.db (SQLite, WAL mode, ~/.local/share/bterminal/btmsg.db)
├── agents table — registered agents with roles
├── messages table — DMs and channel messages
├── channels table — named channels (#review-queue, #review-log)
├── contacts table — ACL (who can message whom)
├── heartbeats table — agent liveness tracking
├── dead_letter_queue — undeliverable messages
└── audit_log — all operations for debugging
Rust Backend (btmsg.rs, ~600 lines)
├── btmsg_list_messages, btmsg_send_message, ...
├── 15+ Tauri commands for full CRUD
└── Shared database connection (WAL + 5s busy_timeout)
Frontend (btmsg-bridge.ts → CommsTab.svelte)
├── Activity feed — all messages across all agents
├── DM view — direct messages between specific agents
└── Channel view — channel messages (#review-queue, etc.)
```
### Database Schema
The btmsg database (`btmsg.db`) stores all messaging data:
| Table | Purpose | Key Columns |
|-------|---------|-------------|
| `agents` | Agent registry | id, name, role, project_id, status, created_at |
| `messages` | All messages | id, sender_id, recipient_id, channel_id, content, read, created_at |
| `channels` | Named channels | id, name, created_by, created_at |
| `contacts` | ACL | agent_id, contact_id (bidirectional) |
| `heartbeats` | Liveness | agent_id, last_heartbeat, status |
| `dead_letter_queue` | Failed delivery | message_id, reason, created_at |
| `audit_log` | All operations | id, event_type, agent_id, details, created_at |
### CLI Usage (for agents)
Agents use the `btmsg` Python CLI tool in their shell. The tool reads `BTMSG_AGENT_ID` to identify the sender:
```bash
# Send a direct message
btmsg send architect "Please review the auth module design"
# Read unread messages
btmsg read
# Create a channel
btmsg channel create #architecture-decisions
# Post to a channel
btmsg channel post #review-queue "PR #42 ready for review"
# Send heartbeat (agents do this periodically)
btmsg heartbeat
# List all agents
btmsg agents
```
### Frontend (CommsTab)
The CommsTab component (rendered in ProjectBox for all agents) shows:
- **Activity Feed** — chronological view of all messages across all agents
- **DMs** — direct message threads between agents
- **Channels** — named channel message streams
- Polling-based updates (5s interval)
### Dead Letter Queue
Messages sent to non-existent or offline agents are moved to the dead letter queue instead of being silently dropped. The Rust backend checks agent status before delivery and queues failures. The Manager agent's health dashboard shows dead letter count.
### Audit Logging
Every btmsg operation is logged to the `audit_log` table with event type, agent ID, and JSON details. Event types include: message_sent, message_read, channel_created, agent_registered, heartbeat, and prompt_injection_detected.
---
## bttask — Task Board
bttask is a kanban-style task board that agents use to coordinate work. It shares the same SQLite database as btmsg (`btmsg.db`) for deployment simplicity.
### Architecture
```
Agent (via bttask CLI)
├── bttask list → list all tasks
├── bttask create "Fix auth bug" → create task (Manager only)
├── bttask status <id> in_progress → update status
├── bttask comment <id> "Done" → add comment
└── bttask review-count → count review queue tasks
btmsg.db → tasks table + task_comments table
Rust Backend (bttask.rs, ~300 lines)
├── 7 Tauri commands: list, create, update_status, delete, add_comment, comments, review_queue_count
└── Optimistic locking via version column
Frontend (bttask-bridge.ts → TaskBoardTab.svelte)
└── Kanban board: 5 columns, 5s poll, drag-and-drop
```
### Task Lifecycle
```
┌──────────┐ assign ┌─────────────┐ complete ┌──────────┐
│ Backlog │──────────►│ In Progress │────────────►│ Review │
└──────────┘ └─────────────┘ └──────────┘
┌───────────┼───────────┐
▼ ▼
┌────────┐ ┌──────────┐
│ Done │ │ Rejected │
└────────┘ └──────────┘
```
When a task moves to the "Review" column, the system automatically posts a notification to the `#review-queue` btmsg channel. The `ensure_review_channels()` function creates `#review-queue` and `#review-log` channels idempotently on first use.
### Optimistic Locking
To prevent concurrent updates from corrupting task state, bttask uses optimistic locking via a `version` column:
1. Client reads task with current version (e.g., version=3)
2. Client sends update with expected version=3
3. Server's UPDATE query includes `WHERE version = 3`
4. If another client updated first (version=4), the WHERE clause matches 0 rows
5. Server returns a conflict error, client must re-read and retry
This is critical because multiple agents may try to update the same task simultaneously.
### Role-Based Permissions
| Role | List | Create | Update Status | Delete | Comments |
|------|------|--------|---------------|--------|----------|
| Manager | Yes | Yes | Yes | Yes | Yes |
| Reviewer | Yes | No | Yes (review decisions) | No | Yes |
| Architect | Yes | No | No | No | Yes |
| Tester | Yes | No | No | No | Yes |
| Project (Tier 2) | Yes | No | No | No | Yes |
Permissions are enforced in the CLI tool based on the agent's role (read from `BTMSG_AGENT_ID` → agents table lookup).
### Review Queue Integration
The Reviewer agent gets special treatment in the attention scoring system:
- `reviewQueueDepth` is an input to attention scoring: 10 points per review task, capped at 50
- Priority: between file_conflict (70) and context_high (40)
- ProjectBox polls `review_queue_count` every 10 seconds for reviewer agents
- Results feed into `setReviewQueueDepth()` in the health store
### Frontend (TaskBoardTab.svelte)
The kanban board renders 5 columns (Backlog, In Progress, Review, Done, Rejected) with task cards. Features:
- 5-second polling for updates
- Click to expand task details + comments
- Manager-only create/delete buttons
- Color-coded status badges
---
## Wake Scheduler
The wake scheduler automatically re-activates idle Manager agents when attention-worthy events occur. It runs in `wake-scheduler.svelte.ts` and supports three user-selectable strategies.
### Strategies
| Strategy | Behavior | Use Case |
|----------|----------|----------|
| **Persistent** | Sends a resume prompt to the existing session | Long-running managers that should maintain context |
| **On-demand** | Starts a fresh session | Managers that work in bursts |
| **Smart** | On-demand, but only when wake score exceeds threshold | Avoids waking for minor events |
Strategy and threshold are configurable per group agent via `GroupAgentConfig.wakeStrategy` and `GroupAgentConfig.wakeThreshold` fields, persisted in `groups.json`.
### Wake Signals
The wake scorer evaluates 6 signals (defined in `types/wake.ts`, scored by `utils/wake-scorer.ts`):
| Signal | Weight | Trigger |
|--------|--------|---------|
| AttentionSpike | 1.0 | Any project's attention score exceeds threshold |
| ContextPressureCluster | 0.9 | Multiple projects have >75% context usage |
| BurnRateAnomaly | 0.8 | Cost rate deviates significantly from baseline |
| TaskQueuePressure | 0.7 | Task backlog grows beyond threshold |
| ReviewBacklog | 0.6 | Review queue has pending items |
| PeriodicFloor | 0.1 | Minimum periodic check (floor signal) |
The pure scoring function in `wake-scorer.ts` is tested with 24 unit tests. The types are in `types/wake.ts` (WakeStrategy, WakeSignal, WakeEvaluation, WakeContext).
### Lifecycle
1. ProjectBox registers manager agents via `$effect` on mount
2. Wake scheduler creates per-manager timers
3. Every 5 seconds, AgentSession polls wake events
4. If score exceeds threshold (for smart strategy), triggers wake
5. On group switch, `clearWakeScheduler()` cancels all timers
6. In test mode (`BTERMINAL_TEST=1`), wake scheduler is disabled via `disableWakeScheduler()`
---
## Health Monitoring & Attention Scoring
The health store (`health.svelte.ts`) tracks per-project health with a 5-second tick timer. It provides the data that feeds the StatusBar, wake scheduler, and attention queue.
### Activity States
| State | Meaning | Visual |
|-------|---------|--------|
| Inactive | No agent running, no recent activity | Dim dot |
| Running | Agent actively processing | Green pulse |
| Idle | Agent finished, waiting for input | Gray dot |
| Stalled | Agent hasn't produced output for >N minutes | Orange pulse |
The stall threshold is configurable per-project via `stallThresholdMin` in ProjectConfig (default 15 min, range 5-60, step 5).
### Attention Scoring
Each project gets an attention score (0-100) based on its current state. The attention queue in the StatusBar shows the top 5 projects sorted by urgency:
| Condition | Score | Priority |
|-----------|-------|----------|
| Stalled agent | 100 | Highest — agent may be stuck |
| Error state | 90 | Agent crashed or API error |
| Context >90% | 80 | Context window nearly full |
| File conflict | 70 | Two agents wrote same file |
| Review queue depth | 10/task, cap 50 | Reviewer has pending reviews |
| Context >75% | 40 | Context pressure building |
The pure scoring function is in `utils/attention-scorer.ts` (14 tests). It takes `AttentionInput` and returns a numeric score.
### Burn Rate
Cost tracking uses a 5-minute exponential moving average (EMA) of cost snapshots. The StatusBar displays aggregate $/hr across all running agents.
### File Conflict Detection
The conflicts store (`conflicts.svelte.ts`) detects two types of conflicts:
1. **Agent overlap** — Two agents in the same worktree write the same file (tracked via tool_call analysis in the dispatcher)
2. **External writes** — A file watched by an agent is modified externally (detected via inotify in `fs_watcher.rs`, uses 2s timing heuristic `AGENT_WRITE_GRACE_MS` to distinguish agent writes from external)
Both types show badges in ProjectHeader (orange ⚡ for external, red ⚠ for agent overlap).
---
## Session Anchors
Session anchors preserve important conversation turns through Claude's context compaction process. Without anchors, valuable early context (architecture decisions, debugging breakthroughs) can be lost when the context window fills up.
### Anchor Types
| Type | Created By | Behavior |
|------|-----------|----------|
| **Auto** | System (on first compaction) | Captures first 3 turns, observation-masked (reasoning preserved, tool outputs compacted) |
| **Pinned** | User (pin button in AgentPane) | Marks specific turns as important |
| **Promoted** | User (from pinned) | Re-injectable into future sessions via system prompt |
### Anchor Budget
The budget controls how many tokens are spent on anchor re-injection:
| Scale | Token Budget | Use Case |
|-------|-------------|----------|
| Small | 2,000 | Quick sessions, minimal context needed |
| Medium | 6,000 | Default, covers most scenarios |
| Large | 12,000 | Complex debugging sessions |
| Full | 20,000 | Maximum context preservation |
Configurable per-project via slider in SettingsTab, stored as `ProjectConfig.anchorBudgetScale` in `groups.json`.
### Re-injection Flow
When a session resumes with promoted anchors:
1. `anchors.svelte.ts` loads promoted anchors for the project
2. `anchor-serializer.ts` serializes them (turn grouping, observation masking, token estimation)
3. `AgentPane.startQuery()` includes serialized anchors in the `system_prompt` field
4. The sidecar passes the system prompt to the SDK
5. Claude receives the anchors as context alongside the new prompt
### Storage
Anchors are persisted in the `session_anchors` table in `sessions.db`. The ContextTab shows an anchor section with a budget meter (derived from the configured scale) and promote/demote buttons.

330
docs/phases.md Normal file
View file

@ -0,0 +1,330 @@
# BTerminal v2 — Implementation Phases
See [architecture.md](architecture.md) for system architecture and [decisions.md](decisions.md) for design decisions.
---
## Phase 1: Project Scaffolding [status: complete] — MVP
- [x] Create feature branch `v2-mission-control`
- [x] Initialize Tauri 2.x project with Svelte 5 frontend
- [x] Project structure (see below)
- [x] Basic Tauri window with Catppuccin Mocha CSS variables
- [x] Verify Tauri builds and launches on target system
- [x] Set up dev scripts (dev, build, lint)
### File Structure
```
bterminal-v2/
src-tauri/
src/
main.rs # Tauri app entry
pty.rs # PTY management (portable-pty, not plugin)
sidecar.rs # Sidecar lifecycle (unified .mjs bundle, Deno-first + Node.js fallback)
watcher.rs # File watcher for markdown viewer
session.rs # Session + SSH session persistence (SQLite via rusqlite)
ctx.rs # Read-only ctx context DB access
Cargo.toml
src/
App.svelte # Root layout + detached pane mode
lib/
components/
Layout/
TilingGrid.svelte # Dynamic tiling manager
PaneContainer.svelte # Individual pane wrapper
Terminal/
TerminalPane.svelte # xterm.js terminal pane (theme-aware)
Agent/
AgentPane.svelte # SDK agent structured output
AgentTree.svelte # Subagent tree visualization (SVG)
Markdown/
MarkdownPane.svelte # Live markdown file viewer (shiki highlighting)
Context/
ContextPane.svelte # ctx database viewer (projects, entries, search)
SSH/
SshDialog.svelte # SSH session create/edit modal
SshSessionList.svelte # SSH session list in sidebar
Sidebar/
SessionList.svelte # Session browser + SSH list
StatusBar/
StatusBar.svelte # Global status bar (pane counts, cost)
Notifications/
ToastContainer.svelte # Toast notification display
Settings/
SettingsDialog.svelte # Settings modal (shell, cwd, max panes, theme)
stores/
sessions.svelte.ts # Session state ($state runes)
agents.svelte.ts # Active agent tracking
layout.svelte.ts # Pane layout state
notifications.svelte.ts # Toast notification state
theme.svelte.ts # Catppuccin theme flavor state
adapters/
sdk-messages.ts # SDK message abstraction layer
pty-bridge.ts # PTY IPC wrapper
agent-bridge.ts # Agent IPC wrapper (local + remote routing)
claude-bridge.ts # Claude profiles + skills IPC wrapper
settings-bridge.ts # Settings IPC wrapper
ctx-bridge.ts # ctx database IPC wrapper
ssh-bridge.ts # SSH session IPC wrapper
remote-bridge.ts # Remote machine management IPC wrapper
session-bridge.ts # Session/layout persistence IPC wrapper
utils/
agent-tree.ts # Agent tree builder (hierarchy from messages)
highlight.ts # Shiki syntax highlighter (lazy singleton)
detach.ts # Detached pane mode (pop-out windows)
updater.ts # Tauri auto-updater utility
styles/
catppuccin.css # Theme CSS variables (Mocha defaults)
themes.ts # All 4 Catppuccin flavor definitions
app.css
sidecar/
agent-runner.ts # Sidecar source (compiled to .mjs by esbuild)
dist/
agent-runner.mjs # Bundled sidecar (runs on both Deno and Node.js)
package.json # Agent SDK dependency
package.json
svelte.config.js
vite.config.ts
tauri.conf.json
```
**Key change from v1:** Using portable-pty directly from Rust instead of tauri-plugin-pty (38-star community plugin). portable-pty is well-maintained (used by WezTerm). More work upfront, more reliable long-term.
---
## Phase 2: Terminal Pane + Layout [status: complete] — MVP
### Layout (responsive)
**32:9 (5120px) — full density:**
```
+--------+------------------------------------+--------+
|Sidebar | 2-4 panes, CSS Grid, resizable | Right |
| 260px | | 380px |
+--------+------------------------------------+--------+
```
**16:9 (1920px) — degraded but functional:**
```
+--------+-------------------------+
|Sidebar | 1-2 panes | (right panel collapses to overlay)
| 240px | |
+--------+-------------------------+
```
- [x] CSS Grid layout with sidebar + main area + optional right panel
- [x] Responsive breakpoints (ultrawide / standard / narrow)
- [x] Pane resize via drag handles (splitter overlays in TilingGrid with mouse drag, min/max 10%/90%)
- [x] Layout presets: 1-col, 2-col, 3-col, 2x2, master+stack
- [ ] Save/restore layout to SQLite (Phase 4)
- [x] Keyboard: Ctrl+1-4 focus pane, Ctrl+N new terminal
### Terminal
- [x] xterm.js with Canvas addon (explicit — no WebGL dependency)
- [x] Catppuccin Mocha theme for xterm.js
- [x] PTY spawn from Rust (portable-pty), stream to frontend via Tauri events
- [x] Terminal resize -> PTY resize (100ms debounce)
- [x] Copy/paste (Ctrl+Shift+C/V) — via attachCustomKeyEventHandler
- [x] SSH session: spawn `ssh` command in PTY (via shell args)
- [x] Local shell: spawn user's $SHELL
- [x] Claude Code CLI: spawn `claude` in PTY (via shell args)
**Milestone: After Phase 2, we have a working multi-pane terminal.** Usable as a daily driver even without agent features.
---
## Phase 3: Agent SDK Integration [status: complete] — MVP
### Backend
- [x] Node.js/Deno sidecar: uses `@anthropic-ai/claude-agent-sdk` query() function (migrated from raw CLI spawning due to piped stdio hang bug #6775)
- [x] Sidecar communication: Rust spawns Node.js, stdio NDJSON
- [x] Sidecar lifecycle: auto-start on app launch, shutdown on exit
- [x] Sidecar lifecycle: detect crash, offer restart in UI (agent_restart command + restart button)
- [x] Tauri commands: agent_query, agent_stop, agent_ready, agent_restart
### Frontend
- [x] SDK message adapter: parses stream-json into 9 typed AgentMessage types (abstraction layer)
- [x] Agent bridge: Tauri IPC adapter (invoke + event listeners)
- [x] Agent dispatcher: singleton routing sidecar events to store, crash detection
- [x] Agent store: session state, message history, cost tracking (Svelte 5 $state)
- [x] Agent pane: renders structured messages
- [x] Text -> plain text (markdown rendering deferred)
- [x] Tool calls -> collapsible cards (tool name + input)
- [x] Tool results -> collapsible cards
- [x] Thinking -> collapsible details
- [x] Init -> model badge
- [x] Cost -> USD/tokens/turns/duration summary
- [x] Errors -> highlighted error card
- [x] Subagent spawn -> auto-creates child agent pane with parent/child navigation (Phase 7)
- [x] Agent status indicator (starting/running/done/error)
- [x] Start/stop agent from UI (prompt form + stop button)
- [x] Auto-scroll with scroll-lock on user scroll-up
- [x] Session resume (follow-up prompt in AgentPane, resume_session_id passed to SDK)
- [x] Keyboard: Ctrl+Shift+N new agent
- [x] Sidebar: agent session button
**Milestone: After Phase 3, we have the core differentiator.** SDK agents run in structured panes alongside raw terminals.
---
## Phase 4: Session Management + Markdown Viewer [status: complete] — MVP
### Sessions
- [x] SQLite persistence for sessions (rusqlite with bundled feature)
- [x] Session types: terminal, agent, markdown (SSH via terminal args)
- [x] Session CRUD: save, delete, update_title, touch (last_used_at)
- [x] Session groups/folders — group_name column, setPaneGroup, grouped sidebar with collapsible headers
- [x] Remember last layout on restart (preset + pane_ids in layout_state table)
- [x] Auto-restore panes on app startup (restoreFromDb in layout store)
### Markdown Viewer
- [x] File watcher (notify crate v6) -> Tauri events -> frontend
- [x] Markdown rendering (marked.js)
- [x] Syntax highlighting (Shiki) — added in Phase 5 (highlight.ts, 13 preloaded languages)
- [x] Open from sidebar (file picker button "M")
- [x] Catppuccin-themed markdown styles (h1-h3, code, pre, tables, blockquotes)
- [x] Live reload on file change
**Milestone: After Phase 4 = MVP ship.** Full session management, structured agent panes, terminal panes, markdown viewer.
---
## Phase 5: Agent Tree + Polish [status: complete] — Post-MVP
- [x] Agent tree visualization (SVG, compact horizontal layout) — AgentTree.svelte + agent-tree.ts utility
- [x] Click tree node -> scroll to message (handleTreeNodeClick in AgentPane, scrollIntoView smooth)
- [x] Aggregate cost per subtree (subtreeCost displayed in yellow below each tree node label)
- [x] Terminal copy/paste (Ctrl+Shift+C/V via attachCustomKeyEventHandler)
- [x] Terminal theme hot-swap (onThemeChange callback registry in theme.svelte.ts, TerminalPane subscribes)
- [x] Pane drag-resize handles (splitter overlays in TilingGrid with mouse drag)
- [x] Session resume (follow-up prompt, resume_session_id to SDK)
- [x] Global status bar (terminal/agent counts, active agents pulse, token/cost totals) — StatusBar.svelte
- [x] Notification system (toast: success/error/warning/info, auto-dismiss 4s, max 5) — notifications.svelte.ts + ToastContainer.svelte
- [x] Agent dispatcher toast integration (agent complete, error, sidecar crash notifications)
- [x] Global keyboard shortcuts — Ctrl+W close focused pane, Ctrl+, open settings
- [x] Settings dialog (default shell, cwd, max panes, theme flavor) — SettingsDialog.svelte + settings-bridge.ts
- [x] Settings backend — settings table in SQLite (session.rs), Tauri commands settings_get/set/list (lib.rs)
- [x] ctx integration — read-only access to ~/.claude-context/context.db (ctx.rs, ctx-bridge.ts, ContextPane.svelte)
- [x] SSH session management — CRUD in SQLite (SshSession struct, SshDialog.svelte, SshSessionList.svelte, ssh-bridge.ts)
- [x] Catppuccin theme flavors — Latte/Frappe/Macchiato/Mocha selectable (themes.ts, theme.svelte.ts)
- [x] Detached pane mode — pop-out terminal/agent into standalone windows (detach.ts, App.svelte)
- [x] Syntax highlighting — Shiki integration for markdown + agent messages (highlight.ts, shiki dep)
---
## Phase 6: Packaging + Distribution [status: complete] — Post-MVP
- [x] install-v2.sh — build-from-source installer with dependency checks (Node.js 20+, Rust 1.77+, system libs)
- Checks: WebKit2GTK, GTK3, GLib, libayatana-appindicator, librsvg, openssl, build-essential, pkg-config, curl, wget, FUSE
- Prompts to install missing packages via apt
- Builds with `npx tauri build`, installs binary as `bterminal-v2` in `~/.local/bin/`
- Creates desktop entry and installs SVG icon
- [x] Tauri bundle configuration — targets: `["deb", "appimage"]`, category: DeveloperTool
- .deb depends: libwebkit2gtk-4.1-0, libgtk-3-0, libayatana-appindicator3-1
- AppImage: bundleMediaFramework disabled
- [x] Icons regenerated from bterminal.svg — RGBA PNGs (32x32, 128x128, 128x128@2x, 512x512, .ico)
- [x] GitHub Actions release workflow (`.github/workflows/release.yml`)
- Triggered on `v*` tags, Ubuntu 22.04 runner
- Caches Rust and npm dependencies
- Builds .deb + AppImage, uploads as GitHub Release artifacts
- [x] Build verified: .deb (4.3 MB), AppImage (103 MB)
- [x] Auto-updater plugin integrated (tauri-plugin-updater Rust + @tauri-apps/plugin-updater npm + updater.ts)
- [x] Auto-update latest.json generation in CI (version, platform URL, signature from .sig file)
- [x] release.yml: TAURI_SIGNING_PRIVATE_KEY env vars passed to build step
- [x] Auto-update signing key generated, pubkey set in tauri.conf.json
- [x] TAURI_SIGNING_PRIVATE_KEY secret set in GitHub repo settings via `gh secret set`
---
## Phase 7: Agent Teams / Subagent Support [status: complete] — Post-MVP
- [x] Agent store parent/child hierarchy — parentSessionId, parentToolUseId, childSessionIds fields on AgentSession
- [x] Agent store functions — findChildByToolUseId(), getChildSessions(), parent-aware createAgentSession()
- [x] Agent dispatcher subagent detection — SUBAGENT_TOOL_NAMES Set ('Agent', 'Task', 'dispatch_agent')
- [x] Agent dispatcher message routing — parentId-bearing messages routed to child panes via toolUseToChildPane Map
- [x] Agent dispatcher pane spawning — spawnSubagentPane() creates child session + layout pane, auto-grouped under parent
- [x] AgentPane parent navigation — SUB badge + button to focus parent agent
- [x] AgentPane children bar — clickable chips per child subagent with status colors (running/done/error)
- [x] SessionList subagent icon — '↳' for subagent panes
- [x] Subagent cost aggregation — getTotalCost() recursive helper in agents.svelte.ts, total cost shown in parent pane done-bar
- [x] Dispatcher tests for subagent routing — 10 tests covering spawn, dedup, child message routing, init/cost forwarding, fallbacks (28 total dispatcher tests)
- [ ] Test with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1
### System Requirements
- Node.js 20+ (for Agent SDK sidecar)
- Rust 1.77+ (for building from source)
- WebKit2GTK 4.1+ (Tauri runtime)
- Linux x86_64 (primary target)
---
## Multi-Machine Support (Phases A-D) [status: complete]
Architecture designed in [multi-machine.md](multi-machine.md). Implementation extends BTerminal to manage agents and terminals on remote machines over WebSocket.
### Phase A: Extract `bterminal-core` crate [status: complete]
- [x] Created Cargo workspace at level (Cargo.toml with members)
- [x] Extracted PtyManager and SidecarManager into shared `bterminal-core` crate
- [x] Created EventSink trait to abstract Tauri event emission (bterminal-core/src/event.rs)
- [x] TauriEventSink in src-tauri/src/event_sink.rs implements EventSink
- [x] src-tauri pty.rs and sidecar.rs now thin re-exports from bterminal-core
### Phase B: Build `bterminal-relay` binary [status: complete]
- [x] WebSocket server using tokio-tungstenite with token auth
- [x] CLI flags: --port, --token, --insecure (clap)
- [x] Routes RelayCommand to PtyManager/SidecarManager, forwards RelayEvent over WebSocket
- [x] Rate limiting on auth failures (10 attempts, 5min lockout)
- [x] Per-connection isolated PTY + sidecar managers
- [x] Command response propagation: structured responses (pty_created, pong, error) via shared event channel
- [x] send_error() helper for consistent error reporting with commandId correlation
- [x] PTY creation confirmation: pty_created event with session ID and commandId
### Phase C: Add `RemoteManager` to controller [status: complete]
- [x] New remote.rs module in src-tauri — WebSocket client connections to relay instances
- [x] Machine lifecycle: add/remove/connect/disconnect
- [x] 12 new Tauri commands for remote operations
- [x] Heartbeat ping every 15s
- [x] PTY creation event: emits remote-pty-created Tauri event with machineId, ptyId, commandId
- [x] Exponential backoff reconnection on disconnect (1s/2s/4s/8s/16s/30s cap)
- [x] attempt_tcp_probe() function: TCP-only probe (5s timeout, default port 9750) — avoids allocating per-connection resources on relay during probes
- [x] Reconnection events: remote-machine-reconnecting, remote-machine-reconnect-ready
### Phase D: Frontend integration [status: complete]
- [x] remote-bridge.ts adapter for machine management + remote events
- [x] machines.svelte.ts store for remote machine state
- [x] Layout store: Pane.remoteMachineId field
- [x] agent-bridge.ts and pty-bridge.ts route to remote commands when remoteMachineId is set
- [x] SettingsDialog "Remote Machines" section (add/remove/connect/disconnect)
- [x] Sidebar auto-groups remote panes by machine label
### Remaining Work
- [x] Reconnection logic with exponential backoff — implemented in remote.rs
- [x] Relay command response propagation — implemented in bterminal-relay main.rs
- [ ] Real-world relay testing (2 machines)
- [ ] TLS/certificate pinning
---
## Extras: Claude Profiles & Skill Discovery [status: complete]
### Claude Profile / Account Switching
- [x] Tauri command claude_list_profiles(): reads ~/.config/switcher/profiles/ directories
- [x] Profile metadata from profile.toml (email, subscription_type, display_name)
- [x] Config dir resolution: ~/.config/switcher-claude/{name}/ or fallback ~/.claude/
- [x] Default profile fallback when no switcher profiles exist
- [x] Profile selector dropdown in AgentPane toolbar (shown when >1 profile)
- [x] Selected profile's config_dir passed as claude_config_dir -> CLAUDE_CONFIG_DIR env override
### Skill Discovery & Autocomplete
- [x] Tauri command claude_list_skills(): reads ~/.claude/skills/ (dirs with SKILL.md or .md files)
- [x] Tauri command claude_read_skill(path): reads skill file content
- [x] Frontend adapter: claude-bridge.ts (ClaudeProfile, ClaudeSkill interfaces, listProfiles/listSkills/readSkill)
- [x] Skill autocomplete in AgentPane: `/` prefix triggers menu, arrow keys navigate, Tab/Enter select
- [x] expandSkillPrompt(): reads skill content, injects as prompt with optional user args
### Extended AgentQueryOptions
- [x] Rust struct (bterminal-core/src/sidecar.rs): setting_sources, system_prompt, model, claude_config_dir, additional_directories
- [x] Sidecar JSON passthrough (both agent-runner.ts and agent-runner-deno.ts)
- [x] SDK query() options: settingSources defaults to ['user', 'project'], systemPrompt, model, additionalDirectories
- [x] CLAUDE_CONFIG_DIR env injection for multi-account support
- [x] Frontend AgentQueryOptions interface (agent-bridge.ts) updated with new fields

364
docs/production.md Normal file
View file

@ -0,0 +1,364 @@
# Production Hardening
Agent Orchestrator includes several production-readiness features that ensure reliability, security, and observability. This document covers each subsystem in detail.
---
## Sidecar Supervisor (Crash Recovery)
The `SidecarSupervisor` in `bterminal-core/src/supervisor.rs` automatically restarts crashed sidecar processes.
### Behavior
When the sidecar child process exits unexpectedly:
1. The supervisor detects the exit via process monitoring
2. Waits with exponential backoff before restarting:
- Attempt 1: wait 1 second
- Attempt 2: wait 2 seconds
- Attempt 3: wait 4 seconds
- Attempt 4: wait 8 seconds
- Attempt 5: wait 16 seconds (capped at 30s)
3. After 5 failed attempts, the supervisor gives up and reports `SidecarHealth::Failed`
### Health States
```rust
pub enum SidecarHealth {
Healthy,
Restarting { attempt: u32, next_retry: Duration },
Failed { attempts: u32, last_error: String },
}
```
The frontend can query health state and offer a manual restart button when auto-recovery fails. 17 unit tests cover all recovery scenarios including edge cases like rapid successive crashes.
---
## Landlock Sandbox
Landlock is a Linux kernel (6.2+) security module that restricts filesystem access for processes. Agent Orchestrator uses it to sandbox sidecar processes, limiting what files they can read and write.
### Configuration
```rust
pub struct SandboxConfig {
pub read_write_paths: Vec<PathBuf>, // Full access (project dir, temp)
pub read_only_paths: Vec<PathBuf>, // Read-only (system libs, SDK)
}
```
The sandbox is applied via `pre_exec()` on the child process command, before the sidecar starts executing.
### Path Rules
| Path | Access | Reason |
|------|--------|--------|
| Project CWD | Read/Write | Agent needs to read and modify project files |
| `/tmp` | Read/Write | Temporary files during operation |
| `~/.local/share/bterminal/` | Read/Write | SQLite databases (btmsg, sessions) |
| System library paths | Read-only | Node.js/Deno runtime dependencies |
| `~/.claude/` or config dir | Read-only | Claude configuration and credentials |
### Graceful Fallback
If the kernel doesn't support Landlock (< 6.2) or the kernel module isn't loaded, the sandbox silently degrades the sidecar runs without filesystem restrictions. This is logged as a warning but doesn't prevent operation.
---
## FTS5 Full-Text Search
The search system uses SQLite's FTS5 extension for full-text search across three data types. Accessed via a Spotlight-style overlay (Ctrl+Shift+F).
### Architecture
```
SearchOverlay.svelte (Ctrl+Shift+F)
└── search-bridge.ts → Tauri commands
└── search.rs → SearchDb (separate FTS5 tables)
├── search_messages — agent session messages
├── search_tasks — bttask task content
└── search_btmsg — btmsg inter-agent messages
```
### Virtual Tables
The `SearchDb` struct in `search.rs` manages three FTS5 virtual tables:
| Table | Source | Indexed Columns |
|-------|--------|----------------|
| `search_messages` | Agent session messages | content, session_id, project_id |
| `search_tasks` | bttask tasks | title, description, assignee, status |
| `search_btmsg` | btmsg messages | content, sender, recipient, channel |
### Operations
| Tauri Command | Purpose |
|---------------|---------|
| `search_init` | Creates FTS5 virtual tables if not exist |
| `search_all` | Queries all 3 tables, returns ranked results |
| `search_rebuild` | Drops and rebuilds all indices (maintenance) |
| `search_index_message` | Indexes a single new message (real-time) |
### Frontend (SearchOverlay.svelte)
- Triggered by Ctrl+Shift+F
- Spotlight-style floating overlay centered on screen
- 300ms debounce on input to avoid excessive queries
- Results grouped by type (Messages, Tasks, Communications)
- Click result to navigate to source (focus project, switch tab)
---
## Plugin System
The plugin system allows extending Agent Orchestrator with custom commands and event handlers. Plugins are sandboxed JavaScript executing in a restricted environment.
### Plugin Discovery
Plugins live in `~/.config/bterminal/plugins/`. Each plugin is a directory containing a `plugin.json` manifest:
```json
{
"name": "my-plugin",
"version": "1.0.0",
"description": "A custom plugin",
"main": "index.js",
"permissions": ["notifications", "settings"]
}
```
The Rust `plugins.rs` module scans for `plugin.json` files with path-traversal protection (rejects `..` in paths).
### Sandboxed Runtime (plugin-host.ts)
Plugins execute via `new Function()` in a restricted scope:
**Shadowed globals (13):**
`fetch`, `XMLHttpRequest`, `WebSocket`, `Worker`, `eval`, `Function`, `importScripts`, `require`, `process`, `globalThis`, `window`, `document`, `localStorage`
**Provided API (permission-gated):**
| API | Permission | Purpose |
|-----|-----------|---------|
| `bt.notify(msg)` | `notifications` | Show toast notification |
| `bt.getSetting(key)` | `settings` | Read app setting |
| `bt.setSetting(key, val)` | `settings` | Write app setting |
| `bt.registerCommand(name, fn)` | — (always allowed) | Add command to palette |
| `bt.on(event, fn)` | — (always allowed) | Subscribe to app events |
The API object is frozen (`Object.freeze`) to prevent tampering. Strict mode is enforced.
### Plugin Store (`plugins.svelte.ts`)
The store manages plugin lifecycle:
- `loadAllPlugins()` — discover, validate permissions, execute in sandbox
- `unloadAllPlugins()` — cleanup event listeners, remove commands
- Command registry integrates with CommandPalette
- Event bus distributes app events to subscribed plugins
### Security Notes
The `new Function()` sandbox is best-effort — it is not a security boundary. A determined attacker could escape it. Landlock provides the actual filesystem restriction. The plugin sandbox primarily prevents accidental damage from buggy plugins.
35 tests cover the plugin system including permission validation, sandbox escape attempts, and lifecycle management.
---
## Secrets Management
Secrets (API keys, tokens) are stored in the system keyring rather than in plaintext files or SQLite.
### Backend (`secrets.rs`)
Uses the `keyring` crate with the `linux-native` feature (libsecret/DBUS):
```rust
pub struct SecretsManager;
impl SecretsManager {
pub fn store(key: &str, value: &str) -> Result<()>;
pub fn get(key: &str) -> Result<Option<String>>;
pub fn delete(key: &str) -> Result<()>;
pub fn list() -> Result<Vec<SecretMetadata>>;
pub fn has_keyring() -> bool;
}
```
Metadata (key names, last modified timestamps) is stored in SQLite settings. The actual secret values never touch disk — they live only in the system keyring (gnome-keyring, KWallet, or equivalent).
### Frontend (`secrets-bridge.ts`)
| Function | Purpose |
|----------|---------|
| `storeSecret(key, value)` | Store a secret in keyring |
| `getSecret(key)` | Retrieve a secret |
| `deleteSecret(key)` | Remove a secret |
| `listSecrets()` | List all secret metadata |
| `hasKeyring()` | Check if system keyring is available |
### No Fallback
If no keyring daemon is available (no DBUS session, no gnome-keyring), secret operations fail with a clear error message. There is no plaintext fallback — this is intentional to prevent accidental credential leakage.
---
## Notifications
Agent Orchestrator has two notification systems: in-app toasts and OS-level desktop notifications.
### In-App Toasts (`notifications.svelte.ts`)
- 6 notification types: `success`, `error`, `warning`, `info`, `agent_complete`, `agent_error`
- Maximum 5 visible toasts, 4-second auto-dismiss
- Toast history (up to 100 entries) with unread badge in NotificationCenter
- Agent dispatcher emits toasts on: agent completion, agent error, sidecar crash
### Desktop Notifications (`notifications.rs`)
Uses `notify-rust` crate for native Linux notifications. Graceful fallback if notification daemon is unavailable (e.g., no D-Bus session).
Frontend triggers via `sendDesktopNotification()` in `notifications-bridge.ts`. Used for events that should be visible even when the app is not focused.
### Notification Center (`NotificationCenter.svelte`)
Bell icon in the top-right with unread badge. Dropdown panel shows notification history with timestamps, type icons, and clear/mark-read actions.
---
## Agent Health Monitoring
### Heartbeats
Tier 1 agents send periodic heartbeats via `btmsg heartbeat` CLI command. The heartbeats table tracks last heartbeat timestamp and status per agent.
### Stale Detection
The health store detects stalled agents via the `stallThresholdMin` setting (default 15 minutes). If an agent hasn't produced output within the threshold, its activity state transitions to `Stalled` and the attention score jumps to 100 (highest priority).
### Dead Letter Queue
Messages sent to agents that are offline or have crashed are moved to the dead letter queue in `btmsg.db`. This prevents silent message loss and allows debugging delivery failures.
### Audit Logging
All significant events are logged to the `audit_log` table:
| Event Type | Logged When |
|-----------|-------------|
| `message_sent` | Agent sends a btmsg message |
| `message_read` | Agent reads messages |
| `channel_created` | New btmsg channel created |
| `agent_registered` | Agent registers with btmsg |
| `heartbeat` | Agent sends heartbeat |
| `task_created` | New bttask task |
| `task_status_changed` | Task status update |
| `wake_event` | Wake scheduler triggers |
| `prompt_injection_detected` | Suspicious content in agent messages |
The AuditLogTab component in the workspace UI displays audit entries with filtering by event type and agent, with 5-second auto-refresh and max 200 entries.
---
## Error Classification
The error classifier (`utils/error-classifier.ts`) categorizes API errors into 6 types with appropriate retry behavior:
| Type | Examples | Retry? | User Message |
|------|----------|--------|--------------|
| `rate_limit` | HTTP 429, "rate limit exceeded" | Yes (with backoff) | "Rate limited — retrying in Xs" |
| `auth` | HTTP 401/403, "invalid API key" | No | "Authentication failed — check API key" |
| `quota` | "quota exceeded", "billing" | No | "Usage quota exceeded" |
| `overloaded` | HTTP 529, "overloaded" | Yes (longer backoff) | "Service overloaded — retrying" |
| `network` | ECONNREFUSED, timeout, DNS failure | Yes | "Network error — check connection" |
| `unknown` | Anything else | No | "Unexpected error" |
20 unit tests cover classification accuracy across various error message formats.
---
## WAL Checkpoint
Both SQLite databases (`sessions.db` and `btmsg.db`) use WAL (Write-Ahead Logging) mode for concurrent read/write access. Without periodic checkpoints, the WAL file grows unboundedly.
A background tokio task runs `PRAGMA wal_checkpoint(TRUNCATE)` every 5 minutes on both databases. This moves WAL data into the main database file and resets the WAL.
---
## TLS Relay Support
The `bterminal-relay` binary supports TLS for encrypted WebSocket connections:
```bash
bterminal-relay \
--port 9750 \
--token <secret> \
--tls-cert /path/to/cert.pem \
--tls-key /path/to/key.pem
```
Without `--tls-cert`/`--tls-key`, the relay only accepts connections with the `--insecure` flag (plain WebSocket). In production, TLS is mandatory — the relay rejects `ws://` connections unless `--insecure` is explicitly set.
Certificate pinning (comparing relay certificate fingerprints) is planned for v3.1.
---
## OpenTelemetry Observability
The Rust backend supports optional OTLP trace export via the `BTERMINAL_OTLP_ENDPOINT` environment variable.
### Backend (`telemetry.rs`)
- `TelemetryGuard` initializes tracing + OTLP export pipeline
- Uses `tracing` + `tracing-subscriber` + `opentelemetry` 0.28 + `tracing-opentelemetry` 0.29
- OTLP/HTTP export to configured endpoint
- `Drop`-based shutdown ensures spans are flushed
### Frontend (`telemetry-bridge.ts`)
The frontend cannot use the browser OTEL SDK (WebKit2GTK incompatible). Instead, it routes events through a `frontend_log` Tauri command that pipes into Rust's tracing system:
```typescript
tel.info('agent-started', { sessionId, provider });
tel.warn('context-pressure', { projectId, usage: 0.85 });
tel.error('sidecar-crash', { error: msg });
```
### Docker Stack
A pre-configured Tempo + Grafana stack lives in `docker/tempo/`:
```bash
cd docker/tempo && docker compose up -d
# Grafana at http://localhost:9715
# Set BTERMINAL_OTLP_ENDPOINT=http://localhost:4318 to enable export
```
---
## Session Metrics
Per-project historical session data is stored in the `session_metrics` table:
| Column | Type | Purpose |
|--------|------|---------|
| `project_id` | TEXT | Which project |
| `session_id` | TEXT | Agent session ID |
| `start_time` | INTEGER | Session start timestamp |
| `end_time` | INTEGER | Session end timestamp |
| `peak_tokens` | INTEGER | Maximum context tokens used |
| `turn_count` | INTEGER | Total conversation turns |
| `tool_call_count` | INTEGER | Total tool calls made |
| `cost_usd` | REAL | Total cost in USD |
| `model` | TEXT | Model used |
| `status` | TEXT | Final status (success/error/stopped) |
| `error_message` | TEXT | Error details if failed |
100-row retention per project (oldest pruned on insert). Metrics are persisted on agent completion via the agent dispatcher.
The MetricsPanel component displays this data as:
- **Live view** — fleet aggregates, project health grid, task board summary, attention queue
- **History view** — SVG sparklines for cost/tokens/turns/tools/duration, stats row, session table

273
docs/progress/v2-archive.md Normal file
View file

@ -0,0 +1,273 @@
# v2 Progress Log (Archive: 2026-03-05 to 2026-03-06 early)
> Archived from [v2.md](v2.md). Covers research, Phases 1-6, polish, testing, agent teams, and subagent support.
## Session: 2026-03-05
### Research Phase (complete)
- [x] Analyzed current BTerminal v1 codebase (2092 lines Python, GTK3+VTE)
- [x] Queried Memora — no existing BTerminal memories
- [x] Researched Claude Agent SDK — found structured streaming, subagent tracking, hooks
- [x] Researched Tauri + xterm.js ecosystem — found 4+ working projects
- [x] Researched terminal latency benchmarks — xterm.js acceptable for AI output
- [x] Researched 32:9 ultrawide layout patterns
- [x] Evaluated GTK4 vs Tauri vs pure Rust — Tauri wins for this use case
- [x] Created task_plan.md with 8 phases
- [x] Created findings.md with 7 research areas
### Technology Decision (complete)
- Decision: **Tauri 2.x + Solid.js + Claude Agent SDK + xterm.js**
- Rationale documented in task_plan.md Phase 0
### Adversarial Review (complete)
- [x] Spawned devil's advocate agent to attack the plan
- [x] Identified 5 fatal/critical issues:
1. Node.js sidecar requirement unacknowledged
2. SDK 0.2.x instability — need abstraction layer
3. Three-tier observation overengineered — simplified to two-tier
4. Solid.js ecosystem too small — switched to Svelte 5
5. Missing: packaging, error handling, testing, responsive design
- [x] Revised plan (Rev 2) incorporating all corrections
- [x] Added error handling strategy table
- [x] Added testing strategy table
- [x] Defined MVP boundary (Phases 1-4)
- [x] Added responsive layout requirement (1920px degraded mode)
### Phase 1 Scaffolding (complete)
- [x] Created feature branch `v2-mission-control`
- [x] Initialized Tauri 2.x + Svelte 5 project in `v2/` directory
- [x] Rust backend stubs: main.rs, lib.rs, pty.rs, sidecar.rs, watcher.rs, session.rs
- [x] Svelte frontend: App.svelte with Catppuccin Mocha CSS variables, component stubs
- [x] Node.js sidecar scaffold: agent-runner.ts with NDJSON communication pattern
- [x] Tauri builds and launches (cargo build --release verified)
- [x] Dev scripts: npm run dev, npm run build, npm run tauri dev/build
- [x] 17 operational rules added to `.claude/rules/`
- [x] Project meta files: CLAUDE.md, .claude/CLAUDE.md, TODO.md, CHANGELOG.md
- [x] Documentation structure: docs/README.md, task_plan.md, phases.md, findings.md, progress.md
### Phase 2: Terminal Pane + Layout (complete)
- [x] Rust PTY backend with portable-pty (PtyManager: spawn, write, resize, kill)
- [x] PTY reader thread emitting Tauri events (pty-data-{id}, pty-exit-{id})
- [x] Tauri commands: pty_spawn, pty_write, pty_resize, pty_kill
- [x] xterm.js terminal pane with Canvas addon (explicit, no WebGL)
- [x] Catppuccin Mocha theme for xterm.js (16 ANSI colors)
- [x] FitAddon with ResizeObserver + 100ms debounce
- [x] PTY bridge adapter (spawnPty, writePty, resizePty, killPty, onPtyData, onPtyExit)
- [x] CSS Grid tiling layout with 5 presets (1-col, 2-col, 3-col, 2x2, master-stack)
- [x] Layout store with Svelte 5 $state runes and auto-preset selection
- [x] Sidebar with session list, layout preset selector, new terminal button
- [x] Keyboard shortcuts: Ctrl+N new terminal, Ctrl+1-4 focus pane
- [x] PaneContainer with header bar (title, status, close)
- [x] Empty state welcome screen with Ctrl+N hint
- [x] npm dependencies: @xterm/xterm, @xterm/addon-canvas, @xterm/addon-fit
- [x] Cargo dependencies: portable-pty, uuid
### Phase 3: Agent SDK Integration (complete)
- [x] Rust SidecarManager: spawn Node.js, stdio NDJSON, query/stop/shutdown (sidecar.rs, 218 lines)
- [x] Node.js agent-runner: spawns `claude -p --output-format stream-json`, manages sessions (agent-runner.ts, 176 lines)
- [x] Tauri commands: agent_query, agent_stop, agent_ready in lib.rs
- [x] Sidecar auto-start on app launch
- [x] SDK message adapter: full stream-json parser with 9 typed message types (sdk-messages.ts, 234 lines)
- [x] Agent bridge: Tauri IPC adapter for sidecar communication (agent-bridge.ts, 53 lines)
- [x] Agent dispatcher: routes sidecar events to agent store (agent-dispatcher.ts, 87 lines)
- [x] Agent store: session state with messages, cost tracking (agents.svelte.ts, 91 lines)
- [x] AgentPane component: prompt input, message rendering, stop button, cost display (AgentPane.svelte, 420 lines)
- [x] UI integration: Ctrl+Shift+N for new agent, sidebar agent button, TilingGrid routing
Architecture decision: Initially used `claude` CLI with `--output-format stream-json`. Migrated to `@anthropic-ai/claude-agent-sdk` query() due to CLI piped stdio hang bug (#6775). SDK outputs same message format, so adapter unchanged.
### Bug Fix: Svelte 5 Rune File Extensions (2026-03-06)
- [x] Diagnosed blank screen / "rune_outside_svelte" runtime error
- [x] Root cause: store files used `.ts` extension but contain Svelte 5 `$state`/`$derived` runes, which only work in `.svelte` and `.svelte.ts` files
- [x] Renamed: `layout.ts` -> `layout.svelte.ts`, `agents.ts` -> `agents.svelte.ts`, `sessions.ts` -> `sessions.svelte.ts`
- [x] Updated all import paths in 5 files to use `.svelte` suffix (e.g., `from './stores/layout.svelte'`)
### Phase 3 Polish (2026-03-06)
- [x] Sidecar crash detection: dispatcher listens for sidecar-exited event, marks running sessions as error
- [x] Restart UI: "Restart Sidecar" button in AgentPane error bar, calls agent_restart command
- [x] Auto-scroll lock: scroll handler disables auto-scroll when user scrolls >50px from bottom, "Scroll to bottom" button appears
### Phase 4: Session Management + Markdown Viewer (2026-03-06)
- [x] rusqlite 0.31 (bundled) + dirs 5 + notify 6 added to Cargo.toml
- [x] SessionDb: SQLite with WAL mode, sessions table + layout_state singleton
- [x] Session CRUD: list, save, delete, update_title, touch (7 Tauri commands)
- [x] Frontend session-bridge.ts: typed invoke wrappers for all session/layout commands
- [x] Layout store wired to persistence: addPane/removePane/focusPane/setPreset all persist
- [x] restoreFromDb() on app startup restores panes in layout order
- [x] FileWatcherManager: notify crate watches files, emits Tauri "file-changed" events
- [x] MarkdownPane component: marked.js rendering, Catppuccin-themed styles, live reload
- [x] Sidebar "M" button opens file picker for .md/.markdown/.txt files
- [x] TilingGrid routes markdown pane type to MarkdownPane component
### Phase 5: Agent Tree + Polish (2026-03-06, complete)
- [x] Agent tree visualization (SVG): AgentTree.svelte component with horizontal tree layout, bezier edges, status-colored nodes; agent-tree.ts utility (buildAgentTree, countTreeNodes, subtreeCost)
- [x] Agent tree toggle in AgentPane: collapsible tree view shown when tool_call messages exist
- [x] Global status bar: StatusBar.svelte showing terminal/agent pane counts, active agents with pulse animation, total tokens and cost
- [x] Notification system: notifications.svelte.ts store (notify, dismissNotification, max 5 toasts, 4s auto-dismiss) + ToastContainer.svelte (slide-in animation, color-coded by type)
- [x] Agent dispatcher notifications: toast on agent_stopped (success), agent_error (error), sidecar crash (error), cost result (success with cost/turns)
- [x] Settings dialog: SettingsDialog.svelte modal (default shell, cwd, max panes, theme flavor) with settings-bridge.ts adapter
- [x] Settings backend: settings table (key/value) in session.rs, Tauri commands settings_get/set/list in lib.rs
- [x] Keyboard shortcuts: Ctrl+W close focused pane, Ctrl+, open settings dialog
- [x] CSS grid update: app.css grid-template-rows '1fr' -> '1fr auto' for status bar row
- [x] App.svelte: integrated StatusBar, ToastContainer, SettingsDialog components
### Phase 6: Packaging + Distribution (2026-03-06)
- [x] Created install-v2.sh — build-from-source installer with 6-step dependency check process
- [x] Updated v2/src-tauri/tauri.conf.json: bundle targets ["deb", "appimage"]
- [x] Regenerated all icons in v2/src-tauri/icons/ from bterminal.svg as RGBA PNGs
- [x] Created .github/workflows/release.yml — CI workflow triggered on v* tags
- [x] Build verified: .deb (4.3 MB), AppImage (103 MB) both built successfully
### Phase 5 continued: SSH, ctx, themes, detached mode, auto-updater (2026-03-06)
- [x] ctx integration: Rust ctx.rs, 5 Tauri commands, ctx-bridge.ts adapter, ContextPane.svelte
- [x] SSH session management: SshSession struct, ssh-bridge.ts, SshDialog.svelte, SshSessionList.svelte
- [x] Catppuccin theme flavors: Latte/Frappe/Macchiato/Mocha selectable
- [x] Detached pane mode: pop-out windows via URL params
- [x] Syntax highlighting: Shiki lazy singleton (13 languages)
- [x] Tauri auto-updater plugin integrated
- [x] AgentPane markdown rendering with Shiki highlighting
### Session: 2026-03-06 (continued) — Polish, Testing, Extras
#### Terminal Copy/Paste + Theme Hot-Swap
- [x] Copy/paste in TerminalPane via Ctrl+Shift+C/V
- [x] Theme hot-swap: onThemeChange() callback registry
#### Agent Tree Enhancements
- [x] Click tree node -> scroll to message, subtree cost display
#### Session Resume
- [x] Follow-up prompt input, resume_session_id passed to SDK
#### Pane Drag-Resize Handles
- [x] Splitter overlays in TilingGrid with mouse drag (min 10% / max 90%)
#### Auto-Update Workflow Enhancement
- [x] release.yml: signing key env vars, latest.json generation
#### Deno Sidecar Evaluation
- [x] Created agent-runner-deno.ts proof-of-concept
#### Testing Infrastructure
- [x] Vitest + Cargo tests: 104 vitest + 29 cargo tests, all passing
### Session: 2026-03-06 (continued) — Session Groups, Auto-Update Key, Deno Sidecar, Tests
#### Auto-Update Signing Key
- [x] Generated Tauri signing keypair (minisign), set pubkey in tauri.conf.json
#### Session Groups/Folders
- [x] group_name column, setPaneGroup, grouped sidebar with collapsible headers
#### Deno Sidecar Integration (upgraded from PoC)
- [x] SidecarCommand struct, Deno-first resolution, Node.js fallback
#### E2E/Integration Tests
- [x] layout.test.ts (30), agent-bridge.test.ts (11), agent-dispatcher.test.ts (18), sdk-messages.test.ts (25)
- [x] Total: 104 vitest tests + 29 cargo tests
### Session: 2026-03-06 (continued) — Agent Teams / Subagent Support
#### Agent Teams Frontend Support
- [x] Parent/child hierarchy in agent store, subagent detection in dispatcher
- [x] spawnSubagentPane(), toolUseToChildPane routing, parent/child navigation in AgentPane
- [x] 10 new dispatcher tests for subagent routing (28 total, 114 vitest overall)
#### Subagent Cost Aggregation
- [x] getTotalCost() recursive helper, total cost shown in parent pane
#### TAURI_SIGNING_PRIVATE_KEY
- [x] Set via `gh secret set` on DexterFromLab/BTerminal GitHub repo
### Session: 2026-03-06 (continued) — Multi-Machine Architecture Design
#### Multi-Machine Support Architecture
- [x] Designed full multi-machine architecture in docs/multi-machine.md (303 lines)
- [x] Three-layer model: BTerminal (controller) + bterminal-relay (remote binary) + unified frontend
- [x] WebSocket NDJSON protocol: RelayCommand/RelayEvent envelope wrapping existing sidecar format
- [x] Authentication: pre-shared token + TLS, rate limiting, lockout
- [x] Autonomous relay model: agents keep running when controller disconnects
- [x] Reconnection with exponential backoff (1s-30s), state_sync on reconnect
- [x] 4-phase implementation plan: A (extract bterminal-core crate), B (relay binary), C (RemoteManager), D (frontend)
- [x] Updated TODO.md and docs/task_plan.md to reference the design
### Session: 2026-03-06 (continued) — Multi-Machine Implementation (Phases A-D)
#### Phase A: bterminal-core crate extraction
- [x] Created Cargo workspace at v2/ level (v2/Cargo.toml, workspace members: src-tauri, bterminal-core, bterminal-relay)
- [x] Extracted PtyManager into v2/bterminal-core/src/pty.rs
- [x] Extracted SidecarManager into v2/bterminal-core/src/sidecar.rs
- [x] Created EventSink trait (v2/bterminal-core/src/event.rs) to abstract event emission
- [x] TauriEventSink (v2/src-tauri/src/event_sink.rs) implements EventSink for Tauri AppHandle
- [x] src-tauri/src/pty.rs and sidecar.rs now thin re-export wrappers
- [x] Cargo.lock moved from src-tauri/ to workspace root (v2/)
#### Phase B: bterminal-relay binary
- [x] New Rust binary at v2/bterminal-relay/ with WebSocket server (tokio-tungstenite)
- [x] Token auth via Authorization: Bearer header on WebSocket upgrade
- [x] CLI flags: --port (default 9750), --token (required), --insecure (allow ws://)
- [x] Routes RelayCommand types (pty_create/write/resize/close, agent_query/stop, sidecar_restart, ping)
- [x] Forwards RelayEvent types (pty_data/exit, sidecar_message/exited, error, pong, ready)
- [x] Rate limiting: 10 failed auth attempts triggers 5-minute lockout
- [x] Per-connection isolated PtyManager + SidecarManager instances
#### Phase C: RemoteManager in controller
- [x] New v2/src-tauri/src/remote.rs module — RemoteManager struct
- [x] WebSocket client connections to relay instances (tokio-tungstenite)
- [x] RemoteMachine struct: id, label, url, token, status (Connected/Connecting/Disconnected/Error)
- [x] Machine lifecycle: add_machine, remove_machine, connect, disconnect
- [x] 12 new Tauri commands: remote_add_machine, remote_remove_machine, remote_connect, remote_disconnect, remote_list_machines, remote_pty_spawn/write/resize/kill, remote_agent_query/stop, remote_sidecar_restart
- [x] Heartbeat ping every 15s to detect stale connections
#### Phase D: Frontend integration
- [x] v2/src/lib/adapters/remote-bridge.ts — IPC adapter for machine management + remote events
- [x] v2/src/lib/stores/machines.svelte.ts — Svelte 5 store for remote machine state
- [x] Layout store: added remoteMachineId?: string to Pane interface
- [x] agent-bridge.ts: routes to remote_agent_query/stop when pane has remoteMachineId
- [x] pty-bridge.ts: routes to remote_pty_spawn/write/resize/kill when pane has remoteMachineId
- [x] SettingsDialog: new "Remote Machines" section (add/remove/connect/disconnect UI)
- [x] SessionList sidebar: auto-groups remote panes by machine label
#### Verification
- cargo check --workspace: clean (0 errors)
- vitest: 114/114 tests passing
- svelte-check: clean (0 errors)
#### New dependencies added
- bterminal-core: serde, serde_json, log, portable-pty, uuid (extracted from src-tauri)
- bterminal-relay: tokio, tokio-tungstenite, clap, env_logger, futures-util
- src-tauri: tokio-tungstenite, tokio, futures-util, uuid (added for RemoteManager)
### Session: 2026-03-06 (continued) — Relay Hardening & Reconnection
#### Relay Command Response Propagation
- [x] Shared event channel between EventSink and command response sender (sink_tx clone in bterminal-relay)
- [x] send_error() helper function: all command failures now emit RelayEvent with commandId + error message instead of just logging
- [x] ping command: now sends pong response via event channel (was a no-op)
- [x] pty_create: returns pty_created event with session ID and commandId for correlation
- [x] All error paths (pty_write, pty_resize, pty_close, agent_query, agent_stop, sidecar_restart) use send_error()
#### RemoteManager Reconnection
- [x] Exponential backoff reconnection in remote.rs: spawns async tokio task on disconnect
- [x] Backoff schedule: 1s, 2s, 4s, 8s, 16s, 30s (capped)
- [x] attempt_tcp_probe() function: TCP-only connect probe (5s timeout, default port 9750) — avoids allocating per-connection resources on relay
- [x] Emits remote-machine-reconnecting (with backoffSecs) and remote-machine-reconnect-ready Tauri events
- [x] Cancellation: stops if machine removed (not in HashMap) or manually reconnected (status != disconnected)
- [x] Fixed scoping: disconnection cleanup uses inner block to release mutex before emitting event
#### RemoteManager PTY Creation Confirmation
- [x] Handles pty_created event type from relay: emits remote-pty-created Tauri event with machineId, ptyId, commandId
### Session: 2026-03-06 (continued) — Reconnection Hardening
#### TCP Probe Refactor
- [x] Replaced attempt_ws_connect() with attempt_tcp_probe() in remote.rs: TCP-only connect (no WS upgrade), 5s timeout, default port 9750
- [x] Avoids allocating per-connection resources (PtyManager, SidecarManager) on the relay during reconnection probes
- [x] Probe no longer needs auth token — only checks TCP reachability
#### Frontend Reconnection Listeners
- [x] Added onRemoteMachineReconnecting() listener in remote-bridge.ts: receives machineId + backoffSecs
- [x] Added onRemoteMachineReconnectReady() listener in remote-bridge.ts: receives machineId when probe succeeds
- [x] machines.svelte.ts: reconnecting handler sets machine status to 'reconnecting', shows toast with backoff duration
- [x] machines.svelte.ts: reconnect-ready handler auto-calls connectMachine() to re-establish full WebSocket connection
- [x] Updated docs/multi-machine.md to reflect TCP probe and frontend listener changes

249
docs/progress/v2.md Normal file
View file

@ -0,0 +1,249 @@
# v2 Progress Log
> Earlier sessions (2026-03-05 to 2026-03-06 multi-machine): see [v2-archive.md](v2-archive.md)
### Session: 2026-03-09 — AgentPane + MarkdownPane UI Redesign
#### Tribunal-Elected Design (S-3-R4, 88% confidence)
- [x] AgentPane full rewrite: sans-serif root font, tool call/result pairing via `$derived.by` toolResultMap, hook message collapsing, context window meter, cost bar minimized, session summary styling
- [x] Two-phase scroll anchoring (`$effect.pre` + `$effect`)
- [x] Tool-aware output truncation (Bash 500, Read/Write 50, Glob/Grep 20, default 30 lines)
- [x] Colors softened via `color-mix(in srgb, var(--ctp-*) 65%, var(--ctp-surface1) 35%)`
- [x] MarkdownPane: container query wrapper, shared responsive padding variable
- [x] catppuccin.css: `--bterminal-pane-padding-inline: clamp(0.75rem, 3.5cqi, 2rem)`
- [x] 139/139 vitest passing, 0 new TypeScript errors
- [ ] Visual verification in dev mode (pending)
### Session: 2026-03-06 (continued) — Sidecar Env Var Bug Fix
#### CLAUDE* Environment Variable Leak (critical fix)
- [x] Diagnosed silent hang in agent sessions when BTerminal launched from Claude Code terminal
- [x] Root cause: Claude Code sets ~8 CLAUDE* env vars for nesting/sandbox detection
- [x] Fixed both sidecar runners to filter out all keys starting with 'CLAUDE'
### Session: 2026-03-06 (continued) — Sidecar SDK Migration
#### Migration from CLI Spawning to Agent SDK
- [x] Diagnosed root cause: claude CLI v2.1.69 hangs with piped stdio (bug #6775)
- [x] Migrated both runners to @anthropic-ai/claude-agent-sdk query() function
- [x] Added build:sidecar script (esbuild bundle, SDK included)
- [x] SDK options: permissionMode configurable, allowDangerouslySkipPermissions conditional
#### Bug Found and Fixed
- [x] AgentPane onDestroy killed running sessions on layout remounts (fixed: moved to TilingGrid onClose)
### Session: 2026-03-06 (continued) — Permission Mode, AgentPane Bug Fix, SDK Bundling
#### Permission Mode Passthrough
- [x] permission_mode field flows Rust -> sidecar -> SDK, defaults to 'bypassPermissions'
#### AgentPane onDestroy Bug Fix
- [x] Stop-on-close moved from AgentPane onDestroy to TilingGrid onClose handler
#### SDK Bundling Fix
- [x] SDK bundled into agent-runner.mjs (no external dependency at runtime)
### Session: 2026-03-07 — Unified Sidecar Bundle
#### Sidecar Resolution Simplification
- [x] Consolidated to single pre-built bundle (dist/agent-runner.mjs) running on both Deno and Node.js
- [x] resolve_sidecar_command() checks runtime availability upfront, prefers Deno
- [x] agent-runner-deno.ts retained in repo but not used by SidecarManager
### Session: 2026-03-07 (continued) — Rust-Side CLAUDE* Env Var Stripping
#### Dual-Layer Env Var Stripping
- [x] Rust SidecarManager uses env_clear() + envs(clean_env) before spawn (primary defense)
- [x] JS-side stripping retained as defense-in-depth
### Session: 2026-03-07 (continued) — Claude CLI Path Detection
#### pathToClaudeCodeExecutable for SDK
- [x] Added findClaudeCli() to agent-runner.ts (Node.js): checks ~/.local/bin/claude, ~/.claude/local/claude, /usr/local/bin/claude, /usr/bin/claude, then falls back to `which claude`/`where claude`
- [x] Added findClaudeCli() to agent-runner-deno.ts (Deno): same candidate paths, uses Deno.statSync() + Deno.Command("which")
- [x] Both runners now pass pathToClaudeCodeExecutable to SDK query() options
- [x] Early error: if Claude CLI not found, agent_error emitted immediately instead of cryptic SDK failure
- [x] CLI path resolved once at sidecar startup, logged for debugging
### Session: 2026-03-07 (continued) — Claude Profiles & Skill Discovery
#### Claude Profile / Account Switching (switcher-claude integration)
- [x] New Tauri commands: claude_list_profiles(), claude_list_skills(), claude_read_skill(), pick_directory()
- [x] claude_list_profiles() reads ~/.config/switcher/profiles/ for profile directories with profile.toml metadata
- [x] Config dir resolution: ~/.config/switcher-claude/{name}/ or fallback ~/.claude/
- [x] extract_toml_value() simple TOML parser for profile metadata (email, subscription_type, display_name)
- [x] Always includes "default" profile if no switcher profiles found
#### Skill Discovery & Autocomplete
- [x] claude_list_skills() reads ~/.claude/skills/ directory (directories with SKILL.md or standalone .md files)
- [x] Description extracted from first non-heading, non-empty line (max 120 chars)
- [x] claude_read_skill(path) reads full skill file content
- [x] New frontend adapter: v2/src/lib/adapters/claude-bridge.ts (ClaudeProfile, ClaudeSkill interfaces)
#### AgentPane Session Toolbar
- [x] Working directory input (cwdInput) — editable text field, replaces fixed cwd prop
- [x] Profile/account selector dropdown (shown when >1 profile available)
- [x] Selected profile's config_dir passed as claude_config_dir in AgentQueryOptions
- [x] Skill autocomplete menu: type `/` to trigger, arrow keys navigate, Tab/Enter select, Escape dismiss
- [x] expandSkillPrompt(): reads skill content via readSkill(), prepends to prompt with optional user args
#### Extended AgentQueryOptions (full stack passthrough)
- [x] New fields in Rust AgentQueryOptions struct: setting_sources, system_prompt, model, claude_config_dir, additional_directories
- [x] Sidecar QueryMessage interface updated with matching fields
- [x] Both sidecar runners (agent-runner.ts, agent-runner-deno.ts) pass new fields to SDK query()
- [x] CLAUDE_CONFIG_DIR injected into cleanEnv when claudeConfigDir provided (multi-account support)
- [x] settingSources defaults to ['user', 'project'] (loads CLAUDE.md and project settings)
- [x] Frontend AgentQueryOptions interface updated in agent-bridge.ts
### Session: 2026-03-07 (continued) — v3 Mission Control Planning
#### v3 Architecture Planning
- [x] Created docs/v3-task_plan.md — core concept, user requirements, architecture questions
- [x] Created docs/v3-findings.md — codebase reuse analysis (what to keep/replace/drop)
- [x] Created docs/v3-progress.md — v3-specific progress log
- [x] Launched 3 adversarial architecture agents (Architect, Devil's Advocate, UX+Perf Specialist)
- [x] Collect adversarial agent findings
- [x] Produce final architecture plan
- [x] Create v3 implementation phases
### Session: 2026-03-07 (continued) — v3 Mission Control MVP Implementation (Phases 1-5)
#### Phase 1: Data Model + Config
- [x] Created v2/src/lib/types/groups.ts (ProjectConfig, GroupConfig, GroupsFile interfaces)
- [x] Created v2/src-tauri/src/groups.rs (Rust structs + load/save groups.json + default_groups())
- [x] Added groups_load, groups_save Tauri commands to lib.rs
- [x] SQLite migrations in session.rs: project_id column, agent_messages table, project_agent_state table
- [x] Created v2/src/lib/adapters/groups-bridge.ts (IPC wrapper)
- [x] Created v2/src/lib/stores/workspace.svelte.ts (replaces layout.svelte.ts for v3, Svelte 5 runes)
- [x] Added --group CLI argument parsing in main.rs
- [x] 24 vitest tests for workspace store + 7 cargo tests for groups
#### Phase 2: Project Box Shell
- [x] Created 12 new Workspace components in v2/src/lib/components/Workspace/
- [x] GlobalTabBar, ProjectGrid, ProjectBox, ProjectHeader, CommandPalette, DocsTab, ContextTab, SettingsTab
- [x] Rewrote App.svelte (no sidebar, no TilingGrid — GlobalTabBar + tab content + StatusBar)
#### Phase 3: Claude Session Integration
- [x] Created ClaudeSession.svelte wrapping AgentPane per-project
#### Phase 4: Terminal Tabs
- [x] Created TerminalTabs.svelte with shell/SSH/agent tab types
#### Phase 5: Team Agents Panel
- [x] Created TeamAgentsPanel.svelte + AgentCard.svelte
#### Bug Fix
- [x] Fixed AgentPane Svelte 5 event modifier: on:click -> onclick
#### Verification
- All 138 vitest + 36 cargo tests pass, vite build succeeds
### Session: 2026-03-07 (continued) — v3 Phases 6-10 Completion
#### Phase 6: Session Continuity
- [x] Added persistSessionForProject() to agent-dispatcher (saves state + messages to SQLite on complete)
- [x] Added registerSessionProject() + sessionProjectMap for session->project persistence routing
- [x] ClaudeSession restoreMessagesFromRecords() restores cached messages on mount
- [x] Added getAgentSession() export to agents store
#### Phase 7: Workspace Teardown
- [x] Added clearAllAgentSessions() to agents store
- [x] switchGroup() calls clearAllAgentSessions() + resets terminal tabs
- [x] Updated workspace.test.ts with clearAllAgentSessions mock
#### Phase 10: Dead Component Removal + Polish
- [x] Deleted 7 dead v2 components (~1,836 lines): TilingGrid, PaneContainer, PaneHeader, SessionList, SshSessionList, SshDialog, SettingsDialog
- [x] Removed empty directories: Layout/, Sidebar/, Settings/, SSH/
- [x] Rewrote StatusBar for workspace store (group name, project count, "BTerminal v3")
- [x] Fixed subagent routing: project-scoped sessions skip layout pane (render in TeamAgentsPanel)
- [x] Updated v3-task_plan.md to mark all 10 phases complete
#### Verification
- All 138 vitest + 36 cargo tests pass, vite build succeeds
### Session: 2026-03-07 (continued) — Multi-Theme System
#### Theme System Generalization (7 Editor Themes)
- [x] Generalized CatppuccinFlavor to ThemeId union type (11 themes)
- [x] Added 7 editor themes: VSCode Dark+, Atom One Dark, Monokai, Dracula, Nord, Solarized Dark, GitHub Dark
- [x] Added ThemePalette, ThemeMeta, THEME_LIST types; deprecated old Catppuccin-only types
- [x] Theme store: getCurrentTheme()/setTheme() with deprecated wrappers for backwards compat
- [x] SettingsTab: optgroup-based theme selector, fixed input overflow with min-width:0
- [x] All themes map to same --ctp-* CSS vars — zero component changes needed
#### Verification
- All 138 vitest + 35 cargo tests pass
### Session: 2026-03-07 (continued) — Deep Dark Theme Group
#### 6 New Deep Dark Themes
- [x] Added Tokyo Night, Gruvbox Dark, Ayu Dark, Poimandres, Vesper, Midnight to themes.ts
- [x] Extended ThemeId from 11 to 17 values, THEME_LIST from 11 to 17 entries
- [x] New "Deep Dark" theme group (3rd group alongside Catppuccin and Editor)
- [x] Midnight is pure OLED black (#000000), Ayu Dark near-black (#0b0e14), Vesper warm dark (#101010)
- [x] All 6 themes map to same 26 --ctp-* CSS vars — zero component changes needed
### Session: 2026-03-07 (continued) — Custom Theme Dropdown
#### SettingsTab Theme Picker Redesign
- [x] Replaced native `<select>` with custom themed dropdown in SettingsTab.svelte
- [x] Trigger: color swatch (base) + label + arrow; menu: grouped sections with styled headers
- [x] Options show color swatch + label + 4 accent dots (red/green/blue/yellow) via getPalette()
- [x] Click-outside and Escape to close; aria-haspopup/aria-expanded for a11y
- [x] Uses --ctp-* CSS vars — fully themed with active theme
### Session: 2026-03-07 (continued) — Global Font Controls
#### SettingsTab Font Controls + Layout Restructure
- [x] Added font family select (9 monospace fonts + Default) with live CSS var preview
- [x] Added font size +/- stepper (8-24px range) with live CSS var preview
- [x] Restructured global settings: 2-column grid layout with labels above controls (replaced inline rows)
- [x] Added --ui-font-family and --ui-font-size CSS custom properties to catppuccin.css
- [x] app.css body rule now uses CSS vars instead of hardcoded font values
- [x] initTheme() in theme.svelte.ts restores saved font settings on startup (try/catch, non-fatal)
- [x] Font settings persisted as 'font_family' and 'font_size' keys in SQLite settings table
### Session: 2026-03-07 (continued) — SettingsTab Global Settings Redesign
#### Font Settings Split (UI Font + Terminal Font)
- [x] Split single font into UI font (sans-serif: System Sans-Serif, Inter, Roboto, etc.) and Terminal font (monospace: JetBrains Mono, Fira Code, etc.)
- [x] Each font dropdown renders preview text in its own typeface
- [x] Independent size steppers (8-24px) for UI and Terminal font
- [x] Setting keys changed: font_family/font_size -> ui_font_family/ui_font_size + term_font_family/term_font_size
#### SettingsTab Layout + CSS Updates
- [x] Rewrote global settings: single-column layout, "Appearance" + "Defaults" subsections
- [x] All dropdowns are custom themed (no native `<select>` anywhere)
- [x] Added --term-font-family and --term-font-size CSS vars to catppuccin.css
- [x] Updated initTheme() to restore 4 font settings instead of 2
### Session: 2026-03-08 — CSS Relative Units Rule
- [x] Created `.claude/rules/18-relative-units.md` — enforces rem/em for layout CSS (px only for icons/borders)
- [x] Converted GlobalTabBar.svelte styles from px to rem (rail width, button size, gap, padding, border-radius)
- [x] Converted App.svelte sidebar header styles from px to rem (padding, close button, border-radius)
- [x] Changed GlobalTabBar rail-btn color from --ctp-overlay1 to --ctp-subtext0
### Session: 2026-03-09 — AgentPane Collapsibles, Aspect Ratio, Desktop Integration
#### AgentPane Collapsible Messages
- [x] Text messages (`msg.type === 'text'`) wrapped in `<details open>` — open by default, collapsible
- [x] Cost summary (`cost.result`) wrapped in `<details>` — collapsed by default, expandable
- [x] CSS: `.msg-text-collapsible` and `.msg-summary-collapsible` with preview text
#### Project Max Aspect Ratio Setting
- [x] New `project_max_aspect` SQLite setting (float, default 1.0, range 0.33.0)
- [x] ProjectGrid: `max-width: calc(100vh * var(--project-max-aspect, 1))` on `.project-slot`
- [x] SettingsTab: stepper UI in Appearance section
- [x] App.svelte: restore on startup via getSetting()
#### Desktop Integration
- [x] install-v2.sh: added `StartupWMClass=bterminal` to .desktop template
- [x] GNOME auto-move extension compatible
#### No-Implicit-Push Rule
- [x] Created `.claude/rules/52-no-implicit-push.md` — never push unless explicitly asked
### Next Steps
- [ ] Real-world relay testing (2 machines)
- [ ] TLS/certificate pinning for relay connections
- [ ] Test agent teams with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1

1076
docs/progress/v3.md Normal file

File diff suppressed because it is too large Load diff

View file

@ -0,0 +1,62 @@
# Agent Provider Adapter — Findings
## Architecture Exploration (2026-03-11)
### Claude-Specific Coupling Severity Map
Full codebase exploration of 13+ files revealed coupling at 4 severity levels:
#### CRITICAL (hardcoded SDK, must abstract)
| File | Coupling | Impact |
|------|----------|--------|
| `sidecar/agent-runner.ts` | Imports `@anthropic-ai/claude-agent-sdk`, calls `query()`, hardcoded `findClaudeCli()` | Entire sidecar is Claude-only. Must become `claude-runner.ts`. Other providers get own runners. |
| `bterminal-core/src/sidecar.rs` | `AgentQueryOptions` struct has no `provider` field. `SidecarCommand` hardcodes `agent-runner.mjs` path. | Must add `provider: String` field. Runner selection must be provider-based. |
| `src/lib/adapters/sdk-messages.ts` | `parseMessage()` assumes Claude SDK JSON format (assistant/user/result types, subagent tool names like `dispatch_agent`) | Must become `claude-messages.ts`. Other providers get own parsers. Registry selects by provider. |
#### HIGH (TS mirror types, provider-specific commands)
| File | Coupling | Impact |
|------|----------|--------|
| `src/lib/adapters/agent-bridge.ts` | `AgentQueryOptions` interface mirrors Rust struct — no provider field. `queryAgent()` passes options directly. | Add `provider` field. Options shape stays generic (provider_config blob). |
| `src-tauri/src/lib.rs` | `claude_list_profiles`, `claude_list_skills`, `claude_read_skill` commands are Claude-specific. | Keep as-is — they're provider-specific commands, not generic agent commands. UI gates by capability. |
| `src/lib/adapters/claude-bridge.ts` | `listClaudeProfiles()`, `listClaudeSkills()` — provider-specific adapter. | Stays as `claude-bridge.ts`. Other providers get own bridges. Provider-bridge.ts for generic routing. |
#### MEDIUM (provider-aware routing, UI rendering)
| File | Coupling | Impact |
|------|----------|--------|
| `src/lib/agent-dispatcher.ts` | `handleAgentMessage()` calls `parseMessage()` (Claude-specific). Subagent tool names hardcoded (`dispatch_agent`). | Route through message adapter registry. Subagent detection becomes provider-capability. |
| `src/lib/components/Agent/AgentPane.svelte` | Profile selector, skill autocomplete, Claude-specific tool names in rendering logic. | Gate by `ProviderCapabilities`. No `if(provider==='claude')` — use `capabilities.hasProfiles`. |
| `src/lib/components/Workspace/ClaudeSession.svelte` | Name says "Claude" but logic is mostly generic (session management, prompt, AgentPane). | Rename to `AgentSession.svelte`. Add provider prop. |
#### LOW (mostly generic already)
| File | Coupling | Impact |
|------|----------|--------|
| `src/lib/stores/agents.svelte.ts` | AgentMessage type is already generic (text, tool_call, tool_result). No Claude-specific logic. | No changes needed. Common AgentMessage type stays. |
| `src/lib/stores/health.svelte.ts` | Tracks activity/cost/context per project. Provider-agnostic. | No changes needed. |
| `src/lib/stores/conflicts.svelte.ts` | File overlap detection. Provider-agnostic (operates on tool_call file paths). | No changes needed. |
| `bterminal-relay/` | Forwards AgentQueryOptions as-is. No provider logic. | No changes needed (will forward `provider` field transparently). |
### Key Design Insights
1. **Sidecar is the natural boundary**: Each provider needs its own JS runner because SDKs are incompatible (Claude Agent SDK vs Codex CLI vs Ollama REST). The Rust sidecar manager selects which runner to spawn based on `provider` field.
2. **Message format is the main divergence**: Claude SDK emits structured JSON (assistant/user/result with specific fields). Codex CLI has different output format. Ollama uses OpenAI-compatible streaming. Per-provider message adapters normalize to common AgentMessage.
3. **Settings are per-provider + per-project**: Global defaults (API keys, model preferences) are per-provider. Project-level setting is just "which provider to use" (with override for model). Current SettingsTab has room for a collapsible Providers section without needing tabs.
4. **Capability flags eliminate provider switches**: Instead of `if (provider === 'claude') showProfiles()`, use `if (capabilities.hasProfiles) showProfiles()`. This means adding a new provider only requires registering its capabilities — no UI code changes.
5. **env var stripping is provider-specific**: Claude needs CLAUDE* vars stripped (nesting detection). Codex may need CODEX* stripped. Ollama needs nothing stripped. This is part of provider config, not generic logic.
### Risk Assessment
| Risk | Likelihood | Mitigation |
|------|-----------|------------|
| Rename breaks imports across 20+ files | High | Do renames one-at-a-time with full grep verification. Run tests after each. |
| AgentQueryOptions Rust/TS mismatch | Medium | Add provider field to both simultaneously. Default to 'claude'. |
| Message parser regression | Medium | sdk-messages.ts has 25 tests. Copy tests to claude-messages.ts test file. All must pass. |
| Settings persistence migration | Low | New settings keys (provider defaults) — no migration needed, just new keys. |
| UI regression from capability gating | Medium | Start with Claude capabilities = all true. Verify AgentPane renders identically. |

View file

@ -0,0 +1,95 @@
# Agent Provider Adapter — Progress
## Session Log
### 2026-03-11 — Planning Phase
**Duration:** ~30 min
**What happened:**
1. Explored 13+ files across Rust backend, TypeScript bridges, Svelte UI, and JS sidecar to map Claude-specific coupling
2. Classified coupling into 4 severity levels (CRITICAL/HIGH/MEDIUM/LOW)
3. Ran /ultra-think for deep architectural analysis — evaluated 3 design options for sidecar routing, message adapters, and settings UI
4. Made 6 architecture decisions (PA-1 through PA-6)
5. Created 3-phase implementation plan (16 + 5 + 3 tasks)
6. Created planning files: task_plan.md, findings.md, progress.md
**Architecture decisions made:**
- PA-1: Per-provider sidecar binaries (not single multi-SDK bundle)
- PA-2: Generic provider_config blob in AgentQueryOptions
- PA-3: Per-provider message adapter files → common AgentMessage type
- PA-4: Provider selection per-project with global default
- PA-5: Capability flags drive UI rendering (not provider ID checks)
- PA-6: Providers section in SettingsTab scroll (not inner tabs)
**Status:** Planning complete. Ready for Phase 1 implementation.
### 2026-03-11 — Implementation (All 3 Phases)
**Duration:** ~60 min
**What happened:**
**Phase 1 — Core Abstraction Layer (16 tasks):**
1. Created provider types (ProviderId, ProviderCapabilities, ProviderMeta, ProviderSettings)
2. Created Svelte 5 rune-based provider registry (registry.svelte.ts)
3. Created Claude provider meta constant (claude.ts)
4. Renamed sdk-messages.ts → claude-messages.ts, updated 13+ import references
5. Created message adapter registry (message-adapters.ts) with per-provider routing
6. Updated Rust AgentQueryOptions with `provider` and `provider_config` fields (serde defaults for backward compat)
7. Updated agent-bridge.ts TypeScript options
8. Renamed agent-runner.ts → claude-runner.ts, rebuilt dist bundle
9. Added provider field to ProjectConfig (groups.ts)
10. Renamed ClaudeSession.svelte → AgentSession.svelte with provider awareness
11. Updated agent-dispatcher.ts with sessionProviderMap for provider-based message routing
12. Updated AgentPane.svelte with capability-driven rendering (hasProfiles, hasSkills, supportsResume gates)
13. Created provider-bridge.ts (generic adapter delegating to provider-specific bridges)
14. Registered CLAUDE_PROVIDER in App.svelte onMount
15. Updated all test mocks (dispatcher test: adaptMessage mock with provider param)
16. Verified: 202 vitest + 42 cargo tests pass
**Phase 2 — Settings UI (5 tasks):**
1. Added "Providers" section to SettingsTab with collapsible per-provider config panels
2. Each panel: enabled toggle, default model input, capabilities badge display
3. Added per-project provider dropdown in project cards (conditional on >1 provider)
4. Provider settings persisted as JSON blob via `provider_settings` settings key
5. AgentPane already capability-aware from Phase 1
**Phase 3 — Sidecar Routing (3 tasks):**
1. Refactored resolve_sidecar_command() → resolve_sidecar_for_provider(provider) — looks for `{provider}-runner.mjs`
2. query() validates provider runner exists before sending message
3. Extracted strip_provider_env_var() — strips CLAUDE*/CODEX*/OLLAMA* env vars (whitelists CLAUDE_CODE_EXPERIMENTAL_*)
**Status:** All 3 phases complete. 202 vitest + 42 cargo tests pass. Zero regression.
### 2026-03-11 — Provider Runners (Codex + Ollama)
**Duration:** ~45 min
**What happened:**
**Research:**
1. Researched OpenAI Codex CLI programmatic interface (SDK, NDJSON stream format, thread events, sandbox/approval modes, session resume)
2. Researched Ollama REST API (/api/chat, NDJSON streaming, tool calling, token counts, health check)
**Codex Provider (3 files):**
1. Created providers/codex.ts — ProviderMeta (gpt-5.4 default, hasSandbox=true, supportsResume=true, no profiles/skills/cost)
2. Created adapters/codex-messages.ts — adaptCodexMessage() maps ThreadEvents to AgentMessage[] (agent_message→text, reasoning→thinking, command_execution→Bash tool pair, file_change→Write/Edit/Bash per change, mcp_tool_call→server:tool, web_search→WebSearch, turn.completed→cost with tokens)
3. Created sidecar/codex-runner.ts — @openai/codex-sdk wrapper (dynamic import, graceful failure, sandbox/approval mapping, CODEX_API_KEY auth, session resume via thread ID)
**Ollama Provider (3 files):**
1. Created providers/ollama.ts — ProviderMeta (qwen3:8b default, hasModelSelection only, all other capabilities false)
2. Created adapters/ollama-messages.ts — adaptOllamaMessage() maps synthesized chunk events (text, thinking from Qwen3, done→cost with eval_duration/token counts, always $0)
3. Created sidecar/ollama-runner.ts — Direct HTTP to localhost:11434/api/chat (zero deps, health check, NDJSON stream parsing, configurable host/model/num_ctx/think)
**Registration + Build:**
1. Registered CODEX_PROVIDER + OLLAMA_PROVIDER in App.svelte onMount
2. Registered adaptCodexMessage + adaptOllamaMessage in message-adapters.ts
3. Updated build:sidecar script to build all 3 runners via esbuild
**Tests:**
- 19 new tests for codex-messages.ts (all event types)
- 11 new tests for ollama-messages.ts (all event types)
- 256 vitest + 42 cargo tests pass. Zero regression.
**Status:** Provider runners complete. Both providers infrastructure-ready (will work when CLI/server installed).

View file

@ -0,0 +1,134 @@
# Agent Provider Adapter — Task Plan
## Goal
Multi-provider agent support (Claude Code, Codex CLI, Ollama) via adapter pattern. Claude Code remains primary and fully functional. Zero regression.
## Architecture Decisions
| # | Date | Decision | Rationale |
|---|------|----------|-----------|
| PA-1 | 2026-03-11 | Per-provider sidecar binaries (not single multi-SDK bundle) | Independent testing, no bloat, clean separation. SidecarCommand already abstracts binary path. |
| PA-2 | 2026-03-11 | Generic provider_config blob in AgentQueryOptions (not discriminated union) | Rust passes through without parsing. TypeScript uses discriminated unions for compile-time safety. Minimal Rust changes. |
| PA-3 | 2026-03-11 | Per-provider message adapter files → common AgentMessage type | sdk-messages.ts becomes claude-messages.ts. Registry selects parser by provider. Store/UI unchanged. |
| PA-4 | 2026-03-11 | Provider selection per-project with global default | ProjectConfig.provider field (default: 'claude'). Matches real workflow. |
| PA-5 | 2026-03-11 | Capability flags drive UI rendering (not provider ID checks) | ProviderCapabilities interface. AgentPane checks hasProfiles/hasSkills/etc. No hardcoded if(provider==='claude'). |
| PA-6 | 2026-03-11 | Providers section in SettingsTab scroll (not inner tabs) | Current sections aren't long enough for tabs. Collapsible per-provider config panels. |
## Phases
### Phase 1: Core Abstraction Layer (no functional change)
**Goal:** Insert abstraction boundary. Claude remains the only registered provider. Zero user-visible change.
| # | Task | Files | Status |
|---|------|-------|--------|
| 1.1 | Create provider types | NEW: `src/lib/providers/types.ts` | done |
| 1.2 | Create provider registry | NEW: `src/lib/providers/registry.svelte.ts` | done |
| 1.3 | Create Claude provider meta | NEW: `src/lib/providers/claude.ts` | done |
| 1.4 | Rename sdk-messages.ts → claude-messages.ts | RENAME + update imports | done |
| 1.5 | Create message adapter registry | NEW: `src/lib/adapters/message-adapters.ts` | done |
| 1.6 | Update Rust AgentQueryOptions | MOD: `bterminal-core/src/sidecar.rs` | done |
| 1.7 | Update agent-bridge.ts options shape | MOD: `src/lib/adapters/agent-bridge.ts` | done |
| 1.8 | Rename agent-runner.ts → claude-runner.ts | RENAME + update build script | done |
| 1.9 | Add provider field to ProjectConfig | MOD: `src/lib/types/groups.ts` | done |
| 1.10 | Rename ClaudeSession.svelte → AgentSession.svelte | RENAME + update imports | done |
| 1.11 | Update agent-dispatcher provider routing | MOD: `src/lib/agent-dispatcher.ts` | done |
| 1.12 | Update AgentPane for capability-driven rendering | MOD: `src/lib/components/Agent/AgentPane.svelte` | done |
| 1.13 | Rename claude-bridge.ts → provider-bridge.ts | RENAME + genericize | done |
| 1.14 | Update Rust lib.rs commands | MOD: `src-tauri/src/lib.rs` | done |
| 1.15 | Update all tests | MOD: test files | done |
| 1.16 | Verify: 202 vitest + 42 cargo tests pass | — | done |
### Phase 2: Settings UI
| # | Task | Files | Status |
|---|------|-------|--------|
| 2.1 | Add Providers section to SettingsTab | MOD: `SettingsTab.svelte` | done |
| 2.2 | Per-provider collapsible config panels | MOD: `SettingsTab.svelte` | done |
| 2.3 | Per-project provider dropdown | MOD: `SettingsTab.svelte` | done |
| 2.4 | Persist provider settings | MOD: `settings-bridge.ts` | done |
| 2.5 | Provider-aware AgentPane | MOD: `AgentPane.svelte` | done |
### Phase 3: Sidecar Routing
| # | Task | Files | Status |
|---|------|-------|--------|
| 3.1 | SidecarManager provider-based runner selection | MOD: `bterminal-core/src/sidecar.rs` | done |
| 3.2 | Per-provider runner discovery | MOD: `bterminal-core/src/sidecar.rs` | done |
| 3.3 | Provider-specific env var stripping | MOD: `bterminal-core/src/sidecar.rs` | done |
## Type System
### ProviderQueryOptions (TypeScript → Rust → Sidecar)
```
Frontend (typed):
AgentQueryOptions {
provider: ProviderId // 'claude' | 'codex' | 'ollama'
session_id: string
prompt: string
model?: string
max_turns?: number
provider_config: Record<string, unknown> // provider-specific
}
↓ (Tauri invoke)
Rust (generic):
AgentQueryOptions {
provider: String
session_id: String
prompt: String
model: Option<String>
max_turns: Option<u32>
provider_config: serde_json::Value
}
↓ (stdin NDJSON)
Sidecar (provider-specific):
claude-runner.ts parses provider_config as ClaudeProviderConfig
codex-runner.ts parses provider_config as CodexProviderConfig
ollama-runner.ts parses provider_config as OllamaProviderConfig
```
### Message Flow (Sidecar → Frontend)
```
Sidecar stdout (NDJSON, provider-specific format)
Rust SidecarManager (pass-through, adds sessionId)
agent-dispatcher.ts
→ message-adapters.ts registry
→ claude-messages.ts (if provider=claude)
→ codex-messages.ts (if provider=codex, future)
→ ollama-messages.ts (if provider=ollama, future)
→ AgentMessage (common type)
agents.svelte.ts store (unchanged)
AgentPane.svelte (renders AgentMessage, capability-driven)
```
## File Inventory
### New Files (Phase 1)
- `src/lib/providers/types.ts`
- `src/lib/providers/registry.svelte.ts`
- `src/lib/providers/claude.ts`
- `src/lib/adapters/message-adapters.ts`
### Renamed Files (Phase 1)
- `sdk-messages.ts``claude-messages.ts`
- `agent-runner.ts``claude-runner.ts`
- `ClaudeSession.svelte``AgentSession.svelte`
- `claude-bridge.ts``provider-bridge.ts` (genericized)
### Modified Files (Phase 1)
- `bterminal-core/src/sidecar.rs` — AgentQueryOptions struct
- `src-tauri/src/lib.rs` — command handlers
- `src/lib/adapters/agent-bridge.ts` — options interface
- `src/lib/agent-dispatcher.ts` — provider routing
- `src/lib/components/Agent/AgentPane.svelte` — capability checks
- `src/lib/components/Workspace/ProjectBox.svelte` — import rename
- `src/lib/types/groups.ts` — ProjectConfig.provider field
- `package.json` — build:sidecar script path
- Test files — import path updates

110
docs/release-notes.md Normal file
View file

@ -0,0 +1,110 @@
# v3.0 Release Notes
## Mission Control — Multi-Project AI Agent Orchestration
BTerminal v3.0 is a ground-up redesign of the terminal interface, built for managing multiple AI agent sessions across multiple projects simultaneously. The Mission Control dashboard replaces the single-pane terminal with a full orchestration workspace.
### What's New
**Mission Control Dashboard**
- VSCode-style layout: icon sidebar + expandable settings drawer + project grid + status bar
- Per-project boxes with 11 tab types (Model, Docs, Context, Files, SSH, Memory, Metrics, Tasks, Architecture, Selenium, Tests)
- Command palette (Ctrl+K) with 18+ commands across 6 categories
- Keyboard-first navigation: Alt+1-5 project jump, Ctrl+H/L vi-nav, Ctrl+Shift+1-9 tab switch
- 17 themes in 3 groups (Catppuccin, Editor, Deep Dark)
**Multi-Agent Orchestration**
- 4 Tier 1 management roles: Manager, Architect, Tester, Reviewer
- btmsg: inter-agent messaging (DMs, channels, contacts ACL, heartbeats, dead letter queue)
- bttask: kanban task board (5 columns, optimistic locking, review queue auto-notifications)
- Agent prompt generator with role-specific workflows and tool documentation
- Manager subagent delegation via Claude Agent SDK teams
- Auto-wake scheduler: 3 strategies (persistent, on-demand, smart) with 6 wake signals
**Multi-Provider Support**
- Claude Code (primary), OpenAI Codex, Ollama
- Provider-specific sidecar runners with unified message adapter layer
- Per-project provider selection with capability-gated UI
**Session Continuity**
- SQLite persistence for agent sessions, messages, and cost tracking
- Session anchors: preserve important turns through context compaction
- Auto-anchoring on first compaction (observation-masked, reasoning preserved)
- Configurable anchor budget (2K20K tokens)
**Dashboard Metrics**
- Real-time fleet overview: running/idle/stalled counts, burn rate ($/hr)
- Per-project health: activity state, context pressure, file conflicts, attention scoring
- Historical sparklines for cost, tokens, turns, tools, and duration
- Attention queue with priority-scored cards (click to focus)
**File Management**
- VSCode-style directory tree with CodeMirror 6 editor (15 language modes)
- PDF viewer (pdfjs-dist, multi-page, zoom 0.5x3x)
- CSV table viewer (RFC 4180, delimiter auto-detect, sortable columns)
- Filesystem watcher for external write conflict detection
**Terminal**
- xterm.js with Canvas addon (WebKit2GTK compatible)
- Agent preview pane (read-only view of agent activity)
- SSH session management (native PTY, no library required)
- Worktree isolation per project (optional)
### Production Readiness
**Reliability**
- Sidecar crash recovery: auto-restart with exponential backoff (1s30s, 5 retries)
- WAL checkpoint: periodic TRUNCATE every 5 minutes (sessions.db + btmsg.db)
- Error classification: 6 types with actionable messages and retry logic
- Optimistic locking for concurrent task board updates
**Security**
- Landlock sandbox: kernel 6.2+ filesystem restriction for sidecar processes
- Plugin sandbox: 13 shadowed globals, strict mode, frozen API, permission-gated
- Secrets management: system keyring (libsecret), no plaintext fallback
- TLS support for bterminal-relay (optional `--tls-cert`/`--tls-key`)
- Sidecar environment stripping: dual-layer (Rust + JS) credential isolation
- Audit logging: agent events, task changes, wake events, prompt injections
**Observability**
- OpenTelemetry: tracing + OTLP export to Tempo (optional)
- FTS5 full-text search across messages, tasks, and agent comms
- Agent health monitoring: heartbeats, stale detection, dead letter queue
- Desktop + in-app notifications with history
### Multi-Machine (Early Access)
bterminal-relay enables running agent sessions across multiple Linux machines via WebSocket. TLS encryption is supported. This feature is architecturally complete but not yet surfaced in the v3 UI — available for advanced users via the relay binary and bridges.
**v3.1 roadmap:** Certificate pinning, UI integration, real-world multi-machine testing.
### Test Coverage
| Suite | Tests | Status |
|-------|-------|--------|
| Vitest (frontend) | 444 | Pass |
| Cargo (backend) | 151 | Pass |
| E2E (WebDriverIO) | 109 | Pass |
| **Total** | **704** | **All passing** |
### Breaking Changes from v2
- Layout system replaced by workspace store (project groups)
- Configuration moved from sessions.json to groups.json
- App.svelte rewritten (VSCode-style sidebar replaces TilingGrid)
- Settings moved from modal dialog to sidebar drawer tab
### Requirements
- Linux x86_64
- Kernel 6.2+ recommended (for Landlock sandbox enforcement)
- libsecret / DBUS session (for secrets management)
- Node.js 20+ and Rust 1.77+ (build from source)
- Claude CLI installed (`~/.local/bin/claude` or system path)
### Known Limitations
- Maximum 4 active xterm.js instances (WebKit2GTK memory constraint)
- Plugin sandbox uses `new Function()` — best-effort, not a security boundary
- Multi-machine UI not yet integrated into Mission Control
- Agent Teams delegation requires complex prompts to trigger reliably

235
docs/sidecar.md Normal file
View file

@ -0,0 +1,235 @@
# Sidecar Architecture
The sidecar is the bridge between Agent Orchestrator's Rust backend and AI provider APIs. Because the Claude Agent SDK, OpenAI Codex SDK, and Ollama API are JavaScript/TypeScript libraries, they cannot run inside Rust or WebKit2GTK's webview. Instead, the Rust backend spawns child processes (sidecars) that handle AI interactions and communicate back via stdio NDJSON.
---
## Overview
```
Rust Backend (SidecarManager)
├── Spawns child process (Deno preferred, Node.js fallback)
├── Writes QueryMessage to stdin (NDJSON)
├── Reads response lines from stdout (NDJSON)
├── Emits Tauri events for each message
└── Manages lifecycle (start, stop, crash recovery)
Sidecar Process (one of):
├── claude-runner.mjs → @anthropic-ai/claude-agent-sdk
├── codex-runner.mjs → @openai/codex-sdk
└── ollama-runner.mjs → native fetch to localhost:11434
```
---
## Provider Runners
Each provider has its own runner file in `sidecar/`, compiled to a standalone ESM bundle in `sidecar/dist/` by esbuild. The runners are self-contained — all dependencies (including SDKs) are bundled into the `.mjs` file.
### Claude Runner (`claude-runner.ts``claude-runner.mjs`)
The primary runner. Uses `@anthropic-ai/claude-agent-sdk` query() function.
**Startup sequence:**
1. Reads NDJSON messages from stdin in a loop
2. On `query` message: resolves Claude CLI path via `findClaudeCli()`
3. Calls SDK `query()` with options: prompt, cwd, permissionMode, model, settingSources, systemPrompt, additionalDirectories, worktreeName, pathToClaudeCodeExecutable
4. Streams SDK messages as NDJSON to stdout
5. On `stop` message: calls AbortController.abort()
**Claude CLI detection (`findClaudeCli()`):**
Checks paths in order: `~/.local/bin/claude``~/.claude/local/claude``/usr/local/bin/claude``/usr/bin/claude``which claude`. If none found, emits `agent_error` immediately. The path is resolved once at sidecar startup and reused for all sessions.
**Session resume:** Passes `resume: sessionId` to the SDK when a resume session ID is provided. The SDK handles transcript loading internally.
**Multi-account support:** When `claudeConfigDir` is provided (from profile selection), it is set as `CLAUDE_CONFIG_DIR` in the SDK's env option. This points the Claude CLI at a different configuration directory.
**Worktree isolation:** When `worktreeName` is provided, it is passed as `extraArgs: { worktree: name }` to the SDK, which translates to `--worktree <name>` on the CLI.
### Codex Runner (`codex-runner.ts``codex-runner.mjs`)
Uses `@openai/codex-sdk` via dynamic import (graceful failure if not installed).
**Key differences from Claude:**
- Authentication via `CODEX_API_KEY` environment variable
- Sandbox mode mapping: `bypassPermissions``full-auto`, `default``suggest`
- Session resume via thread ID (Codex's equivalent of session continuity)
- No profile/skill support
- ThreadEvent format differs from Claude's stream-json (parsed by `codex-messages.ts`)
### Ollama Runner (`ollama-runner.ts``ollama-runner.mjs`)
Direct HTTP to Ollama's REST API — zero external dependencies.
**Key differences:**
- No SDK — uses native `fetch()` to `http://localhost:11434/api/chat`
- Health check on startup (`GET /api/tags`)
- NDJSON streaming response from Ollama's `/api/chat` endpoint
- Supports Qwen3's `<think>` tags for reasoning display
- Configurable: host, model, num_ctx, temperature
- Cost is always $0 (local inference)
- No subagent support, no profiles, no skills
---
## Communication Protocol
### Messages from Rust to Sidecar (stdin)
```typescript
// Query — start a new agent session
{
"type": "query",
"session_id": "uuid",
"prompt": "Fix the bug in auth.ts",
"cwd": "/home/user/project",
"provider": "claude",
"model": "claude-sonnet-4-6",
"permission_mode": "bypassPermissions",
"resume_session_id": "previous-uuid", // optional
"system_prompt": "You are an architect...", // optional
"claude_config_dir": "~/.config/switcher-claude/work/", // optional
"setting_sources": ["user", "project"], // optional
"additional_directories": ["/shared/lib"], // optional
"worktree_name": "session-123", // optional
"provider_config": { ... }, // provider-specific blob
"extra_env": { "BTMSG_AGENT_ID": "manager-1" } // optional
}
// Stop — abort a running session
{
"type": "stop",
"session_id": "uuid"
}
```
### Messages from Sidecar to Rust (stdout)
The sidecar writes one JSON object per line (NDJSON). The format depends on the provider, but all messages include a `sessionId` field added by the Rust SidecarManager before forwarding as Tauri events.
**Claude messages** follow the same format as the Claude CLI's `--output-format stream-json`:
```typescript
// System init (carries session ID, model info)
{ "type": "system", "subtype": "init", "session_id": "...", "model": "..." }
// Assistant text
{ "type": "assistant", "message": { "content": [{ "type": "text", "text": "..." }] } }
// Tool use
{ "type": "assistant", "message": { "content": [{ "type": "tool_use", "name": "Read", "input": {...} }] } }
// Tool result
{ "type": "user", "message": { "content": [{ "type": "tool_result", "content": "..." }] } }
// Final result
{ "type": "result", "subtype": "success", "cost_usd": 0.05, "duration_ms": 12000, ... }
// Error
{ "type": "agent_error", "error": "Claude CLI not found" }
```
---
## Environment Variable Stripping
When Agent Orchestrator is launched from within a Claude Code terminal session, the parent process sets `CLAUDE*` environment variables for nesting detection and sandbox configuration. If these leak to the sidecar, Claude's SDK detects nesting and either errors or behaves unexpectedly.
The solution is **dual-layer stripping**:
1. **Rust layer (primary):** `SidecarManager` calls `env_clear()` on the child process command, then explicitly sets only the variables needed (`PATH`, `HOME`, `USER`, etc.). This prevents any parent environment from leaking.
2. **JavaScript layer (defense-in-depth):** Each runner also strips provider-specific variables via `strip_provider_env_var()`:
- Claude: strips all `CLAUDE*` keys (whitelists `CLAUDE_CODE_EXPERIMENTAL_*`)
- Codex: strips all `CODEX*` keys
- Ollama: strips all `OLLAMA*` keys (except `OLLAMA_HOST`)
The `extra_env` field in AgentQueryOptions allows injecting specific variables (like `BTMSG_AGENT_ID` for Tier 1 agents) after stripping.
---
## Sidecar Lifecycle
### Startup
The SidecarManager is initialized during Tauri app setup. It does not spawn any sidecar processes at startup — processes are spawned on-demand when the first agent query arrives.
### Runtime Resolution
When a query arrives, `resolve_sidecar_for_provider(provider)` finds the appropriate runner:
1. Looks for `{provider}-runner.mjs` in the sidecar dist directory
2. Checks for Deno first (`deno` or `~/.deno/bin/deno`), then Node.js
3. Returns a `SidecarCommand` struct with the runtime binary and script path
4. If neither runtime is found, returns an error
Deno is preferred because it has faster cold-start time (~50ms vs ~150ms for Node.js) and can compile to a single binary for distribution.
### Crash Recovery (SidecarSupervisor)
The `SidecarSupervisor` in `bterminal-core/src/supervisor.rs` provides automatic crash recovery:
- Monitors the sidecar child process for unexpected exits
- On crash: waits with exponential backoff (1s → 2s → 4s → 8s → 16s → 30s cap)
- Maximum 5 restart attempts before giving up
- Reports health via `SidecarHealth` enum: `Healthy`, `Restarting { attempt, next_retry }`, `Failed { attempts, last_error }`
- 17 unit tests covering all recovery scenarios
### Shutdown
On app exit, `SidecarManager` sends stop messages to all active sessions and kills remaining child processes. The `Drop` implementation ensures cleanup even on panic.
---
## Build Pipeline
```bash
# Build all 3 runner bundles
npm run build:sidecar
# Internally runs esbuild 3 times:
# sidecar/claude-runner.ts → sidecar/dist/claude-runner.mjs
# sidecar/codex-runner.ts → sidecar/dist/codex-runner.mjs
# sidecar/ollama-runner.ts → sidecar/dist/ollama-runner.mjs
```
Each bundle is a standalone ESM file with all dependencies included. The Claude runner bundles `@anthropic-ai/claude-agent-sdk` directly — no `node_modules` needed at runtime. The Codex runner uses dynamic import for `@openai/codex-sdk` (graceful failure if not installed). The Ollama runner has zero external dependencies.
The built `.mjs` files are included as Tauri resources in `tauri.conf.json` and copied to the app bundle during `tauri build`.
---
## Message Adapter Layer
On the frontend, raw sidecar messages pass through a provider-specific adapter before reaching the agent store:
```
Sidecar stdout → Rust SidecarManager → Tauri event
→ agent-dispatcher.ts
→ message-adapters.ts (registry)
→ claude-messages.ts / codex-messages.ts / ollama-messages.ts
→ AgentMessage[] (common type)
→ agents.svelte.ts store
```
The `AgentMessage` type is provider-agnostic:
```typescript
interface AgentMessage {
id: string;
type: 'text' | 'tool_call' | 'tool_result' | 'thinking' | 'init'
| 'status' | 'cost' | 'error' | 'hook';
parentId?: string; // for subagent tracking
content: unknown; // type-specific payload
timestamp: number;
}
```
This means the agent store and AgentPane rendering code never need to know which provider generated a message. The adapter layer is the only code that understands provider-specific formats.
### Test Coverage
- `claude-messages.test.ts` — 25 tests covering all Claude message types
- `codex-messages.test.ts` — 19 tests covering all Codex ThreadEvent types
- `ollama-messages.test.ts` — 11 tests covering all Ollama chunk types

BIN
screenshot.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 157 KiB