diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 419bd62..3550cb9 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -5,7 +5,7 @@ - v1 is a single-file Python app (`bterminal.py`). Changes are localized. - v2 docs are in `docs/`. Architecture in `docs/architecture.md`. - v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery. -- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (sandboxed new Function(), permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, SPKI pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (35), SidecarManager actor pattern, per-message btmsg acknowledgment, Aider autonomous mode. 516 vitest + 159 cargo + 109 E2E. +- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (Web Worker sandbox, permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, SPKI pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (26), SidecarManager actor pattern, per-message btmsg acknowledgment, Aider autonomous mode. 507 vitest + 110 cargo + 109 E2E. - Consult Memora (tag: `bterminal`) before making architectural changes. ## Documentation References diff --git a/CHANGELOG.md b/CHANGELOG.md index 4702791..320584d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -8,12 +8,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] ### Added +- **Plugin sandbox Web Worker migration** — replaced `new Function()` sandbox with dedicated Web Worker per plugin. True process-level isolation — no DOM, no Tauri IPC, no main-thread access. Permission-gated API proxied via postMessage with RPC pattern. 26 tests (MockWorker class in vitest) +- **seen_messages startup pruning** — `pruneSeen()` called fire-and-forget in App.svelte onMount. Removes entries older than 7 days (emergency: 3 days at 200k rows) - **Aider autonomous mode toggle** — per-project `autonomousMode` setting ('restricted'|'autonomous') gates shell command execution in Aider sidecar. Default restricted. SettingsTab toggle - **SPKI certificate pinning (TOFU)** — `remote_probe_spki` Tauri command + `probe_spki_hash()` extracts relay TLS certificate SPKI hash. `remote_add_pin`/`remote_remove_pin` commands. In-memory pin store in RemoteManager - **Per-message btmsg acknowledgment** — `seen_messages` table with session-scoped tracking replaces count-based polling. `btmsg_unseen_messages`, `btmsg_mark_seen`, `btmsg_prune_seen` commands. ON DELETE CASCADE cleanup - **Aider parser test suite** — 72 vitest tests for extracted `aider-parser.ts` (pure parsing functions). 8 realistic Aider output fixtures. Covers prompt detection, suppression, turn parsing, cost extraction, shell execution, format-drift canaries - **Dead code wiring** — 4 orphaned Rust functions wired as Tauri commands: `btmsg_get_agent_heartbeats`, `btmsg_queue_dead_letter`, `search_index_task`, `search_index_btmsg` +### Security +- **Plugin sandbox hardened** — `new Function()` (same-realm, escapable via prototype walking) replaced with Web Worker (separate JS context, no escape vectors). Eliminates `arguments.callee.constructor` and `({}).constructor.constructor` attack paths + ### Changed - **SidecarManager actor refactor** — replaced `Arc>` with dedicated actor thread via `std::sync::mpsc` channel. Eliminates TOCTOU race conditions on session lifecycle. All mutable state owned by single thread - **Aider parser extraction** — pure functions (`looksLikePrompt`, `parseTurnOutput`, `extractSessionCost`, etc.) extracted from `aider-runner.ts` to `aider-parser.ts` for testability. Runner imports from parser module diff --git a/CLAUDE.md b/CLAUDE.md index 73e4974..6efbd95 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,7 +2,7 @@ ## Project Overview -Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (sandboxed, 35 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, SPKI certificate pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, SidecarManager actor pattern (mpsc), per-message btmsg acknowledgment (seen_messages), Aider autonomous mode toggle. +Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (Web Worker sandbox, 26 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, SPKI certificate pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, SidecarManager actor pattern (mpsc), per-message btmsg acknowledgment (seen_messages), Aider autonomous mode toggle. - **Repository:** github.com/DexterFromLab/BTerminal - **License:** MIT @@ -119,7 +119,7 @@ Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Pyth | `v2/src/lib/adapters/search-bridge.ts` | FTS5 search IPC wrapper (initSearch, searchAll, rebuildIndex, indexMessage) | | `v2/src/lib/adapters/secrets-bridge.ts` | Secrets IPC wrapper (storeSecret, getSecret, deleteSecret, listSecrets, hasKeyring) | | `v2/src/lib/utils/error-classifier.ts` | API error classification (6 types: rate_limit/auth/quota/overloaded/network/unknown, retry logic, 20 tests) | -| `v2/src/lib/plugins/plugin-host.ts` | Sandboxed plugin runtime (new Function(), permission-gated API, load/unload lifecycle) | +| `v2/src/lib/plugins/plugin-host.ts` | Sandboxed plugin runtime (Web Worker isolation, permission-gated API via postMessage, load/unload lifecycle) | | `v2/src/lib/components/Agent/UsageMeter.svelte` | Compact inline usage meter (color thresholds 50/75/90%, hover tooltip) | | `v2/src/lib/components/Notifications/NotificationCenter.svelte` | Bell icon + dropdown notification panel (unread badge, history, mark read/clear) | | `v2/src/lib/components/Workspace/AuditLogTab.svelte` | Manager audit log tab (filter by type+agent, 5s auto-refresh, max 200 entries) | diff --git a/TODO.md b/TODO.md index fc8d81b..f0ffd92 100644 --- a/TODO.md +++ b/TODO.md @@ -8,18 +8,16 @@ ## Multi-Agent (v3.1) - [ ] **Agent Teams real-world testing** — Subagent delegation prompt + `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env injection done. Needs real multi-agent session to verify Manager spawns child agents via SDK teams. -- [ ] **seen_messages periodic pruning** — `btmsg_prune_seen` command exists but no scheduler calls it. Wire to a periodic timer (e.g., every 6 hours) to prevent unbounded growth. - -## Security (v3.2) - -- [ ] **Plugin sandbox migration** — Current `new Function()` sandbox has escape vectors (prototype walking, `arguments.callee.constructor`). Migrate to Web Worker isolation for true process-level sandboxing. ## Reliability - [ ] **Soak test** — Run 4-hour soak with 6+ agents across 3+ projects. Monitor: memory growth, SQLite WAL size, xterm.js instance count, sidecar supervisor restarts. +- [ ] **WebKit2GTK Worker verification** — Verify Web Worker Blob URL approach works in Tauri's WebKit2GTK webview (tested in vitest only so far). ## Completed +- [x] Plugin sandbox migration — new Function() → Web Worker isolation, 26 tests | Done: 2026-03-15 +- [x] seen_messages startup pruning — pruneSeen() on app startup, fire-and-forget | Done: 2026-03-15 - [x] Tribunal priorities: Aider security, SidecarManager actor, SPKI pinning, btmsg reliability, Aider tests | Done: 2026-03-14 - [x] Dead code cleanup — 7 warnings resolved, 4 new Tauri commands wired | Done: 2026-03-14 - [x] E2E fixture + judge hardening | Done: 2026-03-12 @@ -28,9 +26,3 @@ - [x] v3 Production Readiness — all 13 tribunal items | Done: 2026-03-12 - [x] Unified test runner + testing gate rule | Done: 2026-03-12 - [x] E2E Phase B + 27 test fixes | Done: 2026-03-12 -- [x] Reviewer agent role | Done: 2026-03-12 -- [x] Auto-wake Manager scheduler | Done: 2026-03-12 -- [x] Dashboard metrics panel | Done: 2026-03-12 -- [x] Branded types (GroupId, AgentId, SessionId, ProjectId) | Done: 2026-03-11 -- [x] Regression tests + sidecar env security | Done: 2026-03-11 -- [x] Integration fix (btmsg column, camelCase, PlantUML, Tauri 2.x assets) | Done: 2026-03-11