docs: update CHANGELOG, TODO, CLAUDE.md for Worker sandbox and startup pruning

2026-03-15 02:36:55 +01:00 · 2026-03-15 02:36:55 +01:00 · 19e4a68f22
commit 19e4a68f22
parent 92000f2d6d
4 changed files with 11 additions and 14 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@ -5,7 +5,7 @@
 - v1 is a single-file Python app (`bterminal.py`). Changes are localized.
 - v2 docs are in `docs/`. Architecture in `docs/architecture.md`.
 - v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery.
- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (sandboxed new Function(), permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, SPKI pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (35), SidecarManager actor pattern, per-message btmsg acknowledgment, Aider autonomous mode. 516 vitest + 159 cargo + 109 E2E.
+- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (Web Worker sandbox, permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, SPKI pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (26), SidecarManager actor pattern, per-message btmsg acknowledgment, Aider autonomous mode. 507 vitest + 110 cargo + 109 E2E.
 - Consult Memora (tag: `bterminal`) before making architectural changes.

 ## Documentation References
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -8,12 +8,17 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]

 ### Added
+- **Plugin sandbox Web Worker migration** — replaced `new Function()` sandbox with dedicated Web Worker per plugin. True process-level isolation — no DOM, no Tauri IPC, no main-thread access. Permission-gated API proxied via postMessage with RPC pattern. 26 tests (MockWorker class in vitest)
+- **seen_messages startup pruning** — `pruneSeen()` called fire-and-forget in App.svelte onMount. Removes entries older than 7 days (emergency: 3 days at 200k rows)
 - **Aider autonomous mode toggle** — per-project `autonomousMode` setting ('restricted'|'autonomous') gates shell command execution in Aider sidecar. Default restricted. SettingsTab toggle
 - **SPKI certificate pinning (TOFU)** — `remote_probe_spki` Tauri command + `probe_spki_hash()` extracts relay TLS certificate SPKI hash. `remote_add_pin`/`remote_remove_pin` commands. In-memory pin store in RemoteManager
 - **Per-message btmsg acknowledgment** — `seen_messages` table with session-scoped tracking replaces count-based polling. `btmsg_unseen_messages`, `btmsg_mark_seen`, `btmsg_prune_seen` commands. ON DELETE CASCADE cleanup
 - **Aider parser test suite** — 72 vitest tests for extracted `aider-parser.ts` (pure parsing functions). 8 realistic Aider output fixtures. Covers prompt detection, suppression, turn parsing, cost extraction, shell execution, format-drift canaries
 - **Dead code wiring** — 4 orphaned Rust functions wired as Tauri commands: `btmsg_get_agent_heartbeats`, `btmsg_queue_dead_letter`, `search_index_task`, `search_index_btmsg`

+### Security
+- **Plugin sandbox hardened** — `new Function()` (same-realm, escapable via prototype walking) replaced with Web Worker (separate JS context, no escape vectors). Eliminates `arguments.callee.constructor` and `({}).constructor.constructor` attack paths
+
 ### Changed
 - **SidecarManager actor refactor** — replaced `Arc<Mutex<HashMap>>` with dedicated actor thread via `std::sync::mpsc` channel. Eliminates TOCTOU race conditions on session lifecycle. All mutable state owned by single thread
 - **Aider parser extraction** — pure functions (`looksLikePrompt`, `parseTurnOutput`, `extractSessionCost`, etc.) extracted from `aider-runner.ts` to `aider-parser.ts` for testability. Runner imports from parser module
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -2,7 +2,7 @@

 ## Project Overview

-Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (sandboxed, 35 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, SPKI certificate pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, SidecarManager actor pattern (mpsc), per-message btmsg acknowledgment (seen_messages), Aider autonomous mode toggle.
+Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (Web Worker sandbox, 26 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, SPKI certificate pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, SidecarManager actor pattern (mpsc), per-message btmsg acknowledgment (seen_messages), Aider autonomous mode toggle.

 - **Repository:** github.com/DexterFromLab/BTerminal
 - **License:** MIT
@ -119,7 +119,7 @@ Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Pyth
 | `v2/src/lib/adapters/search-bridge.ts` | FTS5 search IPC wrapper (initSearch, searchAll, rebuildIndex, indexMessage) |
 | `v2/src/lib/adapters/secrets-bridge.ts` | Secrets IPC wrapper (storeSecret, getSecret, deleteSecret, listSecrets, hasKeyring) |
 | `v2/src/lib/utils/error-classifier.ts` | API error classification (6 types: rate_limit/auth/quota/overloaded/network/unknown, retry logic, 20 tests) |
-| `v2/src/lib/plugins/plugin-host.ts` | Sandboxed plugin runtime (new Function(), permission-gated API, load/unload lifecycle) |
+| `v2/src/lib/plugins/plugin-host.ts` | Sandboxed plugin runtime (Web Worker isolation, permission-gated API via postMessage, load/unload lifecycle) |
 | `v2/src/lib/components/Agent/UsageMeter.svelte` | Compact inline usage meter (color thresholds 50/75/90%, hover tooltip) |
 | `v2/src/lib/components/Notifications/NotificationCenter.svelte` | Bell icon + dropdown notification panel (unread badge, history, mark read/clear) |
 | `v2/src/lib/components/Workspace/AuditLogTab.svelte` | Manager audit log tab (filter by type+agent, 5s auto-refresh, max 200 entries) |
--- a/TODO.md
+++ b/TODO.md
@ -8,18 +8,16 @@
 ## Multi-Agent (v3.1)

 - [ ] **Agent Teams real-world testing** — Subagent delegation prompt + `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env injection done. Needs real multi-agent session to verify Manager spawns child agents via SDK teams.
- [ ] **seen_messages periodic pruning** — `btmsg_prune_seen` command exists but no scheduler calls it. Wire to a periodic timer (e.g., every 6 hours) to prevent unbounded growth.
-
-## Security (v3.2)
-
- [ ] **Plugin sandbox migration** — Current `new Function()` sandbox has escape vectors (prototype walking, `arguments.callee.constructor`). Migrate to Web Worker isolation for true process-level sandboxing.

 ## Reliability

 - [ ] **Soak test** — Run 4-hour soak with 6+ agents across 3+ projects. Monitor: memory growth, SQLite WAL size, xterm.js instance count, sidecar supervisor restarts.
+- [ ] **WebKit2GTK Worker verification** — Verify Web Worker Blob URL approach works in Tauri's WebKit2GTK webview (tested in vitest only so far).

 ## Completed

+- [x] Plugin sandbox migration — new Function() → Web Worker isolation, 26 tests | Done: 2026-03-15
+- [x] seen_messages startup pruning — pruneSeen() on app startup, fire-and-forget | Done: 2026-03-15
 - [x] Tribunal priorities: Aider security, SidecarManager actor, SPKI pinning, btmsg reliability, Aider tests | Done: 2026-03-14
 - [x] Dead code cleanup — 7 warnings resolved, 4 new Tauri commands wired | Done: 2026-03-14
 - [x] E2E fixture + judge hardening | Done: 2026-03-12
@ -28,9 +26,3 @@
 - [x] v3 Production Readiness — all 13 tribunal items | Done: 2026-03-12
 - [x] Unified test runner + testing gate rule | Done: 2026-03-12
 - [x] E2E Phase B + 27 test fixes | Done: 2026-03-12
- [x] Reviewer agent role | Done: 2026-03-12
- [x] Auto-wake Manager scheduler | Done: 2026-03-12
- [x] Dashboard metrics panel | Done: 2026-03-12
- [x] Branded types (GroupId, AgentId, SessionId, ProjectId) | Done: 2026-03-11
- [x] Regression tests + sidecar env security | Done: 2026-03-11
- [x] Integration fix (btmsg column, camelCase, PlantUML, Tauri 2.x assets) | Done: 2026-03-11