diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 49f6082..fbd4c7b 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -5,7 +5,7 @@ - v1 is a single-file Python app (`bterminal.py`). Changes are localized. - v2 docs are in `docs/`. Architecture decisions are in `docs/task_plan.md`. - v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery. -- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (sandboxed new Function(), permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). 409 vitest + 109 cargo + 82 E2E. +- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (sandboxed new Function(), permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (35). 444 vitest + 111 cargo + 82 E2E. - v3 docs: `docs/v3-task_plan.md`, `docs/v3-findings.md`, `docs/v3-progress.md`. - Consult Memora (tag: `bterminal`) before making architectural changes. diff --git a/CHANGELOG.md b/CHANGELOG.md index 8e1791a..c6e522a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -42,6 +42,10 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Security - `claude_read_skill` path traversal: added `canonicalize()` + `starts_with()` validation to prevent reading arbitrary files via crafted skill paths (commands/claude.rs) - **Sidecar env allowlist hardening** — added `ANTHROPIC_*` to Rust-level `strip_provider_env_var()` as defense-in-depth (Claude CLI uses credentials file, not env for auth). Dual-layer stripping documented: Rust layer (first checkpoint) + JS runner layer (per-provider) +- **Plugin sandbox hardening** — 13 shadowed globals in `new Function()` sandbox (window, document, fetch, globalThis, self, XMLHttpRequest, WebSocket, Function, importScripts, require, process, Deno, __TAURI__, __TAURI_INTERNALS__). `this` bound to undefined via `.call()`. 35 tests covering all shadows, permissions, and lifecycle. Known escape vectors documented in JSDoc +- **WAL checkpoint** — periodic `PRAGMA wal_checkpoint(TRUNCATE)` every 5 minutes on sessions.db + btmsg.db to prevent unbounded WAL growth under sustained multi-agent load. 2 tests +- **TLS support for bterminal-relay** — optional `--tls-cert` and `--tls-key` CLI args. Server wraps TCP streams with native-tls. Client already supports `wss://` URLs. Generic handler refactor avoids code duplication +- **Landlock fallback logging** — improved warning message with kernel version requirement (6.2+) and documented 3 enforcement states ### Fixed - **btmsg.rs column index mismatch** — `get_agents()` used `SELECT a.*` with positional index 7 for `status`, but column 7 is actually `system_prompt`. Converted all query functions in btmsg.rs and bttask.rs from positional to named column access (`row.get("column_name")`). Added SQL aliases for JOIN columns @@ -50,6 +54,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **ArchitectureTab PlantUML encoding** — `rawDeflate()` was a no-op, `encode64()` did hex encoding. Collapsed into single `plantumlEncode()` using PlantUML's `~h` hex encoding - **TestingTab Tauri 2.x asset URL** — used `asset://localhost/` (Tauri 1.x). Fixed to `convertFileSrc()` from `@tauri-apps/api/core` - **Reconnect loop race in RemoteManager** — orphaned reconnect tasks continued running after `remove_machine()` or `disconnect()`. Added `cancelled: Arc` flag to `RemoteMachine`; set on removal/disconnect, checked each reconnect iteration. `connect()` resets flag for new connections (remote.rs) +- **Subagent delegation not triggering** — Manager system prompt had no documentation of Agent tool / delegation capability. Added "Multi-Agent Delegation" section with usage examples and guidelines. Also inject `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env var for Manager agents +- **Gitignore ignoring source code** — root `.gitignore` `plugins/` rule matched `v2/src/lib/plugins/` (source code). Narrowed to `/plugins/` and `/v2/plugins/` (runtime dirs only) ### Added - **Reviewer agent role** — Tier 1 specialist with reviewer workflow in `agent-prompts.ts` (8-step process: inbox → review-queue → analyze → verdict → status update → review-log → report). Rust `bttask.rs` auto-posts to `#review-queue` btmsg channel on task→review transition via `notify_review_channel()` + `ensure_review_channels()` (idempotent). `reviewQueueDepth` in `attention-scorer.ts` (10pts/task, cap 50). `review_queue_count()` Rust function + Tauri command + `reviewQueueCount()` IPC bridge. ProjectBox: 'Tasks' tab for reviewer (reuses TaskBoardTab), 10s review queue polling → `setReviewQueueDepth()` in health store. 7 new vitest + 4 new cargo tests. 388 vitest + 76 cargo total diff --git a/CLAUDE.md b/CLAUDE.md index 094a1b0..890fd99 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,7 +2,7 @@ ## Project Overview -Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system, Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. +Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (sandboxed, 35 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, WAL checkpoint (5min), subagent delegation fix. - **Repository:** github.com/DexterFromLab/BTerminal - **License:** MIT diff --git a/TODO.md b/TODO.md index 9237ef4..08c80c8 100644 --- a/TODO.md +++ b/TODO.md @@ -2,13 +2,16 @@ ## Active -### v3 Remaining -- [ ] **Multi-machine real-world testing** -- Test bterminal-relay with 2 machines. Mark as experimental until docker-compose integration tests pass. -- [ ] **Multi-machine TLS/certificate pinning** -- TLS support for bterminal-relay + certificate pinning in RemoteManager. -- [ ] **Agent Teams real-world testing** -- Env var whitelist fix done. 3 test sessions ran ($1.10, $0.69, $1.70) but model didn't spawn subagents — needs complex multi-part prompts to trigger delegation. Test with CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1. +### v3.1 Remaining +- [ ] **Multi-machine real-world testing** -- TLS added to relay. Needs real 2-machine test. Multi-machine UI not surfaced in v3, code exists in bridges/stores only. +- [ ] **Certificate pinning** -- TLS encryption done (v3.0). Pin cert hash in RemoteManager for v3.1. +- [ ] **Agent Teams real-world testing** -- Subagent delegation prompt fix done + env var injection. Needs real multi-agent session to verify Manager spawns child agents. +- [ ] **Plugin sandbox migration** -- `new Function()` has inherent escape vectors (prototype walking, arguments.callee.constructor). Consider Web Worker isolation for v3.2. +- [ ] **Soak test** -- Run 4-hour soak with 6+ agents across 3+ projects. Monitor memory, SQLite WAL size, xterm.js instances. ## Completed +- [x] **v3 Hardening Sprint** -- Fixed subagent delegation (prompt + env var), added TLS to relay, WAL checkpoint (5min), Landlock logging, plugin sandbox tests (35), gitignore fix. 444 vitest + 111 cargo. | Done: 2026-03-12 - [x] **v3 Production Readiness — ALL tribunal items** -- Implemented all 13 features from tribunal assessment: sidecar supervisor, notifications, secrets, keyboard UX, agent health, search, plugins, sandbox, error classifier, audit log, team agent orchestration, optimistic locking, usage meter. 409 vitest + 109 cargo. | Done: 2026-03-12 - [x] **Unified test runner + testing gate rule** -- Created v2/scripts/test-all.sh (vitest + cargo + optional E2E), added npm scripts (test:all, test:all:e2e, test:cargo), added .claude/rules/20-testing-gate.md requiring full suite after major changes. | Done: 2026-03-12 - [x] **E2E testing — Phase B+ & test fixes** -- Phase B: LLM judge (llm-judge.ts, claude-haiku-4-5), 6 multi-project scenarios, CI workflow (3 jobs). Test fixes: 27 failures across 3 spec files. 388 vitest + 68 cargo + 82 E2E (0 fail, 4 skip). | Done: 2026-03-12 diff --git a/docs/v3-progress.md b/docs/v3-progress.md index 031edd1..c8c1734 100644 --- a/docs/v3-progress.md +++ b/docs/v3-progress.md @@ -1041,3 +1041,36 @@ Implemented ALL 13 features from tribunal assessment in 3 parallel waves (11 sub - [x] Vitest: 409 passed, 0 failed (+21 from prior) - [x] Cargo: 109 passed, 0 failed (+41 from prior) - [x] No regressions + +--- + +### Session: v3 Hardening Sprint (2026-03-12) + +Executed tribunal-recommended hybrid S-2/S-1 hardening sprint. Fixed 3 security/resilience issues, added TLS, fixed gitignore bug. + +#### Subagent Delegation Fix +- [x] Root cause: Manager system prompt had no mention of Agent tool / delegation capability +- [x] Added "Multi-Agent Delegation" section to Manager workflow in `agent-prompts.ts` +- [x] Inject `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env var for Manager agents in `AgentSession.svelte` + +#### TLS for bterminal-relay +- [x] Added `--tls-cert` and `--tls-key` optional CLI args to relay binary +- [x] `build_tls_acceptor()` using `native_tls::Identity::from_pkcs8` +- [x] Refactored to generic `accept_ws_with_auth` and `run_ws_session` (avoids code duplication) +- [x] Client side already supports `wss://` via `connect_async` with native-tls feature — no changes needed +- [x] Certificate pinning deferred to v3.1 per tribunal risk matrix + +#### Security Hardening +- [x] **WAL checkpoint** — `checkpoint_wal()` + `spawn_wal_checkpoint_task()` in lib.rs. Runs `PRAGMA wal_checkpoint(TRUNCATE)` every 5 minutes on sessions.db + btmsg.db. 2 tests +- [x] **Landlock logging** — Improved fallback message: "Kernel 6.2+ required for enforcement" + 3-state enforcement comments +- [x] **Plugin sandbox** — Already hardened (13 shadowed globals, `this` binding to undefined). Documented known `new Function()` escape vectors in JSDoc + +#### Gitignore Fix +- [x] Root `.gitignore` had `plugins/` which matched `v2/src/lib/plugins/` (source code). Narrowed to `/plugins/` and `/v2/plugins/` (runtime dirs only) +- [x] Tracked previously-ignored `plugin-host.ts` source file +- [x] Added `plugin-host.test.ts` with 35 tests (sandbox globals, permissions, lifecycle) + +#### Verification +- [x] Vitest: 444 passed, 0 failed (+35 plugin sandbox tests) +- [x] Cargo: 111 passed, 0 failed (+2 WAL checkpoint tests) +- [x] Full workspace compiles clean diff --git a/docs/v3-release-notes.md b/docs/v3-release-notes.md new file mode 100644 index 0000000..9c6a968 --- /dev/null +++ b/docs/v3-release-notes.md @@ -0,0 +1,110 @@ +# BTerminal v3.0 Release Notes + +## Mission Control — Multi-Project AI Agent Orchestration + +BTerminal v3.0 is a ground-up redesign of the terminal interface, built for managing multiple AI agent sessions across multiple projects simultaneously. The Mission Control dashboard replaces the single-pane terminal with a full orchestration workspace. + +### What's New + +**Mission Control Dashboard** +- VSCode-style layout: icon sidebar + expandable settings drawer + project grid + status bar +- Per-project boxes with 11 tab types (Model, Docs, Context, Files, SSH, Memory, Metrics, Tasks, Architecture, Selenium, Tests) +- Command palette (Ctrl+K) with 18+ commands across 6 categories +- Keyboard-first navigation: Alt+1-5 project jump, Ctrl+H/L vi-nav, Ctrl+Shift+1-9 tab switch +- 17 themes in 3 groups (Catppuccin, Editor, Deep Dark) + +**Multi-Agent Orchestration** +- 4 Tier 1 management roles: Manager, Architect, Tester, Reviewer +- btmsg: inter-agent messaging (DMs, channels, contacts ACL, heartbeats, dead letter queue) +- bttask: kanban task board (5 columns, optimistic locking, review queue auto-notifications) +- Agent prompt generator with role-specific workflows and tool documentation +- Manager subagent delegation via Claude Agent SDK teams +- Auto-wake scheduler: 3 strategies (persistent, on-demand, smart) with 6 wake signals + +**Multi-Provider Support** +- Claude Code (primary), OpenAI Codex, Ollama +- Provider-specific sidecar runners with unified message adapter layer +- Per-project provider selection with capability-gated UI + +**Session Continuity** +- SQLite persistence for agent sessions, messages, and cost tracking +- Session anchors: preserve important turns through context compaction +- Auto-anchoring on first compaction (observation-masked, reasoning preserved) +- Configurable anchor budget (2K–20K tokens) + +**Dashboard Metrics** +- Real-time fleet overview: running/idle/stalled counts, burn rate ($/hr) +- Per-project health: activity state, context pressure, file conflicts, attention scoring +- Historical sparklines for cost, tokens, turns, tools, and duration +- Attention queue with priority-scored cards (click to focus) + +**File Management** +- VSCode-style directory tree with CodeMirror 6 editor (15 language modes) +- PDF viewer (pdfjs-dist, multi-page, zoom 0.5x–3x) +- CSV table viewer (RFC 4180, delimiter auto-detect, sortable columns) +- Filesystem watcher for external write conflict detection + +**Terminal** +- xterm.js with Canvas addon (WebKit2GTK compatible) +- Agent preview pane (read-only view of agent activity) +- SSH session management (native PTY, no library required) +- Worktree isolation per project (optional) + +### Production Readiness + +**Reliability** +- Sidecar crash recovery: auto-restart with exponential backoff (1s–30s, 5 retries) +- WAL checkpoint: periodic TRUNCATE every 5 minutes (sessions.db + btmsg.db) +- Error classification: 6 types with actionable messages and retry logic +- Optimistic locking for concurrent task board updates + +**Security** +- Landlock sandbox: kernel 6.2+ filesystem restriction for sidecar processes +- Plugin sandbox: 13 shadowed globals, strict mode, frozen API, permission-gated +- Secrets management: system keyring (libsecret), no plaintext fallback +- TLS support for bterminal-relay (optional `--tls-cert`/`--tls-key`) +- Sidecar environment stripping: dual-layer (Rust + JS) credential isolation +- Audit logging: agent events, task changes, wake events, prompt injections + +**Observability** +- OpenTelemetry: tracing + OTLP export to Tempo (optional) +- FTS5 full-text search across messages, tasks, and agent comms +- Agent health monitoring: heartbeats, stale detection, dead letter queue +- Desktop + in-app notifications with history + +### Multi-Machine (Early Access) + +bterminal-relay enables running agent sessions across multiple Linux machines via WebSocket. TLS encryption is supported. This feature is architecturally complete but not yet surfaced in the v3 UI — available for advanced users via the relay binary and bridges. + +**v3.1 roadmap:** Certificate pinning, UI integration, real-world multi-machine testing. + +### Test Coverage + +| Suite | Tests | Status | +|-------|-------|--------| +| Vitest (frontend) | 444 | Pass | +| Cargo (backend) | 111 | Pass | +| E2E (WebDriverIO) | 82 | Pass | +| **Total** | **637** | **All passing** | + +### Breaking Changes from v2 + +- Layout system replaced by workspace store (project groups) +- Configuration moved from sessions.json to groups.json +- App.svelte rewritten (VSCode-style sidebar replaces TilingGrid) +- Settings moved from modal dialog to sidebar drawer tab + +### Requirements + +- Linux x86_64 +- Kernel 6.2+ recommended (for Landlock sandbox enforcement) +- libsecret / DBUS session (for secrets management) +- Node.js 20+ and Rust 1.77+ (build from source) +- Claude CLI installed (`~/.local/bin/claude` or system path) + +### Known Limitations + +- Maximum 4 active xterm.js instances (WebKit2GTK memory constraint) +- Plugin sandbox uses `new Function()` — best-effort, not a security boundary +- Multi-machine UI not yet integrated into Mission Control +- Agent Teams delegation requires complex prompts to trigger reliably