diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 8b9d156..419bd62 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -5,7 +5,7 @@ - v1 is a single-file Python app (`bterminal.py`). Changes are localized. - v2 docs are in `docs/`. Architecture in `docs/architecture.md`. - v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery. -- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (sandboxed new Function(), permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (35). 444 vitest + 151 cargo + 109 E2E. +- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (sandboxed new Function(), permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, SPKI pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (35), SidecarManager actor pattern, per-message btmsg acknowledgment, Aider autonomous mode. 516 vitest + 159 cargo + 109 E2E. - Consult Memora (tag: `bterminal`) before making architectural changes. ## Documentation References diff --git a/CHANGELOG.md b/CHANGELOG.md index 5209a97..4702791 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -7,6 +7,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +### Added +- **Aider autonomous mode toggle** — per-project `autonomousMode` setting ('restricted'|'autonomous') gates shell command execution in Aider sidecar. Default restricted. SettingsTab toggle +- **SPKI certificate pinning (TOFU)** — `remote_probe_spki` Tauri command + `probe_spki_hash()` extracts relay TLS certificate SPKI hash. `remote_add_pin`/`remote_remove_pin` commands. In-memory pin store in RemoteManager +- **Per-message btmsg acknowledgment** — `seen_messages` table with session-scoped tracking replaces count-based polling. `btmsg_unseen_messages`, `btmsg_mark_seen`, `btmsg_prune_seen` commands. ON DELETE CASCADE cleanup +- **Aider parser test suite** — 72 vitest tests for extracted `aider-parser.ts` (pure parsing functions). 8 realistic Aider output fixtures. Covers prompt detection, suppression, turn parsing, cost extraction, shell execution, format-drift canaries +- **Dead code wiring** — 4 orphaned Rust functions wired as Tauri commands: `btmsg_get_agent_heartbeats`, `btmsg_queue_dead_letter`, `search_index_task`, `search_index_btmsg` + +### Changed +- **SidecarManager actor refactor** — replaced `Arc>` with dedicated actor thread via `std::sync::mpsc` channel. Eliminates TOCTOU race conditions on session lifecycle. All mutable state owned by single thread +- **Aider parser extraction** — pure functions (`looksLikePrompt`, `parseTurnOutput`, `extractSessionCost`, etc.) extracted from `aider-runner.ts` to `aider-parser.ts` for testability. Runner imports from parser module + +### Fixed +- **groups.rs test failure** — `test_groups_roundtrip` missing 9 Option fields added in P1-P10 (provider, model, use_worktrees, sandbox_enabled, anchor_budget_scale, stall_threshold_min, is_agent, agent_role, system_prompt) +- **remote_probe_spki tracing skip mismatch** — `#[tracing::instrument(skip(state))]` referenced non-existent parameter name. Removed unused State parameter + ### Added - **Comprehensive documentation suite** — 4 new docs: `architecture.md` (end-to-end system architecture with component hierarchy, data flow, IPC patterns), `sidecar.md` (multi-provider runner lifecycle, env stripping, NDJSON protocol, build pipeline), `orchestration.md` (btmsg messaging, bttask kanban, agent roles, wake scheduler, session anchors, health monitoring), `production.md` (sidecar supervisor, Landlock sandbox, FTS5 search, plugin system, secrets management, notifications, audit logging, error classification, telemetry) - **Sidecar crash recovery/supervision** — `bterminal-core/src/supervisor.rs`: SidecarSupervisor wraps SidecarManager with auto-restart, exponential backoff (1s base, 30s cap, 5 retries), SidecarHealth enum (Healthy/Degraded/Failed), 5min stability window. 17 tests diff --git a/CLAUDE.md b/CLAUDE.md index 9aadff2..73e4974 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,7 +2,7 @@ ## Project Overview -Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (sandboxed, 35 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, WAL checkpoint (5min), subagent delegation fix. +Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (sandboxed, 35 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, SPKI certificate pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, SidecarManager actor pattern (mpsc), per-message btmsg acknowledgment (seen_messages), Aider autonomous mode toggle. - **Repository:** github.com/DexterFromLab/BTerminal - **License:** MIT @@ -149,6 +149,8 @@ Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Pyth | `v2/sidecar/claude-runner.ts` | Claude sidecar source (compiled to .mjs by esbuild, includes findClaudeCli()) | | `v2/sidecar/codex-runner.ts` | Codex sidecar source (@openai/codex-sdk dynamic import, sandbox/approval mapping) | | `v2/sidecar/ollama-runner.ts` | Ollama sidecar source (direct HTTP to localhost:11434, zero external deps) | +| `v2/sidecar/aider-parser.ts` | Aider output parser (pure functions: looksLikePrompt, parseTurnOutput, extractSessionCost, execShell) | +| `v2/sidecar/aider-parser.test.ts` | Vitest tests for Aider parser (72 tests: prompt detection, turn parsing, cost extraction, format-drift canaries) | | `v2/sidecar/agent-runner-deno.ts` | Standalone Deno sidecar runner (not used by SidecarManager, alternative) | | `v2/sidecar/dist/claude-runner.mjs` | Bundled Claude sidecar (runs on both Deno and Node.js) | | `v2/src/lib/adapters/claude-messages.test.ts` | Vitest tests for Claude message adapter (25 tests) | diff --git a/README.md b/README.md index 34c2973..0573c3b 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,7 @@ Agent Orchestrator lets you run multiple Claude Code agents in parallel, organiz - **ctx integration** — SQLite context database for cross-session memory ### Testing -- **444 vitest** + **151 cargo** + **109 E2E** tests +- **516 vitest** + **159 cargo** + **109 E2E** tests - **E2E engine** — WebDriverIO + tauri-driver, Phase A/B/C scenarios - **LLM judge** — dual-mode CLI/API for semantic assertion (claude-haiku) - **CI** — GitHub Actions with xvfb + LLM-judged test gating diff --git a/TODO.md b/TODO.md index 9f61936..fc8d81b 100644 --- a/TODO.md +++ b/TODO.md @@ -3,11 +3,12 @@ ## Multi-Machine (v3.1) - [ ] **Real-world relay testing** — TLS added, code complete in bridges/stores. Needs 2-machine test to verify relay + RemoteManager end-to-end. Multi-machine UI not yet surfaced in v3 ProjectBox. -- [ ] **Certificate pinning** — TLS encryption works. Pin relay cert hash in RemoteManager to prevent MITM. Planned for v3.1. +- [ ] **SPKI pin persistence** — TOFU pinning implemented (probe_spki_hash + in-memory pin store in RemoteManager), but pins are lost on restart. Persist to groups.json or separate config file. ## Multi-Agent (v3.1) - [ ] **Agent Teams real-world testing** — Subagent delegation prompt + `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env injection done. Needs real multi-agent session to verify Manager spawns child agents via SDK teams. +- [ ] **seen_messages periodic pruning** — `btmsg_prune_seen` command exists but no scheduler calls it. Wire to a periodic timer (e.g., every 6 hours) to prevent unbounded growth. ## Security (v3.2) @@ -19,6 +20,8 @@ ## Completed +- [x] Tribunal priorities: Aider security, SidecarManager actor, SPKI pinning, btmsg reliability, Aider tests | Done: 2026-03-14 +- [x] Dead code cleanup — 7 warnings resolved, 4 new Tauri commands wired | Done: 2026-03-14 - [x] E2E fixture + judge hardening | Done: 2026-03-12 - [x] LLM judge refactor + E2E docs | Done: 2026-03-12 - [x] v3 Hardening Sprint (TLS, WAL, Landlock, plugin tests, Phase C E2E) | Done: 2026-03-12