docs: update CHANGELOG, TODO, README, CLAUDE.md for tribunal session

Update test counts (516 vitest + 159 cargo), add new entries for all 5 tribunal priorities, mark certificate pinning done, add SPKI persistence and seen_messages pruning as new TODOs.
2026-03-14 04:39:40 +01:00 · 2026-03-14 04:39:40 +01:00 · 662cda2daf
commit 662cda2daf
parent 97abd8a434
5 changed files with 24 additions and 4 deletions
--- a/.claude/CLAUDE.md
+++ b/.claude/CLAUDE.md
@ -5,7 +5,7 @@
 - v1 is a single-file Python app (`bterminal.py`). Changes are localized.
 - v2 docs are in `docs/`. Architecture in `docs/architecture.md`.
 - v2 Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Extras: SSH, ctx, themes, detached mode, auto-updater, shiki, copy/paste, session resume, drag-resize, session groups, Deno sidecar, Claude profiles, skill discovery.
- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (sandboxed new Function(), permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (35). 444 vitest + 151 cargo + 109 E2E.
+- v3 Mission Control (All Phases 1-10 + Production Readiness Complete): project groups, workspace store, 15+ Workspace components, session continuity, multi-provider adapter pattern, worktree isolation, session anchors, Memora adapter, SOLID refactoring, multi-agent orchestration (btmsg/bttask, 4 Tier 1 roles, role-specific tabs), dashboard metrics, auto-wake scheduler, reviewer agent. Production: sidecar supervisor (auto-restart, exponential backoff), FTS5 search (3 virtual tables, Spotlight overlay), plugin system (sandboxed new Function(), permission-gated), Landlock sandbox (kernel 6.2+), secrets management (system keyring), OS+in-app notifications, keyboard-first UX (18+ palette commands, vi-nav), agent health monitoring (heartbeats, dead letter queue), audit logging, error classification (6 types), optimistic locking (bttask). Hardening: TLS relay, SPKI pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, plugin sandbox tests (35), SidecarManager actor pattern, per-message btmsg acknowledgment, Aider autonomous mode. 516 vitest + 159 cargo + 109 E2E.
 - Consult Memora (tag: `bterminal`) before making architectural changes.

 ## Documentation References
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -7,6 +7,21 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [Unreleased]

+### Added
+- **Aider autonomous mode toggle** — per-project `autonomousMode` setting ('restricted'|'autonomous') gates shell command execution in Aider sidecar. Default restricted. SettingsTab toggle
+- **SPKI certificate pinning (TOFU)** — `remote_probe_spki` Tauri command + `probe_spki_hash()` extracts relay TLS certificate SPKI hash. `remote_add_pin`/`remote_remove_pin` commands. In-memory pin store in RemoteManager
+- **Per-message btmsg acknowledgment** — `seen_messages` table with session-scoped tracking replaces count-based polling. `btmsg_unseen_messages`, `btmsg_mark_seen`, `btmsg_prune_seen` commands. ON DELETE CASCADE cleanup
+- **Aider parser test suite** — 72 vitest tests for extracted `aider-parser.ts` (pure parsing functions). 8 realistic Aider output fixtures. Covers prompt detection, suppression, turn parsing, cost extraction, shell execution, format-drift canaries
+- **Dead code wiring** — 4 orphaned Rust functions wired as Tauri commands: `btmsg_get_agent_heartbeats`, `btmsg_queue_dead_letter`, `search_index_task`, `search_index_btmsg`
+
+### Changed
+- **SidecarManager actor refactor** — replaced `Arc<Mutex<HashMap>>` with dedicated actor thread via `std::sync::mpsc` channel. Eliminates TOCTOU race conditions on session lifecycle. All mutable state owned by single thread
+- **Aider parser extraction** — pure functions (`looksLikePrompt`, `parseTurnOutput`, `extractSessionCost`, etc.) extracted from `aider-runner.ts` to `aider-parser.ts` for testability. Runner imports from parser module
+
+### Fixed
+- **groups.rs test failure** — `test_groups_roundtrip` missing 9 Option fields added in P1-P10 (provider, model, use_worktrees, sandbox_enabled, anchor_budget_scale, stall_threshold_min, is_agent, agent_role, system_prompt)
+- **remote_probe_spki tracing skip mismatch** — `#[tracing::instrument(skip(state))]` referenced non-existent parameter name. Removed unused State parameter
+
 ### Added
 - **Comprehensive documentation suite** — 4 new docs: `architecture.md` (end-to-end system architecture with component hierarchy, data flow, IPC patterns), `sidecar.md` (multi-provider runner lifecycle, env stripping, NDJSON protocol, build pipeline), `orchestration.md` (btmsg messaging, bttask kanban, agent roles, wake scheduler, session anchors, health monitoring), `production.md` (sidecar supervisor, Landlock sandbox, FTS5 search, plugin system, secrets management, notifications, audit logging, error classification, telemetry)
 - **Sidecar crash recovery/supervision** — `bterminal-core/src/supervisor.rs`: SidecarSupervisor wraps SidecarManager with auto-restart, exponential backoff (1s base, 30s cap, 5 retries), SidecarHealth enum (Healthy/Degraded/Failed), 5min stability window. 17 tests
--- a/CLAUDE.md
+++ b/CLAUDE.md
@ -2,7 +2,7 @@

 ## Project Overview

-Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (sandboxed, 35 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, WAL checkpoint (5min), subagent delegation fix.
+Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Python) is production-stable. v2 redesign (Tauri 2.x + Svelte 5 + Claude Agent SDK) Phases 1-7 + multi-machine (A-D) + profiles/skills complete. Packaging: .deb + AppImage via GitHub Actions CI. v3 Mission Control (All Phases 1-10 Complete + Production Readiness): multi-project dashboard with project groups, per-project Claude sessions with session continuity, team agents panel, terminal tabs, VSCode-style left sidebar, multi-agent orchestration (Tier 1 management agents: Manager/Architect/Tester/Reviewer with role-specific tabs, btmsg inter-agent messaging, bttask kanban task board with optimistic locking). Production features: sidecar crash recovery/supervision, FTS5 full-text search, plugin system (sandboxed, 35 tests), Landlock sandboxing, secrets management (system keyring), OS + in-app notifications, keyboard-first UX (18+ palette commands), agent health monitoring + dead letter queue, audit logging, error classification. Hardening: TLS relay support, SPKI certificate pinning (TOFU), WAL checkpoint (5min), subagent delegation fix, SidecarManager actor pattern (mpsc), per-message btmsg acknowledgment (seen_messages), Aider autonomous mode toggle.

 - **Repository:** github.com/DexterFromLab/BTerminal
 - **License:** MIT
@ -149,6 +149,8 @@ Terminal emulator with SSH and Claude Code session management. v1 (GTK3+VTE Pyth
 | `v2/sidecar/claude-runner.ts` | Claude sidecar source (compiled to .mjs by esbuild, includes findClaudeCli()) |
 | `v2/sidecar/codex-runner.ts` | Codex sidecar source (@openai/codex-sdk dynamic import, sandbox/approval mapping) |
 | `v2/sidecar/ollama-runner.ts` | Ollama sidecar source (direct HTTP to localhost:11434, zero external deps) |
+| `v2/sidecar/aider-parser.ts` | Aider output parser (pure functions: looksLikePrompt, parseTurnOutput, extractSessionCost, execShell) |
+| `v2/sidecar/aider-parser.test.ts` | Vitest tests for Aider parser (72 tests: prompt detection, turn parsing, cost extraction, format-drift canaries) |
 | `v2/sidecar/agent-runner-deno.ts` | Standalone Deno sidecar runner (not used by SidecarManager, alternative) |
 | `v2/sidecar/dist/claude-runner.mjs` | Bundled Claude sidecar (runs on both Deno and Node.js) |
 | `v2/src/lib/adapters/claude-messages.test.ts` | Vitest tests for Claude message adapter (25 tests) |
--- a/README.md
+++ b/README.md
@ -52,7 +52,7 @@ Agent Orchestrator lets you run multiple Claude Code agents in parallel, organiz
 - **ctx integration** — SQLite context database for cross-session memory

 ### Testing
- **444 vitest** + **151 cargo** + **109 E2E** tests
+- **516 vitest** + **159 cargo** + **109 E2E** tests
 - **E2E engine** — WebDriverIO + tauri-driver, Phase A/B/C scenarios
 - **LLM judge** — dual-mode CLI/API for semantic assertion (claude-haiku)
 - **CI** — GitHub Actions with xvfb + LLM-judged test gating
--- a/TODO.md
+++ b/TODO.md
@ -3,11 +3,12 @@
 ## Multi-Machine (v3.1)

 - [ ] **Real-world relay testing** — TLS added, code complete in bridges/stores. Needs 2-machine test to verify relay + RemoteManager end-to-end. Multi-machine UI not yet surfaced in v3 ProjectBox.
- [ ] **Certificate pinning** — TLS encryption works. Pin relay cert hash in RemoteManager to prevent MITM. Planned for v3.1.
+- [ ] **SPKI pin persistence** — TOFU pinning implemented (probe_spki_hash + in-memory pin store in RemoteManager), but pins are lost on restart. Persist to groups.json or separate config file.

 ## Multi-Agent (v3.1)

 - [ ] **Agent Teams real-world testing** — Subagent delegation prompt + `CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1` env injection done. Needs real multi-agent session to verify Manager spawns child agents via SDK teams.
+- [ ] **seen_messages periodic pruning** — `btmsg_prune_seen` command exists but no scheduler calls it. Wire to a periodic timer (e.g., every 6 hours) to prevent unbounded growth.

 ## Security (v3.2)

@ -19,6 +20,8 @@

 ## Completed

+- [x] Tribunal priorities: Aider security, SidecarManager actor, SPKI pinning, btmsg reliability, Aider tests | Done: 2026-03-14
+- [x] Dead code cleanup — 7 warnings resolved, 4 new Tauri commands wired | Done: 2026-03-14
 - [x] E2E fixture + judge hardening | Done: 2026-03-12
 - [x] LLM judge refactor + E2E docs | Done: 2026-03-12
 - [x] v3 Hardening Sprint (TLS, WAL, Landlock, plugin tests, Phase C E2E) | Done: 2026-03-12