BTerminal/TODO.md

# BTerminal -- TODO

## Active

### Migration to agent-orchestrator
- [ ] **Review Dexter's 13 feature commits** -- hib_changes rebased onto dexter_changes (55ba8d0). Need to review: Aider provider, splash screen, provider/model unification, Tier 2 btmsg access, auto-wake on btmsg. Then push hib_changes and start working from agent-orchestrator repo.
- [ ] **CLAUDE.md Commit Zero** -- Update agent-orchestrator's CLAUDE.md to reflect rebrand + new features. Update docs/ accordingly.
- [ ] **Switch primary development to agent-orchestrator** -- After review + CLAUDE.md update, develop on hib_changes in agent-orchestrator. BTerminal repo stays for v1 production only.

### v3.1 Remaining
- [ ] **Multi-machine real-world testing** -- TLS added to relay. Needs real 2-machine test. Multi-machine UI not surfaced in v3, code exists in bridges/stores only.
- [ ] **Certificate pinning** -- TLS encryption done (v3.0). Pin cert hash in RemoteManager for v3.1.
- [ ] **Agent Teams real-world testing** -- Subagent delegation prompt fix done + env var injection. Needs real multi-agent session to verify Manager spawns child agents.
- [ ] **Plugin sandbox migration** -- `new Function()` has inherent escape vectors (prototype walking, arguments.callee.constructor). Consider Web Worker isolation for v3.2.
- [ ] **Soak test** -- Run 4-hour soak with 6+ agents across 3+ projects. Monitor memory, SQLite WAL size, xterm.js instances.
- [ ] **E2E agent tests hang in nested Claude sessions** -- B4/B5 phase-b tests timeout when run from within a Claude Code session (sidecar spawns Claude CLI which hangs in git repo context). Works in CI. Investigate process isolation or session nesting guard.

## Completed

- [x] **E2E fixture + judge hardening** -- Fixed fixture env propagation (process.env injection, tauri:options.env unreliable), LLM judge CLI context isolation (--setting-sources user, cwd /tmp, --system-prompt), mocha timeout 180s. Confirmed fixture fakes project list. Agent tests CI-only (nested Claude limitation). | Done: 2026-03-12
- [x] **LLM judge refactor + E2E docs** -- Refactored llm-judge.ts to dual-mode (CLI first, API fallback), env-configurable via LLM_JUDGE_BACKEND. Wrote comprehensive docs/e2e-testing.md covering fixtures, test mode, LLM judge, all spec phases, CI, troubleshooting. 444 vitest + 151 cargo + 109 E2E. | Done: 2026-03-12
- [x] **v3 Hardening Sprint** -- Fixed subagent delegation (prompt + env var), added TLS to relay, WAL checkpoint (5min), Landlock logging, plugin sandbox tests (35), gitignore fix. Phase C E2E tests (27 new, 3 pre-existing fixes). 444 vitest + 151 cargo + 109 E2E. | Done: 2026-03-12
- [x] **v3 Production Readiness — ALL tribunal items** -- Implemented all 13 features from tribunal assessment: sidecar supervisor, notifications, secrets, keyboard UX, agent health, search, plugins, sandbox, error classifier, audit log, team agent orchestration, optimistic locking, usage meter. 409 vitest + 109 cargo. | Done: 2026-03-12
- [x] **Unified test runner + testing gate rule** -- Created v2/scripts/test-all.sh (vitest + cargo + optional E2E), added npm scripts (test:all, test:all:e2e, test:cargo), added .claude/rules/20-testing-gate.md requiring full suite after major changes. | Done: 2026-03-12
- [x] **E2E testing — Phase B+ & test fixes** -- Phase B: LLM judge (llm-judge.ts, claude-haiku-4-5), 6 multi-project scenarios, CI workflow (3 jobs). Test fixes: 27 failures across 3 spec files. 388 vitest + 68 cargo + 82 E2E (0 fail, 4 skip). | Done: 2026-03-12
- [x] **Reviewer agent role** -- Tier 1 specialist with role='reviewer'. Reviewer workflow in agent-prompts.ts (8-step process). #review-queue/#review-log auto-channels. reviewQueueDepth in attention scoring (10pts/task, cap 50). 388 vitest + 76 cargo. | Done: 2026-03-12
- [x] **Auto-wake Manager** -- wake-scheduler.svelte.ts + wake-scorer.ts (24 tests). 3 strategies: persistent/on-demand/smart. 6 signals. Settings UI. 381 vitest + 72 cargo. | Done: 2026-03-12
- [x] **Dashboard metrics panel** -- MetricsPanel.svelte: live health + task board summary + SVG sparkline history. 25 tests. 357 vitest + 72 cargo. | Done: 2026-03-12