Rust groups.rs ProjectConfig was missing provider, model, and other optional
fields — serde silently dropped them on save, causing all projects to fall
back to claude/Opus on reload. Added all missing fields to both ProjectConfig
and GroupAgentConfig structs.
Rewrote aider-runner from one-shot --message mode to interactive stdin/stdout:
- Persistent aider process with multi-turn conversation support
- Pre-fetches btmsg inbox and bttask board before sending prompt to LLM
- Autonomous agent override prompt so LLM acts instead of asking for files
- Line-buffered output (no token-by-token fragments)
- Thinking block classification for DeepSeek R1
- Graceful /exit shutdown
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
scrollIntoView() in AgentPane was scrolling all ancestor containers
including ProjectGrid (overflow-x: auto), causing the entire project
grid to jump horizontally every time any agent produced output.
Replaced with direct scrollTop/scrollTo manipulation that only affects
the intended scroll container. Also removed scroll-snap-type which
caused additional snap recalculation on layout changes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Set BTMSG_AGENT_ID for all projects (not just Tier 1) so Tier 2
agents can use btmsg/bttask CLI tools
- Add btmsg/bttask documentation to Tier 2 system prompt with
workflow instructions (inbox, tasks, status updates)
- Unify wake/start prompts to always reference btmsg inbox
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add btmsg inbox polling (10s) to AgentSession so agents wake when
they receive messages from other agents (not just admin DMs)
- Remove automatic setActiveProject on agent activation to prevent
focus stealing from the user
- Use untrack() in ProjectGrid scroll effect so agent re-renders
don't trigger unwanted scrollIntoView
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add aider-runner.ts sidecar that spawns aider CLI in non-interactive mode
- Add Aider provider metadata with OpenRouter model presets
- Add aider-messages.ts adapter for Aider event format
- Refactor SidecarManager from single-process to per-provider process management
with lazy startup on first query and session→provider routing
- Add openrouter_api_key to secrets system (keyring storage)
- Inject OPENROUTER_API_KEY from secrets into Aider agent environment
- Register Aider in provider registry, build pipeline, and resource bundle
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add provider and model fields to both GroupAgentConfig and ProjectConfig
- Wire model override through AgentSession → AgentPane → queryAgent → sidecar
- Add model preset dropdown per provider (Opus/Sonnet/Haiku, GPT-5.4/o3, etc.)
with custom model ID input at the bottom
- Add provider dropdown to Tier 1 agents (was Tier 2 only)
- Add "Apply & Restart" button on both tiers to restart agent with new settings
- Changing provider auto-resets model selection
- Admin bypasses stale heartbeat check in btmsg so DMs always deliver
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Play button in GroupAgentsPanel now starts agent session via emitAgentStart
- Stop button now stops running agent session via emitAgentStop
- Sending a DM to a stopped agent auto-wakes it (sets active + emitAgentStart)
- Fix autoPrompt in AgentPane to work for fresh sessions (not just done/error)
- Fix btmsg: admin (tier 0) bypasses stale heartbeat check so messages deliver
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Rename Cargo package from bterminal to agent-orchestrator so WM_CLASS
matches desktop entry and taskbar groups correctly
- Update lib name (agent_orchestrator_lib) and telemetry service name
- Add Pandora's Box splash screen with progress steps during startup
- Prevent white window flash with inline CSS and Tauri backgroundColor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace old BTerminal V1 README with comprehensive Agent Orchestrator
documentation covering multi-agent orchestration, production hardening,
testing infrastructure, and architecture overview. Update screenshot
to show current V3 UI with Messages panel and agent workspace.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rebrand all user-visible BTerminal references to Agent Orchestrator
(window title, product name, identifier, status bar, updater URL,
context registration, CLAUDE.md branch reference).
Fix critical btmsg/bttask crash: pragma_update uses execute() internally
but PRAGMA busy_timeout returns a result row, causing "Execute returned
results" error that silently broke all CommsTab message loading.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Increase global mocha timeout from 60s to 180s in wdio.conf.js to accommodate longer-running LLM judge tests that evaluate agent responses and code generation. Add explicit per-test overrides for Phase B scenarios B4 and B5 to ensure adequate time for agent startup, execution, and LLM verification.
- wdio.conf.js: global timeout 60_000 → 180_000ms
- phase-b.test.ts: explicit 180_000ms timeout for B4 and B5 scenarios
Migrates legacy rule numbering (18, 20) to standardized sequence (53, 54) and adds new 18-preexisting-issues.md for handling pre-existing issues during development. This consolidates duplicate rule coverage across the old and new numbering schemes.
Files changed:
- Removed: 18-relative-units.md (moved to 53-relative-units.md)
- Removed: 20-testing-gate.md (moved to 54-testing-gate.md)
- Added: 18-preexisting-issues.md (new)
- Added: 53-relative-units.md (renamed from 18)
- Added: 54-testing-gate.md (renamed from 20)
New docs/e2e-testing.md covering all 3 pillars: test fixtures
(isolated temp environments), test mode (BTERMINAL_TEST=1), and
LLM judge (dual-mode CLI/API). Includes spec phases, CI integration,
WebKit2GTK pitfalls, and troubleshooting guide.
Refactor llm-judge.ts from raw API-only to dual-mode: CLI first
(spawns claude with --output-format text, unsets CLAUDECODE), API
fallback. Backend selectable via LLM_JUDGE_BACKEND env var.
Fix pre-existing race condition in config.rs tests where parallel
test execution caused env var mutations to interfere. Added static
Mutex to serialize env-mutating tests.
tokio::spawn() panics during Tauri setup in WebDriver E2E mode because
the Tokio runtime is not directly accessible. Switch to
tauri::async_runtime::spawn() which uses Tauri's managed runtime.
Fix .gitignore 'plugins/' rule that was accidentally ignoring source
files in v2/src/lib/plugins/. Narrow to /plugins/ and /v2/plugins/
(runtime plugin directories only). Track plugin-host.ts (was written
but never committed) and add comprehensive test suite covering all 13
shadowed globals, this-binding, permission gating, API freeze, and
lifecycle management.
Add periodic PRAGMA wal_checkpoint(TRUNCATE) every 5 minutes for both
sessions.db and btmsg.db to prevent unbounded WAL growth under sustained
multi-agent load. Improve Landlock fallback log message with kernel
version requirement. Add WAL checkpoint tests.
Add optional --tls-cert and --tls-key CLI args. When provided, the relay
wraps TCP streams with native-tls before WebSocket upgrade. Refactored
to generic accept_ws_with_auth<S> and run_ws_session<S> to avoid code
duplication between plain and TLS paths. Client side already supports
wss:// URLs via connect_async with native-tls feature.
Add multi-agent delegation documentation to Manager system prompt so
Claude knows it can spawn child agents via the Agent tool. Also inject
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 env var for Manager agents.
Version column in tasks table with WHERE id=? AND version=? guard.
Conflict detection in TaskBoardTab. error-classifier.ts: 6 error types
with actionable messages and retry logic. UsageMeter.svelte.
heartbeats + dead_letter_queue + audit_log tables in btmsg.db. 15s
heartbeat polling in ProjectBox, stale detection, ProjectHeader heart
indicator. AuditLogTab for Manager. register_agents_from_groups() with
bidirectional contacts and review channel creation.
Plugin discovery from ~/.config/bterminal/plugins/ with plugin.json
manifest. Sandboxed new Function() execution, permission-gated API
(palette, btmsg:read, bttask:read, events). Plugin store + SettingsTab.
notify-rust for desktop notifications, NotificationCenter.svelte with
bell icon, unread badge, history (max 100), 6 notification types.
Extended notification store with history and type support.
SandboxConfig with RW/RO paths applied via pre_exec() in sidecar child
process. Requires kernel 6.2+ with graceful fallback. Per-project toggle
in SettingsTab. 9 unit tests.
Update CLAUDE.md with test runner in key paths and build commands.
Update .claude/CLAUDE.md with testing gate rule index entry.
Update TODO.md with tribunal-derived roadmap items.
Update CHANGELOG.md with test runner and testing gate entries.
Create v2/scripts/test-all.sh (vitest + cargo + optional E2E via --e2e).
Add npm scripts: test:all, test:all:e2e, test:cargo.
Add .claude/rules/20-testing-gate.md requiring full suite after major changes.
Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid
rendering, independent tab switching, status bar fleet state, and
LLM-judged agent response quality evaluation via Claude API.
Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5,
structured verdicts with confidence thresholds).
7 human-authored test scenarios (22 tests) using data-testid
selectors. Test fixture generator for isolated environments.
JSON results store (no native deps). WebDriverIO config updated
with TCP readiness probe and multi-spec support.
Stable test selectors for E2E: agent-pane, data-agent-status,
project-box, data-project-id, status-bar, agent-session,
sidebar-rail, command-palette, terminal-tabs and more.