Commit graph

334 commits

Author SHA1 Message Date
DexterFromLab
e8555625ff Fix horizontal grid jumping caused by scrollIntoView bubbling
scrollIntoView() in AgentPane was scrolling all ancestor containers
including ProjectGrid (overflow-x: auto), causing the entire project
grid to jump horizontally every time any agent produced output.

Replaced with direct scrollTop/scrollTo manipulation that only affects
the intended scroll container. Also removed scroll-snap-type which
caused additional snap recalculation on layout changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:32:14 +01:00
DexterFromLab
bb09b3c0ff Give Tier 2 agents btmsg/bttask access and instructions
- Set BTMSG_AGENT_ID for all projects (not just Tier 1) so Tier 2
  agents can use btmsg/bttask CLI tools
- Add btmsg/bttask documentation to Tier 2 system prompt with
  workflow instructions (inbox, tasks, status updates)
- Unify wake/start prompts to always reference btmsg inbox

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:16:57 +01:00
DexterFromLab
c6a41018b6 Auto-wake agents on btmsg and fix unwanted auto-scroll
- Add btmsg inbox polling (10s) to AgentSession so agents wake when
  they receive messages from other agents (not just admin DMs)
- Remove automatic setActiveProject on agent activation to prevent
  focus stealing from the user
- Use untrack() in ProjectGrid scroll effect so agent re-renders
  don't trigger unwanted scrollIntoView

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:14:13 +01:00
DexterFromLab
5b7ad30573 Add Aider provider with OpenRouter support and per-provider sidecar routing
- Add aider-runner.ts sidecar that spawns aider CLI in non-interactive mode
- Add Aider provider metadata with OpenRouter model presets
- Add aider-messages.ts adapter for Aider event format
- Refactor SidecarManager from single-process to per-provider process management
  with lazy startup on first query and session→provider routing
- Add openrouter_api_key to secrets system (keyring storage)
- Inject OPENROUTER_API_KEY from secrets into Aider agent environment
- Register Aider in provider registry, build pipeline, and resource bundle

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 13:33:39 +01:00
DexterFromLab
35963be686 Unify provider/model config for Tier 1 and Tier 2 agents
- Add provider and model fields to both GroupAgentConfig and ProjectConfig
- Wire model override through AgentSession → AgentPane → queryAgent → sidecar
- Add model preset dropdown per provider (Opus/Sonnet/Haiku, GPT-5.4/o3, etc.)
  with custom model ID input at the bottom
- Add provider dropdown to Tier 1 agents (was Tier 2 only)
- Add "Apply & Restart" button on both tiers to restart agent with new settings
- Changing provider auto-resets model selection
- Admin bypasses stale heartbeat check in btmsg so DMs always deliver

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:54:13 +01:00
DexterFromLab
2c710fa0db Wire play/stop buttons and DM send to agent lifecycle
- Play button in GroupAgentsPanel now starts agent session via emitAgentStart
- Stop button now stops running agent session via emitAgentStop
- Sending a DM to a stopped agent auto-wakes it (sets active + emitAgentStart)
- Fix autoPrompt in AgentPane to work for fresh sessions (not just done/error)
- Fix btmsg: admin (tier 0) bypasses stale heartbeat check so messages deliver

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:33:08 +01:00
DexterFromLab
42907f22a4 Auto-scroll ProjectGrid to focused project on agent selection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:16:13 +01:00
DexterFromLab
adde8462ef Rename binary to agent-orchestrator, add splash screen, fix white flash
- Rename Cargo package from bterminal to agent-orchestrator so WM_CLASS
  matches desktop entry and taskbar groups correctly
- Update lib name (agent_orchestrator_lib) and telemetry service name
- Add Pandora's Box splash screen with progress steps during startup
- Prevent white window flash with inline CSS and Tauri backgroundColor

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:09:18 +01:00
DexterFromLab
79e13649a1 docs: rewrite README for Agent Orchestrator v3
Replace old BTerminal V1 README with comprehensive Agent Orchestrator
documentation covering multi-agent orchestration, production hardening,
testing infrastructure, and architecture overview. Update screenshot
to show current V3 UI with Messages panel and agent workspace.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:51:05 +01:00
DexterFromLab
4fee567dd9 chore: rebrand to Agent Orchestrator + fix pragma busy_timeout crash
Rebrand all user-visible BTerminal references to Agent Orchestrator
(window title, product name, identifier, status bar, updater URL,
context registration, CLAUDE.md branch reference).

Fix critical btmsg/bttask crash: pragma_update uses execute() internally
but PRAGMA busy_timeout returns a result row, causing "Execute returned
results" error that silently broke all CommsTab message loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:42:13 +01:00
Hibryda
a313da8892 docs: update CHANGELOG and TODO for E2E fixture/judge fixes 2026-03-12 11:10:50 +01:00
Hibryda
a49c2010c7 fix: LLM judge CLI context isolation (--setting-sources user, cwd /tmp) 2026-03-12 11:10:50 +01:00
Hibryda
f339c5918d test: increase WebDriverIO timeout for LLM-judged E2E tests
Increase global mocha timeout from 60s to 180s in wdio.conf.js to accommodate longer-running LLM judge tests that evaluate agent responses and code generation. Add explicit per-test overrides for Phase B scenarios B4 and B5 to ensure adequate time for agent startup, execution, and LLM verification.

- wdio.conf.js: global timeout 60_000 → 180_000ms
- phase-b.test.ts: explicit 180_000ms timeout for B4 and B5 scenarios
2026-03-12 11:10:50 +01:00
Hibryda
070ef3bf48 test: update WebDriverIO configuration with improved fixture setup and logging 2026-03-12 11:10:50 +01:00
Hibryda
768a9b73e7 chore: remove obsolete rules files (consolidated into 53/54 sequence) 2026-03-12 11:10:50 +01:00
Hibryda
0f176f1fb6 chore: reorganize rules files — consolidate duplicates
Migrates legacy rule numbering (18, 20) to standardized sequence (53, 54) and adds new 18-preexisting-issues.md for handling pre-existing issues during development. This consolidates duplicate rule coverage across the old and new numbering schemes.

Files changed:
- Removed: 18-relative-units.md (moved to 53-relative-units.md)
- Removed: 20-testing-gate.md (moved to 54-testing-gate.md)
- Added: 18-preexisting-issues.md (new)
- Added: 53-relative-units.md (renamed from 18)
- Added: 54-testing-gate.md (renamed from 20)
2026-03-12 11:10:50 +01:00
Hibryda
6f93032565 docs: add comprehensive E2E testing facility documentation
New docs/e2e-testing.md covering all 3 pillars: test fixtures
(isolated temp environments), test mode (BTERMINAL_TEST=1), and
LLM judge (dual-mode CLI/API). Includes spec phases, CI integration,
WebKit2GTK pitfalls, and troubleshooting guide.
2026-03-12 11:10:50 +01:00
Hibryda
f88d10888b feat: refactor LLM judge to dual-mode CLI/API and fix config test race
Refactor llm-judge.ts from raw API-only to dual-mode: CLI first
(spawns claude with --output-format text, unsets CLAUDECODE), API
fallback. Backend selectable via LLM_JUDGE_BACKEND env var.

Fix pre-existing race condition in config.rs tests where parallel
test execution caused env var mutations to interfere. Added static
Mutex to serialize env-mutating tests.
2026-03-12 11:10:50 +01:00
Hibryda
9a90c2499a test: add Phase C E2E tests and fix pre-existing test failures
- Add phase-c.test.ts: 27 new E2E tests across 11 scenarios covering
  hardening sprint features (command palette, search overlay, notification
  center, keyboard navigation, settings panel, project health, metrics tab,
  context tab, files tab, LLM-judged settings/status bar)
- Fix 3 pre-existing failures in bterminal.test.ts: update stale CSS
  selectors (.group-name → .cmd-label, .palette-item.active → .selected)
- Register phase-c.test.ts in wdio.conf.js specs array
- Update test counts: 444 vitest + 151 cargo + 109 E2E = 704 total
2026-03-12 11:10:50 +01:00
Hibryda
05629a7204 fix: use tauri::async_runtime::spawn for WAL checkpoint task
tokio::spawn() panics during Tauri setup in WebDriver E2E mode because
the Tokio runtime is not directly accessible. Switch to
tauri::async_runtime::spawn() which uses Tauri's managed runtime.
2026-03-12 11:10:50 +01:00
Hibryda
58054e56fc docs: add v3.0 release notes and update meta files for hardening sprint
- docs/v3-release-notes.md: comprehensive v3.0 release notes covering
  Mission Control, multi-agent orchestration, production readiness,
  multi-machine early access, test coverage, and known limitations
- docs/v3-progress.md: hardening sprint session entry
- CHANGELOG.md: security entries (TLS, WAL, plugin sandbox, Landlock)
  and bug fixes (subagent delegation, gitignore)
- TODO.md: hardening complete, remaining items moved to v3.1
- CLAUDE.md: updated test counts (444 vitest + 111 cargo)
2026-03-12 11:10:50 +01:00
Hibryda
5e949696d5 fix: track plugin-host source and add 35 sandbox security tests
Fix .gitignore 'plugins/' rule that was accidentally ignoring source
files in v2/src/lib/plugins/. Narrow to /plugins/ and /v2/plugins/
(runtime plugin directories only). Track plugin-host.ts (was written
but never committed) and add comprehensive test suite covering all 13
shadowed globals, this-binding, permission gating, API freeze, and
lifecycle management.
2026-03-12 11:10:50 +01:00
Hibryda
9af3cdc637 feat: add WAL checkpoint task and improve Landlock fallback logging
Add periodic PRAGMA wal_checkpoint(TRUNCATE) every 5 minutes for both
sessions.db and btmsg.db to prevent unbounded WAL growth under sustained
multi-agent load. Improve Landlock fallback log message with kernel
version requirement. Add WAL checkpoint tests.
2026-03-12 11:10:50 +01:00
Hibryda
de8d1488e2 feat: add TLS support to bterminal-relay
Add optional --tls-cert and --tls-key CLI args. When provided, the relay
wraps TCP streams with native-tls before WebSocket upgrade. Refactored
to generic accept_ws_with_auth<S> and run_ws_session<S> to avoid code
duplication between plain and TLS paths. Client side already supports
wss:// URLs via connect_async with native-tls feature.
2026-03-12 11:10:50 +01:00
Hibryda
7e65ee2360 feat: fix subagent delegation for Manager agents
Add multi-agent delegation documentation to Manager system prompt so
Claude knows it can spawn child agents via the Agent tool. Also inject
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 env var for Manager agents.
2026-03-12 11:10:50 +01:00
Hibryda
acdab31e43 docs: update meta files for testing facility and tribunal assessment 2026-03-12 11:10:50 +01:00
Hibryda
c94a82b8dd test: update tests for production readiness features
Update btmsg-bridge, bttask-bridge, and agent-dispatcher tests for new
APIs (registerAgents, version param, notification mocks).
2026-03-12 11:10:50 +01:00
Hibryda
afc059b346 feat: integrate all production readiness modules
Register new commands in lib.rs, add command modules, update Cargo deps
(notify-rust, keyring, bundled-full), fix PRAGMA WAL for bundled-full,
add notifications/heartbeats/FTS5 indexing to agent-dispatcher,
update SettingsTab with secrets/plugins/sandbox/updates sections.
2026-03-12 11:10:50 +01:00
Hibryda
66cbee2c53 feat: add optimistic locking for bttask and error classification
Version column in tasks table with WHERE id=? AND version=? guard.
Conflict detection in TaskBoardTab. error-classifier.ts: 6 error types
with actionable messages and retry logic. UsageMeter.svelte.
2026-03-12 11:10:50 +01:00
Hibryda
a9b7ed0dda feat: add keyboard-first UX and rewrite CommandPalette
Alt+1-5 project jump, Ctrl+H/L vi-nav, Ctrl+Shift+1-9 tab switch,
Ctrl+J terminal toggle, Ctrl+Shift+K focus agent. isEditing() guard.
CommandPalette: 18+ commands, 6 categories, fuzzy filter, arrow nav.
2026-03-12 11:10:50 +01:00
Hibryda
d31a2c3ed7 feat: add agent health monitoring, audit log, and dead letter queue
heartbeats + dead_letter_queue + audit_log tables in btmsg.db. 15s
heartbeat polling in ProjectBox, stale detection, ProjectHeader heart
indicator. AuditLogTab for Manager. register_agents_from_groups() with
bidirectional contacts and review channel creation.
2026-03-12 11:10:50 +01:00
Hibryda
cb9d99d191 feat: add plugin system with sandboxed runtime
Plugin discovery from ~/.config/bterminal/plugins/ with plugin.json
manifest. Sandboxed new Function() execution, permission-gated API
(palette, btmsg:read, bttask:read, events). Plugin store + SettingsTab.
2026-03-12 11:10:50 +01:00
Hibryda
e8acd6c3d5 feat: add OS + in-app notification system
notify-rust for desktop notifications, NotificationCenter.svelte with
bell icon, unread badge, history (max 100), 6 notification types.
Extended notification store with history and type support.
2026-03-12 11:10:50 +01:00
Hibryda
c6836cecf3 feat: add secrets management via system keyring
SecretsManager using keyring crate (linux-native/libsecret). Store/get/
delete/list with __bterminal_keys__ metadata tracking. SettingsTab
Secrets section. No plaintext fallback.
2026-03-12 11:10:50 +01:00
Hibryda
3148d31ab1 feat: add FTS5 full-text search with Spotlight-style overlay
Upgrade rusqlite to bundled-full for FTS5. SearchDb with 3 virtual tables
(messages, tasks, btmsg). SearchOverlay.svelte: Ctrl+Shift+F, 300ms
debounce, grouped results with highlight snippets.
2026-03-12 11:10:50 +01:00
Hibryda
871fd0385f feat: add Landlock sandbox for sidecar process isolation
SandboxConfig with RW/RO paths applied via pre_exec() in sidecar child
process. Requires kernel 6.2+ with graceful fallback. Per-project toggle
in SettingsTab. 9 unit tests.
2026-03-12 11:10:50 +01:00
Hibryda
f868f6f148 feat: add sidecar crash recovery supervisor with exponential backoff
SidecarSupervisor wraps SidecarManager with auto-restart (1s-30s backoff,
5 retries), SidecarHealth enum, 5min stability window. 17 unit tests.
2026-03-12 11:10:50 +01:00
Hibryda
6cf06e9ab5 docs: update meta files for testing facility and tribunal assessment
Update CLAUDE.md with test runner in key paths and build commands.
Update .claude/CLAUDE.md with testing gate rule index entry.
Update TODO.md with tribunal-derived roadmap items.
Update CHANGELOG.md with test runner and testing gate entries.
2026-03-12 11:10:50 +01:00
Hibryda
643be87cb4 feat: add unified test runner and testing gate rule
Create v2/scripts/test-all.sh (vitest + cargo + optional E2E via --e2e).
Add npm scripts: test:all, test:all:e2e, test:cargo.
Add .claude/rules/20-testing-gate.md requiring full suite after major changes.
2026-03-12 11:10:50 +01:00
Hibryda
c8df61199f docs: update meta files for E2E test fixes
Update test counts (82 E2E passing), add CHANGELOG entries for
27 fixed failures and AgentPane template fix, update TODO.md.
2026-03-12 11:10:50 +01:00
Hibryda
1f293083b2 fix(e2e): fix 27 E2E test failures across 3 spec files
Fix stale v2 CSS selectors for v3 UI, WebKit2GTK keyboard/focus
quirks (JS-dispatched KeyboardEvent, programmatic focus check,
backdrop click close), conditional render timing (waitUntil for
project boxes, null handling for burn-rate/cost elements), and
AgentPane missing closing > on data-testid div tag.
2026-03-12 11:10:50 +01:00
Hibryda
47f9322948 docs: update meta files for E2E testing engine Phase B+ 2026-03-12 11:10:50 +01:00
Hibryda
19a6a788af ci: add E2E test workflow with xvfb and LLM-judged test gating
3 jobs (vitest, cargo, e2e), path-filtered triggers on v2 source changes,
xvfb-run for headless WebKit2GTK, LLM-judged tests gated on
ANTHROPIC_API_KEY secret availability.
2026-03-12 11:10:50 +01:00
Hibryda
90c997d3e9 feat(e2e): add Phase B scenarios with LLM-judged assertions and multi-project tests
Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid
rendering, independent tab switching, status bar fleet state, and
LLM-judged agent response quality evaluation via Claude API.
Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5,
structured verdicts with confidence thresholds).
2026-03-12 11:10:50 +01:00
Hibryda
22fe723816 docs: update meta files for E2E testing engine Phase A 2026-03-12 11:10:50 +01:00
Hibryda
8bc8a1a33d feat(e2e): add Phase A scenarios, fixtures, and results store
7 human-authored test scenarios (22 tests) using data-testid
selectors. Test fixture generator for isolated environments.
JSON results store (no native deps). WebDriverIO config updated
with TCP readiness probe and multi-spec support.
2026-03-12 11:10:50 +01:00
Hibryda
51abc6ee34 feat(e2e): add data-testid attributes to 7 key Svelte components
Stable test selectors for E2E: agent-pane, data-agent-status,
project-box, data-project-id, status-bar, agent-session,
sidebar-rail, command-palette, terminal-tabs and more.
2026-03-12 11:10:50 +01:00
Hibryda
4b86065163 feat(e2e): add test mode infrastructure with BTERMINAL_TEST env isolation
Rust: watcher.rs/fs_watcher.rs skip watchers in test mode,
is_test_mode Tauri command. Frontend: wake-scheduler disable,
App.svelte test mode detection. AppConfig centralization in
bterminal-core (OnceLock pattern for path overrides).
2026-03-12 11:10:50 +01:00
Hibryda
01c8ab8b3e docs: update meta files for reviewer agent role 2026-03-12 11:10:50 +01:00
Hibryda
4b72c26158 feat(reviewer): add Tier 1 reviewer agent role with auto-channel notifications
Reviewer workflow in agent-prompts.ts (8-step process), Rust auto-post
to #review-queue on task->review transition, reviewQueueDepth in
attention scoring (10pts/task cap 50), Tasks tab for reviewer in
ProjectBox with 10s queue polling. 7 vitest + 4 cargo tests.
2026-03-12 11:10:50 +01:00