Commit graph

351 commits

Author SHA1 Message Date
Hibryda
19e4a68f22 docs: update CHANGELOG, TODO, CLAUDE.md for Worker sandbox and startup pruning 2026-03-15 02:36:55 +01:00
Hibryda
92000f2d6d feat: add seen_messages pruning on app startup
Calls pruneSeen() fire-and-forget during onMount to clean up stale
seen_messages entries (7-day default, emergency 3-day at 200k rows).
2026-03-15 02:36:55 +01:00
Hibryda
a70d45ad21 security: migrate plugin sandbox from new Function() to Web Worker
Each plugin now runs in a dedicated Web Worker with permission-gated
API proxied via postMessage. Eliminates prototype walking and
arguments.callee.constructor escape vectors inherent to same-realm
new Function() sandbox.
2026-03-15 02:36:55 +01:00
Hibryda
662cda2daf docs: update CHANGELOG, TODO, README, CLAUDE.md for tribunal session
Update test counts (516 vitest + 159 cargo), add new entries for all 5
tribunal priorities, mark certificate pinning done, add SPKI persistence
and seen_messages pruning as new TODOs.
2026-03-14 04:39:40 +01:00
Hibryda
97abd8a434 feat: add Aider parser extraction with 72 tests
Tribunal priority 5: Extract pure parsing functions from aider-runner.ts
to aider-parser.ts for testability. 72 vitest tests covering prompt
detection, turn parsing, cost extraction, and format-drift canaries.
2026-03-14 04:39:40 +01:00
Hibryda
23b4d0cf26 feat: add SidecarManager actor pattern, SPKI pinning, btmsg seen_messages, Aider autonomous mode
Tribunal priorities 1-4: SidecarManager refactored to mpsc actor thread
(eliminates TOCTOU race), SPKI TOFU certificate pinning for relay TLS,
per-message btmsg acknowledgment via seen_messages table, Aider
autonomous mode toggle gating shell execution.
2026-03-14 04:39:40 +01:00
Hibryda
949d90887d docs: update all references for restructured docs layout
Update CLAUDE.md, .claude/CLAUDE.md, README.md, CHANGELOG.md to reference
new paths: decisions.md, progress/, release-notes.md, unified findings.md.
Fix branch name reference (dexter_changes -> hib_changes). Rewrite TODO.md
with grouped categories (Multi-Machine, Multi-Agent, Security, Reliability).
2026-03-14 02:51:22 +01:00
Hibryda
a89e2b9f69 docs: restructure docs — eliminate v3- prefix, merge findings, create decisions.md
Merge v3-task_plan.md content into architecture.md (data model, layout system,
keyboard shortcuts) and new decisions.md (22-entry categorized decisions log).
Merge v3-findings.md into unified findings.md (16 sections covering all research).
Move progress logs to progress/ subdirectory (v2.md, v3.md, v2-archive.md).
Rename v3-release-notes.md to release-notes.md. Update all cross-references.
Delete v3-task_plan.md and v3-findings.md (content fully incorporated).
2026-03-14 02:51:13 +01:00
Hibryda
60e2bfb857 docs: remove v2 task_plan.md and update all references
v2 architecture doc superseded by architecture.md, sidecar.md,
orchestration.md, and production.md. Updated cross-references in
README.md, phases.md, and .claude/CLAUDE.md.
2026-03-14 02:36:56 +01:00
Hibryda
7f005db94f docs: add cross-references to task_plan.md and update CHANGELOG 2026-03-14 02:33:59 +01:00
Hibryda
9a295c224c docs: expand README index and v3-findings with deep research content
README.md: from 42-line index to rich documentation hub with project
overview, reading order, and key directory listing.
v3-findings.md: from 63 lines to comprehensive research findings covering
adversarial review details, provider coupling analysis, codebase reuse,
session anchor design, multi-agent design rationale, theme evolution,
and performance measurements.
2026-03-14 02:33:59 +01:00
Hibryda
de8dd04f4b docs: add architecture, sidecar, orchestration, and production guides
New documentation covering end-to-end system architecture, multi-provider
sidecar lifecycle, btmsg/bttask multi-agent orchestration, and production
hardening features (supervisor, sandbox, search, plugins, secrets, audit).
2026-03-14 02:33:59 +01:00
DexterFromLab
f6b3a3e080 feat: add consult (multi-model tribunal) CLI tool
Fix NoneType bug in generate_report (counterArgument can be None).
Add consult to install.sh alongside ctx for symlink-based installation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 13:17:48 +01:00
DexterFromLab
55ba8d0969 Add incoming message visibility and shell command execution to Aider runner
- Emit 'input' events so agents show received prompts in their console
- Execute detected shell commands (btmsg, bttask, etc.) from LLM output
- Feed command results back to aider for iterative autonomous work
- Detect commands in code blocks, bare btmsg/bttask lines, and $ prefixes
- More robust THINKING/ANSWER marker detection (multiple unicode variants)
- Adapter handles new 'input' and 'tool_result' event types

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 16:24:20 +01:00
DexterFromLab
1a123f6835 Remove aider-runner.mjs from wrong path (build artifact goes to sidecar/dist/)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 16:14:01 +01:00
DexterFromLab
fd355ab6fe Batch Aider output into structured blocks instead of per-line events
Aider runner now buffers entire turn output and parses it into thinking,
text, shell command, and cost blocks. Adapter updated for new event types.
Fixes console UI showing individual chevrons per output line.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 16:09:54 +01:00
DexterFromLab
862ddfcbb8 Fix provider/model persistence and rewrite Aider runner for interactive mode
Rust groups.rs ProjectConfig was missing provider, model, and other optional
fields — serde silently dropped them on save, causing all projects to fall
back to claude/Opus on reload. Added all missing fields to both ProjectConfig
and GroupAgentConfig structs.

Rewrote aider-runner from one-shot --message mode to interactive stdin/stdout:
- Persistent aider process with multi-turn conversation support
- Pre-fetches btmsg inbox and bttask board before sending prompt to LLM
- Autonomous agent override prompt so LLM acts instead of asking for files
- Line-buffered output (no token-by-token fragments)
- Thinking block classification for DeepSeek R1
- Graceful /exit shutdown

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 16:01:46 +01:00
DexterFromLab
e8555625ff Fix horizontal grid jumping caused by scrollIntoView bubbling
scrollIntoView() in AgentPane was scrolling all ancestor containers
including ProjectGrid (overflow-x: auto), causing the entire project
grid to jump horizontally every time any agent produced output.

Replaced with direct scrollTop/scrollTo manipulation that only affects
the intended scroll container. Also removed scroll-snap-type which
caused additional snap recalculation on layout changes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:32:14 +01:00
DexterFromLab
bb09b3c0ff Give Tier 2 agents btmsg/bttask access and instructions
- Set BTMSG_AGENT_ID for all projects (not just Tier 1) so Tier 2
  agents can use btmsg/bttask CLI tools
- Add btmsg/bttask documentation to Tier 2 system prompt with
  workflow instructions (inbox, tasks, status updates)
- Unify wake/start prompts to always reference btmsg inbox

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:16:57 +01:00
DexterFromLab
c6a41018b6 Auto-wake agents on btmsg and fix unwanted auto-scroll
- Add btmsg inbox polling (10s) to AgentSession so agents wake when
  they receive messages from other agents (not just admin DMs)
- Remove automatic setActiveProject on agent activation to prevent
  focus stealing from the user
- Use untrack() in ProjectGrid scroll effect so agent re-renders
  don't trigger unwanted scrollIntoView

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 14:14:13 +01:00
DexterFromLab
5b7ad30573 Add Aider provider with OpenRouter support and per-provider sidecar routing
- Add aider-runner.ts sidecar that spawns aider CLI in non-interactive mode
- Add Aider provider metadata with OpenRouter model presets
- Add aider-messages.ts adapter for Aider event format
- Refactor SidecarManager from single-process to per-provider process management
  with lazy startup on first query and session→provider routing
- Add openrouter_api_key to secrets system (keyring storage)
- Inject OPENROUTER_API_KEY from secrets into Aider agent environment
- Register Aider in provider registry, build pipeline, and resource bundle

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 13:33:39 +01:00
DexterFromLab
35963be686 Unify provider/model config for Tier 1 and Tier 2 agents
- Add provider and model fields to both GroupAgentConfig and ProjectConfig
- Wire model override through AgentSession → AgentPane → queryAgent → sidecar
- Add model preset dropdown per provider (Opus/Sonnet/Haiku, GPT-5.4/o3, etc.)
  with custom model ID input at the bottom
- Add provider dropdown to Tier 1 agents (was Tier 2 only)
- Add "Apply & Restart" button on both tiers to restart agent with new settings
- Changing provider auto-resets model selection
- Admin bypasses stale heartbeat check in btmsg so DMs always deliver

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:54:13 +01:00
DexterFromLab
2c710fa0db Wire play/stop buttons and DM send to agent lifecycle
- Play button in GroupAgentsPanel now starts agent session via emitAgentStart
- Stop button now stops running agent session via emitAgentStop
- Sending a DM to a stopped agent auto-wakes it (sets active + emitAgentStart)
- Fix autoPrompt in AgentPane to work for fresh sessions (not just done/error)
- Fix btmsg: admin (tier 0) bypasses stale heartbeat check so messages deliver

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:33:08 +01:00
DexterFromLab
42907f22a4 Auto-scroll ProjectGrid to focused project on agent selection
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:16:13 +01:00
DexterFromLab
adde8462ef Rename binary to agent-orchestrator, add splash screen, fix white flash
- Rename Cargo package from bterminal to agent-orchestrator so WM_CLASS
  matches desktop entry and taskbar groups correctly
- Update lib name (agent_orchestrator_lib) and telemetry service name
- Add Pandora's Box splash screen with progress steps during startup
- Prevent white window flash with inline CSS and Tauri backgroundColor

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 12:09:18 +01:00
DexterFromLab
79e13649a1 docs: rewrite README for Agent Orchestrator v3
Replace old BTerminal V1 README with comprehensive Agent Orchestrator
documentation covering multi-agent orchestration, production hardening,
testing infrastructure, and architecture overview. Update screenshot
to show current V3 UI with Messages panel and agent workspace.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:51:05 +01:00
DexterFromLab
4fee567dd9 chore: rebrand to Agent Orchestrator + fix pragma busy_timeout crash
Rebrand all user-visible BTerminal references to Agent Orchestrator
(window title, product name, identifier, status bar, updater URL,
context registration, CLAUDE.md branch reference).

Fix critical btmsg/bttask crash: pragma_update uses execute() internally
but PRAGMA busy_timeout returns a result row, causing "Execute returned
results" error that silently broke all CommsTab message loading.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 11:42:13 +01:00
Hibryda
a313da8892 docs: update CHANGELOG and TODO for E2E fixture/judge fixes 2026-03-12 11:10:50 +01:00
Hibryda
a49c2010c7 fix: LLM judge CLI context isolation (--setting-sources user, cwd /tmp) 2026-03-12 11:10:50 +01:00
Hibryda
f339c5918d test: increase WebDriverIO timeout for LLM-judged E2E tests
Increase global mocha timeout from 60s to 180s in wdio.conf.js to accommodate longer-running LLM judge tests that evaluate agent responses and code generation. Add explicit per-test overrides for Phase B scenarios B4 and B5 to ensure adequate time for agent startup, execution, and LLM verification.

- wdio.conf.js: global timeout 60_000 → 180_000ms
- phase-b.test.ts: explicit 180_000ms timeout for B4 and B5 scenarios
2026-03-12 11:10:50 +01:00
Hibryda
070ef3bf48 test: update WebDriverIO configuration with improved fixture setup and logging 2026-03-12 11:10:50 +01:00
Hibryda
768a9b73e7 chore: remove obsolete rules files (consolidated into 53/54 sequence) 2026-03-12 11:10:50 +01:00
Hibryda
0f176f1fb6 chore: reorganize rules files — consolidate duplicates
Migrates legacy rule numbering (18, 20) to standardized sequence (53, 54) and adds new 18-preexisting-issues.md for handling pre-existing issues during development. This consolidates duplicate rule coverage across the old and new numbering schemes.

Files changed:
- Removed: 18-relative-units.md (moved to 53-relative-units.md)
- Removed: 20-testing-gate.md (moved to 54-testing-gate.md)
- Added: 18-preexisting-issues.md (new)
- Added: 53-relative-units.md (renamed from 18)
- Added: 54-testing-gate.md (renamed from 20)
2026-03-12 11:10:50 +01:00
Hibryda
6f93032565 docs: add comprehensive E2E testing facility documentation
New docs/e2e-testing.md covering all 3 pillars: test fixtures
(isolated temp environments), test mode (BTERMINAL_TEST=1), and
LLM judge (dual-mode CLI/API). Includes spec phases, CI integration,
WebKit2GTK pitfalls, and troubleshooting guide.
2026-03-12 11:10:50 +01:00
Hibryda
f88d10888b feat: refactor LLM judge to dual-mode CLI/API and fix config test race
Refactor llm-judge.ts from raw API-only to dual-mode: CLI first
(spawns claude with --output-format text, unsets CLAUDECODE), API
fallback. Backend selectable via LLM_JUDGE_BACKEND env var.

Fix pre-existing race condition in config.rs tests where parallel
test execution caused env var mutations to interfere. Added static
Mutex to serialize env-mutating tests.
2026-03-12 11:10:50 +01:00
Hibryda
9a90c2499a test: add Phase C E2E tests and fix pre-existing test failures
- Add phase-c.test.ts: 27 new E2E tests across 11 scenarios covering
  hardening sprint features (command palette, search overlay, notification
  center, keyboard navigation, settings panel, project health, metrics tab,
  context tab, files tab, LLM-judged settings/status bar)
- Fix 3 pre-existing failures in bterminal.test.ts: update stale CSS
  selectors (.group-name → .cmd-label, .palette-item.active → .selected)
- Register phase-c.test.ts in wdio.conf.js specs array
- Update test counts: 444 vitest + 151 cargo + 109 E2E = 704 total
2026-03-12 11:10:50 +01:00
Hibryda
05629a7204 fix: use tauri::async_runtime::spawn for WAL checkpoint task
tokio::spawn() panics during Tauri setup in WebDriver E2E mode because
the Tokio runtime is not directly accessible. Switch to
tauri::async_runtime::spawn() which uses Tauri's managed runtime.
2026-03-12 11:10:50 +01:00
Hibryda
58054e56fc docs: add v3.0 release notes and update meta files for hardening sprint
- docs/v3-release-notes.md: comprehensive v3.0 release notes covering
  Mission Control, multi-agent orchestration, production readiness,
  multi-machine early access, test coverage, and known limitations
- docs/v3-progress.md: hardening sprint session entry
- CHANGELOG.md: security entries (TLS, WAL, plugin sandbox, Landlock)
  and bug fixes (subagent delegation, gitignore)
- TODO.md: hardening complete, remaining items moved to v3.1
- CLAUDE.md: updated test counts (444 vitest + 111 cargo)
2026-03-12 11:10:50 +01:00
Hibryda
5e949696d5 fix: track plugin-host source and add 35 sandbox security tests
Fix .gitignore 'plugins/' rule that was accidentally ignoring source
files in v2/src/lib/plugins/. Narrow to /plugins/ and /v2/plugins/
(runtime plugin directories only). Track plugin-host.ts (was written
but never committed) and add comprehensive test suite covering all 13
shadowed globals, this-binding, permission gating, API freeze, and
lifecycle management.
2026-03-12 11:10:50 +01:00
Hibryda
9af3cdc637 feat: add WAL checkpoint task and improve Landlock fallback logging
Add periodic PRAGMA wal_checkpoint(TRUNCATE) every 5 minutes for both
sessions.db and btmsg.db to prevent unbounded WAL growth under sustained
multi-agent load. Improve Landlock fallback log message with kernel
version requirement. Add WAL checkpoint tests.
2026-03-12 11:10:50 +01:00
Hibryda
de8d1488e2 feat: add TLS support to bterminal-relay
Add optional --tls-cert and --tls-key CLI args. When provided, the relay
wraps TCP streams with native-tls before WebSocket upgrade. Refactored
to generic accept_ws_with_auth<S> and run_ws_session<S> to avoid code
duplication between plain and TLS paths. Client side already supports
wss:// URLs via connect_async with native-tls feature.
2026-03-12 11:10:50 +01:00
Hibryda
7e65ee2360 feat: fix subagent delegation for Manager agents
Add multi-agent delegation documentation to Manager system prompt so
Claude knows it can spawn child agents via the Agent tool. Also inject
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 env var for Manager agents.
2026-03-12 11:10:50 +01:00
Hibryda
acdab31e43 docs: update meta files for testing facility and tribunal assessment 2026-03-12 11:10:50 +01:00
Hibryda
c94a82b8dd test: update tests for production readiness features
Update btmsg-bridge, bttask-bridge, and agent-dispatcher tests for new
APIs (registerAgents, version param, notification mocks).
2026-03-12 11:10:50 +01:00
Hibryda
afc059b346 feat: integrate all production readiness modules
Register new commands in lib.rs, add command modules, update Cargo deps
(notify-rust, keyring, bundled-full), fix PRAGMA WAL for bundled-full,
add notifications/heartbeats/FTS5 indexing to agent-dispatcher,
update SettingsTab with secrets/plugins/sandbox/updates sections.
2026-03-12 11:10:50 +01:00
Hibryda
66cbee2c53 feat: add optimistic locking for bttask and error classification
Version column in tasks table with WHERE id=? AND version=? guard.
Conflict detection in TaskBoardTab. error-classifier.ts: 6 error types
with actionable messages and retry logic. UsageMeter.svelte.
2026-03-12 11:10:50 +01:00
Hibryda
a9b7ed0dda feat: add keyboard-first UX and rewrite CommandPalette
Alt+1-5 project jump, Ctrl+H/L vi-nav, Ctrl+Shift+1-9 tab switch,
Ctrl+J terminal toggle, Ctrl+Shift+K focus agent. isEditing() guard.
CommandPalette: 18+ commands, 6 categories, fuzzy filter, arrow nav.
2026-03-12 11:10:50 +01:00
Hibryda
d31a2c3ed7 feat: add agent health monitoring, audit log, and dead letter queue
heartbeats + dead_letter_queue + audit_log tables in btmsg.db. 15s
heartbeat polling in ProjectBox, stale detection, ProjectHeader heart
indicator. AuditLogTab for Manager. register_agents_from_groups() with
bidirectional contacts and review channel creation.
2026-03-12 11:10:50 +01:00
Hibryda
cb9d99d191 feat: add plugin system with sandboxed runtime
Plugin discovery from ~/.config/bterminal/plugins/ with plugin.json
manifest. Sandboxed new Function() execution, permission-gated API
(palette, btmsg:read, bttask:read, events). Plugin store + SettingsTab.
2026-03-12 11:10:50 +01:00
Hibryda
e8acd6c3d5 feat: add OS + in-app notification system
notify-rust for desktop notifications, NotificationCenter.svelte with
bell icon, unread badge, history (max 100), 6 notification types.
Extended notification store with history and type support.
2026-03-12 11:10:50 +01:00