Disable auto-scroll on output so users can read scrollback without being
jumped to bottom. Keep Claude Code tab names from config instead of
overwriting with generic VTE title.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Export dialog lets users pick specific projects, entries, summaries,
and shared context to save as JSON. Import dialog previews file
contents with checkboxes and supports overwrite/skip conflict mode.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds a StackSwitcher to the sidebar with Sessions and Ctx tabs. The Ctx panel
provides a tree view of all ctx projects/entries with a detail preview pane,
CRUD operations (add/edit/delete projects and entries), right-click context
menus, and auto-refresh on tab switch.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Shared entries (server, webhooks, workflow) were shown for every project,
causing Claude to misattribute them. Now ctx get shows only project-specific
data. Use --shared flag to include shared context when needed.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove silent setup_ctx and "Edit ctx entries" button from ClaudeCodeDialog
- Add CtxSetupWizard: 3-step guided flow (project registration, first entry, confirm)
- Show ctx status label in session dialog (registered vs new project)
- Launch wizard automatically on save when project_dir is set and ctx not initialized
- Add ctx cleanup prompt when deleting a Claude session
- Extract helper functions: _detect_project_description, _is_ctx_project_registered
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Increase global mocha timeout from 60s to 180s in wdio.conf.js to accommodate longer-running LLM judge tests that evaluate agent responses and code generation. Add explicit per-test overrides for Phase B scenarios B4 and B5 to ensure adequate time for agent startup, execution, and LLM verification.
- wdio.conf.js: global timeout 60_000 → 180_000ms
- phase-b.test.ts: explicit 180_000ms timeout for B4 and B5 scenarios
Migrates legacy rule numbering (18, 20) to standardized sequence (53, 54) and adds new 18-preexisting-issues.md for handling pre-existing issues during development. This consolidates duplicate rule coverage across the old and new numbering schemes.
Files changed:
- Removed: 18-relative-units.md (moved to 53-relative-units.md)
- Removed: 20-testing-gate.md (moved to 54-testing-gate.md)
- Added: 18-preexisting-issues.md (new)
- Added: 53-relative-units.md (renamed from 18)
- Added: 54-testing-gate.md (renamed from 20)
New docs/e2e-testing.md covering all 3 pillars: test fixtures
(isolated temp environments), test mode (BTERMINAL_TEST=1), and
LLM judge (dual-mode CLI/API). Includes spec phases, CI integration,
WebKit2GTK pitfalls, and troubleshooting guide.
Refactor llm-judge.ts from raw API-only to dual-mode: CLI first
(spawns claude with --output-format text, unsets CLAUDECODE), API
fallback. Backend selectable via LLM_JUDGE_BACKEND env var.
Fix pre-existing race condition in config.rs tests where parallel
test execution caused env var mutations to interfere. Added static
Mutex to serialize env-mutating tests.
tokio::spawn() panics during Tauri setup in WebDriver E2E mode because
the Tokio runtime is not directly accessible. Switch to
tauri::async_runtime::spawn() which uses Tauri's managed runtime.
Fix .gitignore 'plugins/' rule that was accidentally ignoring source
files in v2/src/lib/plugins/. Narrow to /plugins/ and /v2/plugins/
(runtime plugin directories only). Track plugin-host.ts (was written
but never committed) and add comprehensive test suite covering all 13
shadowed globals, this-binding, permission gating, API freeze, and
lifecycle management.
Add periodic PRAGMA wal_checkpoint(TRUNCATE) every 5 minutes for both
sessions.db and btmsg.db to prevent unbounded WAL growth under sustained
multi-agent load. Improve Landlock fallback log message with kernel
version requirement. Add WAL checkpoint tests.
Add optional --tls-cert and --tls-key CLI args. When provided, the relay
wraps TCP streams with native-tls before WebSocket upgrade. Refactored
to generic accept_ws_with_auth<S> and run_ws_session<S> to avoid code
duplication between plain and TLS paths. Client side already supports
wss:// URLs via connect_async with native-tls feature.
Add multi-agent delegation documentation to Manager system prompt so
Claude knows it can spawn child agents via the Agent tool. Also inject
CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 env var for Manager agents.
Version column in tasks table with WHERE id=? AND version=? guard.
Conflict detection in TaskBoardTab. error-classifier.ts: 6 error types
with actionable messages and retry logic. UsageMeter.svelte.
heartbeats + dead_letter_queue + audit_log tables in btmsg.db. 15s
heartbeat polling in ProjectBox, stale detection, ProjectHeader heart
indicator. AuditLogTab for Manager. register_agents_from_groups() with
bidirectional contacts and review channel creation.
Plugin discovery from ~/.config/bterminal/plugins/ with plugin.json
manifest. Sandboxed new Function() execution, permission-gated API
(palette, btmsg:read, bttask:read, events). Plugin store + SettingsTab.
notify-rust for desktop notifications, NotificationCenter.svelte with
bell icon, unread badge, history (max 100), 6 notification types.
Extended notification store with history and type support.
SandboxConfig with RW/RO paths applied via pre_exec() in sidecar child
process. Requires kernel 6.2+ with graceful fallback. Per-project toggle
in SettingsTab. 9 unit tests.
Update CLAUDE.md with test runner in key paths and build commands.
Update .claude/CLAUDE.md with testing gate rule index entry.
Update TODO.md with tribunal-derived roadmap items.
Update CHANGELOG.md with test runner and testing gate entries.
Create v2/scripts/test-all.sh (vitest + cargo + optional E2E via --e2e).
Add npm scripts: test:all, test:all:e2e, test:cargo.
Add .claude/rules/20-testing-gate.md requiring full suite after major changes.
Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid
rendering, independent tab switching, status bar fleet state, and
LLM-judged agent response quality evaluation via Claude API.
Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5,
structured verdicts with confidence thresholds).
7 human-authored test scenarios (22 tests) using data-testid
selectors. Test fixture generator for isolated environments.
JSON results store (no native deps). WebDriverIO config updated
with TCP readiness probe and multi-spec support.
Stable test selectors for E2E: agent-pane, data-agent-status,
project-box, data-project-id, status-bar, agent-session,
sidebar-rail, command-palette, terminal-tabs and more.
Reviewer workflow in agent-prompts.ts (8-step process), Rust auto-post
to #review-queue on task->review transition, reviewQueueDepth in
attention scoring (10pts/task cap 50), Tasks tab for reviewer in
ProjectBox with 10s queue polling. 7 vitest + 4 cargo tests.
New wake system for Manager agents: persistent (resume prompt), on-demand
(fresh session), smart (threshold-gated). 6 wake signals from tribunal S-3
hybrid. Pure scorer function (24 tests), Svelte 5 rune scheduler store,
SettingsTab UI (strategy button + threshold slider), AgentSession integration.