feat: Agent Orchestrator — multi-project agent dashboard

Tauri + Svelte 5 + Rust application for orchestrating multiple AI coding agents. Includes Claude, Aider, Codex, and Ollama provider support, multi-agent communication (btmsg/bttask), session anchors, plugin sandbox, FTS5 search, Landlock sandboxing, and 507 vitest + 110 cargo tests.
2026-03-15 15:45:27 +01:00 · 2026-03-15 15:45:27 +01:00 · 3672e92b7e
commit 3672e92b7e
272 changed files with 68600 additions and 0 deletions
--- a/tests/e2e/README.md
+++ b/tests/e2e/README.md
@ -0,0 +1,143 @@
+# E2E Tests (WebDriver)
+
+Tauri apps use the WebDriver protocol for E2E testing (not Playwright directly).
+The app runs inside WebKit2GTK on Linux, so tests interact with the real WebView.
+
+## Prerequisites
+
+- Rust toolchain (for building the Tauri app)
+- Display server (X11 or Wayland) — headless Xvfb works for CI
+- `tauri-driver` installed: `cargo install tauri-driver`
+- `webkit2gtk-driver` system package: `sudo apt install webkit2gtk-driver`
+- npm devDeps already in package.json (`@wdio/cli`, `@wdio/local-runner`, `@wdio/mocha-framework`, `@wdio/spec-reporter`)
+
+## Running
+
+```bash
+# From v2/ directory — builds debug binary automatically, spawns tauri-driver
+npm run test:e2e
+
+# Skip rebuild (use existing binary)
+SKIP_BUILD=1 npm run test:e2e
+
+# With test isolation (custom data/config dirs)
+BTERMINAL_TEST_DATA_DIR=/tmp/bt-test/data BTERMINAL_TEST_CONFIG_DIR=/tmp/bt-test/config npm run test:e2e
+```
+
+The `wdio.conf.js` handles:
+1. Building the debug binary (`cargo tauri build --debug --no-bundle`) in `onPrepare`
+2. Spawning `tauri-driver` before each session (TCP readiness probe, 10s deadline)
+3. Killing `tauri-driver` after each session
+4. Passing `BTERMINAL_TEST=1` env var to the app for test mode isolation
+
+## Test Mode (`BTERMINAL_TEST=1`)
+
+When `BTERMINAL_TEST=1` is set:
+- File watchers (watcher.rs, fs_watcher.rs) are disabled to avoid inotify noise
+- Wake scheduler is disabled (no auto-wake timers)
+- Data/config directories can be overridden via `BTERMINAL_TEST_DATA_DIR` / `BTERMINAL_TEST_CONFIG_DIR`
+
+## CI setup (headless)
+
+```bash
+# Install virtual framebuffer + WebKit driver
+sudo apt install xvfb webkit2gtk-driver
+
+# Run with Xvfb wrapper
+xvfb-run npm run test:e2e
+```
+
+## Writing tests
+
+Tests use WebdriverIO with Mocha. Specs go in `specs/`:
+
+```typescript
+import { browser, expect } from '@wdio/globals';
+
+describe('BTerminal', () => {
+  it('should show the status bar', async () => {
+    const statusBar = await browser.$('[data-testid="status-bar"]');
+    await expect(statusBar).toBeDisplayed();
+  });
+});
+```
+
+### Stable selectors
+
+Prefer `data-testid` attributes over CSS class selectors:
+
+| Element | Selector |
+|---------|----------|
+| Status bar | `[data-testid="status-bar"]` |
+| Sidebar rail | `[data-testid="sidebar-rail"]` |
+| Settings button | `[data-testid="settings-btn"]` |
+| Project box | `[data-testid="project-box"]` |
+| Project ID | `[data-project-id="..."]` |
+| Project tabs | `[data-testid="project-tabs"]` |
+| Agent session | `[data-testid="agent-session"]` |
+| Agent pane | `[data-testid="agent-pane"]` |
+| Agent status | `[data-agent-status="idle\|running\|..."]` |
+| Agent messages | `[data-testid="agent-messages"]` |
+| Agent prompt | `[data-testid="agent-prompt"]` |
+| Agent submit | `[data-testid="agent-submit"]` |
+| Agent stop | `[data-testid="agent-stop"]` |
+| Terminal tabs | `[data-testid="terminal-tabs"]` |
+| Add tab button | `[data-testid="tab-add"]` |
+| Terminal toggle | `[data-testid="terminal-toggle"]` |
+| Command palette | `[data-testid="command-palette"]` |
+| Palette input | `[data-testid="palette-input"]` |
+
+### Key constraints
+
+- `maxInstances: 1` — Tauri doesn't support parallel WebDriver sessions
+- Mocha timeout is 60s — the app needs time to initialize
+- Tests interact with the real WebKit2GTK WebView, not a browser
+- Use `browser.execute()` for JS clicks when WebDriver clicks don't trigger Svelte handlers
+- Agent tests (Scenario 7) require a real Claude CLI install + API key — they skip gracefully if unavailable
+
+## Test infrastructure
+
+### Fixtures (`fixtures.ts`)
+
+Creates isolated test environments with temp data/config dirs and git repos:
+
+```typescript
+import { createTestFixture, destroyTestFixture } from '../fixtures';
+
+const fixture = createTestFixture('my-test');
+// fixture.dataDir, fixture.configDir, fixture.projectDir, fixture.env
+destroyTestFixture(fixture);
+```
+
+### Results DB (`results-db.ts`)
+
+JSON-based test results store for tracking runs and steps:
+
+```typescript
+import { ResultsDb } from '../results-db';
+
+const db = new ResultsDb();
+db.startRun('run-001', 'v2-mission-control', 'abc123');
+db.recordStep({ run_id: 'run-001', scenario_name: 'Smoke', step_name: 'renders', status: 'passed', ... });
+db.finishRun('run-001', 'passed', 5000);
+```
+
+## File structure
+
+```
+tests/e2e/
+├── README.md                         # This file
+├── wdio.conf.js                      # WebdriverIO config with tauri-driver lifecycle
+├── tsconfig.json                     # TypeScript config for test specs
+├── fixtures.ts                       # Test fixture generator (isolated environments)
+├── results-db.ts                     # JSON test results store
+└── specs/
+    ├── bterminal.test.ts             # Smoke tests (CSS class selectors, 50+ tests)
+    └── agent-scenarios.test.ts       # Phase A scenarios (data-testid selectors, 22 tests)
+```
+
+## References
+
+- Tauri WebDriver docs: https://v2.tauri.app/develop/tests/webdriver/
+- WebdriverIO docs: https://webdriver.io/
+- tauri-driver: https://crates.io/crates/tauri-driver
--- a/tests/e2e/fixtures.ts
+++ b/tests/e2e/fixtures.ts
@ -0,0 +1,142 @@
+// Test fixture generator — creates isolated test environments
+// Used by E2E tests to set up temp data/config dirs with valid groups.json
+
+import { mkdirSync, writeFileSync, rmSync, existsSync } from 'node:fs';
+import { join } from 'node:path';
+import { execSync } from 'node:child_process';
+import { tmpdir } from 'node:os';
+
+export interface TestFixture {
+  /** Root temp directory for this test run */
+  rootDir: string;
+  /** BTERMINAL_TEST_DATA_DIR — isolated data dir */
+  dataDir: string;
+  /** BTERMINAL_TEST_CONFIG_DIR — isolated config dir */
+  configDir: string;
+  /** Path to a minimal git repo for agent testing */
+  projectDir: string;
+  /** Environment variables to pass to the app */
+  env: Record<string, string>;
+}
+
+/**
+ * Create an isolated test fixture with:
+ * - Temp data dir (sessions.db, btmsg.db created at runtime)
+ * - Temp config dir with a minimal groups.json
+ * - A simple git repo with one file for agent testing
+ */
+export function createTestFixture(name = 'bterminal-e2e'): TestFixture {
+  const rootDir = join(tmpdir(), `${name}-${Date.now()}`);
+  const dataDir = join(rootDir, 'data');
+  const configDir = join(rootDir, 'config');
+  const projectDir = join(rootDir, 'test-project');
+
+  // Create directory structure
+  mkdirSync(dataDir, { recursive: true });
+  mkdirSync(configDir, { recursive: true });
+  mkdirSync(projectDir, { recursive: true });
+
+  // Create a minimal git repo for agent testing
+  execSync('git init', { cwd: projectDir, stdio: 'ignore' });
+  execSync('git config user.email "test@bterminal.dev"', { cwd: projectDir, stdio: 'ignore' });
+  execSync('git config user.name "BTerminal Test"', { cwd: projectDir, stdio: 'ignore' });
+  writeFileSync(join(projectDir, 'README.md'), '# Test Project\n\nA simple test project for BTerminal E2E tests.\n');
+  writeFileSync(join(projectDir, 'hello.py'), 'def greet(name: str) -> str:\n    return f"Hello, {name}!"\n');
+  execSync('git add -A && git commit -m "initial commit"', { cwd: projectDir, stdio: 'ignore' });
+
+  // Write groups.json with one group containing the test project
+  const groupsJson = {
+    version: 1,
+    groups: [
+      {
+        id: 'test-group',
+        name: 'Test Group',
+        projects: [
+          {
+            id: 'test-project',
+            name: 'Test Project',
+            identifier: 'test-project',
+            description: 'E2E test project',
+            icon: '\uf120',
+            cwd: projectDir,
+            profile: 'default',
+            enabled: true,
+          },
+        ],
+        agents: [],
+      },
+    ],
+    activeGroupId: 'test-group',
+  };
+
+  writeFileSync(
+    join(configDir, 'groups.json'),
+    JSON.stringify(groupsJson, null, 2),
+  );
+
+  const env: Record<string, string> = {
+    BTERMINAL_TEST: '1',
+    BTERMINAL_TEST_DATA_DIR: dataDir,
+    BTERMINAL_TEST_CONFIG_DIR: configDir,
+  };
+
+  return { rootDir, dataDir, configDir, projectDir, env };
+}
+
+/**
+ * Clean up a test fixture's temporary directories.
+ */
+export function destroyTestFixture(fixture: TestFixture): void {
+  if (existsSync(fixture.rootDir)) {
+    rmSync(fixture.rootDir, { recursive: true, force: true });
+  }
+}
+
+/**
+ * Create a groups.json with multiple projects for multi-project testing.
+ */
+export function createMultiProjectFixture(projectCount = 3): TestFixture {
+  const fixture = createTestFixture('bterminal-multi');
+
+  const projects = [];
+  for (let i = 0; i < projectCount; i++) {
+    const projDir = join(fixture.rootDir, `project-${i}`);
+    mkdirSync(projDir, { recursive: true });
+    execSync('git init', { cwd: projDir, stdio: 'ignore' });
+    execSync('git config user.email "test@bterminal.dev"', { cwd: projDir, stdio: 'ignore' });
+    execSync('git config user.name "BTerminal Test"', { cwd: projDir, stdio: 'ignore' });
+    writeFileSync(join(projDir, 'README.md'), `# Project ${i}\n`);
+    execSync('git add -A && git commit -m "init"', { cwd: projDir, stdio: 'ignore' });
+
+    projects.push({
+      id: `project-${i}`,
+      name: `Project ${i}`,
+      identifier: `project-${i}`,
+      description: `Test project ${i}`,
+      icon: '\uf120',
+      cwd: projDir,
+      profile: 'default',
+      enabled: true,
+    });
+  }
+
+  const groupsJson = {
+    version: 1,
+    groups: [
+      {
+        id: 'multi-group',
+        name: 'Multi Project Group',
+        projects,
+        agents: [],
+      },
+    ],
+    activeGroupId: 'multi-group',
+  };
+
+  writeFileSync(
+    join(fixture.configDir, 'groups.json'),
+    JSON.stringify(groupsJson, null, 2),
+  );
+
+  return fixture;
+}
--- a/tests/e2e/llm-judge.ts
+++ b/tests/e2e/llm-judge.ts
@ -0,0 +1,231 @@
+// LLM Judge — evaluates test outcomes via Claude.
+//
+// Two backends, configurable via LLM_JUDGE_BACKEND env var:
+//   "cli"  — Claude CLI (default, no API key needed)
+//   "api"  — Anthropic REST API (requires ANTHROPIC_API_KEY)
+//
+// CLI backend: spawns `claude` with --output-format text, parses JSON verdict.
+// API backend: raw fetch to messages API, same JSON verdict parsing.
+//
+// Skips gracefully when neither backend is available.
+
+import { execFileSync, execSync } from 'node:child_process';
+import { existsSync } from 'node:fs';
+
+const MODEL = 'claude-haiku-4-5-20251001';
+const API_URL = 'https://api.anthropic.com/v1/messages';
+const MAX_TOKENS = 512;
+
+// CLI search paths (in order)
+const CLI_PATHS = [
+  `${process.env.HOME}/.local/bin/claude`,
+  `${process.env.HOME}/.claude/local/claude`,
+  '/usr/local/bin/claude',
+  '/usr/bin/claude',
+];
+
+export type JudgeBackend = 'cli' | 'api';
+
+export interface JudgeVerdict {
+  pass: boolean;
+  reasoning: string;
+  confidence: number; // 0-1
+}
+
+/**
+ * Find the Claude CLI binary path, or null if not installed.
+ */
+function findClaudeCli(): string | null {
+  for (const p of CLI_PATHS) {
+    if (existsSync(p)) return p;
+  }
+  // Fallback: check PATH
+  try {
+    const which = execSync('which claude 2>/dev/null', { encoding: 'utf-8' }).trim();
+    if (which) return which;
+  } catch {
+    // not found
+  }
+  return null;
+}
+
+/**
+ * Determine which backend to use.
+ * Env var LLM_JUDGE_BACKEND overrides auto-detection.
+ * Auto: CLI if available, then API if key set, else null.
+ */
+function resolveBackend(): JudgeBackend | null {
+  const explicit = process.env.LLM_JUDGE_BACKEND?.toLowerCase();
+  if (explicit === 'cli') return findClaudeCli() ? 'cli' : null;
+  if (explicit === 'api') return process.env.ANTHROPIC_API_KEY ? 'api' : null;
+
+  // Auto-detect: CLI first, API fallback
+  if (findClaudeCli()) return 'cli';
+  if (process.env.ANTHROPIC_API_KEY) return 'api';
+  return null;
+}
+
+/**
+ * Check if the LLM judge is available (CLI installed or API key set).
+ */
+export function isJudgeAvailable(): boolean {
+  return resolveBackend() !== null;
+}
+
+/**
+ * Build the prompt for the judge.
+ */
+function buildPrompt(criteria: string, actual: string, context?: string): { system: string; user: string } {
+  const system = `You are a test assertion judge for a terminal emulator application called BTerminal.
+Your job is to evaluate whether actual output from the application meets the given criteria.
+Respond with EXACTLY this JSON format, nothing else:
+{"pass": true/false, "reasoning": "brief explanation", "confidence": 0.0-1.0}`;
+
+  const user = [
+    '## Criteria',
+    criteria,
+    '',
+    '## Actual Output',
+    actual,
+    ...(context ? ['', '## Additional Context', context] : []),
+    '',
+    'Does the actual output satisfy the criteria? Respond with JSON only.',
+  ].join('\n');
+
+  return { system, user };
+}
+
+/**
+ * Extract and validate a JudgeVerdict from raw text output.
+ */
+function parseVerdict(text: string): JudgeVerdict {
+  const jsonMatch = text.match(/\{[\s\S]*\}/);
+  if (!jsonMatch) {
+    throw new Error(`LLM judge returned non-JSON: ${text}`);
+  }
+
+  const verdict = JSON.parse(jsonMatch[0]) as JudgeVerdict;
+
+  if (typeof verdict.pass !== 'boolean') {
+    throw new Error(`LLM judge returned invalid verdict: ${text}`);
+  }
+  verdict.confidence = Number(verdict.confidence) || 0;
+  verdict.reasoning = String(verdict.reasoning || '');
+
+  return verdict;
+}
+
+/**
+ * Judge via Claude CLI (spawns subprocess).
+ * Unsets CLAUDECODE to avoid nested session errors.
+ */
+async function judgeCli(
+  criteria: string,
+  actual: string,
+  context?: string,
+): Promise<JudgeVerdict> {
+  const cliPath = findClaudeCli();
+  if (!cliPath) throw new Error('Claude CLI not found');
+
+  const { system, user } = buildPrompt(criteria, actual, context);
+
+  const output = execFileSync(cliPath, [
+    '-p', user,
+    '--model', MODEL,
+    '--output-format', 'text',
+    '--system-prompt', system,
+    '--setting-sources', 'user',   // skip project CLAUDE.md
+  ], {
+    encoding: 'utf-8',
+    timeout: 60_000,
+    cwd: '/tmp',                   // avoid loading project CLAUDE.md
+    env: { ...process.env, CLAUDECODE: '' },
+    maxBuffer: 1024 * 1024,
+  });
+
+  return parseVerdict(output);
+}
+
+/**
+ * Judge via Anthropic REST API (raw fetch).
+ */
+async function judgeApi(
+  criteria: string,
+  actual: string,
+  context?: string,
+): Promise<JudgeVerdict> {
+  const apiKey = process.env.ANTHROPIC_API_KEY;
+  if (!apiKey) throw new Error('ANTHROPIC_API_KEY not set');
+
+  const { system, user } = buildPrompt(criteria, actual, context);
+
+  const response = await fetch(API_URL, {
+    method: 'POST',
+    headers: {
+      'Content-Type': 'application/json',
+      'x-api-key': apiKey,
+      'anthropic-version': '2023-06-01',
+    },
+    body: JSON.stringify({
+      model: MODEL,
+      max_tokens: MAX_TOKENS,
+      system,
+      messages: [{ role: 'user', content: user }],
+    }),
+  });
+
+  if (!response.ok) {
+    const body = await response.text();
+    throw new Error(`Anthropic API error ${response.status}: ${body}`);
+  }
+
+  const data = await response.json();
+  const text = data.content?.[0]?.text ?? '';
+
+  return parseVerdict(text);
+}
+
+/**
+ * Ask Claude to evaluate whether `actual` output satisfies `criteria`.
+ *
+ * Uses CLI backend by default, falls back to API. Override with
+ * LLM_JUDGE_BACKEND env var ("cli" or "api").
+ *
+ * Returns a structured verdict with pass/fail, reasoning, and confidence.
+ * Throws if no backend available or call fails.
+ */
+export async function judge(
+  criteria: string,
+  actual: string,
+  context?: string,
+): Promise<JudgeVerdict> {
+  const backend = resolveBackend();
+  if (!backend) {
+    throw new Error('LLM judge unavailable — no Claude CLI found and ANTHROPIC_API_KEY not set');
+  }
+
+  if (backend === 'cli') {
+    return judgeCli(criteria, actual, context);
+  }
+  return judgeApi(criteria, actual, context);
+}
+
+/**
+ * Convenience: judge with a minimum confidence threshold.
+ * Returns pass=true only if verdict.pass=true AND confidence >= threshold.
+ */
+export async function assertWithJudge(
+  criteria: string,
+  actual: string,
+  options: { context?: string; minConfidence?: number } = {},
+): Promise<JudgeVerdict> {
+  const { context, minConfidence = 0.7 } = options;
+  const verdict = await judge(criteria, actual, context);
+
+  if (verdict.pass && verdict.confidence < minConfidence) {
+    verdict.pass = false;
+    verdict.reasoning += ` (confidence ${verdict.confidence} below threshold ${minConfidence})`;
+  }
+
+  return verdict;
+}
--- a/tests/e2e/results-db.ts
+++ b/tests/e2e/results-db.ts
@ -0,0 +1,113 @@
+// Test results store — persists test run outcomes as JSON for analysis
+// No native deps needed — reads/writes a JSON file
+
+import { resolve, dirname } from 'node:path';
+import { mkdirSync, readFileSync, writeFileSync, existsSync } from 'node:fs';
+import { fileURLToPath } from 'node:url';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const DEFAULT_PATH = resolve(__dirname, '../../test-results/results.json');
+
+export interface TestRunRow {
+  run_id: string;
+  started_at: string;
+  finished_at: string | null;
+  status: 'running' | 'passed' | 'failed' | 'error';
+  total_tests: number;
+  passed_tests: number;
+  failed_tests: number;
+  duration_ms: number | null;
+  git_branch: string | null;
+  git_sha: string | null;
+}
+
+export interface TestStepRow {
+  run_id: string;
+  scenario_name: string;
+  step_name: string;
+  status: 'passed' | 'failed' | 'skipped' | 'error';
+  duration_ms: number | null;
+  error_message: string | null;
+  screenshot_path: string | null;
+  agent_cost_usd: number | null;
+  created_at: string;
+}
+
+interface ResultsStore {
+  runs: TestRunRow[];
+  steps: TestStepRow[];
+}
+
+export class ResultsDb {
+  private filePath: string;
+  private store: ResultsStore;
+
+  constructor(filePath = DEFAULT_PATH) {
+    this.filePath = filePath;
+    mkdirSync(dirname(filePath), { recursive: true });
+    this.store = this.load();
+  }
+
+  private load(): ResultsStore {
+    if (existsSync(this.filePath)) {
+      try {
+        return JSON.parse(readFileSync(this.filePath, 'utf-8'));
+      } catch {
+        return { runs: [], steps: [] };
+      }
+    }
+    return { runs: [], steps: [] };
+  }
+
+  private save(): void {
+    writeFileSync(this.filePath, JSON.stringify(this.store, null, 2));
+  }
+
+  startRun(runId: string, gitBranch?: string, gitSha?: string): void {
+    this.store.runs.push({
+      run_id: runId,
+      started_at: new Date().toISOString(),
+      finished_at: null,
+      status: 'running',
+      total_tests: 0,
+      passed_tests: 0,
+      failed_tests: 0,
+      duration_ms: null,
+      git_branch: gitBranch ?? null,
+      git_sha: gitSha ?? null,
+    });
+    this.save();
+  }
+
+  finishRun(runId: string, status: 'passed' | 'failed' | 'error', durationMs: number): void {
+    const run = this.store.runs.find(r => r.run_id === runId);
+    if (!run) return;
+
+    const steps = this.store.steps.filter(s => s.run_id === runId);
+    run.finished_at = new Date().toISOString();
+    run.status = status;
+    run.duration_ms = durationMs;
+    run.total_tests = steps.length;
+    run.passed_tests = steps.filter(s => s.status === 'passed').length;
+    run.failed_tests = steps.filter(s => s.status === 'failed' || s.status === 'error').length;
+    this.save();
+  }
+
+  recordStep(step: Omit<TestStepRow, 'created_at'>): void {
+    this.store.steps.push({
+      ...step,
+      created_at: new Date().toISOString(),
+    });
+    this.save();
+  }
+
+  getRecentRuns(limit = 20): TestRunRow[] {
+    return this.store.runs
+      .sort((a, b) => b.started_at.localeCompare(a.started_at))
+      .slice(0, limit);
+  }
+
+  getStepsForRun(runId: string): TestStepRow[] {
+    return this.store.steps.filter(s => s.run_id === runId);
+  }
+}
--- a/tests/e2e/specs/agent-scenarios.test.ts
+++ b/tests/e2e/specs/agent-scenarios.test.ts
@ -0,0 +1,429 @@
+import { browser, expect } from '@wdio/globals';
+
+// Phase A: Human-authored E2E scenarios with deterministic assertions.
+// These test the agent UI flow end-to-end using stable data-testid selectors.
+// Agent-interaction tests require a real Claude CLI install + API key.
+
+// ─── Helpers ──────────────────────────────────────────────────────────
+
+/** Wait for agent status to reach a target value within timeout. */
+async function waitForAgentStatus(
+  status: string,
+  timeout = 30_000,
+): Promise<void> {
+  await browser.waitUntil(
+    async () => {
+      const attr = await browser.execute(() => {
+        const el = document.querySelector('[data-testid="agent-pane"]');
+        return el?.getAttribute('data-agent-status') ?? 'idle';
+      });
+      return attr === status;
+    },
+    { timeout, timeoutMsg: `Agent did not reach status "${status}" within ${timeout}ms` },
+  );
+}
+
+/** Check if an agent pane exists and is visible. */
+async function agentPaneExists(): Promise<boolean> {
+  const el = await browser.$('[data-testid="agent-pane"]');
+  return el.isExisting();
+}
+
+/** Type a prompt into the agent textarea and submit. */
+async function sendAgentPrompt(text: string): Promise<void> {
+  const textarea = await browser.$('[data-testid="agent-prompt"]');
+  await textarea.waitForDisplayed({ timeout: 5000 });
+  await textarea.setValue(text);
+  // Small delay for Svelte reactivity
+  await browser.pause(200);
+  const submitBtn = await browser.$('[data-testid="agent-submit"]');
+  await browser.execute((el) => (el as HTMLElement).click(), submitBtn);
+}
+
+// ─── Scenario 1: App renders with project grid and data-testid anchors ───
+
+describe('Scenario 1 — App Structural Integrity', () => {
+  it('should render the status bar with data-testid', async () => {
+    const bar = await browser.$('[data-testid="status-bar"]');
+    await expect(bar).toBeDisplayed();
+  });
+
+  it('should render the sidebar rail with data-testid', async () => {
+    const rail = await browser.$('[data-testid="sidebar-rail"]');
+    await expect(rail).toBeDisplayed();
+  });
+
+  it('should render at least one project box with data-testid', async () => {
+    const boxes = await browser.$$('[data-testid="project-box"]');
+    expect(boxes.length).toBeGreaterThanOrEqual(1);
+  });
+
+  it('should have data-project-id on project boxes', async () => {
+    const projectId = await browser.execute(() => {
+      const box = document.querySelector('[data-testid="project-box"]');
+      return box?.getAttribute('data-project-id') ?? null;
+    });
+    expect(projectId).not.toBeNull();
+    expect((projectId as string).length).toBeGreaterThan(0);
+  });
+
+  it('should render project tabs with data-testid', async () => {
+    const tabs = await browser.$('[data-testid="project-tabs"]');
+    await expect(tabs).toBeDisplayed();
+  });
+
+  it('should render agent session component', async () => {
+    const session = await browser.$('[data-testid="agent-session"]');
+    await expect(session).toBeDisplayed();
+  });
+});
+
+// ─── Scenario 2: Settings panel via data-testid ──────────────────────
+
+describe('Scenario 2 — Settings Panel (data-testid)', () => {
+  it('should open settings via data-testid button', async () => {
+    // Use JS click for reliability with WebKit2GTK/tauri-driver
+    await browser.execute(() => {
+      const btn = document.querySelector('[data-testid="settings-btn"]');
+      if (btn) (btn as HTMLElement).click();
+    });
+
+    const panel = await browser.$('.sidebar-panel');
+    await panel.waitForDisplayed({ timeout: 5000 });
+    // Wait for settings content to mount
+    await browser.waitUntil(
+      async () => {
+        const count = await browser.execute(() =>
+          document.querySelectorAll('.settings-tab .settings-section').length,
+        );
+        return (count as number) >= 1;
+      },
+      { timeout: 5000 },
+    );
+    await expect(panel).toBeDisplayed();
+  });
+
+  it('should close settings with Escape', async () => {
+    await browser.keys('Escape');
+    const panel = await browser.$('.sidebar-panel');
+    await panel.waitForDisplayed({ timeout: 3000, reverse: true });
+  });
+});
+
+// ─── Scenario 3: Agent pane initial state ────────────────────────────
+
+describe('Scenario 3 — Agent Pane Initial State', () => {
+  it('should display agent pane in idle status', async () => {
+    const exists = await agentPaneExists();
+    if (!exists) {
+      // Agent pane might not be visible until Model tab is active
+      await browser.execute(() => {
+        const tab = document.querySelector('[data-testid="project-tabs"] .ptab');
+        if (tab) (tab as HTMLElement).click();
+      });
+      await browser.pause(300);
+    }
+
+    const pane = await browser.$('[data-testid="agent-pane"]');
+    await expect(pane).toBeExisting();
+
+    const status = await browser.execute(() => {
+      const el = document.querySelector('[data-testid="agent-pane"]');
+      return el?.getAttribute('data-agent-status') ?? 'unknown';
+    });
+    expect(status).toBe('idle');
+  });
+
+  it('should show prompt textarea', async () => {
+    const textarea = await browser.$('[data-testid="agent-prompt"]');
+    await expect(textarea).toBeDisplayed();
+  });
+
+  it('should show submit button', async () => {
+    const btn = await browser.$('[data-testid="agent-submit"]');
+    await expect(btn).toBeExisting();
+  });
+
+  it('should have empty messages area initially', async () => {
+    const msgArea = await browser.$('[data-testid="agent-messages"]');
+    await expect(msgArea).toBeExisting();
+
+    // No message bubbles should exist in a fresh session
+    const msgCount = await browser.execute(() => {
+      const area = document.querySelector('[data-testid="agent-messages"]');
+      if (!area) return 0;
+      return area.querySelectorAll('.message').length;
+    });
+    expect(msgCount).toBe(0);
+  });
+});
+
+// ─── Scenario 4: Terminal tab management ─────────────────────────────
+
+describe('Scenario 4 — Terminal Tab Management (data-testid)', () => {
+  before(async () => {
+    // Ensure Model tab is active and terminal section visible
+    await browser.execute(() => {
+      const tab = document.querySelector('[data-testid="project-tabs"] .ptab');
+      if (tab) (tab as HTMLElement).click();
+    });
+    await browser.pause(300);
+
+    // Expand terminal section
+    await browser.execute(() => {
+      const toggle = document.querySelector('[data-testid="terminal-toggle"]');
+      if (toggle) (toggle as HTMLElement).click();
+    });
+    await browser.pause(500);
+  });
+
+  it('should display terminal tabs container', async () => {
+    const tabs = await browser.$('[data-testid="terminal-tabs"]');
+    await expect(tabs).toBeDisplayed();
+  });
+
+  it('should add a shell tab via data-testid button', async () => {
+    await browser.execute(() => {
+      const btn = document.querySelector('[data-testid="tab-add"]');
+      if (btn) (btn as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    const tabTitle = await browser.execute(() => {
+      const el = document.querySelector('.tab-bar .tab-title');
+      return el?.textContent ?? '';
+    });
+    expect(tabTitle.toLowerCase()).toContain('shell');
+  });
+
+  it('should show active tab styling', async () => {
+    const activeTab = await browser.$('.tab.active');
+    await expect(activeTab).toBeExisting();
+  });
+
+  it('should close tab and show empty state', async () => {
+    // Close all tabs
+    await browser.execute(() => {
+      const closeBtns = document.querySelectorAll('.tab-close');
+      closeBtns.forEach(btn => (btn as HTMLElement).click());
+    });
+    await browser.pause(500);
+
+    // Should show empty terminal area with "Open terminal" button
+    const emptyBtn = await browser.$('.add-first');
+    await expect(emptyBtn).toBeDisplayed();
+  });
+
+  after(async () => {
+    // Collapse terminal section
+    await browser.execute(() => {
+      const toggle = document.querySelector('[data-testid="terminal-toggle"]');
+      const chevron = toggle?.querySelector('.toggle-chevron.expanded');
+      if (chevron) (toggle as HTMLElement).click();
+    });
+    await browser.pause(300);
+  });
+});
+
+// ─── Scenario 5: Command palette with data-testid ───────────────────
+
+describe('Scenario 5 — Command Palette (data-testid)', () => {
+  it('should open palette and show data-testid input', async () => {
+    await browser.execute(() => document.body.focus());
+    await browser.pause(200);
+    await browser.keys(['Control', 'k']);
+
+    const palette = await browser.$('[data-testid="command-palette"]');
+    await palette.waitForDisplayed({ timeout: 3000 });
+
+    const input = await browser.$('[data-testid="palette-input"]');
+    await expect(input).toBeDisplayed();
+  });
+
+  it('should have focused input', async () => {
+    // Use programmatic focus check (auto-focus may not work in WebKit2GTK/tauri-driver)
+    const isFocused = await browser.execute(() => {
+      const el = document.querySelector('[data-testid="palette-input"]') as HTMLInputElement | null;
+      if (!el) return false;
+      el.focus(); // Ensure focus programmatically
+      return el === document.activeElement;
+    });
+    expect(isFocused).toBe(true);
+  });
+
+  it('should show at least one group item', async () => {
+    const items = await browser.$$('.palette-item');
+    expect(items.length).toBeGreaterThanOrEqual(1);
+  });
+
+  it('should filter and show no-results for nonsense query', async () => {
+    const input = await browser.$('[data-testid="palette-input"]');
+    await input.setValue('zzz_no_match_xyz');
+    await browser.pause(300);
+
+    const noResults = await browser.$('.no-results');
+    await expect(noResults).toBeDisplayed();
+  });
+
+  it('should close on Escape', async () => {
+    await browser.keys('Escape');
+    const palette = await browser.$('[data-testid="command-palette"]');
+    await browser.waitUntil(
+      async () => !(await palette.isDisplayed()),
+      { timeout: 3000 },
+    );
+  });
+});
+
+// ─── Scenario 6: Project focus and tab switching ─────────────────────
+
+describe('Scenario 6 — Project Focus & Tab Switching', () => {
+  it('should focus project on header click', async () => {
+    await browser.execute(() => {
+      const header = document.querySelector('.project-header');
+      if (header) (header as HTMLElement).click();
+    });
+    await browser.pause(300);
+
+    const activeBox = await browser.$('.project-box.active');
+    await expect(activeBox).toBeDisplayed();
+  });
+
+  it('should switch to Files tab and back without losing agent session', async () => {
+    // Get current agent session element reference
+    const sessionBefore = await browser.execute(() => {
+      const el = document.querySelector('[data-testid="agent-session"]');
+      return el !== null;
+    });
+    expect(sessionBefore).toBe(true);
+
+    // Switch to Files tab (second tab)
+    await browser.execute(() => {
+      const tabs = document.querySelectorAll('[data-testid="project-tabs"] .ptab');
+      if (tabs.length >= 2) (tabs[1] as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    // AgentSession should still exist in DOM (display:none, not unmounted)
+    const sessionDuring = await browser.execute(() => {
+      const el = document.querySelector('[data-testid="agent-session"]');
+      return el !== null;
+    });
+    expect(sessionDuring).toBe(true);
+
+    // Switch back to Model tab
+    await browser.execute(() => {
+      const tab = document.querySelector('[data-testid="project-tabs"] .ptab');
+      if (tab) (tab as HTMLElement).click();
+    });
+    await browser.pause(300);
+
+    // Agent session should be visible again
+    const session = await browser.$('[data-testid="agent-session"]');
+    await expect(session).toBeDisplayed();
+  });
+
+  it('should preserve agent status across tab switches', async () => {
+    const statusBefore = await browser.execute(() => {
+      const el = document.querySelector('[data-testid="agent-pane"]');
+      return el?.getAttribute('data-agent-status') ?? 'unknown';
+    });
+
+    // Switch to Context tab (third tab) and back
+    await browser.execute(() => {
+      const tabs = document.querySelectorAll('[data-testid="project-tabs"] .ptab');
+      if (tabs.length >= 3) (tabs[2] as HTMLElement).click();
+    });
+    await browser.pause(300);
+    await browser.execute(() => {
+      const tab = document.querySelector('[data-testid="project-tabs"] .ptab');
+      if (tab) (tab as HTMLElement).click();
+    });
+    await browser.pause(300);
+
+    const statusAfter = await browser.execute(() => {
+      const el = document.querySelector('[data-testid="agent-pane"]');
+      return el?.getAttribute('data-agent-status') ?? 'unknown';
+    });
+    expect(statusAfter).toBe(statusBefore);
+  });
+});
+
+// ─── Scenario 7: Agent prompt interaction (requires Claude CLI) ──────
+
+describe('Scenario 7 — Agent Prompt Submission', () => {
+  // This scenario requires a real Claude CLI + API key.
+  // Skip gracefully if agent doesn't transition to "running" within timeout.
+
+  it('should accept text in prompt textarea', async () => {
+    const textarea = await browser.$('[data-testid="agent-prompt"]');
+    await textarea.waitForDisplayed({ timeout: 5000 });
+    await textarea.setValue('Say hello');
+    await browser.pause(200);
+
+    const value = await textarea.getValue();
+    expect(value).toBe('Say hello');
+
+    // Clear without submitting
+    await textarea.clearValue();
+  });
+
+  it('should enable submit button when prompt has text', async () => {
+    const textarea = await browser.$('[data-testid="agent-prompt"]');
+    await textarea.setValue('Test prompt');
+    await browser.pause(200);
+
+    // Submit button should be interactable (not disabled)
+    const isDisabled = await browser.execute(() => {
+      const btn = document.querySelector('[data-testid="agent-submit"]');
+      if (!btn) return true;
+      return (btn as HTMLButtonElement).disabled;
+    });
+    expect(isDisabled).toBe(false);
+
+    await textarea.clearValue();
+  });
+
+  it('should show stop button during agent execution (if Claude available)', async function () {
+    // Send a minimal prompt
+    await sendAgentPrompt('Reply with exactly: BTERMINAL_TEST_OK');
+
+    // Wait for running status (generous timeout for sidecar spin-up)
+    try {
+      await waitForAgentStatus('running', 15_000);
+    } catch {
+      // Claude CLI not available — skip remaining assertions
+      console.log('Agent did not start — Claude CLI may not be available. Skipping.');
+      this.skip();
+      return;
+    }
+
+    // If agent is still running, check for stop button
+    const status = await browser.execute(() => {
+      const el = document.querySelector('[data-testid="agent-pane"]');
+      return el?.getAttribute('data-agent-status') ?? 'unknown';
+    });
+
+    if (status === 'running') {
+      const stopBtn = await browser.$('[data-testid="agent-stop"]');
+      await expect(stopBtn).toBeDisplayed();
+    }
+
+    // Wait for completion (with shorter timeout to avoid mocha timeout)
+    try {
+      await waitForAgentStatus('idle', 40_000);
+    } catch {
+      console.log('Agent did not complete within 40s — skipping completion checks.');
+      this.skip();
+      return;
+    }
+
+    // Messages area should now have content
+    const msgCount = await browser.execute(() => {
+      const area = document.querySelector('[data-testid="agent-messages"]');
+      if (!area) return 0;
+      return area.children.length;
+    });
+    expect(msgCount).toBeGreaterThan(0);
+  });
+});
--- a/tests/e2e/specs/bterminal.test.ts
+++ b/tests/e2e/specs/bterminal.test.ts
@ -0,0 +1,799 @@
+import { browser, expect } from '@wdio/globals';
+
+// All E2E tests run in a single spec file because Tauri launches one app
+// instance per session, and tauri-driver doesn't support re-creating sessions.
+
+describe('BTerminal — Smoke Tests', () => {
+  it('should render the application window', async () => {
+    // Wait for the app to fully load before any tests
+    await browser.waitUntil(
+      async () => (await browser.getTitle()) === 'BTerminal',
+      { timeout: 10_000, timeoutMsg: 'App did not load within 10s' },
+    );
+    const title = await browser.getTitle();
+    expect(title).toBe('BTerminal');
+  });
+
+  it('should display the status bar', async () => {
+    const statusBar = await browser.$('.status-bar');
+    await expect(statusBar).toBeDisplayed();
+  });
+
+  it('should show version text in status bar', async () => {
+    const version = await browser.$('.status-bar .version');
+    await expect(version).toBeDisplayed();
+    const text = await version.getText();
+    expect(text).toContain('BTerminal');
+  });
+
+  it('should display the sidebar rail', async () => {
+    const sidebarRail = await browser.$('.sidebar-rail');
+    await expect(sidebarRail).toBeDisplayed();
+  });
+
+  it('should display the workspace area', async () => {
+    const workspace = await browser.$('.workspace');
+    await expect(workspace).toBeDisplayed();
+  });
+
+  it('should toggle sidebar with settings button', async () => {
+    const settingsBtn = await browser.$('.rail-btn');
+    await settingsBtn.click();
+
+    const sidebarPanel = await browser.$('.sidebar-panel');
+    await expect(sidebarPanel).toBeDisplayed();
+
+    // Click again to close
+    await settingsBtn.click();
+    await expect(sidebarPanel).not.toBeDisplayed();
+  });
+});
+
+describe('BTerminal — Workspace & Projects', () => {
+  it('should display the project grid', async () => {
+    const grid = await browser.$('.project-grid');
+    await expect(grid).toBeDisplayed();
+  });
+
+  it('should render at least one project box', async () => {
+    const boxes = await browser.$$('.project-box');
+    expect(boxes.length).toBeGreaterThanOrEqual(1);
+  });
+
+  it('should show project header with name', async () => {
+    const header = await browser.$('.project-header');
+    await expect(header).toBeDisplayed();
+
+    const name = await browser.$('.project-name');
+    const text = await name.getText();
+    expect(text.length).toBeGreaterThan(0);
+  });
+
+  it('should show project-level tabs (Model, Docs, Context, Files, SSH, Memory, ...)', async () => {
+    const box = await browser.$('.project-box');
+    const tabs = await box.$$('.ptab');
+    // v3 has 6+ tabs: Model, Docs, Context, Files, SSH, Memory (+ role-specific)
+    expect(tabs.length).toBeGreaterThanOrEqual(6);
+  });
+
+  it('should highlight active project on click', async () => {
+    const header = await browser.$('.project-header');
+    await header.click();
+
+    const activeBox = await browser.$('.project-box.active');
+    await expect(activeBox).toBeDisplayed();
+  });
+
+  it('should switch project tabs', async () => {
+    // Use JS click — WebDriver clicks don't always trigger Svelte onclick
+    // on buttons inside complex components via WebKit2GTK/tauri-driver
+    const switched = await browser.execute(() => {
+      const box = document.querySelector('.project-box');
+      if (!box) return false;
+      const tabs = box.querySelectorAll('.ptab');
+      if (tabs.length < 2) return false;
+      (tabs[1] as HTMLElement).click();
+      return true;
+    });
+    expect(switched).toBe(true);
+    await browser.pause(500);
+
+    const box = await browser.$('.project-box');
+    const activeTab = await box.$('.ptab.active');
+    const text = await activeTab.getText();
+    // Tab[1] is "Docs" in v3 tab bar (Model, Docs, Context, Files, ...)
+    expect(text.toLowerCase()).toContain('docs');
+
+    // Switch back to Model tab
+    await browser.execute(() => {
+      const tab = document.querySelector('.project-box .ptab');
+      if (tab) (tab as HTMLElement).click();
+    });
+    await browser.pause(300);
+  });
+
+  it('should display the status bar with project count', async () => {
+    const statusBar = await browser.$('.status-bar .left');
+    const text = await statusBar.getText();
+    expect(text).toContain('projects');
+  });
+
+  it('should display project and agent info in status bar', async () => {
+    const statusBar = await browser.$('.status-bar .left');
+    const text = await statusBar.getText();
+    // Status bar always shows project count; agent counts only when > 0
+    // (shows "X running", "X idle", "X stalled" — not the word "agents")
+    expect(text).toContain('projects');
+  });
+});
+
+/** Open the settings panel, waiting for content to render. */
+async function openSettings(): Promise<void> {
+  const panel = await browser.$('.sidebar-panel');
+  const isOpen = await panel.isDisplayed().catch(() => false);
+  if (!isOpen) {
+    // Use data-testid for unambiguous selection
+    await browser.execute(() => {
+      const btn = document.querySelector('[data-testid="settings-btn"]');
+      if (btn) (btn as HTMLElement).click();
+    });
+    await panel.waitForDisplayed({ timeout: 5000 });
+  }
+  // Wait for settings content to mount
+  await browser.waitUntil(
+    async () => {
+      const count = await browser.execute(() =>
+        document.querySelectorAll('.settings-tab .settings-section').length,
+      );
+      return (count as number) >= 1;
+    },
+    { timeout: 5000, timeoutMsg: 'Settings sections did not render within 5s' },
+  );
+  await browser.pause(200);
+}
+
+/** Close the settings panel if open. */
+async function closeSettings(): Promise<void> {
+  const panel = await browser.$('.sidebar-panel');
+  if (await panel.isDisplayed().catch(() => false)) {
+    await browser.execute(() => {
+      const btn = document.querySelector('.panel-close');
+      if (btn) (btn as HTMLElement).click();
+    });
+    await browser.pause(500);
+  }
+}
+
+describe('BTerminal — Settings Panel', () => {
+  before(async () => {
+    await openSettings();
+  });
+
+  after(async () => {
+    await closeSettings();
+  });
+
+  it('should display the settings tab container', async () => {
+    const settingsTab = await browser.$('.settings-tab');
+    await expect(settingsTab).toBeDisplayed();
+  });
+
+  it('should show settings sections', async () => {
+    const sections = await browser.$$('.settings-section');
+    expect(sections.length).toBeGreaterThanOrEqual(1);
+  });
+
+  it('should display theme dropdown', async () => {
+    const dropdown = await browser.$('.custom-dropdown .dropdown-trigger');
+    await expect(dropdown).toBeDisplayed();
+  });
+
+  it('should open theme dropdown and show options', async () => {
+    // Use JS click — WebDriver clicks don't reliably trigger Svelte onclick
+    // on buttons inside scrollable panels via WebKit2GTK/tauri-driver
+    await browser.execute(() => {
+      const trigger = document.querySelector('.custom-dropdown .dropdown-trigger');
+      if (trigger) (trigger as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    const menu = await browser.$('.dropdown-menu');
+    await menu.waitForExist({ timeout: 3000 });
+
+    const options = await browser.$$('.dropdown-option');
+    expect(options.length).toBeGreaterThan(0);
+
+    // Close dropdown by clicking trigger again
+    await browser.execute(() => {
+      const trigger = document.querySelector('.custom-dropdown .dropdown-trigger');
+      if (trigger) (trigger as HTMLElement).click();
+    });
+    await browser.pause(300);
+  });
+
+  it('should display group list', async () => {
+    // Groups section is below Appearance/Defaults/Providers — scroll into view
+    await browser.execute(() => {
+      const el = document.querySelector('.group-list');
+      if (el) el.scrollIntoView({ behavior: 'instant', block: 'center' });
+    });
+    await browser.pause(300);
+    const groupList = await browser.$('.group-list');
+    await expect(groupList).toBeDisplayed();
+  });
+
+  it('should close settings panel with close button', async () => {
+    // Ensure settings is open
+    await openSettings();
+
+    // Use JS click for reliability
+    await browser.execute(() => {
+      const btn = document.querySelector('.panel-close');
+      if (btn) (btn as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    const panel = await browser.$('.sidebar-panel');
+    await expect(panel).not.toBeDisplayed();
+  });
+});
+
+/** Open command palette — idempotent (won't toggle-close if already open). */
+async function openCommandPalette(): Promise<void> {
+  // Ensure sidebar is closed first (it can intercept keyboard events)
+  await closeSettings();
+
+  // Check if already open
+  const alreadyOpen = await browser.execute(() => {
+    const p = document.querySelector('.palette');
+    return p !== null && getComputedStyle(p).display !== 'none';
+  });
+  if (alreadyOpen) return;
+
+  // Dispatch Ctrl+K via JS for reliability with WebKit2GTK/tauri-driver
+  await browser.execute(() => {
+    document.dispatchEvent(new KeyboardEvent('keydown', {
+      key: 'k', code: 'KeyK', ctrlKey: true, bubbles: true, cancelable: true,
+    }));
+  });
+  await browser.pause(300);
+
+  const palette = await browser.$('.palette');
+  await palette.waitForDisplayed({ timeout: 5000 });
+}
+
+/** Close command palette if open — uses backdrop click (more reliable than Escape). */
+async function closeCommandPalette(): Promise<void> {
+  const isOpen = await browser.execute(() => {
+    const p = document.querySelector('.palette');
+    return p !== null && getComputedStyle(p).display !== 'none';
+  });
+  if (!isOpen) return;
+
+  // Click backdrop to close (more reliable than dispatching Escape)
+  await browser.execute(() => {
+    const backdrop = document.querySelector('.palette-backdrop');
+    if (backdrop) (backdrop as HTMLElement).click();
+  });
+  await browser.pause(500);
+}
+
+describe('BTerminal — Command Palette', () => {
+  beforeEach(async () => {
+    await closeCommandPalette();
+  });
+
+  it('should show palette input', async () => {
+    await openCommandPalette();
+
+    const input = await browser.$('.palette-input');
+    await expect(input).toBeDisplayed();
+
+    // Verify input accepts text (functional focus test, not activeElement check
+    // which is unreliable in WebKit2GTK/tauri-driver)
+    const canType = await browser.execute(() => {
+      const el = document.querySelector('.palette-input') as HTMLInputElement | null;
+      if (!el) return false;
+      el.focus();
+      return el === document.activeElement;
+    });
+    expect(canType).toBe(true);
+
+    await closeCommandPalette();
+  });
+
+  it('should show palette items with command labels and categories', async () => {
+    await openCommandPalette();
+
+    const items = await browser.$$('.palette-item');
+    expect(items.length).toBeGreaterThanOrEqual(1);
+
+    // Each command item should have a label
+    const cmdLabel = await browser.$('.palette-item .cmd-label');
+    await expect(cmdLabel).toBeDisplayed();
+    const labelText = await cmdLabel.getText();
+    expect(labelText.length).toBeGreaterThan(0);
+
+    // Commands should be grouped under category headers
+    const categories = await browser.$$('.palette-category');
+    expect(categories.length).toBeGreaterThanOrEqual(1);
+
+    await closeCommandPalette();
+  });
+
+  it('should highlight selected item in palette', async () => {
+    await openCommandPalette();
+
+    // First item should be selected by default
+    const selectedItem = await browser.$('.palette-item.selected');
+    await expect(selectedItem).toBeExisting();
+
+    await closeCommandPalette();
+  });
+
+  it('should filter palette items by typing', async () => {
+    await openCommandPalette();
+
+    const itemsBefore = await browser.$$('.palette-item');
+    const countBefore = itemsBefore.length;
+
+    // Type a nonsense string that won't match any group name
+    const input = await browser.$('.palette-input');
+    await input.setValue('zzz_nonexistent_group_xyz');
+    await browser.pause(300);
+
+    // Should show no results or fewer items
+    const noResults = await browser.$('.no-results');
+    const itemsAfter = await browser.$$('.palette-item');
+    // Either no-results message appears OR item count decreased
+    const filtered = (await noResults.isExisting()) || itemsAfter.length < countBefore;
+    expect(filtered).toBe(true);
+
+    await closeCommandPalette();
+  });
+
+  it('should close palette by clicking backdrop', async () => {
+    await openCommandPalette();
+    const palette = await browser.$('.palette');
+
+    // Click the backdrop (outside the palette)
+    await browser.execute(() => {
+      const backdrop = document.querySelector('.palette-backdrop');
+      if (backdrop) (backdrop as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    await expect(palette).not.toBeDisplayed();
+  });
+});
+
+describe('BTerminal — Terminal Tabs', () => {
+  before(async () => {
+    // Ensure Claude tab is active so terminal section is visible
+    await browser.execute(() => {
+      const tab = document.querySelector('.project-box .ptab');
+      if (tab) (tab as HTMLElement).click();
+    });
+    await browser.pause(300);
+  });
+
+  it('should show terminal toggle on Claude tab', async () => {
+    const toggle = await browser.$('.terminal-toggle');
+    await expect(toggle).toBeDisplayed();
+
+    const label = await browser.$('.toggle-label');
+    const text = await label.getText();
+    expect(text.toLowerCase()).toContain('terminal');
+  });
+
+  it('should expand terminal area on toggle click', async () => {
+    // Click terminal toggle via JS
+    await browser.execute(() => {
+      const toggle = document.querySelector('.terminal-toggle');
+      if (toggle) (toggle as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    const termArea = await browser.$('.project-terminal-area');
+    await expect(termArea).toBeDisplayed();
+
+    // Chevron should have expanded class
+    const chevron = await browser.$('.toggle-chevron.expanded');
+    await expect(chevron).toBeExisting();
+  });
+
+  it('should show add tab button when terminal expanded', async () => {
+    const addBtn = await browser.$('.tab-add');
+    await expect(addBtn).toBeDisplayed();
+  });
+
+  it('should add a shell tab', async () => {
+    // Click add tab button via JS (Svelte onclick)
+    await browser.execute(() => {
+      const btn = document.querySelector('.tab-bar .tab-add');
+      if (btn) (btn as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    // Verify tab title via JS to avoid stale element issues
+    const title = await browser.execute(() => {
+      const el = document.querySelector('.tab-bar .tab-title');
+      return el ? el.textContent : '';
+    });
+    expect((title as string).toLowerCase()).toContain('shell');
+  });
+
+  it('should show active tab styling', async () => {
+    const activeTab = await browser.$('.tab.active');
+    await expect(activeTab).toBeExisting();
+  });
+
+  it('should add a second shell tab and switch between them', async () => {
+    // Add second tab via JS
+    await browser.execute(() => {
+      const btn = document.querySelector('.tab-bar .tab-add');
+      if (btn) (btn as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    const tabCount = await browser.execute(() => {
+      return document.querySelectorAll('.tab-bar .tab').length;
+    });
+    expect(tabCount as number).toBeGreaterThanOrEqual(2);
+
+    // Click first tab and verify it becomes active with Shell title
+    await browser.execute(() => {
+      const tabs = document.querySelectorAll('.tab-bar .tab');
+      if (tabs[0]) (tabs[0] as HTMLElement).click();
+    });
+    await browser.pause(300);
+
+    const activeTitle = await browser.execute(() => {
+      const active = document.querySelector('.tab-bar .tab.active .tab-title');
+      return active ? active.textContent : '';
+    });
+    expect(activeTitle as string).toContain('Shell');
+  });
+
+  it('should close a tab', async () => {
+    const tabsBefore = await browser.$$('.tab');
+    const countBefore = tabsBefore.length;
+
+    // Close the last tab
+    await browser.execute(() => {
+      const closeBtns = document.querySelectorAll('.tab-close');
+      if (closeBtns.length > 0) {
+        (closeBtns[closeBtns.length - 1] as HTMLElement).click();
+      }
+    });
+    await browser.pause(500);
+
+    const tabsAfter = await browser.$$('.tab');
+    expect(tabsAfter.length).toBe(Number(countBefore) - 1);
+  });
+
+  after(async () => {
+    // Clean up: close remaining tabs and collapse terminal
+    await browser.execute(() => {
+      // Close all tabs
+      const closeBtns = document.querySelectorAll('.tab-close');
+      closeBtns.forEach(btn => (btn as HTMLElement).click());
+    });
+    await browser.pause(300);
+
+    // Collapse terminal
+    await browser.execute(() => {
+      const toggle = document.querySelector('.terminal-toggle');
+      if (toggle) {
+        const chevron = toggle.querySelector('.toggle-chevron.expanded');
+        if (chevron) (toggle as HTMLElement).click();
+      }
+    });
+    await browser.pause(300);
+  });
+});
+
+describe('BTerminal — Theme Switching', () => {
+  before(async () => {
+    await openSettings();
+    // Scroll to top for theme dropdown
+    await browser.execute(() => {
+      const content = document.querySelector('.panel-content') || document.querySelector('.sidebar-panel');
+      if (content) content.scrollTop = 0;
+    });
+    await browser.pause(300);
+  });
+
+  after(async () => {
+    await closeSettings();
+  });
+
+  it('should show theme dropdown with group labels', async () => {
+    // Close any open dropdowns first
+    await browser.execute(() => {
+      const openMenu = document.querySelector('.dropdown-menu');
+      if (openMenu) {
+        const trigger = openMenu.closest('.custom-dropdown')?.querySelector('.dropdown-trigger');
+        if (trigger) (trigger as HTMLElement).click();
+      }
+    });
+    await browser.pause(200);
+
+    // Click the first dropdown trigger (theme dropdown)
+    await browser.execute(() => {
+      const trigger = document.querySelector('.settings-tab .custom-dropdown .dropdown-trigger');
+      if (trigger) (trigger as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    const menu = await browser.$('.dropdown-menu');
+    await menu.waitForExist({ timeout: 5000 });
+
+    // Should have group labels (Catppuccin, Editor, Deep Dark)
+    const groupLabels = await browser.$$('.dropdown-group-label');
+    expect(groupLabels.length).toBeGreaterThanOrEqual(2);
+
+    // Close dropdown
+    await browser.execute(() => {
+      const trigger = document.querySelector('.settings-tab .custom-dropdown .dropdown-trigger');
+      if (trigger) (trigger as HTMLElement).click();
+    });
+    await browser.pause(300);
+  });
+
+  it('should switch theme and update CSS variables', async () => {
+    // Get current base color
+    const baseBefore = await browser.execute(() => {
+      return getComputedStyle(document.documentElement).getPropertyValue('--ctp-base').trim();
+    });
+
+    // Open theme dropdown (first custom-dropdown in settings)
+    await browser.execute(() => {
+      const trigger = document.querySelector('.settings-tab .custom-dropdown .dropdown-trigger');
+      if (trigger) (trigger as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    // Wait for dropdown menu
+    const menu = await browser.$('.dropdown-menu');
+    await menu.waitForExist({ timeout: 5000 });
+
+    // Click the first non-active theme option
+    const changed = await browser.execute(() => {
+      const options = document.querySelectorAll('.dropdown-menu .dropdown-option:not(.active)');
+      if (options.length > 0) {
+        (options[0] as HTMLElement).click();
+        return true;
+      }
+      return false;
+    });
+    expect(changed).toBe(true);
+    await browser.pause(500);
+
+    // Verify CSS variable changed
+    const baseAfter = await browser.execute(() => {
+      return getComputedStyle(document.documentElement).getPropertyValue('--ctp-base').trim();
+    });
+    expect(baseAfter).not.toBe(baseBefore);
+
+    // Switch back to Catppuccin Mocha (first option) to restore state
+    await browser.execute(() => {
+      const trigger = document.querySelector('.settings-tab .custom-dropdown .dropdown-trigger');
+      if (trigger) (trigger as HTMLElement).click();
+    });
+    await browser.pause(500);
+    await browser.execute(() => {
+      const options = document.querySelectorAll('.dropdown-menu .dropdown-option');
+      if (options.length > 0) (options[0] as HTMLElement).click();
+    });
+    await browser.pause(300);
+  });
+
+  it('should show active theme option', async () => {
+    await browser.execute(() => {
+      const trigger = document.querySelector('.settings-tab .custom-dropdown .dropdown-trigger');
+      if (trigger) (trigger as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    const menu = await browser.$('.dropdown-menu');
+    await menu.waitForExist({ timeout: 5000 });
+
+    const activeOption = await browser.$('.dropdown-option.active');
+    await expect(activeOption).toBeExisting();
+
+    await browser.execute(() => {
+      const trigger = document.querySelector('.settings-tab .custom-dropdown .dropdown-trigger');
+      if (trigger) (trigger as HTMLElement).click();
+    });
+    await browser.pause(300);
+  });
+});
+
+describe('BTerminal — Settings Interaction', () => {
+  before(async () => {
+    await openSettings();
+    // Scroll to top for font controls
+    await browser.execute(() => {
+      const content = document.querySelector('.panel-content') || document.querySelector('.sidebar-panel');
+      if (content) content.scrollTop = 0;
+    });
+    await browser.pause(300);
+  });
+
+  after(async () => {
+    await closeSettings();
+  });
+
+  it('should show font size controls with increment/decrement', async () => {
+    const sizeControls = await browser.$$('.size-control');
+    expect(sizeControls.length).toBeGreaterThanOrEqual(1);
+
+    const sizeBtns = await browser.$$('.size-btn');
+    expect(sizeBtns.length).toBeGreaterThanOrEqual(2); // at least - and + for one control
+
+    const sizeInput = await browser.$('.size-input');
+    await expect(sizeInput).toBeExisting();
+  });
+
+  it('should increment font size', async () => {
+    const sizeInput = await browser.$('.size-input');
+    const valueBefore = await sizeInput.getValue();
+
+    // Click the + button (second .size-btn in first .size-control)
+    await browser.execute(() => {
+      const btns = document.querySelectorAll('.size-control .size-btn');
+      // Second button is + (first is -)
+      if (btns.length >= 2) (btns[1] as HTMLElement).click();
+    });
+    await browser.pause(300);
+
+    const afterEl = await browser.$('.size-input');
+    const valueAfter = await afterEl.getValue();
+    expect(parseInt(valueAfter as string)).toBe(parseInt(valueBefore as string) + 1);
+  });
+
+  it('should decrement font size back', async () => {
+    const sizeInput = await browser.$('.size-input');
+    const valueBefore = await sizeInput.getValue();
+
+    // Click the - button (first .size-btn)
+    await browser.execute(() => {
+      const btns = document.querySelectorAll('.size-control .size-btn');
+      if (btns.length >= 1) (btns[0] as HTMLElement).click();
+    });
+    await browser.pause(300);
+
+    const afterEl = await browser.$('.size-input');
+    const valueAfter = await afterEl.getValue();
+    expect(parseInt(valueAfter as string)).toBe(parseInt(valueBefore as string) - 1);
+  });
+
+  it('should display group rows with active indicator', async () => {
+    // Scroll to Groups section (below Appearance, Defaults, Providers)
+    await browser.execute(() => {
+      const el = document.querySelector('.group-list');
+      if (el) el.scrollIntoView({ behavior: 'instant', block: 'center' });
+    });
+    await browser.pause(300);
+
+    const groupRows = await browser.$$('.group-row');
+    expect(groupRows.length).toBeGreaterThanOrEqual(1);
+
+    const activeGroup = await browser.$('.group-row.active');
+    await expect(activeGroup).toBeExisting();
+  });
+
+  it('should show project cards', async () => {
+    // Scroll to Projects section
+    await browser.execute(() => {
+      const el = document.querySelector('.project-cards');
+      if (el) el.scrollIntoView({ behavior: 'instant', block: 'center' });
+    });
+    await browser.pause(300);
+
+    const cards = await browser.$$('.project-card');
+    expect(cards.length).toBeGreaterThanOrEqual(1);
+  });
+
+  it('should display project card with name and path', async () => {
+    const nameInput = await browser.$('.card-name-input');
+    await expect(nameInput).toBeExisting();
+    const name = await nameInput.getValue() as string;
+    expect(name.length).toBeGreaterThan(0);
+
+    const cwdInput = await browser.$('.cwd-input');
+    await expect(cwdInput).toBeExisting();
+    const cwd = await cwdInput.getValue() as string;
+    expect(cwd.length).toBeGreaterThan(0);
+  });
+
+  it('should show project toggle switch', async () => {
+    const toggle = await browser.$('.card-toggle');
+    await expect(toggle).toBeExisting();
+
+    const track = await browser.$('.toggle-track');
+    await expect(track).toBeDisplayed();
+  });
+
+  it('should show add project form', async () => {
+    // Scroll to add project form (at bottom of Projects section)
+    await browser.execute(() => {
+      const el = document.querySelector('.add-project-form');
+      if (el) el.scrollIntoView({ behavior: 'instant', block: 'center' });
+    });
+    await browser.pause(300);
+
+    const addForm = await browser.$('.add-project-form');
+    await expect(addForm).toBeDisplayed();
+
+    const addBtn = await browser.$('.add-project-form .btn-primary');
+    await expect(addBtn).toBeExisting();
+  });
+});
+
+describe('BTerminal — Keyboard Shortcuts', () => {
+  before(async () => {
+    await closeSettings();
+    await closeCommandPalette();
+  });
+
+  it('should open command palette with Ctrl+K', async () => {
+    await openCommandPalette();
+
+    const input = await browser.$('.palette-input');
+    await expect(input).toBeDisplayed();
+
+    // Close with Escape
+    await closeCommandPalette();
+    const palette = await browser.$('.palette');
+    const isGone = !(await palette.isDisplayed().catch(() => false));
+    expect(isGone).toBe(true);
+  });
+
+  it('should toggle settings with Ctrl+,', async () => {
+    await browser.keys(['Control', ',']);
+
+    const panel = await browser.$('.sidebar-panel');
+    await panel.waitForDisplayed({ timeout: 3000 });
+
+    // Close with Ctrl+,
+    await browser.keys(['Control', ',']);
+    await panel.waitForDisplayed({ timeout: 3000, reverse: true });
+  });
+
+  it('should toggle sidebar with Ctrl+B', async () => {
+    // Open sidebar first
+    await browser.keys(['Control', ',']);
+    const panel = await browser.$('.sidebar-panel');
+    await panel.waitForDisplayed({ timeout: 3000 });
+
+    // Toggle off with Ctrl+B
+    await browser.keys(['Control', 'b']);
+    await panel.waitForDisplayed({ timeout: 3000, reverse: true });
+  });
+
+  it('should close sidebar with Escape', async () => {
+    // Open sidebar
+    await browser.keys(['Control', ',']);
+    const panel = await browser.$('.sidebar-panel');
+    await panel.waitForDisplayed({ timeout: 3000 });
+
+    // Close with Escape
+    await browser.keys('Escape');
+    await panel.waitForDisplayed({ timeout: 3000, reverse: true });
+  });
+
+  it('should show command palette with categorized commands', async () => {
+    await openCommandPalette();
+
+    const items = await browser.$$('.palette-item');
+    expect(items.length).toBeGreaterThanOrEqual(1);
+
+    // Commands should have labels
+    const cmdLabel = await browser.$('.palette-item .cmd-label');
+    await expect(cmdLabel).toBeDisplayed();
+
+    await closeCommandPalette();
+  });
+});
--- a/tests/e2e/specs/phase-b.test.ts
+++ b/tests/e2e/specs/phase-b.test.ts
@ -0,0 +1,377 @@
+import { browser, expect } from '@wdio/globals';
+import { isJudgeAvailable, assertWithJudge } from '../llm-judge';
+
+// Phase B: Multi-project scenarios + LLM-judged assertions.
+// Extends Phase A with tests that exercise multiple project boxes simultaneously
+// and use Claude API to evaluate agent response quality.
+//
+// Prerequisites:
+// - Built debug binary (or SKIP_BUILD=1)
+// - groups.json with 2+ projects (use BTERMINAL_TEST_CONFIG_DIR or default)
+// - ANTHROPIC_API_KEY env var for LLM-judged tests (skipped if absent)
+
+// ─── Helpers ──────────────────────────────────────────────────────────
+
+/** Get all project box IDs currently rendered. */
+async function getProjectIds(): Promise<string[]> {
+  return browser.execute(() => {
+    const boxes = document.querySelectorAll('[data-testid="project-box"]');
+    return Array.from(boxes).map(
+      (b) => b.getAttribute('data-project-id') ?? '',
+    ).filter(Boolean);
+  });
+}
+
+/** Focus a specific project box by its project ID. */
+async function focusProject(projectId: string): Promise<void> {
+  await browser.execute((id) => {
+    const box = document.querySelector(`[data-project-id="${id}"]`);
+    const header = box?.querySelector('.project-header');
+    if (header) (header as HTMLElement).click();
+  }, projectId);
+  await browser.pause(300);
+}
+
+/** Get the agent status for a specific project box. */
+async function getAgentStatus(projectId: string): Promise<string> {
+  return browser.execute((id) => {
+    const box = document.querySelector(`[data-project-id="${id}"]`);
+    const pane = box?.querySelector('[data-testid="agent-pane"]');
+    return pane?.getAttribute('data-agent-status') ?? 'not-found';
+  }, projectId);
+}
+
+/** Send a prompt to the agent in a specific project box. */
+async function sendPromptInProject(projectId: string, text: string): Promise<void> {
+  await focusProject(projectId);
+  await browser.execute((id, prompt) => {
+    const box = document.querySelector(`[data-project-id="${id}"]`);
+    const textarea = box?.querySelector('[data-testid="agent-prompt"]') as HTMLTextAreaElement | null;
+    if (textarea) {
+      textarea.value = prompt;
+      textarea.dispatchEvent(new Event('input', { bubbles: true }));
+    }
+  }, projectId, text);
+  await browser.pause(200);
+  await browser.execute((id) => {
+    const box = document.querySelector(`[data-project-id="${id}"]`);
+    const btn = box?.querySelector('[data-testid="agent-submit"]') as HTMLElement | null;
+    if (btn) btn.click();
+  }, projectId);
+}
+
+/** Wait for agent in a specific project to reach target status. */
+async function waitForProjectAgentStatus(
+  projectId: string,
+  status: string,
+  timeout = 60_000,
+): Promise<void> {
+  await browser.waitUntil(
+    async () => (await getAgentStatus(projectId)) === status,
+    { timeout, timeoutMsg: `Agent in project ${projectId} did not reach "${status}" within ${timeout}ms` },
+  );
+}
+
+/** Get all message text from an agent pane in a specific project. */
+async function getAgentMessages(projectId: string): Promise<string> {
+  return browser.execute((id) => {
+    const box = document.querySelector(`[data-project-id="${id}"]`);
+    const area = box?.querySelector('[data-testid="agent-messages"]');
+    return area?.textContent ?? '';
+  }, projectId);
+}
+
+/** Switch to a tab in a specific project box. Tab index: 0=Model, 1=Docs, 2=Context, etc. */
+async function switchProjectTab(projectId: string, tabIndex: number): Promise<void> {
+  await browser.execute((id, idx) => {
+    const box = document.querySelector(`[data-project-id="${id}"]`);
+    const tabs = box?.querySelectorAll('[data-testid="project-tabs"] .ptab');
+    if (tabs && tabs[idx]) (tabs[idx] as HTMLElement).click();
+  }, projectId, tabIndex);
+  await browser.pause(300);
+}
+
+// ─── Scenario B1: Multi-project grid renders correctly ────────────────
+
+describe('Scenario B1 — Multi-Project Grid', () => {
+  it('should render multiple project boxes', async () => {
+    // Wait for app to fully render project boxes
+    await browser.waitUntil(
+      async () => {
+        const count = await browser.execute(() =>
+          document.querySelectorAll('[data-testid="project-box"]').length,
+        );
+        return (count as number) >= 1;
+      },
+      { timeout: 10_000, timeoutMsg: 'No project boxes rendered within 10s' },
+    );
+
+    const ids = await getProjectIds();
+    // May be 1 project in minimal fixture; test structure regardless
+    expect(ids.length).toBeGreaterThanOrEqual(1);
+    // Each ID should be unique
+    const unique = new Set(ids);
+    expect(unique.size).toBe(ids.length);
+  });
+
+  it('should show project headers with CWD paths', async () => {
+    const headers = await browser.execute(() => {
+      const els = document.querySelectorAll('.project-header .cwd');
+      return Array.from(els).map((e) => e.textContent?.trim() ?? '');
+    });
+    // Each header should have a non-empty CWD
+    for (const cwd of headers) {
+      expect(cwd.length).toBeGreaterThan(0);
+    }
+  });
+
+  it('should have independent agent panes per project', async () => {
+    const ids = await getProjectIds();
+    for (const id of ids) {
+      const status = await getAgentStatus(id);
+      expect(['idle', 'running', 'stalled']).toContain(status);
+    }
+  });
+
+  it('should focus project on click and show active styling', async () => {
+    const ids = await getProjectIds();
+    if (ids.length < 1) return;
+
+    await focusProject(ids[0]);
+    const isActive = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      return box?.classList.contains('active') ?? false;
+    }, ids[0]);
+    expect(isActive).toBe(true);
+  });
+});
+
+// ─── Scenario B2: Independent tab switching across projects ───────────
+
+describe('Scenario B2 — Independent Tab Switching', () => {
+  it('should allow different tabs active in different projects', async () => {
+    const ids = await getProjectIds();
+    if (ids.length < 2) {
+      console.log('Skipping B2 — need 2+ projects');
+      return;
+    }
+
+    // Switch first project to Files tab (index 3)
+    await switchProjectTab(ids[0], 3);
+    // Keep second project on Model tab (index 0)
+    await switchProjectTab(ids[1], 0);
+
+    // Verify first project has Files tab active
+    const firstActiveTab = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      const active = box?.querySelector('[data-testid="project-tabs"] .ptab.active');
+      return active?.textContent?.trim() ?? '';
+    }, ids[0]);
+
+    const secondActiveTab = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      const active = box?.querySelector('[data-testid="project-tabs"] .ptab.active');
+      return active?.textContent?.trim() ?? '';
+    }, ids[1]);
+
+    // They should be different tabs
+    expect(firstActiveTab).not.toBe(secondActiveTab);
+
+    // Restore first project to Model tab
+    await switchProjectTab(ids[0], 0);
+  });
+});
+
+// ─── Scenario B3: Status bar reflects fleet state ────────────────────
+
+describe('Scenario B3 — Status Bar Fleet State', () => {
+  it('should show agent count in status bar', async () => {
+    const barText = await browser.execute(() => {
+      const bar = document.querySelector('[data-testid="status-bar"]');
+      return bar?.textContent ?? '';
+    });
+    // Status bar should contain at least one count (idle agents)
+    expect(barText.length).toBeGreaterThan(0);
+  });
+
+  it('should show no burn rate when all agents idle', async () => {
+    // When all agents are idle, burn-rate and cost elements are not rendered
+    // (they only appear when totalBurnRatePerHour > 0 or totalCost > 0)
+    const hasBurnRate = await browser.execute(() => {
+      const bar = document.querySelector('[data-testid="status-bar"]');
+      const burnEl = bar?.querySelector('.burn-rate');
+      const costEl = bar?.querySelector('.cost');
+      return { burn: burnEl?.textContent ?? null, cost: costEl?.textContent ?? null };
+    });
+    // Either no burn rate shown (idle) or it shows $0
+    if (hasBurnRate.burn !== null) {
+      expect(hasBurnRate.burn).toMatch(/\$0|0\.00/);
+    }
+    if (hasBurnRate.cost !== null) {
+      expect(hasBurnRate.cost).toMatch(/\$0|0\.00/);
+    }
+    // If both are null, agents are idle — that's the expected state
+  });
+});
+
+// ─── Scenario B4: LLM-judged agent response (requires API key) ──────
+
+describe('Scenario B4 — LLM-Judged Agent Response', () => {
+  const SKIP_MSG = 'Skipping — LLM judge not available (no CLI or API key)';
+
+  it('should send prompt and get meaningful response', async function () {
+    this.timeout(180_000); // agent needs time to start + run + respond
+    if (!isJudgeAvailable()) {
+      console.log(SKIP_MSG);
+      this.skip();
+      return;
+    }
+
+    const ids = await getProjectIds();
+    if (ids.length < 1) {
+      this.skip();
+      return;
+    }
+    const projectId = ids[0];
+
+    // Send a prompt that requires a specific kind of response
+    await sendPromptInProject(projectId, 'List the files in the current directory. Just list them, nothing else.');
+
+    // Wait for agent to start
+    try {
+      await waitForProjectAgentStatus(projectId, 'running', 15_000);
+    } catch {
+      console.log('Agent did not start — Claude CLI may not be available');
+      this.skip();
+      return;
+    }
+
+    // Wait for completion
+    await waitForProjectAgentStatus(projectId, 'idle', 120_000);
+
+    // Get the agent's output
+    const messages = await getAgentMessages(projectId);
+
+    // Use LLM judge to evaluate the response
+    const verdict = await assertWithJudge(
+      'The output should contain a file listing that includes at least one filename (like README.md or hello.py). It should look like a directory listing, not an error message.',
+      messages,
+      { context: 'BTerminal agent was asked to list files in a test project directory containing README.md and hello.py' },
+    );
+
+    expect(verdict.pass).toBe(true);
+    if (!verdict.pass) {
+      console.log(`LLM Judge: ${verdict.reasoning} (confidence: ${verdict.confidence})`);
+    }
+  });
+
+  it('should produce response with appropriate tool usage', async function () {
+    if (!isJudgeAvailable()) {
+      console.log(SKIP_MSG);
+      this.skip();
+      return;
+    }
+
+    const ids = await getProjectIds();
+    if (ids.length < 1) {
+      this.skip();
+      return;
+    }
+    const projectId = ids[0];
+
+    // Check that the previous response (from prior test) involved tool calls
+    const messages = await getAgentMessages(projectId);
+
+    const verdict = await assertWithJudge(
+      'The output should show evidence that the agent used tools (like Bash, Read, Glob, or LS commands) to list files. Tool usage typically appears as tool call names, command text, or file paths in the output.',
+      messages,
+      { context: 'BTerminal renders agent tool calls in collapsible sections showing the tool name and output' },
+    );
+
+    expect(verdict.pass).toBe(true);
+    if (!verdict.pass) {
+      console.log(`LLM Judge: ${verdict.reasoning} (confidence: ${verdict.confidence})`);
+    }
+  });
+});
+
+// ─── Scenario B5: LLM-judged code generation quality ─────────────────
+
+describe('Scenario B5 — LLM-Judged Code Generation', () => {
+  const SKIP_MSG = 'Skipping — LLM judge not available (no CLI or API key)';
+
+  it('should generate valid code when asked', async function () {
+    this.timeout(180_000); // agent needs time to start + run + respond
+    if (!isJudgeAvailable()) {
+      console.log(SKIP_MSG);
+      this.skip();
+      return;
+    }
+
+    const ids = await getProjectIds();
+    if (ids.length < 1) {
+      this.skip();
+      return;
+    }
+    const projectId = ids[0];
+
+    // Ask agent to read and explain existing code
+    await sendPromptInProject(
+      projectId,
+      'Read hello.py and tell me what the greet function does. One sentence answer.',
+    );
+
+    try {
+      await waitForProjectAgentStatus(projectId, 'running', 15_000);
+    } catch {
+      console.log('Agent did not start — Claude CLI may not be available');
+      this.skip();
+      return;
+    }
+
+    await waitForProjectAgentStatus(projectId, 'idle', 120_000);
+
+    const messages = await getAgentMessages(projectId);
+
+    const verdict = await assertWithJudge(
+      'The response should correctly describe that the greet function takes a name parameter and returns a greeting string like "Hello, {name}!". The explanation should be roughly one sentence as requested.',
+      messages,
+      { context: 'hello.py contains: def greet(name: str) -> str:\n    return f"Hello, {name}!"' },
+    );
+
+    expect(verdict.pass).toBe(true);
+    if (!verdict.pass) {
+      console.log(`LLM Judge: ${verdict.reasoning} (confidence: ${verdict.confidence})`);
+    }
+  });
+});
+
+// ─── Scenario B6: Context tab reflects agent activity ────────────────
+
+describe('Scenario B6 — Context Tab After Agent Activity', () => {
+  it('should show token usage in Context tab after agent ran', async () => {
+    const ids = await getProjectIds();
+    if (ids.length < 1) return;
+    const projectId = ids[0];
+
+    // Switch to Context tab (index 2)
+    await switchProjectTab(projectId, 2);
+
+    // Check if context tab has any content
+    const contextContent = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      // Look for stats or token meter elements
+      const stats = box?.querySelector('.context-stats, .token-meter, .stat-value');
+      return stats?.textContent ?? '';
+    }, projectId);
+
+    // If an agent has run, context tab should have data
+    // If no agent ran (skipped), this may be empty — that's OK
+    if (contextContent) {
+      expect(contextContent.length).toBeGreaterThan(0);
+    }
+
+    // Switch back to Model tab
+    await switchProjectTab(projectId, 0);
+  });
+});
--- a/tests/e2e/specs/phase-c.test.ts
+++ b/tests/e2e/specs/phase-c.test.ts
@ -0,0 +1,626 @@
+import { browser, expect } from '@wdio/globals';
+import { isJudgeAvailable, assertWithJudge } from '../llm-judge';
+
+// Phase C: Hardening feature tests.
+// Tests the v3 production-readiness features added in the hardening sprint:
+// - Command palette new commands
+// - Search overlay (Ctrl+Shift+F)
+// - Notification center
+// - Keyboard shortcuts (vi-nav, project jump)
+// - Settings panel new sections
+// - Error states and recovery UI
+
+// ─── Helpers ──────────────────────────────────────────────────────────
+
+/** Get all project box IDs currently rendered. */
+async function getProjectIds(): Promise<string[]> {
+  return browser.execute(() => {
+    const boxes = document.querySelectorAll('[data-testid="project-box"]');
+    return Array.from(boxes).map(
+      (b) => b.getAttribute('data-project-id') ?? '',
+    ).filter(Boolean);
+  });
+}
+
+/** Focus a specific project box by its project ID. */
+async function focusProject(projectId: string): Promise<void> {
+  await browser.execute((id) => {
+    const box = document.querySelector(`[data-project-id="${id}"]`);
+    const header = box?.querySelector('.project-header');
+    if (header) (header as HTMLElement).click();
+  }, projectId);
+  await browser.pause(300);
+}
+
+/** Switch to a tab in a specific project box. */
+async function switchProjectTab(projectId: string, tabIndex: number): Promise<void> {
+  await browser.execute((id, idx) => {
+    const box = document.querySelector(`[data-project-id="${id}"]`);
+    const tabs = box?.querySelectorAll('[data-testid="project-tabs"] .ptab');
+    if (tabs && tabs[idx]) (tabs[idx] as HTMLElement).click();
+  }, projectId, tabIndex);
+  await browser.pause(300);
+}
+
+/** Open command palette via Ctrl+K. */
+async function openPalette(): Promise<void> {
+  await browser.execute(() => document.body.focus());
+  await browser.pause(100);
+  await browser.keys(['Control', 'k']);
+  const palette = await browser.$('[data-testid="command-palette"]');
+  await palette.waitForDisplayed({ timeout: 3000 });
+}
+
+/** Close command palette via Escape. */
+async function closePalette(): Promise<void> {
+  await browser.keys('Escape');
+  await browser.pause(300);
+}
+
+/** Type into palette input and get filtered results. */
+async function paletteSearch(query: string): Promise<string[]> {
+  const input = await browser.$('[data-testid="palette-input"]');
+  await input.setValue(query);
+  await browser.pause(300);
+  return browser.execute(() => {
+    const items = document.querySelectorAll('.palette-item .cmd-label');
+    return Array.from(items).map(el => el.textContent?.trim() ?? '');
+  });
+}
+
+// ─── Scenario C1: Command Palette — Hardening Commands ────────────────
+
+describe('Scenario C1 — Command Palette Hardening Commands', () => {
+  afterEach(async () => {
+    // Ensure palette is closed after each test
+    try {
+      const isVisible = await browser.execute(() => {
+        const el = document.querySelector('[data-testid="command-palette"]');
+        return el !== null && window.getComputedStyle(el).display !== 'none';
+      });
+      if (isVisible) {
+        await closePalette();
+      }
+    } catch {
+      // Ignore if palette doesn't exist
+    }
+  });
+
+  it('should find settings command in palette', async () => {
+    await openPalette();
+    const results = await paletteSearch('settings');
+    expect(results.length).toBeGreaterThanOrEqual(1);
+    const hasSettings = results.some(r => r.toLowerCase().includes('settings'));
+    expect(hasSettings).toBe(true);
+  });
+
+  it('should find terminal command in palette', async () => {
+    await openPalette();
+    const results = await paletteSearch('terminal');
+    expect(results.length).toBeGreaterThanOrEqual(1);
+    const hasTerminal = results.some(r => r.toLowerCase().includes('terminal'));
+    expect(hasTerminal).toBe(true);
+  });
+
+  it('should find keyboard shortcuts command in palette', async () => {
+    await openPalette();
+    const results = await paletteSearch('keyboard');
+    expect(results.length).toBeGreaterThanOrEqual(1);
+    const hasShortcuts = results.some(r => r.toLowerCase().includes('keyboard'));
+    expect(hasShortcuts).toBe(true);
+  });
+
+  it('should list all commands grouped by category when input is empty', async () => {
+    await openPalette();
+    const input = await browser.$('[data-testid="palette-input"]');
+    await input.clearValue();
+    await browser.pause(200);
+
+    const itemCount = await browser.execute(() =>
+      document.querySelectorAll('.palette-item').length,
+    );
+    // v3 has 18+ commands
+    expect(itemCount).toBeGreaterThanOrEqual(10);
+
+    // Commands should be organized in groups (categories)
+    const groups = await browser.execute(() => {
+      const headers = document.querySelectorAll('.palette-category');
+      return Array.from(headers).map(h => h.textContent?.trim() ?? '');
+    });
+    // Should have at least 2 command groups
+    expect(groups.length).toBeGreaterThanOrEqual(2);
+  });
+});
+
+// ─── Scenario C2: Search Overlay (Ctrl+Shift+F) ──────────────────────
+
+describe('Scenario C2 — Search Overlay (FTS5)', () => {
+  it('should open search overlay with Ctrl+Shift+F', async () => {
+    await browser.execute(() => document.body.focus());
+    await browser.pause(100);
+    await browser.keys(['Control', 'Shift', 'f']);
+    await browser.pause(500);
+
+    const overlay = await browser.execute(() => {
+      // SearchOverlay uses .search-overlay class
+      const el = document.querySelector('.search-overlay, [data-testid="search-overlay"]');
+      return el !== null;
+    });
+    expect(overlay).toBe(true);
+  });
+
+  it('should have search input focused', async () => {
+    const isFocused = await browser.execute(() => {
+      const input = document.querySelector('.search-overlay input, [data-testid="search-input"]') as HTMLInputElement | null;
+      if (!input) return false;
+      input.focus();
+      return input === document.activeElement;
+    });
+    expect(isFocused).toBe(true);
+  });
+
+  it('should show no results for nonsense query', async () => {
+    await browser.execute(() => {
+      const input = document.querySelector('.search-overlay input, [data-testid="search-input"]') as HTMLInputElement | null;
+      if (input) {
+        input.value = 'zzz_nonexistent_xyz_999';
+        input.dispatchEvent(new Event('input', { bubbles: true }));
+      }
+    });
+    await browser.pause(500); // 300ms debounce + render time
+
+    const resultCount = await browser.execute(() => {
+      const results = document.querySelectorAll('.search-result, .search-result-item');
+      return results.length;
+    });
+    expect(resultCount).toBe(0);
+  });
+
+  it('should close search overlay with Escape', async () => {
+    await browser.keys('Escape');
+    await browser.pause(300);
+
+    const overlay = await browser.execute(() => {
+      const el = document.querySelector('.search-overlay, [data-testid="search-overlay"]');
+      if (!el) return false;
+      const style = window.getComputedStyle(el);
+      return style.display !== 'none' && style.visibility !== 'hidden';
+    });
+    expect(overlay).toBe(false);
+  });
+});
+
+// ─── Scenario C3: Notification Center ─────────────────────────────────
+
+describe('Scenario C3 — Notification Center', () => {
+  it('should render notification bell in status bar', async () => {
+    const hasBell = await browser.execute(() => {
+      const bar = document.querySelector('[data-testid="status-bar"]');
+      // NotificationCenter is in status bar with bell icon
+      const bell = bar?.querySelector('.notification-bell, .bell-icon, [data-testid="notification-bell"]');
+      return bell !== null;
+    });
+    expect(hasBell).toBe(true);
+  });
+
+  it('should open notification panel on bell click', async () => {
+    await browser.execute(() => {
+      const bell = document.querySelector('.notification-bell, .bell-icon, [data-testid="notification-bell"]');
+      if (bell) (bell as HTMLElement).click();
+    });
+    await browser.pause(300);
+
+    const panelOpen = await browser.execute(() => {
+      const panel = document.querySelector('.notification-panel, .notification-dropdown, [data-testid="notification-panel"]');
+      if (!panel) return false;
+      const style = window.getComputedStyle(panel);
+      return style.display !== 'none';
+    });
+    expect(panelOpen).toBe(true);
+  });
+
+  it('should show empty state or notification history', async () => {
+    const content = await browser.execute(() => {
+      const panel = document.querySelector('.notification-panel, .notification-dropdown, [data-testid="notification-panel"]');
+      return panel?.textContent ?? '';
+    });
+    // Panel should have some text content (either "No notifications" or actual notifications)
+    expect(content.length).toBeGreaterThan(0);
+  });
+
+  it('should close notification panel on outside click', async () => {
+    // Click the backdrop overlay to close the panel
+    await browser.execute(() => {
+      const backdrop = document.querySelector('.notification-center .backdrop');
+      if (backdrop) (backdrop as HTMLElement).click();
+    });
+    await browser.pause(300);
+
+    const panelOpen = await browser.execute(() => {
+      const panel = document.querySelector('.notification-panel, .notification-dropdown, [data-testid="notification-panel"]');
+      if (!panel) return false;
+      const style = window.getComputedStyle(panel);
+      return style.display !== 'none';
+    });
+    expect(panelOpen).toBe(false);
+  });
+});
+
+// ─── Scenario C4: Keyboard Navigation ────────────────────────────────
+
+describe('Scenario C4 — Keyboard-First Navigation', () => {
+  it('should toggle settings with Ctrl+Comma', async () => {
+    await browser.execute(() => document.body.focus());
+    await browser.pause(100);
+    await browser.keys(['Control', ',']);
+    await browser.pause(500);
+
+    const settingsVisible = await browser.execute(() => {
+      const panel = document.querySelector('.sidebar-panel');
+      if (!panel) return false;
+      const style = window.getComputedStyle(panel);
+      return style.display !== 'none';
+    });
+    expect(settingsVisible).toBe(true);
+
+    // Close it
+    await browser.keys('Escape');
+    await browser.pause(300);
+  });
+
+  it('should toggle sidebar with Ctrl+B', async () => {
+    await browser.execute(() => document.body.focus());
+    await browser.pause(100);
+
+    // First open settings to have sidebar content
+    await browser.keys(['Control', ',']);
+    await browser.pause(300);
+
+    const initialState = await browser.execute(() => {
+      const panel = document.querySelector('.sidebar-panel');
+      return panel !== null && window.getComputedStyle(panel).display !== 'none';
+    });
+
+    // Toggle sidebar
+    await browser.keys(['Control', 'b']);
+    await browser.pause(300);
+
+    const afterToggle = await browser.execute(() => {
+      const panel = document.querySelector('.sidebar-panel');
+      if (!panel) return false;
+      return window.getComputedStyle(panel).display !== 'none';
+    });
+
+    // State should have changed
+    if (initialState) {
+      expect(afterToggle).toBe(false);
+    }
+
+    // Clean up — close sidebar if still open
+    await browser.keys('Escape');
+    await browser.pause(200);
+  });
+
+  it('should focus project with Alt+1', async () => {
+    await browser.execute(() => document.body.focus());
+    await browser.pause(100);
+    await browser.keys(['Alt', '1']);
+    await browser.pause(300);
+
+    const hasActive = await browser.execute(() => {
+      const active = document.querySelector('.project-box.active');
+      return active !== null;
+    });
+    expect(hasActive).toBe(true);
+  });
+});
+
+// ─── Scenario C5: Settings Panel Sections ─────────────────────────────
+
+describe('Scenario C5 — Settings Panel Sections', () => {
+  before(async () => {
+    // Open settings
+    await browser.execute(() => {
+      const btn = document.querySelector('[data-testid="settings-btn"]');
+      if (btn) (btn as HTMLElement).click();
+    });
+    await browser.pause(500);
+  });
+
+  it('should show Appearance section with theme dropdown', async () => {
+    const hasTheme = await browser.execute(() => {
+      const panel = document.querySelector('.sidebar-panel, .settings-tab');
+      if (!panel) return false;
+      const text = panel.textContent ?? '';
+      return text.toLowerCase().includes('theme') || text.toLowerCase().includes('appearance');
+    });
+    expect(hasTheme).toBe(true);
+  });
+
+  it('should show font settings (UI font and Terminal font)', async () => {
+    const hasFonts = await browser.execute(() => {
+      const panel = document.querySelector('.sidebar-panel, .settings-tab');
+      if (!panel) return false;
+      const text = panel.textContent ?? '';
+      return text.toLowerCase().includes('font');
+    });
+    expect(hasFonts).toBe(true);
+  });
+
+  it('should show default shell setting', async () => {
+    const hasShell = await browser.execute(() => {
+      const panel = document.querySelector('.sidebar-panel, .settings-tab');
+      if (!panel) return false;
+      const text = panel.textContent ?? '';
+      return text.toLowerCase().includes('shell');
+    });
+    expect(hasShell).toBe(true);
+  });
+
+  it('should have theme dropdown with 17 themes', async () => {
+    // Click the theme dropdown to see options
+    const themeCount = await browser.execute(() => {
+      // Find the theme dropdown (custom dropdown, not native select)
+      const dropdowns = document.querySelectorAll('.settings-tab .custom-dropdown, .settings-tab .dropdown');
+      for (const dd of dropdowns) {
+        const label = dd.closest('.settings-row, .setting-row')?.textContent ?? '';
+        if (label.toLowerCase().includes('theme')) {
+          // Click to open it
+          const trigger = dd.querySelector('.dropdown-trigger, .dropdown-selected, button');
+          if (trigger) (trigger as HTMLElement).click();
+          return -1; // Flag: opened dropdown
+        }
+      }
+      return 0;
+    });
+
+    if (themeCount === -1) {
+      // Dropdown was opened, wait and count options
+      await browser.pause(300);
+      const optionCount = await browser.execute(() => {
+        const options = document.querySelectorAll('.dropdown-option, .dropdown-item, .theme-option');
+        return options.length;
+      });
+      // Should have 17 themes
+      expect(optionCount).toBeGreaterThanOrEqual(15);
+
+      // Close dropdown
+      await browser.keys('Escape');
+      await browser.pause(200);
+    }
+  });
+
+  after(async () => {
+    await browser.keys('Escape');
+    await browser.pause(300);
+  });
+});
+
+// ─── Scenario C6: Project Health Indicators ───────────────────────────
+
+describe('Scenario C6 — Project Health Indicators', () => {
+  it('should show status dots on project headers', async () => {
+    const hasDots = await browser.execute(() => {
+      const dots = document.querySelectorAll('.project-header .status-dot, .project-header .health-dot');
+      return dots.length;
+    });
+    // At least one project should have a status dot
+    expect(hasDots).toBeGreaterThanOrEqual(1);
+  });
+
+  it('should show idle status when no agents running', async () => {
+    const ids = await getProjectIds();
+    if (ids.length < 1) return;
+
+    const dotColor = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      const dot = box?.querySelector('.status-dot, .health-dot');
+      if (!dot) return 'not-found';
+      const style = window.getComputedStyle(dot);
+      return style.backgroundColor || style.color || 'unknown';
+    }, ids[0]);
+
+    // Should have some color value (not 'not-found')
+    expect(dotColor).not.toBe('not-found');
+  });
+
+  it('should show status bar agent counts', async () => {
+    const counts = await browser.execute(() => {
+      const bar = document.querySelector('[data-testid="status-bar"]');
+      if (!bar) return '';
+      // Status bar shows running/idle/stalled counts
+      return bar.textContent ?? '';
+    });
+    // Should contain at least idle count
+    expect(counts).toMatch(/idle|running|stalled|\d/i);
+  });
+});
+
+// ─── Scenario C7: Metrics Tab ─────────────────────────────────────────
+
+describe('Scenario C7 — Metrics Tab', () => {
+  it('should show Metrics tab in project tab bar', async () => {
+    const ids = await getProjectIds();
+    if (ids.length < 1) return;
+
+    const hasMetrics = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      const tabs = box?.querySelectorAll('[data-testid="project-tabs"] .ptab');
+      if (!tabs) return false;
+      return Array.from(tabs).some(t => t.textContent?.trim().toLowerCase().includes('metric'));
+    }, ids[0]);
+
+    expect(hasMetrics).toBe(true);
+  });
+
+  it('should render Metrics panel content when tab clicked', async () => {
+    const ids = await getProjectIds();
+    if (ids.length < 1) return;
+    const projectId = ids[0];
+
+    // Find and click Metrics tab
+    await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      const tabs = box?.querySelectorAll('[data-testid="project-tabs"] .ptab');
+      if (!tabs) return;
+      for (const tab of tabs) {
+        if (tab.textContent?.trim().toLowerCase().includes('metric')) {
+          (tab as HTMLElement).click();
+          break;
+        }
+      }
+    }, projectId);
+    await browser.pause(500);
+
+    const hasContent = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      // MetricsPanel has live view with fleet stats
+      const panel = box?.querySelector('.metrics-panel, .metrics-tab');
+      return panel !== null;
+    }, projectId);
+
+    expect(hasContent).toBe(true);
+
+    // Switch back to Model tab
+    await switchProjectTab(projectId, 0);
+  });
+});
+
+// ─── Scenario C8: Context Tab ─────────────────────────────────────────
+
+describe('Scenario C8 — Context Tab Visualization', () => {
+  it('should render Context tab with token meter', async () => {
+    const ids = await getProjectIds();
+    if (ids.length < 1) return;
+    const projectId = ids[0];
+
+    // Switch to Context tab (index 2)
+    await switchProjectTab(projectId, 2);
+
+    const hasContextUI = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      // ContextTab has stats, token meter, file references
+      const ctx = box?.querySelector('.context-tab, .context-stats, .token-meter, .stat-value');
+      return ctx !== null;
+    }, projectId);
+
+    expect(hasContextUI).toBe(true);
+
+    // Switch back to Model tab
+    await switchProjectTab(projectId, 0);
+  });
+});
+
+// ─── Scenario C9: Files Tab with Editor ───────────────────────────────
+
+describe('Scenario C9 — Files Tab & Code Editor', () => {
+  it('should render Files tab with directory tree', async () => {
+    const ids = await getProjectIds();
+    if (ids.length < 1) return;
+    const projectId = ids[0];
+
+    // Switch to Files tab (index 3)
+    await switchProjectTab(projectId, 3);
+    await browser.pause(500);
+
+    const hasTree = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      // FilesTab has a directory tree
+      const tree = box?.querySelector('.file-tree, .directory-tree, .files-tab');
+      return tree !== null;
+    }, projectId);
+
+    expect(hasTree).toBe(true);
+  });
+
+  it('should list files from the project directory', async () => {
+    const ids = await getProjectIds();
+    if (ids.length < 1) return;
+
+    const fileNames = await browser.execute((id) => {
+      const box = document.querySelector(`[data-project-id="${id}"]`);
+      const items = box?.querySelectorAll('.tree-name');
+      return Array.from(items ?? []).map(el => el.textContent?.trim() ?? '');
+    }, ids[0]);
+
+    // Test project has README.md and hello.py
+    const hasFiles = fileNames.some(f =>
+      f.includes('README') || f.includes('hello') || f.includes('.py') || f.includes('.md'),
+    );
+    expect(hasFiles).toBe(true);
+
+    // Switch back to Model tab
+    await switchProjectTab(ids[0], 0);
+  });
+});
+
+// ─── Scenario C10: LLM-Judged Settings Completeness ──────────────────
+
+describe('Scenario C10 — LLM-Judged Settings Completeness', () => {
+  it('should have comprehensive settings panel', async function () {
+    if (!isJudgeAvailable()) {
+      console.log('Skipping — LLM judge not available (no CLI or API key)');
+      this.skip();
+      return;
+    }
+
+    // Open settings
+    await browser.execute(() => {
+      const btn = document.querySelector('[data-testid="settings-btn"]');
+      if (btn) (btn as HTMLElement).click();
+    });
+    await browser.pause(500);
+
+    const settingsContent = await browser.execute(() => {
+      const panel = document.querySelector('.sidebar-panel, .settings-tab');
+      return panel?.textContent ?? '';
+    });
+
+    const verdict = await assertWithJudge(
+      'The settings panel should contain configuration options for: (1) theme/appearance, (2) font settings (UI and terminal), (3) default shell, and optionally (4) provider settings. It should look like a real settings UI, not an error message.',
+      settingsContent,
+      { context: 'BTerminal v3 settings panel with Appearance section (theme dropdown, UI font, terminal font) and Defaults section (shell, CWD). May also have Providers section.' },
+    );
+
+    expect(verdict.pass).toBe(true);
+    if (!verdict.pass) {
+      console.log(`LLM Judge: ${verdict.reasoning} (confidence: ${verdict.confidence})`);
+    }
+
+    await browser.keys('Escape');
+    await browser.pause(300);
+  });
+});
+
+// ─── Scenario C11: LLM-Judged Status Bar ──────────────────────────────
+
+describe('Scenario C11 — LLM-Judged Status Bar Completeness', () => {
+  it('should render a comprehensive status bar', async function () {
+    if (!isJudgeAvailable()) {
+      console.log('Skipping — LLM judge not available (no CLI or API key)');
+      this.skip();
+      return;
+    }
+
+    const statusBarContent = await browser.execute(() => {
+      const bar = document.querySelector('[data-testid="status-bar"]');
+      return bar?.textContent ?? '';
+    });
+
+    const statusBarHtml = await browser.execute(() => {
+      const bar = document.querySelector('[data-testid="status-bar"]');
+      return bar?.innerHTML ?? '';
+    });
+
+    const verdict = await assertWithJudge(
+      'The status bar should display agent fleet information including: agent status counts (idle/running/stalled with numbers), and optionally burn rate ($/hr) and cost tracking. It should look like a real monitoring dashboard status bar.',
+      `Text: ${statusBarContent}\n\nHTML structure: ${statusBarHtml.substring(0, 2000)}`,
+      { context: 'BTerminal Mission Control status bar shows running/idle/stalled agent counts, total $/hr burn rate, attention queue, and total cost.' },
+    );
+
+    expect(verdict.pass).toBe(true);
+    if (!verdict.pass) {
+      console.log(`LLM Judge: ${verdict.reasoning} (confidence: ${verdict.confidence})`);
+    }
+  });
+});
--- a/tests/e2e/tsconfig.json
+++ b/tests/e2e/tsconfig.json
@ -0,0 +1,11 @@
+{
+  "compilerOptions": {
+    "module": "ESNext",
+    "moduleResolution": "bundler",
+    "target": "ESNext",
+    "types": ["@wdio/mocha-framework", "@wdio/globals/types"],
+    "esModuleInterop": true,
+    "skipLibCheck": true
+  },
+  "include": ["specs/**/*.ts", "*.ts"]
+}
--- a/tests/e2e/wdio.conf.js
+++ b/tests/e2e/wdio.conf.js
@ -0,0 +1,213 @@
+import { spawn, execSync } from 'node:child_process';
+import { createConnection } from 'node:net';
+import { resolve, dirname, join } from 'node:path';
+import { fileURLToPath } from 'node:url';
+import { mkdirSync, writeFileSync, rmSync } from 'node:fs';
+import { tmpdir } from 'node:os';
+
+const __dirname = dirname(fileURLToPath(import.meta.url));
+const projectRoot = resolve(__dirname, '../..');
+
+// Debug binary path (built with `cargo tauri build --debug --no-bundle`)
+// Cargo workspace target dir is at v2/target/, not v2/src-tauri/target/
+const tauriBinary = resolve(projectRoot, 'target/debug/bterminal');
+
+let tauriDriver;
+
+// ── Test Fixture (created eagerly so env vars are available for capabilities) ──
+const fixtureRoot = join(tmpdir(), `bterminal-e2e-${Date.now()}`);
+const fixtureDataDir = join(fixtureRoot, 'data');
+const fixtureConfigDir = join(fixtureRoot, 'config');
+const fixtureProjectDir = join(fixtureRoot, 'test-project');
+
+mkdirSync(fixtureDataDir, { recursive: true });
+mkdirSync(fixtureConfigDir, { recursive: true });
+mkdirSync(fixtureProjectDir, { recursive: true });
+
+// Create a minimal git repo for agent testing
+execSync('git init', { cwd: fixtureProjectDir, stdio: 'ignore' });
+execSync('git config user.email "test@bterminal.dev"', { cwd: fixtureProjectDir, stdio: 'ignore' });
+execSync('git config user.name "BTerminal Test"', { cwd: fixtureProjectDir, stdio: 'ignore' });
+writeFileSync(join(fixtureProjectDir, 'README.md'), '# Test Project\n\nA simple test project for BTerminal E2E tests.\n');
+writeFileSync(join(fixtureProjectDir, 'hello.py'), 'def greet(name: str) -> str:\n    return f"Hello, {name}!"\n');
+execSync('git add -A && git commit -m "initial commit"', { cwd: fixtureProjectDir, stdio: 'ignore' });
+
+// Write groups.json with one group containing the test project
+writeFileSync(
+  join(fixtureConfigDir, 'groups.json'),
+  JSON.stringify({
+    version: 1,
+    groups: [{
+      id: 'test-group',
+      name: 'Test Group',
+      projects: [{
+        id: 'test-project',
+        name: 'Test Project',
+        identifier: 'test-project',
+        description: 'E2E test project',
+        icon: '\uf120',
+        cwd: fixtureProjectDir,
+        profile: 'default',
+        enabled: true,
+      }],
+      agents: [],
+    }],
+    activeGroupId: 'test-group',
+  }, null, 2),
+);
+
+// Inject env vars into process.env so tauri-driver inherits them
+// (tauri:options.env may not reliably set process-level env vars)
+process.env.BTERMINAL_TEST = '1';
+process.env.BTERMINAL_TEST_DATA_DIR = fixtureDataDir;
+process.env.BTERMINAL_TEST_CONFIG_DIR = fixtureConfigDir;
+
+console.log(`Test fixture created at ${fixtureRoot}`);
+
+export const config = {
+  // ── Runner ──
+  runner: 'local',
+  maxInstances: 1, // Tauri doesn't support parallel sessions
+
+  // ── Connection (external tauri-driver on port 4444) ──
+  hostname: 'localhost',
+  port: 4444,
+  path: '/',
+
+  // ── Specs ──
+  // Single spec file — Tauri launches one app instance per session,
+  // and tauri-driver can't re-create sessions between spec files.
+  specs: [
+    resolve(__dirname, 'specs/bterminal.test.ts'),
+    resolve(__dirname, 'specs/agent-scenarios.test.ts'),
+    resolve(__dirname, 'specs/phase-b.test.ts'),
+    resolve(__dirname, 'specs/phase-c.test.ts'),
+  ],
+
+  // ── Capabilities ──
+  capabilities: [{
+    // Disable BiDi negotiation — tauri-driver doesn't support webSocketUrl
+    'wdio:enforceWebDriverClassic': true,
+    'tauri:options': {
+      application: tauriBinary,
+      // Test isolation: fixture-created data/config dirs, disable watchers/telemetry
+      env: {
+        BTERMINAL_TEST: '1',
+        BTERMINAL_TEST_DATA_DIR: fixtureDataDir,
+        BTERMINAL_TEST_CONFIG_DIR: fixtureConfigDir,
+      },
+    },
+  }],
+
+  // ── Framework ──
+  framework: 'mocha',
+  mochaOpts: {
+    ui: 'bdd',
+    timeout: 180_000,
+  },
+
+  // ── Reporter ──
+  reporters: ['spec'],
+
+  // ── Logging ──
+  logLevel: 'warn',
+
+  // ── Timeouts ──
+  waitforTimeout: 10_000,
+  connectionRetryTimeout: 30_000,
+  connectionRetryCount: 3,
+
+  // ── Hooks ──
+
+  /**
+   * Build the debug binary before the test run.
+   * Uses --debug --no-bundle for fastest build time.
+   */
+  onPrepare() {
+    if (process.env.SKIP_BUILD) {
+      console.log('SKIP_BUILD set — using existing debug binary.');
+      return Promise.resolve();
+    }
+    return new Promise((resolve, reject) => {
+      console.log('Building Tauri debug binary...');
+      const build = spawn('cargo', ['tauri', 'build', '--debug', '--no-bundle'], {
+        cwd: projectRoot,
+        stdio: 'inherit',
+      });
+      build.on('close', (code) => {
+        if (code === 0) {
+          console.log('Debug binary ready.');
+          resolve();
+        } else {
+          reject(new Error(`Tauri build failed with exit code ${code}`));
+        }
+      });
+      build.on('error', reject);
+    });
+  },
+
+  /**
+   * Spawn tauri-driver before the session.
+   * tauri-driver bridges WebDriver protocol to WebKit2GTK's inspector.
+   * Uses TCP probe to confirm port 4444 is accepting connections.
+   */
+  beforeSession() {
+    return new Promise((res, reject) => {
+      tauriDriver = spawn('tauri-driver', [], {
+        stdio: ['ignore', 'pipe', 'pipe'],
+      });
+
+      tauriDriver.on('error', (err) => {
+        reject(new Error(
+          `Failed to start tauri-driver: ${err.message}. ` +
+          'Install it with: cargo install tauri-driver'
+        ));
+      });
+
+      // TCP readiness probe — poll port 4444 until it accepts a connection
+      const maxWaitMs = 10_000;
+      const intervalMs = 200;
+      const deadline = Date.now() + maxWaitMs;
+
+      function probe() {
+        if (Date.now() > deadline) {
+          reject(new Error('tauri-driver did not become ready within 10s'));
+          return;
+        }
+        const sock = createConnection({ port: 4444, host: 'localhost' }, () => {
+          sock.destroy();
+          res();
+        });
+        sock.on('error', () => {
+          sock.destroy();
+          setTimeout(probe, intervalMs);
+        });
+      }
+
+      // Give it a moment before first probe
+      setTimeout(probe, 300);
+    });
+  },
+
+  /**
+   * Kill tauri-driver after the test run.
+   */
+  afterSession() {
+    if (tauriDriver) {
+      tauriDriver.kill();
+      tauriDriver = null;
+    }
+    // Clean up test fixture
+    try {
+      rmSync(fixtureRoot, { recursive: true, force: true });
+      console.log('Test fixture cleaned up.');
+    } catch { /* best-effort cleanup */ }
+  },
+
+  // ── TypeScript (auto-compile via tsx) ──
+  autoCompileOpts: {
+    tsNodeOpts: {
+      project: resolve(__dirname, 'tsconfig.json'),
+    },
+  },
+};