Commit graph

13 commits

Author SHA1 Message Date
Hibryda
a3185656eb feat: refactor LLM judge to dual-mode CLI/API and fix config test race
Refactor llm-judge.ts from raw API-only to dual-mode: CLI first
(spawns claude with --output-format text, unsets CLAUDECODE), API
fallback. Backend selectable via LLM_JUDGE_BACKEND env var.

Fix pre-existing race condition in config.rs tests where parallel
test execution caused env var mutations to interfere. Added static
Mutex to serialize env-mutating tests.
2026-03-12 06:35:04 +01:00
Hibryda
05c9e1abbb test: add Phase C E2E tests and fix pre-existing test failures
- Add phase-c.test.ts: 27 new E2E tests across 11 scenarios covering
  hardening sprint features (command palette, search overlay, notification
  center, keyboard navigation, settings panel, project health, metrics tab,
  context tab, files tab, LLM-judged settings/status bar)
- Fix 3 pre-existing failures in bterminal.test.ts: update stale CSS
  selectors (.group-name → .cmd-label, .palette-item.active → .selected)
- Register phase-c.test.ts in wdio.conf.js specs array
- Update test counts: 444 vitest + 151 cargo + 109 E2E = 704 total
2026-03-12 06:20:21 +01:00
Hibryda
9ce7c35325 fix(e2e): fix 27 E2E test failures across 3 spec files
Fix stale v2 CSS selectors for v3 UI, WebKit2GTK keyboard/focus
quirks (JS-dispatched KeyboardEvent, programmatic focus check,
backdrop click close), conditional render timing (waitUntil for
project boxes, null handling for burn-rate/cost elements), and
AgentPane missing closing > on data-testid div tag.
2026-03-12 03:50:13 +01:00
Hibryda
5e4357e4ac feat(e2e): add Phase B scenarios with LLM-judged assertions and multi-project tests
Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid
rendering, independent tab switching, status bar fleet state, and
LLM-judged agent response quality evaluation via Claude API.
Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5,
structured verdicts with confidence thresholds).
2026-03-12 03:07:38 +01:00
Hibryda
c4c673a4b0 docs: update meta files for E2E testing engine Phase A 2026-03-12 02:52:14 +01:00
Hibryda
c6c38b91c6 feat(e2e): add Phase A scenarios, fixtures, and results store
7 human-authored test scenarios (22 tests) using data-testid
selectors. Test fixture generator for isolated environments.
JSON results store (no native deps). WebDriverIO config updated
with TCP readiness probe and multi-spec support.
2026-03-12 02:52:14 +01:00
Hibryda
b9b5ef9cb3 fix(e2e): scope terminal tab selectors to .tab-bar for reliable matching 2026-03-08 22:42:48 +01:00
Hibryda
2eb323fba8 test(e2e): expand coverage from 25 to 48 tests across 8 describe blocks 2026-03-08 22:27:51 +01:00
Hibryda
4c02b87e33 chore: remove old individual E2E spec files
Consolidated into single bterminal.test.ts (Tauri single-session requirement).
2026-03-08 21:58:28 +01:00
Hibryda
d12cbffda7 fix(e2e): consolidate specs into single file and fix WebDriver click issues
Tauri creates one app session per spec file; multiple files caused
invalid session id on subsequent specs. WebDriver clicks on Svelte 5
components inside scrollable panels dont trigger onclick handlers
via WebKit2GTK/tauri-driver - use browser.execute() JS clicks.
Also removed tauri-plugin-log (redundant with telemetry::init()).
2026-03-08 21:58:23 +01:00
Hibryda
bfbdb2cc18 fix(e2e): resolve wdio v9 BiDi + tauri-driver compatibility issues 2026-03-08 21:32:16 +01:00
Hibryda
3c3a8ab54e test(e2e): scaffold WebdriverIO + tauri-driver E2E testing infrastructure 2026-03-08 21:13:38 +01:00
Hibryda
020dc20d4f test(v2): add integration tests for layout, agent-bridge, and dispatcher
Add 59 new vitest tests: layout.test.ts (30), agent-bridge.test.ts (11),
agent-dispatcher.test.ts (18). Fix unused import in sdk-messages.test.ts.
Add WebDriver E2E scaffold README. Total: 104 vitest + 29 cargo tests.
2026-03-06 15:42:34 +01:00