Increase global mocha timeout from 60s to 180s in wdio.conf.js to accommodate longer-running LLM judge tests that evaluate agent responses and code generation. Add explicit per-test overrides for Phase B scenarios B4 and B5 to ensure adequate time for agent startup, execution, and LLM verification.
- wdio.conf.js: global timeout 60_000 → 180_000ms
- phase-b.test.ts: explicit 180_000ms timeout for B4 and B5 scenarios
Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid
rendering, independent tab switching, status bar fleet state, and
LLM-judged agent response quality evaluation via Claude API.
Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5,
structured verdicts with confidence thresholds).
7 human-authored test scenarios (22 tests) using data-testid
selectors. Test fixture generator for isolated environments.
JSON results store (no native deps). WebDriverIO config updated
with TCP readiness probe and multi-spec support.
Tauri creates one app session per spec file; multiple files caused
invalid session id on subsequent specs. WebDriver clicks on Svelte 5
components inside scrollable panels dont trigger onclick handlers
via WebKit2GTK/tauri-driver - use browser.execute() JS clicks.
Also removed tauri-plugin-log (redundant with telemetry::init()).