feat(e2e): add Phase B scenarios with LLM-judged assertions and multi-project tests

Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid rendering, independent tab switching, status bar fleet state, and LLM-judged agent response quality evaluation via Claude API. Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5, structured verdicts with confidence thresholds).
2026-03-12 03:07:38 +01:00 · 2026-03-12 03:07:38 +01:00 · 5e4357e4ac
commit 5e4357e4ac
parent c4c673a4b0
3 changed files with 469 additions and 0 deletions
--- a/v2/tests/e2e/wdio.conf.js
+++ b/v2/tests/e2e/wdio.conf.js
@ -28,6 +28,7 @@ export const config = {
  specs: [
    resolve(__dirname, 'specs/bterminal.test.ts'),
    resolve(__dirname, 'specs/agent-scenarios.test.ts'),
+    resolve(__dirname, 'specs/phase-b.test.ts'),
  ],

  // ── Capabilities ──