feat(e2e): add Phase B scenarios with LLM-judged assertions and multi-project tests
Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid rendering, independent tab switching, status bar fleet state, and LLM-judged agent response quality evaluation via Claude API. Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5, structured verdicts with confidence thresholds).
This commit is contained in:
parent
c4c673a4b0
commit
5e4357e4ac
3 changed files with 469 additions and 0 deletions
|
|
@ -28,6 +28,7 @@ export const config = {
|
|||
specs: [
|
||||
resolve(__dirname, 'specs/bterminal.test.ts'),
|
||||
resolve(__dirname, 'specs/agent-scenarios.test.ts'),
|
||||
resolve(__dirname, 'specs/phase-b.test.ts'),
|
||||
],
|
||||
|
||||
// ── Capabilities ──
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue