agents-orchestrator/agent-orchestrator

Author	SHA1	Message	Date
Hibryda	f339c5918d	test: increase WebDriverIO timeout for LLM-judged E2E tests Increase global mocha timeout from 60s to 180s in wdio.conf.js to accommodate longer-running LLM judge tests that evaluate agent responses and code generation. Add explicit per-test overrides for Phase B scenarios B4 and B5 to ensure adequate time for agent startup, execution, and LLM verification. - wdio.conf.js: global timeout 60_000 → 180_000ms - phase-b.test.ts: explicit 180_000ms timeout for B4 and B5 scenarios	2026-03-12 11:10:50 +01:00
Hibryda	f88d10888b	feat: refactor LLM judge to dual-mode CLI/API and fix config test race Refactor llm-judge.ts from raw API-only to dual-mode: CLI first (spawns claude with --output-format text, unsets CLAUDECODE), API fallback. Backend selectable via LLM_JUDGE_BACKEND env var. Fix pre-existing race condition in config.rs tests where parallel test execution caused env var mutations to interfere. Added static Mutex to serialize env-mutating tests.	2026-03-12 11:10:50 +01:00
Hibryda	1f293083b2	fix(e2e): fix 27 E2E test failures across 3 spec files Fix stale v2 CSS selectors for v3 UI, WebKit2GTK keyboard/focus quirks (JS-dispatched KeyboardEvent, programmatic focus check, backdrop click close), conditional render timing (waitUntil for project boxes, null handling for burn-rate/cost elements), and AgentPane missing closing > on data-testid div tag.	2026-03-12 11:10:50 +01:00
Hibryda	90c997d3e9	feat(e2e): add Phase B scenarios with LLM-judged assertions and multi-project tests Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid rendering, independent tab switching, status bar fleet state, and LLM-judged agent response quality evaluation via Claude API. Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5, structured verdicts with confidence thresholds).	2026-03-12 11:10:50 +01:00

4 commits