feat: refactor LLM judge to dual-mode CLI/API and fix config test race

Refactor llm-judge.ts from raw API-only to dual-mode: CLI first (spawns claude with --output-format text, unsets CLAUDECODE), API fallback. Backend selectable via LLM_JUDGE_BACKEND env var. Fix pre-existing race condition in config.rs tests where parallel test execution caused env var mutations to interfere. Added static Mutex to serialize env-mutating tests.
2026-03-12 06:35:04 +01:00 · 2026-03-12 06:35:04 +01:00 · f88d10888b
commit f88d10888b
parent 9a90c2499a
4 changed files with 169 additions and 42 deletions
--- a/v2/tests/e2e/specs/phase-b.test.ts
+++ b/v2/tests/e2e/specs/phase-b.test.ts
@ -217,7 +217,7 @@ describe('Scenario B3 — Status Bar Fleet State', () => {
 // ─── Scenario B4: LLM-judged agent response (requires API key) ──────

 describe('Scenario B4 — LLM-Judged Agent Response', () => {
-  const SKIP_MSG = 'Skipping — ANTHROPIC_API_KEY not set';
+  const SKIP_MSG = 'Skipping — LLM judge not available (no CLI or API key)';

  it('should send prompt and get meaningful response', async function () {
    if (!isJudgeAvailable()) {
@ -297,7 +297,7 @@ describe('Scenario B4 — LLM-Judged Agent Response', () => {
 // ─── Scenario B5: LLM-judged code generation quality ─────────────────

 describe('Scenario B5 — LLM-Judged Code Generation', () => {
-  const SKIP_MSG = 'Skipping — ANTHROPIC_API_KEY not set';
+  const SKIP_MSG = 'Skipping — LLM judge not available (no CLI or API key)';

  it('should generate valid code when asked', async function () {
    if (!isJudgeAvailable()) {