feat: refactor LLM judge to dual-mode CLI/API and fix config test race
Refactor llm-judge.ts from raw API-only to dual-mode: CLI first (spawns claude with --output-format text, unsets CLAUDECODE), API fallback. Backend selectable via LLM_JUDGE_BACKEND env var. Fix pre-existing race condition in config.rs tests where parallel test execution caused env var mutations to interfere. Added static Mutex to serialize env-mutating tests.
This commit is contained in:
parent
9a90c2499a
commit
f88d10888b
4 changed files with 169 additions and 42 deletions
|
|
@ -217,7 +217,7 @@ describe('Scenario B3 — Status Bar Fleet State', () => {
|
|||
// ─── Scenario B4: LLM-judged agent response (requires API key) ──────
|
||||
|
||||
describe('Scenario B4 — LLM-Judged Agent Response', () => {
|
||||
const SKIP_MSG = 'Skipping — ANTHROPIC_API_KEY not set';
|
||||
const SKIP_MSG = 'Skipping — LLM judge not available (no CLI or API key)';
|
||||
|
||||
it('should send prompt and get meaningful response', async function () {
|
||||
if (!isJudgeAvailable()) {
|
||||
|
|
@ -297,7 +297,7 @@ describe('Scenario B4 — LLM-Judged Agent Response', () => {
|
|||
// ─── Scenario B5: LLM-judged code generation quality ─────────────────
|
||||
|
||||
describe('Scenario B5 — LLM-Judged Code Generation', () => {
|
||||
const SKIP_MSG = 'Skipping — ANTHROPIC_API_KEY not set';
|
||||
const SKIP_MSG = 'Skipping — LLM judge not available (no CLI or API key)';
|
||||
|
||||
it('should generate valid code when asked', async function () {
|
||||
if (!isJudgeAvailable()) {
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue