agents-orchestrator/agent-orchestrator

Fork 0

Commit graph

Author	SHA1	Message	Date
Hibryda	a49c2010c7	fix: LLM judge CLI context isolation (--setting-sources user, cwd /tmp)	2026-03-12 11:10:50 +01:00
Hibryda	f88d10888b	feat: refactor LLM judge to dual-mode CLI/API and fix config test race Refactor llm-judge.ts from raw API-only to dual-mode: CLI first (spawns claude with --output-format text, unsets CLAUDECODE), API fallback. Backend selectable via LLM_JUDGE_BACKEND env var. Fix pre-existing race condition in config.rs tests where parallel test execution caused env var mutations to interfere. Added static Mutex to serialize env-mutating tests.	2026-03-12 11:10:50 +01:00
Hibryda	90c997d3e9	feat(e2e): add Phase B scenarios with LLM-judged assertions and multi-project tests Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid rendering, independent tab switching, status bar fleet state, and LLM-judged agent response quality evaluation via Claude API. Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5, structured verdicts with confidence thresholds).	2026-03-12 11:10:50 +01:00

Author

SHA1

Message

Date

Hibryda

a49c2010c7

fix: LLM judge CLI context isolation (--setting-sources user, cwd /tmp)

2026-03-12 11:10:50 +01:00

Hibryda

f88d10888b

feat: refactor LLM judge to dual-mode CLI/API and fix config test race

Refactor llm-judge.ts from raw API-only to dual-mode: CLI first
(spawns claude with --output-format text, unsets CLAUDECODE), API
fallback. Backend selectable via LLM_JUDGE_BACKEND env var.

Fix pre-existing race condition in config.rs tests where parallel
test execution caused env var mutations to interfere. Added static
Mutex to serialize env-mutating tests.

2026-03-12 11:10:50 +01:00

Hibryda

90c997d3e9

feat(e2e): add Phase B scenarios with LLM-judged assertions and multi-project tests

Adds 6 new E2E scenarios in phase-b.test.ts covering multi-project grid
rendering, independent tab switching, status bar fleet state, and
LLM-judged agent response quality evaluation via Claude API.
Includes llm-judge.ts helper (raw Anthropic API fetch, haiku-4-5,
structured verdicts with confidence thresholds).

2026-03-12 11:10:50 +01:00

3 commits