From c9927a41e6f4666919c7d676064016d1e2d8edaa Mon Sep 17 00:00:00 2001 From: Hibryda Date: Thu, 12 Mar 2026 07:30:56 +0100 Subject: [PATCH] docs: update CHANGELOG and TODO for E2E fixture/judge fixes --- CHANGELOG.md | 3 +++ TODO.md | 1 + 2 files changed, 4 insertions(+) diff --git a/CHANGELOG.md b/CHANGELOG.md index 7612c56..f873760 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -34,6 +34,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 - **E2E CI workflow** — `.github/workflows/e2e.yml`: 3 jobs (vitest, cargo, e2e), xvfb-run for headless WebKit2GTK, path-filtered triggers on v2 source changes, LLM-judged tests gated on `ANTHROPIC_API_KEY` secret availability ### Fixed +- **E2E fixture env propagation** — `tauri:options.env` does not reliably set process-level env vars for Rust `std::env::var()`. Added `process.env` injection at module scope in wdio.conf.js so fixture groups.json is loaded instead of real user config +- **LLM judge CLI context pollution** — Claude CLI loaded project CLAUDE.md files causing model to refuse JSON output. Fixed by running judge from `cwd: /tmp` with `--setting-sources user` and `--system-prompt` flags +- **E2E mocha timeout** — Increased global mocha timeout from 60s to 180s. Agent-running tests (B4/B5) need 120s+ for Claude CLI round-trip - **E2E test suite — 27 failures fixed** across 3 spec files: bterminal.test.ts (22 — stale v2 CSS selectors, v3 tab order/count, JS-dispatched KeyboardEvent for Ctrl+K, idempotent palette open/close, backdrop click close, scrollIntoView for below-fold settings, scoped theme dropdown selectors), agent-scenarios.test.ts (3 — JS click for settings button, programmatic focus check, graceful 40s agent timeout with skip), phase-b.test.ts (2 — waitUntil for project box render, conditional null handling for burn-rate/cost elements). 82 E2E passing, 0 failing, 4 skipped - **AgentPane.svelte missing closing `>`** — div tag with data-testid attributes was missing closing angle bracket, causing template parse issues diff --git a/TODO.md b/TODO.md index e6b2ee0..c5f0c45 100644 --- a/TODO.md +++ b/TODO.md @@ -11,6 +11,7 @@ ## Completed +- [x] **E2E fixture + judge hardening** -- Fixed fixture env propagation (process.env injection, tauri:options.env unreliable), LLM judge CLI context isolation (--setting-sources user, cwd /tmp, --system-prompt), mocha timeout 180s. Confirmed fixture fakes project list. Agent tests CI-only (nested Claude limitation). | Done: 2026-03-12 - [x] **LLM judge refactor + E2E docs** -- Refactored llm-judge.ts to dual-mode (CLI first, API fallback), env-configurable via LLM_JUDGE_BACKEND. Wrote comprehensive docs/e2e-testing.md covering fixtures, test mode, LLM judge, all spec phases, CI, troubleshooting. 444 vitest + 151 cargo + 109 E2E. | Done: 2026-03-12 - [x] **v3 Hardening Sprint** -- Fixed subagent delegation (prompt + env var), added TLS to relay, WAL checkpoint (5min), Landlock logging, plugin sandbox tests (35), gitignore fix. Phase C E2E tests (27 new, 3 pre-existing fixes). 444 vitest + 151 cargo + 109 E2E. | Done: 2026-03-12 - [x] **v3 Production Readiness — ALL tribunal items** -- Implemented all 13 features from tribunal assessment: sidecar supervisor, notifications, secrets, keyboard UX, agent health, search, plugins, sandbox, error classifier, audit log, team agent orchestration, optimistic locking, usage meter. 409 vitest + 109 cargo. | Done: 2026-03-12