affinity-intelligence-rework/im2be-qa-rig

BPMN-style live-execution smoke runner for aim2be (fresh-tree fork patterns from ~/code/vioxen/qa-rig per rule 53)

Python 97.7%
JavaScript 1.2%
HTML 1.1%

Find a file

hibryda 4e84e5f46e Some checks failed qa-rig CI / pytest (push) Successful in 8s Details qa-rig CI / ruff (push) Failing after 3s Details qa-rig CI / live-k3d smoke (push) Has been skipped Details feat(qa-rig): L0 T0 #6 PR-OPAQUE-6 — Centrifugo opaque-ticket E2E flows (happy path + single-use rejection) (#5 ) Two new SCAFFOLD flows under qa-rig discipline: 21-centrifugo-opaque-ticket-happy-path (login → mint → connect → ticket-consumed) + 22-centrifugo-opaque-ticket-single-use-rejection (with Prometheus delta assertion). BPMN sidecars + drift-detector test. 138 vitest cases. Reused adult-1 mock-data persona per rule 55. R-cycle: R1 BLOCKED → R2 CONDITIONAL (4 findings) → R3 CONDITIONAL (3 findings) → R4 CONDITIONAL_APPROVE quorum (1 minor, reviewer-deferred to carve-out PR). All carve-out atomicity requirements tracked via TOML NOTE + TODO followup.		2026-05-25 02:55:37 +02:00
.forgejo/workflows	feat(M2/tier-2): pytest suite + Forgejo Actions CI workflow	2026-05-14 00:47:30 +02:00
deploy/systemd	chore(deploy): source LLM keys from shared ~/.config/claude/llm-keys.env	2026-05-14 01:40:05 +02:00
flows	fix(qa-rig): apply PR #5 R2 reviewer findings (4 surgical fixes)	2026-05-25 02:38:41 +02:00
harvested	feat(M4/tier-4): harvest + infer (Claude Agent SDK opt-in)	2026-05-14 00:58:56 +02:00
runs	feat(M1): TIER-1 BPMN QA rig — Python smoke runner against aim2be	2026-05-14 00:36:16 +02:00
src/im2be_qa_rig	fix(qa-rig): apply PR #5 R3 reviewer findings (2 MINOR + 1 INFO)	2026-05-25 02:48:31 +02:00
tests	fix(qa-rig): apply PR #5 R3 reviewer findings (2 MINOR + 1 INFO)	2026-05-25 02:48:31 +02:00
web	feat(M3/tier-3): BPMN 2.0 graph export + bpmn-js viewer	2026-05-14 00:52:06 +02:00
.gitignore	feat(M5/tier-5): scheduler + sentinels + systemd unit + README sweep	2026-05-14 01:02:50 +02:00
CLAUDE.md	feat(M1): TIER-1 BPMN QA rig — Python smoke runner against aim2be	2026-05-14 00:36:16 +02:00
pyproject.toml	feat(qa-rig): add OpenTelemetry SDK + OTLP gRPC bootstrap (opt-in)	2026-05-23 21:52:40 +02:00
README.md	feat(qa-rig): L0 T0 #6 PR-OPAQUE-6 — Centrifugo opaque-ticket E2E flows (happy path + single-use rejection)	2026-05-25 02:09:15 +02:00
run-set.example.toml	feat(parallel): TIER 6 parallel-flow dispatcher — S7 spike drop-in	2026-05-16 00:05:30 +02:00
schedule.toml.example	feat(M5/tier-5): scheduler + sentinels + systemd unit + README sweep	2026-05-14 01:02:50 +02:00
uv.lock	feat(qa-rig): add OpenTelemetry SDK + OTLP gRPC bootstrap (opt-in)	2026-05-23 21:52:40 +02:00

README.md

im2be-qa-rig

BPMN-style live-execution smoke runner for the aim2be platform. Driven from im2be-mono; outputs land in runs/<run-id>/ per invocation.

Forked patterns from ~/code/vioxen/qa-rig per rule 53 — fresh tree, no git-clone-rebrand. We borrow the naming, the TOML schema, and the assertion language; every source file in this repo was authored fresh.

Status

M1 — TIER 1, shipped 2026-05-13. Python smoke runner. One flow (flows/01-adult-1-home-family-diary.toml) drives the PWA at localhost:9620 with adult-1's mock-data storage state injected, walks through home/family/diary, captures per-step screenshots + console messages, exits 0 if all assertions pass.
M2 — TIER 2, shipped 2026-05-13. pytest suite (62 cases across spec / storage-state / harvest / infer / bpmn / scheduler) + Forgejo Actions workflow (.forgejo/workflows/qa-rig-ci.yml) with lint + test + gated live-k3d smoke jobs on the aim2be-rework runner.
M3 — TIER 3, shipped 2026-05-13. im2be-qa-rig graph emits BPMN 2.0 XML for any FlowSpec; optional run-report overlay colours tasks pass/fail via bioc namespace. Vite + bpmn-js viewer under web/ on port 9710.
M4 — TIER 4, shipped 2026-05-13. im2be-qa-rig harvest <sub> reads <meta-repo>/code-intelligence/<sub>/ and emits a REST/Kafka/WebSocket inventory. im2be-qa-rig infer <sub> drafts a FlowSpec TOML — either via the Claude Agent SDK ([triage] extra) or the deterministic stub mode.
M5 — TIER 5, shipped 2026-05-13. im2be-qa-rig schedule runs a config-driven scheduler with two job kinds: flow (subprocess invoking the smoke runner) and sentinel (HTTP probe with status-range assertions). Heartbeat at runs/.heartbeat.json after every cycle. Systemd-user unit at deploy/systemd/im2be-qa-rig.service.

See ~/code/vioxen/qa-rig for the upstream M1-M26 reference (read-only — DO NOT modify).

Path conventions (mock-data)

FlowSpec TOML files reference the meta-repo's mock-data/ tree via paths like ../../mock-data/pwa/playwright-storage/adult-1.json. These are meta-repo-relative: this rig is always consumed as a submodule of im2be-mono, so <meta-repo>/im2be-qa-rig/flows/<x>.toml resolves ../../mock-data/... to <meta-repo>/mock-data/.... That layout is documented in <meta-repo>/CLAUDE.md and <meta-repo>/.claude/rules/55-mock-data-discipline.md.

A standalone clone of im2be-qa-rig (without the meta-repo around it) will NOT find mock-data/ and the flows will fail at the storage-state load step. This is intentional — qa-rig has no source of truth for mock data outside the meta-repo's deterministic-persona set. If you need to run the rig fully standalone, mount the meta-repo's mock-data/ at ../mock-data relative to this directory (e.g. via symlink) and patch the flows; do NOT copy mock-data into this repo (it has its own commit history under the meta-repo's mock-data/ source).

Prerequisites

The aim2be Stage A.3 demo environment up (scripts/stage-a-demo-up.sh from the meta-repo). The first flow's [setup] invokes this automatically; use --skip-setup if you already have the environment running.
Python ≥ 3.11.
uv ≥ 0.9 (for installation).

Installation

From inside this repo:

uv sync
uv run playwright install chromium   # one-time Playwright browser download

Running the smoke flow

# From the im2be-mono meta-repo root:
cd im2be-qa-rig

# Run the first flow (assumes Stage A.3 demo is up via scripts/stage-a-demo-up.sh)
uv run im2be-qa-rig --spec flows/01-adult-1-home-family-diary.toml --skip-setup

# Or let the runner bring up the demo itself (~1 min):
uv run im2be-qa-rig --spec flows/01-adult-1-home-family-diary.toml

# Visible browser for local debugging:
uv run im2be-qa-rig --spec flows/01-adult-1-home-family-diary.toml --skip-setup --headed

A successful run produces runs/<run-id>/report.json + runs/<run-id>/screenshots/NN-step.png and exits 0. A failed step fails fast (subsequent steps are not run) and the runner exits 1.

report.json schema

Field	Type	Description
`spec_path`	string	Absolute path to the FlowSpec TOML file.
`run_id`	string	12-char hex run identifier (directory name under `runs/`).
`started_at`	ISO 8601 string	UTC timestamp when the browser flow began.
`finished_at`	ISO 8601 string	UTC timestamp when the browser flow completed.
`base_url`	string \| null	Effective base URL (from FlowSpec or `--base-url` override).
`passed`	boolean	`true` iff all steps ran and all assertions passed.
`trace_id`	string	32-char lowercase hex OpenTelemetry trace ID for the `qa_rig.flow` span. All-zeros (`"00000000000000000000000000000000"`) when `QA_RIG_OTEL_ENABLED` is unset (local dev without a collector). Use this to correlate evidence in the `runs/` directory with traces in Grafana/Tempo.
`steps[]`	array	Per-step evidence (see below).
`steps[].name`	string	Step name from the FlowSpec.
`steps[].action`	string	Step action (e.g. `navigate`, `expect_text`).
`steps[].passed`	boolean	`true` iff the step assertion succeeded.
`steps[].duration_ms`	integer	Wall-clock step duration in milliseconds.
`steps[].error`	string \| null	Error message when `passed=false`, otherwise null.
`steps[].screenshot_path`	string \| null	Relative path to the step screenshot under `runs/<run-id>/screenshots/`.
`steps[].console_slice`	array	Browser console messages emitted during this step.

FlowSpec schema (TIER 1)

[meta]
name = "human-readable name"
type = "browser"
base_url = "http://localhost:9620"

[setup]                                       # optional
shell = "./scripts/stage-a-demo-up.sh --skip-yarn"

[storage_state]                                # optional
path = "../../mock-data/pwa/playwright-storage/adult-1.json"
inject_auth_storage = true                    # default — adds the Zustand auth-storage key

[[steps]]
name = "step-name"
action = "navigate | fill | click | wait_for | expect_text | expect_url | screenshot | expect_no_errors"
# ... action-specific fields

See src/im2be_qa_rig/types.py for the canonical action list + per-action fields.

Active flows

Per-flow status: EXECUTABLE flows run end-to-end on the local k3d cluster today; SCAFFOLD flows lock the BPMN + step anchors + acceptance criteria for a later carve-out that wires the real assertions (rule 57 BPMN-first; http/grpc/metric step kinds land in a follow-up tier). Each scaffold carries the assertion contract in # TODO (real-execution): comments.

File	Status	Coverage
`flows/01-adult-1-home-family-diary.toml`	EXECUTABLE	M1 smoke — adult-1 home → family → diary.
`flows/02-…` through `flows/20-…`	SCAFFOLD	L-1 happy-path scaffolds (OAuth, tasks, subscriptions, push).
`flows/21-centrifugo-opaque-ticket-happy-path.toml`	SCAFFOLD	L0 T0 #6 PR-OPAQUE-6 — login → mint → connect → validate happy path against realtime-service `/centrifugo/connect`.
`flows/22-centrifugo-opaque-ticket-single-use-rejection.toml`	SCAFFOLD	L0 T0 #6 PR-OPAQUE-6 — single-use semantic; second connect with same ticket → 401 + Disconnect{4401}; `realtime_centrifugo_connect_total{outcome="invalid"}` +1.
`flows/30-…` through `flows/33-…`	SCAFFOLD	L-1 failure-mode flows (SPIRE down, Centrifugo mid-session, Kafka lag, identity 503).

The committed .bpmn sidecars (M3 emitter output) are regenerated by im2be-qa-rig graph --spec flows/<name>.toml --output flows/<name>.bpmn and verified against the TOML source-of-truth in tests/test_opaque_ticket_flows.py (rule 57.1 — BPMN-first).

Rules

Live execution only. No mocking at the QA-rig layer. The PWA's own MSW handlers (Phase 3) cover the external SaaS surface; everything else hits real cluster services via kubectl port-forward. (Rule 57)
Allow-list of targets. base_url for production is explicitly forbidden in code. (Rule 57)
No flake tolerance. Intermittent failures are bugs against the application or the FlowSpec, never the rig.
Reproducible. Same mock dataset + same flow + same commits = same evidence. The runner pins Playwright's chromium revision via lockfile.

Layout

im2be-qa-rig/
├── pyproject.toml
├── src/im2be_qa_rig/
│   ├── __init__.py
│   ├── cli.py               # `im2be-qa-rig` entry point
│   ├── browser.py           # Playwright sync-API flow runner
│   ├── spec.py              # TOML FlowSpec parser + validator
│   └── types.py             # StepResult / FlowResult dataclasses + VALID_ACTIONS
├── flows/
│   └── 01-adult-1-home-family-diary.toml
├── runs/                     # per-run evidence (gitignored)
└── docs/                     # TIER-1+ design notes

Subcommands

Command	Tier	Purpose
`im2be-qa-rig run --spec <toml>`	M1	Execute a FlowSpec (Playwright). Default if no subcommand.
`im2be-qa-rig graph --spec <toml> [--run <report.json>] [--output <path>]`	M3	Emit BPMN 2.0 XML; optional pass/fail colour overlay from a run report.
`im2be-qa-rig harvest <sub>`	M4	Read `<meta-repo>/code-intelligence/<sub>/` and emit `harvested/<sub>/inventory.json`.
`im2be-qa-rig infer <sub> [--stub] [--out <toml>]`	M4	Draft a FlowSpec TOML from a harvested inventory. `--stub` skips the Claude Agent SDK call.
`im2be-qa-rig schedule --config schedule.toml [--max-cycles N]`	M5	Run the scheduler loop. Writes `runs/.heartbeat.json` after every cycle.