feat(rmp): RMP_DISTRIBUTED — distributed quorum as a production review mode #7

Merged
hibryda merged 4 commits from feat/rmp-distributed-production into main 2026-06-03 09:46:25 +02:00
Owner

Promotes the distributed-quorum topology (which won the live reliability comparison) to a flag-gated production review path.

What

  • RMP_DISTRIBUTED=1 (+ REVIEWER_QUORUM_ENABLED=1) → the engine reviews PRs via distributed quorum: the two reviewer legs run in SEPARATE module containers (cross-process, Redis-Streams transport), reconciled by an in-engine arbiter. Per-leg failure isolation — a dead/hung leg degrades to the surviving leg rather than losing the review (the reliability win demonstrated live).
  • Default OFF → the in-process Quorum runs unchanged. Full worker suite passes flag-off → byte-identical.
  • Heterogeneous legs via RMP_DISTRIBUTED_LEG_A_MODEL/_LEG_B_MODEL.

How

  • in-process.ts: generalized runReviewViaProducer (channel↔driver wiring); runReviewInProcess delegates to it.
  • worker-bridge.ts: runDistributedReviewViaDriver (tested) wires a LivenessWatchdog + shared DriverSink to the worker callbacks + drives runDistributedQuorum over the transport leg-runner.
  • worker.ts: the reviewer instanceof Quorum && RMP_DISTRIBUTED_ENABLED branch; mirrors the RMP_DRIVER wiring (no onRawMessage — raw SDK messages don't cross the wire; the leg containers run their own IrregularityDetector + tool log).
  • sdk-event-adapter.ts: emits token/cost telemetry from the SDK result message (closes the compare-harness cost-axis gap).

Tests

  • distributed-bridge tests (divergent→arbitrate, dead-leg→degrade) + adapter token tests; full suite 2142; typecheck clean.

Requires the rmp-modules compose profile up. NOT flipped on the engine — needs a deliberate flag + the leg containers.

Promotes the distributed-quorum topology (which won the live reliability comparison) to a flag-gated production review path. ## What - **`RMP_DISTRIBUTED=1`** (+ `REVIEWER_QUORUM_ENABLED=1`) → the engine reviews PRs via distributed quorum: the two reviewer legs run in SEPARATE module containers (cross-process, Redis-Streams transport), reconciled by an in-engine arbiter. **Per-leg failure isolation** — a dead/hung leg degrades to the surviving leg rather than losing the review (the reliability win demonstrated live). - Default OFF → the in-process `Quorum` runs unchanged. **Full worker suite passes flag-off → byte-identical.** - Heterogeneous legs via `RMP_DISTRIBUTED_LEG_A_MODEL`/`_LEG_B_MODEL`. ## How - `in-process.ts`: generalized `runReviewViaProducer` (channel↔driver wiring); `runReviewInProcess` delegates to it. - `worker-bridge.ts`: `runDistributedReviewViaDriver` (tested) wires a LivenessWatchdog + shared DriverSink to the worker callbacks + drives `runDistributedQuorum` over the transport leg-runner. - `worker.ts`: the `reviewer instanceof Quorum && RMP_DISTRIBUTED_ENABLED` branch; mirrors the RMP_DRIVER wiring (no onRawMessage — raw SDK messages don't cross the wire; the leg containers run their own IrregularityDetector + tool log). - `sdk-event-adapter.ts`: emits token/cost telemetry from the SDK result message (closes the compare-harness cost-axis gap). ## Tests + distributed-bridge tests (divergent→arbitrate, dead-leg→degrade) + adapter token tests; full suite 2142; typecheck clean. **Requires the `rmp-modules` compose profile up.** NOT flipped on the engine — needs a deliberate flag + the leg containers.
Two bugs surfaced by the live A/B comparison runs:
- bin/compare-modules.ts: the requestId was deterministic per (PR, round, sha),
  so re-running the same PR REPLAYED the prior run's event streams (still on
  Redis under the event TTL) instead of a fresh review — a re-run silently
  reported the cached result. requestId now carries Date.now() for per-run
  uniqueness.
- compare.ts: finding-overlap keyed on the full findingKey (severity|file|line|
  title), so two models flagging the SAME issue with different prose/severity
  scored as 0 shared (over-reporting divergence). Overlap now keys on LOCATION
  (file|line); the full findingKeys stay in the per-module detail. + a test.
Closes the compare-harness cost-axis gap (read $0.0000 / 0 tokens): the SDK
`result` message carries usage but the adapter mapped it to `note`, dropping it.
It now emits a `telemetry` event with inputTokens/outputTokens/costUsd (cache
tokens folded into input, mirroring engine-telemetry.recordSdkResultUsage),
falling back to `note` only when usage is absent. Token-only telemetry has NO
rateLimit field, so the LivenessWatchdog treats it exactly like the old note
(onProgress, no state change) — the liveness equivalence (10 tests) still holds.
The compare harness already sums these fields per module/leg, so cost/tokens now
populate. +3 adapter tests; full suite 2140.
feat(rmp): RMP_DISTRIBUTED — distributed quorum as a flag-gated review mode
All checks were successful
pr-reviewer CI / tsc --noEmit (pull_request) Successful in 22s
pr-reviewer CI / vitest (pull_request) Successful in 1m1s
pr-reviewer CI / tsc build (pull_request) Successful in 35s
66bee224fc
Promotes the distributed-quorum topology to a production review path, behind
RMP_DISTRIBUTED (default OFF → in-process Quorum unchanged; full worker suite
passes flag-off → byte-identical).

- in-process.ts: generalized runReviewViaProducer (the channel↔driver wiring);
  runReviewInProcess now delegates to it, and the distributed path reuses it.
- worker-bridge.ts: runDistributedReviewViaDriver — wires a LivenessWatchdog +
  shared DriverSink (makeWatchdogSink) to the worker callbacks and drives
  runDistributedQuorum over the transport leg-runner. Per-leg failure isolation
  holds end-to-end. + tests (divergent→arbitrate, dead-leg→degrade).
- worker.ts: the queue.process branch (reviewer instanceof Quorum &&
  RMP_DISTRIBUTED_ENABLED) builds the ReviewRequest, constructs the in-engine
  Arbiter, routes the two legs to their module containers (leg-a/leg-b, optional
  per-leg model for heterogeneity), and feeds the result into the unchanged post
  path. Startup log surfaces the active review path.
- sdk-event-adapter.ts: emit token/cost telemetry from the SDK result message
  (closes the compare-harness cost-axis gap). + tests.
- .env.example / docs/configuration.md: document RMP_DISTRIBUTED + leg config.

Requires the rmp-modules compose profile up. Full suite 2142.

Superseded by round 2.

Show previous round

hib-pr-reviewer review — PR #7 (hib/hib-pr-reviewer)

Round 1 — head 66bee224fc69, base main, trigger opened

TL;DR: CONDITIONAL_APPROVE — kept 2 unique-to-one findings (both verified at cited lines), no agreed findings, no blocking issues.

Summary

Reconciliation

Two unique-to-one findings, both independently verified against the checked-out HEAD.

Finding A (line 2014 — startup log silent on RMP_DISTRIBUTED=1 + quorum off): Confirmed. Lines 2006–2020 show that the only log written in the else block is the single console.log at line 2011. The ternary quorumEnabled && RMP_DISTRIBUTED_ENABLED silently falls through to the RMP_DRIVER_ENABLED / default branches when RMP_DISTRIBUTED_ENABLED=true but quorumEnabled=false. No console.warn exists for this misconfiguration path anywhere in the block. Kept.

Finding B (line 3067 — maxIdleReads bounds dead-leg detection at full job timeout): Confirmed. readOpts: { blockMs: 2000, maxIdleReads: Math.ceil(cfg.jobTimeoutMs / 2000) } at line 3067 sets the idle-read bound to exactly the full job timeout (e.g. 900 × 2 s = 1800 s for a 30-min job). transport.ts line 111 confirms the default is Number.POSITIVE_INFINITY, so without this bound the loop runs forever — but the chosen bound means a silently dead leg idles for the entire job window. The signal at line 3066 comes from reviewController.abort() which only fires when the LivenessWatchdog trips — and per the topology the watchdog stays alive as long as leg A feeds it events. maxIdleReads is therefore the sole dead-leg escape valve and it fires at the last possible moment. Kept.

No agreed findings (different lines, different concerns). No dropped findings.

Blast Radius

The diff spans 12 files across src/rmp/, src/worker.ts, tests/, docs/, and bin/. The new distributed-quorum path is fully flag-gated (RMP_DISTRIBUTED=1 && REVIEWER_QUORUM_ENABLED=1), so the existing in-process quorum and default reviewer paths are unaffected in production. The exported ModuleRunReport interface gains a new field (findingLocations) that is additive and tested.

BLAST_SCORE: 5/10

Risk Indicators

Indicator Value
Sensitive functions runDistributedQuorum, makeTransportLegRunner, readReviewEvents, runReviewViaProducer, runDistributedReviewViaDriver
Migration touched
Test delta +187 / -0 lines in test files
Dependency changes

CI status (head 66bee224fc69)

Overall: ✓ success

3 checks: 3 pending

Check State Link
pr-reviewer CI / tsc --noEmit (pull_request) pending details
pr-reviewer CI / vitest (pull_request) pending details
pr-reviewer CI / tsc build (pull_request) pending details

Findings (2)

[MINOR] Startup log silently swallows RMP_DISTRIBUTED=1 when quorum is disabled

src/worker.ts:2014

The ternary at line 2014 (quorumEnabled && RMP_DISTRIBUTED_ENABLED) evaluates false when RMP_DISTRIBUTED_ENABLED=true but quorumEnabled=false, so the startup log prints either the RMP_DRIVER path or reviewer.run (default) with no indication that RMP_DISTRIBUTED is set but inert. An operator who sets the flag but forgets REVIEWER_QUORUM_ENABLED=1 gets zero runtime feedback.

Proposed fix — add a warn before the existing console.log at line 2011:

if (RMP_DISTRIBUTED_ENABLED && !quorumEnabled) {
  console.warn(
    '[hib-pr-reviewer] RMP_DISTRIBUTED=1 is set but REVIEWER_QUORUM_ENABLED is off ' +
    '— distributed path is inactive; set REVIEWER_QUORUM_ENABLED=1 to enable it'
  );
}

This follows the rule-14 pattern the PR already uses for the liveness-path log.

[MINOR] maxIdleReads equals the full job timeout — dead-leg isolation only activates at deadline

src/worker.ts:3067

readOpts: { blockMs: 2000, maxIdleReads: Math.ceil(cfg.jobTimeoutMs / 2000) },

With a 30-min job timeout (jobTimeoutMs = 1_800_000), maxIdleReads = 900 and 900 × 2 s = 1800 s — the full timeout. If leg A completes in 5 minutes and leg B's container dies immediately after startup (emitting zero events), the readReviewEvents loop for leg B silently idles for the full 30 minutes before declaring the leg dead.

The LivenessWatchdog does not help here: it stays alive because leg A keeps feeding it events, so reviewController.abort() is never called and the signal (line 3066) does not cut leg B's loop short. maxIdleReads is the only dead-leg detection mechanism and it fires at the last possible moment.

The test at line 87 (maxIdleReads: 1) shows the knob is deliberately tunable. Consider a shorter fixed bound for fast silent-crash detection:

readOpts: { blockMs: 2_000, maxIdleReads: 60 }, // 2 min silence → dead; signal catches legitimately long legs

Alternatively, document the intentional trade-off ("degrade at deadline, not proactively") explicitly in a code comment so operators are not surprised by 30-min single-leg-dead reviews.

Verdict

CONDITIONAL_APPROVE


hib-pr-reviewer • round 1 • 2 findings (2m) • 2026-06-03T07:40:17.502Z → 2026-06-03T07:41:28.637Z • posted-as: pr-reviewer-bot • model: auto

<!-- hib-pr-reviewer collapsed --> > _Superseded by round 2._ <details> <summary>Show previous round</summary> <!-- hib-pr-reviewer round:1 --> ## hib-pr-reviewer review — PR #7 (hib/hib-pr-reviewer) **Round 1** — head `66bee224fc69`, base `main`, trigger `opened` **TL;DR:** CONDITIONAL_APPROVE — kept 2 unique-to-one findings (both verified at cited lines), no agreed findings, no blocking issues. ### Summary ## Reconciliation Two unique-to-one findings, both independently verified against the checked-out HEAD. **Finding A (line 2014 — startup log silent on `RMP_DISTRIBUTED=1` + quorum off):** Confirmed. Lines 2006–2020 show that the only log written in the `else` block is the single `console.log` at line 2011. The ternary `quorumEnabled && RMP_DISTRIBUTED_ENABLED` silently falls through to the `RMP_DRIVER_ENABLED` / default branches when `RMP_DISTRIBUTED_ENABLED=true` but `quorumEnabled=false`. No `console.warn` exists for this misconfiguration path anywhere in the block. **Kept.** **Finding B (line 3067 — `maxIdleReads` bounds dead-leg detection at full job timeout):** Confirmed. `readOpts: { blockMs: 2000, maxIdleReads: Math.ceil(cfg.jobTimeoutMs / 2000) }` at line 3067 sets the idle-read bound to exactly the full job timeout (e.g. 900 × 2 s = 1800 s for a 30-min job). `transport.ts` line 111 confirms the default is `Number.POSITIVE_INFINITY`, so without this bound the loop runs forever — but the chosen bound means a silently dead leg idles for the entire job window. The `signal` at line 3066 comes from `reviewController.abort()` which only fires when the `LivenessWatchdog` trips — and per the topology the watchdog stays alive as long as leg A feeds it events. `maxIdleReads` is therefore the sole dead-leg escape valve and it fires at the last possible moment. **Kept.** No agreed findings (different lines, different concerns). No dropped findings. ### Blast Radius The diff spans 12 files across `src/rmp/`, `src/worker.ts`, `tests/`, `docs/`, and `bin/`. The new distributed-quorum path is fully flag-gated (`RMP_DISTRIBUTED=1 && REVIEWER_QUORUM_ENABLED=1`), so the existing in-process quorum and default reviewer paths are unaffected in production. The exported `ModuleRunReport` interface gains a new field (`findingLocations`) that is additive and tested. **BLAST_SCORE: 5/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `runDistributedQuorum`, `makeTransportLegRunner`, `readReviewEvents`, `runReviewViaProducer`, `runDistributedReviewViaDriver` | | Migration touched | — | | Test delta | +187 / -0 lines in test files | | Dependency changes | — | ### CI status (head `66bee224fc69`) **Overall: ✓ success** 3 checks: 3 pending | Check | State | Link | |---|---|---| | pr-reviewer CI / tsc --noEmit (pull_request) | ⏳ pending | [details](/hib/hib-pr-reviewer/actions/runs/67/jobs/0) | | pr-reviewer CI / vitest (pull_request) | ⏳ pending | [details](/hib/hib-pr-reviewer/actions/runs/67/jobs/1) | | pr-reviewer CI / tsc build (pull_request) | ⏳ pending | [details](/hib/hib-pr-reviewer/actions/runs/67/jobs/2) | ### Related PRs - #4 - #5 ### Findings (2) #### **[MINOR]** Startup log silently swallows `RMP_DISTRIBUTED=1` when quorum is disabled _src/worker.ts:2014_ The ternary at line 2014 (`quorumEnabled && RMP_DISTRIBUTED_ENABLED`) evaluates false when `RMP_DISTRIBUTED_ENABLED=true` but `quorumEnabled=false`, so the startup log prints either the `RMP_DRIVER` path or `reviewer.run (default)` with no indication that `RMP_DISTRIBUTED` is set but inert. An operator who sets the flag but forgets `REVIEWER_QUORUM_ENABLED=1` gets zero runtime feedback. Proposed fix — add a warn before the existing `console.log` at line 2011: ```typescript if (RMP_DISTRIBUTED_ENABLED && !quorumEnabled) { console.warn( '[hib-pr-reviewer] RMP_DISTRIBUTED=1 is set but REVIEWER_QUORUM_ENABLED is off ' + '— distributed path is inactive; set REVIEWER_QUORUM_ENABLED=1 to enable it' ); } ``` This follows the rule-14 pattern the PR already uses for the liveness-path log. #### **[MINOR]** `maxIdleReads` equals the full job timeout — dead-leg isolation only activates at deadline _src/worker.ts:3067_ ```ts readOpts: { blockMs: 2000, maxIdleReads: Math.ceil(cfg.jobTimeoutMs / 2000) }, ``` With a 30-min job timeout (`jobTimeoutMs = 1_800_000`), `maxIdleReads = 900` and `900 × 2 s = 1800 s` — the full timeout. If leg A completes in 5 minutes and leg B's container dies immediately after startup (emitting zero events), the `readReviewEvents` loop for leg B silently idles for the full **30 minutes** before declaring the leg dead. The `LivenessWatchdog` does not help here: it stays alive because leg A keeps feeding it events, so `reviewController.abort()` is never called and the signal (line 3066) does not cut leg B's loop short. `maxIdleReads` is the **only** dead-leg detection mechanism and it fires at the last possible moment. The test at line 87 (`maxIdleReads: 1`) shows the knob is deliberately tunable. Consider a shorter fixed bound for fast silent-crash detection: ```ts readOpts: { blockMs: 2_000, maxIdleReads: 60 }, // 2 min silence → dead; signal catches legitimately long legs ``` Alternatively, document the intentional trade-off ("degrade at deadline, not proactively") explicitly in a code comment so operators are not surprised by 30-min single-leg-dead reviews. ### Verdict **CONDITIONAL_APPROVE** --- <sub>hib-pr-reviewer • round 1 • 2 findings (2m) • 2026-06-03T07:40:17.502Z → 2026-06-03T07:41:28.637Z • posted-as: pr-reviewer-bot • model: auto</sub> </details>
fix(rmp): address PR #7 review — fast dead-leg isolation + inert-flag warning
All checks were successful
pr-reviewer CI / tsc --noEmit (pull_request) Successful in 22s
pr-reviewer CI / vitest (pull_request) Successful in 59s
pr-reviewer CI / tsc build (pull_request) Successful in 35s
9933cfbea1
Round-1 review (CONDITIONAL_APPROVE, 2 minor):
- The distributed leg read used maxIdleReads = full job timeout, so a dead/hung
  leg wasn't detected for ~20min — defeating the FAST failure isolation that is
  the topology's whole value. Added RMP_DISTRIBUTED_LEG_SILENCE_MS (default 5min,
  must be < the job timeout): a leg silent that long is presumed dead → degrade
  to the survivor. A live leg streams events (incl. rate_limit_event while
  throttled) so it never accrues this much silence.
- Startup log now WARNS when RMP_DISTRIBUTED=1 but quorum is off (the flag is
  inert without quorum) — rule 14 visible-config, no longer silently swallowed.
- Documented the new knob (.env.example + configuration.md).
hibryda deleted branch feat/rmp-distributed-production 2026-06-03 09:46:25 +02:00

hib-pr-reviewer review — PR #7 (hib/hib-pr-reviewer)

Round 2 — head 9933cfbea1e7, base main, trigger synchronize

TL;DR: CONDITIONAL_APPROVE — 0 agreed findings (A had 0); both unique-to-B findings verified at HEAD and kept; no blocking or major issues.

Summary

Arbitration — Round 2

Reviewer A found 0 issues; Reviewer B found 2 minor issues, both unique-to-B. Both verified via Read and Grep at HEAD.

Finding B-1 — verified

src/worker.ts lines 2021-2030: the distributed-quorum branch of the startup log hard-codes the string "RMP distributed quorum (RMP_DISTRIBUTED=1, leg containers)" and does not interpolate RMP_DISTRIBUTED_LEG_A_MODULE (resolved at line 173), RMP_DISTRIBUTED_LEG_B_MODULE (line 174), or RMP_DISTRIBUTED_LEG_SILENCE_MS (lines 183-186, with a silent fallback to 300000 for non-integer inputs). An operator overriding any of these has no log evidence of the effective values. Rule 14 explicit intent: "Log effective config (secrets masked)." Kept.

Finding B-2 — verified

src/worker.ts lines 2033-2037: the only guard in the distributed-quorum startup block is the !quorumEnabled inert-mode warning. There is no assertion that RMP_DISTRIBUTED_LEG_SILENCE_MS < cfg.jobTimeoutMs. The .env.example (line 26) and docs/configuration.md both document this as a hard requirement; the parsing at lines 183-185 accepts any positive integer, so a value ≥ job timeout silently passes. This is exactly the misconfiguration class that round 1 fixed in the old hard-coded cfg.jobTimeoutMs / 2000 path — the fix is now re-openable via explicit config. Rule 14: "Validate required config at startup — fail fast." Kept.

No agreed findings (A had 0). Round-1 conditions both confirmed resolved by both reviewers.

Blast Radius

The diff touches 12 files across the RMP subsystem (src/rmp/, src/worker.ts, bin/, docs/, tests/). The core change (RMP_DISTRIBUTED flag-gate) is additive and flag-off is documented as byte-identical to the prior path. The two kept findings are both scoped to the startup-log / startup-validation block in worker.ts and have no runtime data-path consequence under correct configuration.

BLAST_SCORE: 4/10

Risk Indicators

Indicator Value
Sensitive functions runDistributedReviewViaDriver, runReviewViaProducer, makeWatchdogSink
Migration touched
Test delta +187 / -0 lines in test files
Dependency changes

CI status (head 9933cfbea1e7)

Overall: ✓ success

3 checks: 3 pending

Check State Link
pr-reviewer CI / tsc --noEmit (pull_request) pending details
pr-reviewer CI / vitest (pull_request) pending details
pr-reviewer CI / tsc build (pull_request) pending details

Findings (2)

[MINOR] Startup log omits effective RMP_DISTRIBUTED_LEG_SILENCE_MS and leg module IDs

src/worker.ts:2024

The startup log (lines 2021-2030) announces the distributed-quorum path as active but prints only the literal string "RMP distributed quorum (RMP_DISTRIBUTED=1, leg containers)" — it does not interpolate the resolved values of RMP_DISTRIBUTED_LEG_A_MODULE, RMP_DISTRIBUTED_LEG_B_MODULE, or RMP_DISTRIBUTED_LEG_SILENCE_MS. When any of these are overridden (or when RMP_DISTRIBUTED_LEG_SILENCE_MS silently reverts to 300000 because the supplied value is not a positive integer — see parsing at lines 183-185), there is no log evidence of what the engine is actually using.

Concrete fix — extend the distributed branch of the log template:

quorumEnabled && RMP_DISTRIBUTED_ENABLED
  ? `RMP distributed quorum (RMP_DISTRIBUTED=1, legs: ${RMP_DISTRIBUTED_LEG_A_MODULE}/${RMP_DISTRIBUTED_LEG_B_MODULE}, silence: ${RMP_DISTRIBUTED_LEG_SILENCE_MS}ms)`
  : ...

This satisfies the rule-14 intent already cited in the surrounding comment: "Log effective config (secrets masked)."

[MINOR] No startup assertion that RMP_DISTRIBUTED_LEG_SILENCE_MS < cfg.jobTimeoutMs

src/worker.ts:2033

.env.example (line 26) and docs/configuration.md both document that RMP_DISTRIBUTED_LEG_SILENCE_MS "MUST be < the job timeout", but there is no runtime enforcement. The parsing at lines 183-185 accepts any positive integer, so an operator setting e.g. RMP_DISTRIBUTED_LEG_SILENCE_MS=1800000 (equal to JOB_TIMEOUT_MS) passes validation silently — the per-leg watchdog fires at or after job expiry, nullifying the failure-isolation guarantee that is the primary value of this feature. This is precisely the regression class fixed in round 1 (old cfg.jobTimeoutMs / 2000), now re-openable via explicit misconfiguration.

Concrete fix — add a fail-fast assertion after the inert-mode warning block (around line 2037), gated on the active mode:

if (RMP_DISTRIBUTED_ENABLED && quorumEnabled && RMP_DISTRIBUTED_LEG_SILENCE_MS >= cfg.jobTimeoutMs) {
  throw new Error(
    `[hib-pr-reviewer] RMP_DISTRIBUTED_LEG_SILENCE_MS (${RMP_DISTRIBUTED_LEG_SILENCE_MS}ms) must be ` +
    `less than JOB_TIMEOUT_MS (${cfg.jobTimeoutMs}ms) — silence bound must trigger BEFORE job expiry`
  );
}

Rule 14: "Validate required config at startup — fail fast."

Verdict

CONDITIONAL_APPROVE


hib-pr-reviewer • round 2 • 2 findings (2m) • 2026-06-03T07:48:58.873Z → 2026-06-03T07:50:00.611Z • posted-as: pr-reviewer-bot • model: auto

<!-- hib-pr-reviewer round:2 --> ## hib-pr-reviewer review — PR #7 (hib/hib-pr-reviewer) **Round 2** — head `9933cfbea1e7`, base `main`, trigger `synchronize` **TL;DR:** CONDITIONAL_APPROVE — 0 agreed findings (A had 0); both unique-to-B findings verified at HEAD and kept; no blocking or major issues. ### Summary ## Arbitration — Round 2 Reviewer A found 0 issues; Reviewer B found 2 minor issues, both unique-to-B. Both verified via `Read` and `Grep` at HEAD. ### Finding B-1 — verified ✅ `src/worker.ts` lines 2021-2030: the distributed-quorum branch of the startup log hard-codes the string `"RMP distributed quorum (RMP_DISTRIBUTED=1, leg containers)"` and does **not** interpolate `RMP_DISTRIBUTED_LEG_A_MODULE` (resolved at line 173), `RMP_DISTRIBUTED_LEG_B_MODULE` (line 174), or `RMP_DISTRIBUTED_LEG_SILENCE_MS` (lines 183-186, with a silent fallback to 300000 for non-integer inputs). An operator overriding any of these has no log evidence of the effective values. Rule 14 explicit intent: *"Log effective config (secrets masked)."* **Kept.** ### Finding B-2 — verified ✅ `src/worker.ts` lines 2033-2037: the only guard in the distributed-quorum startup block is the `!quorumEnabled` inert-mode warning. There is no assertion that `RMP_DISTRIBUTED_LEG_SILENCE_MS < cfg.jobTimeoutMs`. The `.env.example` (line 26) and `docs/configuration.md` both document this as a hard requirement; the parsing at lines 183-185 accepts any positive integer, so a value ≥ job timeout silently passes. This is exactly the misconfiguration class that round 1 fixed in the old hard-coded `cfg.jobTimeoutMs / 2000` path — the fix is now re-openable via explicit config. Rule 14: *"Validate required config at startup — fail fast."* **Kept.** **No agreed findings (A had 0).** Round-1 conditions both confirmed resolved by both reviewers. ### Blast Radius The diff touches 12 files across the RMP subsystem (src/rmp/, src/worker.ts, bin/, docs/, tests/). The core change (RMP_DISTRIBUTED flag-gate) is additive and flag-off is documented as byte-identical to the prior path. The two kept findings are both scoped to the startup-log / startup-validation block in worker.ts and have no runtime data-path consequence under correct configuration. **BLAST_SCORE: 4/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `runDistributedReviewViaDriver`, `runReviewViaProducer`, `makeWatchdogSink` | | Migration touched | — | | Test delta | +187 / -0 lines in test files | | Dependency changes | — | ### CI status (head `9933cfbea1e7`) **Overall: ✓ success** 3 checks: 3 pending | Check | State | Link | |---|---|---| | pr-reviewer CI / tsc --noEmit (pull_request) | ⏳ pending | [details](/hib/hib-pr-reviewer/actions/runs/68/jobs/0) | | pr-reviewer CI / vitest (pull_request) | ⏳ pending | [details](/hib/hib-pr-reviewer/actions/runs/68/jobs/1) | | pr-reviewer CI / tsc build (pull_request) | ⏳ pending | [details](/hib/hib-pr-reviewer/actions/runs/68/jobs/2) | ### Related PRs - #4 - #5 ### Findings (2) #### **[MINOR]** Startup log omits effective `RMP_DISTRIBUTED_LEG_SILENCE_MS` and leg module IDs _src/worker.ts:2024_ The startup log (lines 2021-2030) announces the distributed-quorum path as active but prints only the literal string `"RMP distributed quorum (RMP_DISTRIBUTED=1, leg containers)"` — it does not interpolate the resolved values of `RMP_DISTRIBUTED_LEG_A_MODULE`, `RMP_DISTRIBUTED_LEG_B_MODULE`, or `RMP_DISTRIBUTED_LEG_SILENCE_MS`. When any of these are overridden (or when `RMP_DISTRIBUTED_LEG_SILENCE_MS` silently reverts to 300000 because the supplied value is not a positive integer — see parsing at lines 183-185), there is no log evidence of what the engine is actually using. Concrete fix — extend the distributed branch of the log template: ```typescript quorumEnabled && RMP_DISTRIBUTED_ENABLED ? `RMP distributed quorum (RMP_DISTRIBUTED=1, legs: ${RMP_DISTRIBUTED_LEG_A_MODULE}/${RMP_DISTRIBUTED_LEG_B_MODULE}, silence: ${RMP_DISTRIBUTED_LEG_SILENCE_MS}ms)` : ... ``` This satisfies the rule-14 intent already cited in the surrounding comment: *"Log effective config (secrets masked)."* #### **[MINOR]** No startup assertion that `RMP_DISTRIBUTED_LEG_SILENCE_MS < cfg.jobTimeoutMs` _src/worker.ts:2033_ `.env.example` (line 26) and `docs/configuration.md` both document that `RMP_DISTRIBUTED_LEG_SILENCE_MS` *"MUST be < the job timeout"*, but there is no runtime enforcement. The parsing at lines 183-185 accepts any positive integer, so an operator setting e.g. `RMP_DISTRIBUTED_LEG_SILENCE_MS=1800000` (equal to `JOB_TIMEOUT_MS`) passes validation silently — the per-leg watchdog fires at or after job expiry, nullifying the failure-isolation guarantee that is the primary value of this feature. This is precisely the regression class fixed in round 1 (old `cfg.jobTimeoutMs / 2000`), now re-openable via explicit misconfiguration. Concrete fix — add a fail-fast assertion after the inert-mode warning block (around line 2037), gated on the active mode: ```typescript if (RMP_DISTRIBUTED_ENABLED && quorumEnabled && RMP_DISTRIBUTED_LEG_SILENCE_MS >= cfg.jobTimeoutMs) { throw new Error( `[hib-pr-reviewer] RMP_DISTRIBUTED_LEG_SILENCE_MS (${RMP_DISTRIBUTED_LEG_SILENCE_MS}ms) must be ` + `less than JOB_TIMEOUT_MS (${cfg.jobTimeoutMs}ms) — silence bound must trigger BEFORE job expiry` ); } ``` Rule 14: *"Validate required config at startup — fail fast."* ### Verdict **CONDITIONAL_APPROVE** --- <sub>hib-pr-reviewer • round 2 • 2 findings (2m) • 2026-06-03T07:48:58.873Z → 2026-06-03T07:50:00.611Z • posted-as: pr-reviewer-bot • model: auto</sub>
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
hib/hib-pr-reviewer!7
No description provided.