affinity-intelligence-rework/im2be-realtime-service

Fork 0

feat: L0 T0 #6 PR-OPAQUE-7 — consume static fallback ticket pool (ADR-0002 §3.4) #2

Merged

hibryda merged 4 commits from feat/l0-t0-opaque-7-fallback-pool into main

2026-05-25 04:17:37 +02:00

hibryda commented

2026-05-25 03:27:00 +02:00

Owner

Summary

Final piece of the L0 Tranche-0 #6 PR-OPAQUE arc. Wires the realtime-service consumer side of the static fallback ticket pool per ADR-0002 §3.4 + the contract documented in flux-applications/apps/centrifugo/scripts/mint-fallback-tickets.sh.

The pool is loaded ONCE at startup into an in-memory Set<string>; degraded-mode connects look up tickets via O(1) Set membership, atomically delete on consumption, and decrement the new Prometheus gauge.

Decision flow

On /centrifugo/connect when identity-service Validate fails with UNAVAILABLE / DEADLINE_EXCEEDED:

OutageDetector.notifyFailure(now).
If (now - lastSuccessMs) <= FALLBACK_DEGRADED_WINDOW_MS → 503 (today's behaviour).
Else if pool is empty → 503.
Else if pool.has(ticket) → 200 + info.degraded=true + atomically delete.
Else → 503.

On any Validate response branch (success/disconnect/error), OutageDetector.notifySuccess runs — a wire response of any kind proves channel reachability.

NOT_FOUND / INVALID_ARGUMENT remain pure rejection paths — no fallback bypass for those.

Files

Application code (new):

src/fallback/pool.ts — loader (parses one-ticket-per-line file → Set; LF + CRLF; whitespace tolerant; security-sensitive — no ticket logging)
src/fallback/outage.ts — OutageDetector (windowMs gate; notifySuccess clears, notifyFailure increments; cold-start safe; lock-free, single-threaded-event-loop atomic)

Application code (modified):

src/config.ts — FALLBACK_TICKETS_PATH / FALLBACK_DEGRADED_WINDOW_MS / FALLBACK_LOW_WATERMARK env vars
src/handlers/connect.ts — fallback decision wired into the catch block; new helpers consumeFallbackTicket + buildFallbackPoolResult; FALLBACK_USER_PLACEHOLDER sentinel
src/metrics.ts — new outcome fallback_pool_consumed + new gauge realtime_fallback_pool_size
src/main.ts — boot-time pool load + gauge initialisation; INFO log carries pool status
src/server.ts — accepts fallbackPool + outageDetector + watermark deps; /health JSON surface extended with identity_outage + fallback_pool.size
src/types.ts — ConnectOutcome adds fallback_pool_consumed

Tests (new):

tests/fallback.pool.test.ts (14 tests) — parseLines, missing file, empty file, 10k load, low-watermark WARN, whitespace handling, no-ticket-logging
tests/fallback.outage.test.ts (10 tests) — cold-start, window-not-yet-elapsed, degraded after window, notifySuccess clears, consecutive count, boundary exact-vs-past-window, negative windowMs rejection
tests/handlers.connect.fallback.test.ts (9 tests) — happy fallback acceptance, ticket-not-in-pool 503, pre-degraded 503, DEADLINE_EXCEEDED parity, NOT_FOUND no-bypass, single-use second-call rejected, low-watermark WARN once, success clears detector, no ticket logging

Tests (modified):

tests/handlers.connect.test.ts — adapted to new ConnectHandlerDeps shape
tests/metrics.test.ts — gauge + new outcome label coverage (+7 tests)
tests/server.test.ts — /health shape assertion

Docs:

README.md — new "Fallback pool consumption (ADR-0002 §3.4)" section + env-var table extensions
.env.example — FALLBACK_* knobs documented

Out of scope

Flux deployment overlay mounting the centrifugo-fallback-tickets Secret + setting the FALLBACK_* env vars in the deployment manifest. That is a follow-up PR (PR-OPAQUE-5-FLUX-FOLLOWUP-fallback-mount) in the flux-applications repo. THIS PR ships the application-code side only.

Verification

npx tsc --noEmit --noUnusedLocals --noUnusedParameters — clean
npm test — 8 files, 82 tests passing (was 41 → +41)
npm run build — clean
docker build . — clean
Rule 62 (zero warnings) — clean across the touched module
Rule 01 (security) — ticket values never logged; verified by explicit test
Rule 14 (resilience-and-config) — fail-loud config validation; degradation visible via /health + gauge
Rule 69 (no duplicate reviews) — checked bq:pr-review:active/:waiting empty before push

Test plan

Reviewer R1: ADR-0002 §3.4 contract compliance check
Reviewer R1: per-process monotonic depletion vs per-outage rationale verified
Reviewer R1: ticket-non-logging discipline (rule 01) verified across all log paths
Reviewer R1: fail-closed contract on info.degraded=true documented for downstream proxies
Reviewer R1: handler ripple — OutageDetector.notifySuccess placement on all Validate response branches sound
Follow-up: flux-applications/apps/realtime-service overlay adds Secret mount + env vars

## Summary Final piece of the L0 Tranche-0 #6 PR-OPAQUE arc. Wires the realtime-service consumer side of the static fallback ticket pool per ADR-0002 §3.4 + the contract documented in `flux-applications/apps/centrifugo/scripts/mint-fallback-tickets.sh`. The pool is loaded ONCE at startup into an in-memory `Set<string>`; degraded-mode connects look up tickets via O(1) Set membership, atomically delete on consumption, and decrement the new Prometheus gauge. ## Decision flow On `/centrifugo/connect` when identity-service Validate fails with `UNAVAILABLE` / `DEADLINE_EXCEEDED`: 1. `OutageDetector.notifyFailure(now)`. 2. If `(now - lastSuccessMs) <= FALLBACK_DEGRADED_WINDOW_MS` → 503 (today's behaviour). 3. Else if pool is empty → 503. 4. Else if `pool.has(ticket)` → 200 + `info.degraded=true` + atomically delete. 5. Else → 503. On any Validate response branch (success/disconnect/error), `OutageDetector.notifySuccess` runs — a wire response of any kind proves channel reachability. `NOT_FOUND` / `INVALID_ARGUMENT` remain pure rejection paths — no fallback bypass for those. ## Files **Application code (new):** - `src/fallback/pool.ts` — loader (parses one-ticket-per-line file → Set; LF + CRLF; whitespace tolerant; security-sensitive — no ticket logging) - `src/fallback/outage.ts` — `OutageDetector` (windowMs gate; notifySuccess clears, notifyFailure increments; cold-start safe; lock-free, single-threaded-event-loop atomic) **Application code (modified):** - `src/config.ts` — `FALLBACK_TICKETS_PATH` / `FALLBACK_DEGRADED_WINDOW_MS` / `FALLBACK_LOW_WATERMARK` env vars - `src/handlers/connect.ts` — fallback decision wired into the catch block; new helpers `consumeFallbackTicket` + `buildFallbackPoolResult`; `FALLBACK_USER_PLACEHOLDER` sentinel - `src/metrics.ts` — new outcome `fallback_pool_consumed` + new gauge `realtime_fallback_pool_size` - `src/main.ts` — boot-time pool load + gauge initialisation; INFO log carries pool status - `src/server.ts` — accepts `fallbackPool` + `outageDetector` + watermark deps; `/health` JSON surface extended with `identity_outage` + `fallback_pool.size` - `src/types.ts` — `ConnectOutcome` adds `fallback_pool_consumed` **Tests (new):** - `tests/fallback.pool.test.ts` (14 tests) — parseLines, missing file, empty file, 10k load, low-watermark WARN, whitespace handling, no-ticket-logging - `tests/fallback.outage.test.ts` (10 tests) — cold-start, window-not-yet-elapsed, degraded after window, notifySuccess clears, consecutive count, boundary exact-vs-past-window, negative windowMs rejection - `tests/handlers.connect.fallback.test.ts` (9 tests) — happy fallback acceptance, ticket-not-in-pool 503, pre-degraded 503, DEADLINE_EXCEEDED parity, NOT_FOUND no-bypass, single-use second-call rejected, low-watermark WARN once, success clears detector, no ticket logging **Tests (modified):** - `tests/handlers.connect.test.ts` — adapted to new `ConnectHandlerDeps` shape - `tests/metrics.test.ts` — gauge + new outcome label coverage (+7 tests) - `tests/server.test.ts` — `/health` shape assertion **Docs:** - `README.md` — new "Fallback pool consumption (ADR-0002 §3.4)" section + env-var table extensions - `.env.example` — `FALLBACK_*` knobs documented ## Out of scope - **Flux deployment overlay** mounting the `centrifugo-fallback-tickets` Secret + setting the `FALLBACK_*` env vars in the deployment manifest. That is a follow-up PR (`PR-OPAQUE-5-FLUX-FOLLOWUP-fallback-mount`) in the `flux-applications` repo. THIS PR ships the application-code side only. ## Verification - `npx tsc --noEmit --noUnusedLocals --noUnusedParameters` — clean - `npm test` — 8 files, 82 tests passing (was 41 → +41) - `npm run build` — clean - `docker build .` — clean - Rule 62 (zero warnings) — clean across the touched module - Rule 01 (security) — ticket values never logged; verified by explicit test - Rule 14 (resilience-and-config) — fail-loud config validation; degradation visible via /health + gauge - Rule 69 (no duplicate reviews) — checked `bq:pr-review:active`/`:waiting` empty before push ## Test plan - [ ] Reviewer R1: ADR-0002 §3.4 contract compliance check - [ ] Reviewer R1: per-process monotonic depletion vs per-outage rationale verified - [ ] Reviewer R1: ticket-non-logging discipline (rule 01) verified across all log paths - [ ] Reviewer R1: fail-closed contract on `info.degraded=true` documented for downstream proxies - [ ] Reviewer R1: handler ripple — `OutageDetector.notifySuccess` placement on all Validate response branches sound - [ ] Follow-up: `flux-applications/apps/realtime-service` overlay adds Secret mount + env vars

hibryda added 1 commit

2026-05-25 03:27:00 +02:00

feat(realtime-service): L0 T0 #6 PR-OPAQUE-7 — consume static fallback ticket pool (ADR-0002 §3.4) 5e9d351dbb

Wire the realtime-service consumer side of the static fallback ticket pool.
Per ADR-0002 §3.4 + the contract documented in flux-applications/apps/
centrifugo/scripts/mint-fallback-tickets.sh: the file is loaded once at
startup into an in-memory Set<string>; degraded-mode connects look up the
ticket via O(1) Set membership, atomically delete on consumption, and
decrement the Prometheus gauge.

Decision flow on /centrifugo/connect when identity-service Validate fails
with UNAVAILABLE / DEADLINE_EXCEEDED:

  1. OutageDetector.notifyFailure(now).
  2. If (now - lastSuccessMs) <= FALLBACK_DEGRADED_WINDOW_MS  → 503 (today).
  3. Else if pool is empty                                    → 503 (today).
  4. Else if pool.has(ticket)                                 → 200 + degraded.
  5. Else                                                     → 503 (today).

On any Validate response branch (success / disconnect / error), the
OutageDetector resets — a wire response of any kind proves channel
reachability. The "error" branch from identity-service is its own transient
signal (Redis unreachable inside identity-service) but does NOT mean
identity-service is unreachable from our side. NOT_FOUND / INVALID_ARGUMENT
remain pure rejection paths — no fallback bypass.

Design decisions (load-bearing):

* Per-process monotonic depletion. The Set is loaded ONCE at boot and
  drains until process restart. NO online repopulation. ADR-0002 §3.4
  documents this as "Pool drains as connections consume tickets; alert
  when below 1k remaining" + recovery via quarterly operator rotation
  (mint-fallback-tickets.sh --force + pod rollout). Online repopulation
  would defeat the single-use contract.

* Fail-closed fallback identity. Fallback-pool connections expose NO real
  user / family / subscription context — identity-service is the source
  of truth and it's unreachable. The result body uses a sentinel user
  ("fallback:degraded") + info.degraded=true. Downstream subscribe /
  publish / rpc proxies (future PRs) MUST fail-closed on every sensitive
  operation when info.degraded=true.

* Atomic single-use via has+delete. Pool consumption is synchronous
  has()/delete() with no await between — Node single-threaded semantics
  make the race-window zero. Double-check via the return of delete().

* Low-watermark crossing. WARN logged exactly once per process when
  consumption crosses below FALLBACK_LOW_WATERMARK (gauge is the durable
  signal; log is for triage). Startup WARN if initial size already below.

* Local-dev safe. Default FALLBACK_TICKETS_PATH points at the production
  Secret mount; missing-file is a startup WARN (the local-dev path),
  not a fatal error. Empty-string disables the path entirely.

* Security (rule 01). Ticket VALUES are never logged on any path —
  startup, consumption, or watermark. Pool size + outage age are the
  only observable quantities.

Test coverage: 41 → 82 (+41 tests across 4 files).

Out of scope for this PR: the Flux deployment overlay that mounts the
centrifugo-fallback-tickets Secret + sets the FALLBACK_* env vars. That
is PR-OPAQUE-5-FLUX-FOLLOWUP-fallback-mount and lives in flux-applications/.

Refs ADR-0002 §3.4, mint-fallback-tickets.sh header contract.

pr-reviewer-bot commented

2026-05-25 03:33:45 +02:00

Member

Superseded by round 2.

Show previous round

hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service)

Round 1 — head 5e9d351dbb37, base main, trigger opened

TL;DR: NEEDS_WORK — kept 3 findings (2 agreed, 1 unique-to-A verified): one major config bug makes the documented disable-by-empty-string path unreachable, one minor JSDoc/implementation mismatch in the fallback user sentinel, one info-level misleading parameter on notifyFailure.

Summary

Arbitration — PR #2 im2be-realtime-service (Round 1)

Prior run history: None — first arbitration for this PR.

No-history submodule context: No stored patterns for this service.

Reconciliation

Finding 1 (F1) — Agreed A + B, verified MAJOR: Both reviewers independently flagged FALLBACK_TICKETS_PATH="" being swallowed by readString. Verified at src/config.ts:85–94: the helper checks value === undefined || value === "" and returns the fallback argument in both cases. The fallback for FALLBACK_TICKETS_PATH is the production Secret mount path (/var/run/secrets/...), NOT empty string. Consequence confirmed: setting FALLBACK_TICKETS_PATH= in a ConfigMap does NOT produce config.fallbackTicketsPath === "" — it produces the production path, so the if (path === "") disabled branch in pool.ts:72 is dead code from any env-var configuration. JSDoc, README, and .env.example all document this as the disable mechanism. B's major severity is the more accurate assessment: the production safety risk (operator believing fallback is disabled when it isn't because the Secret may be mounted at the default path) is real and verified.

Finding 2 (F2) — Agreed A + B, verified MINOR: Both reviewers flagged buildFallbackPoolResult ignoring _ticket. Verified at src/handlers/connect.ts:363–374: function signature is buildFallbackPoolResult(_ticket: string), the _ticket argument is never read, and the returned user is always FALLBACK_USER_PLACEHOLDER ("fallback:degraded"). The JSDoc at lines 352–362 explicitly describes a sha256-derived sentinel ("fallback:<sha256(ticket)[0..12]>"). The discrepancy collapses all concurrent fallback connections into one Centrifugo user bucket. Both reviewers agree on minor severity — kept.

Finding 3 (F3) — Unique to A, verified INFO: Reviewer A flagged notifyFailure(_now) discarding its parameter. Verified at src/fallback/outage.ts:94–101: the implementation ignores _now entirely; only hasAnyFailure and the counter are updated. The @param now JSDoc implies the timestamp is recorded, but it is not. Callers in connect.ts:177 pass a real now. This is by design (degraded gate depends only on lastSuccessMs set by notifySuccess), but the JSDoc misleads. Verified correct at info severity — kept.

Blast Radius

The diff spans 16 files across config, fallback logic (new outage detector + pool loader), the connect handler, metrics, server wiring, and tests. The fallback pool and outage detector are new modules injected into the existing connect handler. The config bug (Finding 1) affects startup behaviour in every deployment environment; the sentinel mismatch (Finding 2) affects Centrifugo-side observability for any degraded-mode connection.

BLAST_SCORE: 5/10

Risk Indicators

Indicator	Value
Sensitive functions	`consumeFallbackTicket`, `buildFallbackPoolResult`, `loadFallbackPool`, `OutageDetector.isInDegradedMode`
Migration touched	—
Test delta	+623 / -21 lines in test files
Dependency changes	—

CI status (head `5e9d351dbb37`)

No CI checks reported for this commit.

Findings (3)

[MAJOR] `FALLBACK_TICKETS_PATH=""` is silently absorbed by `readString`; the documented "disable by empty string" path is unreachable via env var

src/config.ts:156

readString (lines 85–94) collapses both undefined and "" to the fallback argument:

function readString(env: NodeJS.ProcessEnv, key: string, fallback?: string): string {
  const value = env[key];
  if (value === undefined || value === "") {
    if (fallback === undefined) { throw new ConfigError(...); }
    return fallback;  // ← reached for BOTH undefined AND ""
  }
  return value;
}

For FALLBACK_TICKETS_PATH the fallback is "/var/run/secrets/centrifugo-fallback-tickets/fallback-tickets.txt" — the production Secret mount path — not empty string. Therefore:

Setting FALLBACK_TICKETS_PATH="" in a Kubernetes ConfigMap gives config.fallbackTicketsPath === "/var/run/secrets/..." not "".
The if (path === "") disabled branch in pool.ts:72 is dead code from any real env-var configuration.
Five documentation points are wrong: .env.example ("Empty value disables the path"), src/config.ts JSDoc ("Empty string DISABLES fallback acceptance"), pool.ts JSDoc status table, pool.ts:62 param doc, and README.md ("Set FALLBACK_TICKETS_PATH=\"\" to disable the path entirely").

Production risk: an operator following the runbook who sets FALLBACK_TICKETS_PATH="" in a ConfigMap will find that if the Secret is mounted at the default path, the pool loads normally and fallback remains active — silently defeating the intended disable.

Minimal fix: introduce a helper that treats "" as a valid configured value (meaning disabled) rather than falling through to the default:

function readStringAllowEmpty(env: NodeJS.ProcessEnv, key: string, fallback: string): string {
  const value = env[key];
  if (value === undefined) return fallback;  // unset → default
  return value;                               // "" → "" (disabled), "/foo" → "/foo"
}
// In loadConfig:
fallbackTicketsPath: readStringAllowEmpty(env, "FALLBACK_TICKETS_PATH", ""),
// default="" means disabled locally; prod overlay sets the Secret mount path explicitly

This makes FALLBACK_TICKETS_PATH= actually work as documented and removes the confusing ENOENT WARN on dev laptops.

[MINOR] `buildFallbackPoolResult` accepts `_ticket` but never reads it; JSDoc describes a per-ticket sha256 sentinel that is not implemented — all fallback sessions share one Centrifugo user identity

src/handlers/connect.ts:363

The function signature is buildFallbackPoolResult(_ticket: string) and the docblock at lines 347–362 states:

"The sentinel is 'fallback:<sha256(ticket)[0..12]>' so two reconnects with the same fallback ticket would share an identity"

But the implementation returns FALLBACK_USER_PLACEHOLDER ("fallback:degraded") unconditionally; _ticket is never read. All fallback connections share a single Centrifugo user identity.

Behavioural consequence: Centrifugo uses user in its presence, connection-limit, and unsubscribe-by-user APIs. With a static sentinel, N simultaneously degraded connections all appear as the same Centrifugo user, collapsing per-connection tracking into one bucket.

Note: FALLBACK_USER_PLACEHOLDER is tested and exported, so any fix to Option A below must update the corresponding test assertions (e.g. tests/handlers.connect.fallback.test.ts:124).

Two acceptable resolutions:

Option A — implement the sha256 derivation as documented:

import { createHash } from "node:crypto";
function buildFallbackPoolResult(ticket: string) {
  const prefix = createHash("sha256").update(ticket).digest("hex").slice(0, 12);
  return { user: `fallback:${prefix}`, info: { degraded: true, fallback_path: "pool" } };
}

Option B — keep the static sentinel intentionally and fix the comment:

/**
 * All fallback-pool sessions intentionally share one Centrifugo user identity
 * ("fallback:degraded"). Downstream proxies gate on info.degraded, not user.
 * Centrifugo presence/connection-limit tracking will bucket all fallback
 * sessions together — acceptable given the degraded-mode context.
 */
export const FALLBACK_USER_PLACEHOLDER = "fallback:degraded";

and remove the incorrect sha256 description from the JSDoc.

[INFO] `notifyFailure(_now)` accepts and silently discards its timestamp parameter — JSDoc and callers imply it is recorded

src/fallback/outage.ts:94

notifyFailure(_now: number = Date.now()): void {
  this.hasAnyFailure = true;
  if (this.consecutiveFailureCount < Number.MAX_SAFE_INTEGER) {
    this.consecutiveFailureCount += 1;
  }
  // _now is never used
}

The @param now JSDoc says "wall-clock at the time the failure was observed", implying the timestamp is stored. It is not — only hasAnyFailure and the counter are updated. The degraded gate is now - this.lastSuccessMs > windowMs, driven entirely by notifySuccess, not by failure timestamps.

In connect.ts:177, the caller passes a real now (deps.outageDetector.notifyFailure(now)) under the apparent belief it matters. It does not.

Fix (either approach): Remove the _now parameter entirely and update callers:

/**
 * Record a transient Validate failure (UNAVAILABLE / DEADLINE_EXCEEDED).
 * The degraded gate is driven by lastSuccessMs (set by notifySuccess),
 * NOT by failure timestamps — no timestamp is recorded here.
 */
notifyFailure(): void {
  this.hasAnyFailure = true;
  if (this.consecutiveFailureCount < Number.MAX_SAFE_INTEGER) {
    this.consecutiveFailureCount += 1;
  }
}

Or, at minimum, add a clarifying note to the existing JSDoc and remove the @param now line so future maintainers understand the parameter is intentionally absent.

Verdict

NEEDS_WORK

_{hib-pr-reviewer • round 1 • 3 findings (1M/1m/1i) • 2026-05-25T01:31:25.463Z → 2026-05-25T01:33:45.729Z • posted-as: pr-reviewer-bot}

> _Superseded by round 2._ <details> <summary>Show previous round</summary> ## hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service) **Round 1** — head `5e9d351dbb37`, base `main`, trigger `opened` **TL;DR:** NEEDS_WORK — kept 3 findings (2 agreed, 1 unique-to-A verified): one major config bug makes the documented disable-by-empty-string path unreachable, one minor JSDoc/implementation mismatch in the fallback user sentinel, one info-level misleading parameter on notifyFailure. ### Summary ## Arbitration — PR #2 im2be-realtime-service (Round 1) **Prior run history:** None — first arbitration for this PR. **No-history submodule context:** No stored patterns for this service. ### Reconciliation **Finding 1 (F1) — Agreed A + B, verified MAJOR:** Both reviewers independently flagged `FALLBACK_TICKETS_PATH=""` being swallowed by `readString`. Verified at `src/config.ts:85–94`: the helper checks `value === undefined || value === ""` and returns the `fallback` argument in both cases. The fallback for `FALLBACK_TICKETS_PATH` is the production Secret mount path (`/var/run/secrets/...`), NOT empty string. Consequence confirmed: setting `FALLBACK_TICKETS_PATH=` in a ConfigMap does NOT produce `config.fallbackTicketsPath === ""` — it produces the production path, so the `if (path === "")` disabled branch in `pool.ts:72` is dead code from any env-var configuration. JSDoc, README, and `.env.example` all document this as the disable mechanism. B's `major` severity is the more accurate assessment: the production safety risk (operator believing fallback is disabled when it isn't because the Secret may be mounted at the default path) is real and verified. **Finding 2 (F2) — Agreed A + B, verified MINOR:** Both reviewers flagged `buildFallbackPoolResult` ignoring `_ticket`. Verified at `src/handlers/connect.ts:363–374`: function signature is `buildFallbackPoolResult(_ticket: string)`, the `_ticket` argument is never read, and the returned `user` is always `FALLBACK_USER_PLACEHOLDER` (`"fallback:degraded"`). The JSDoc at lines 352–362 explicitly describes a sha256-derived sentinel (`"fallback:<sha256(ticket)[0..12]>"`). The discrepancy collapses all concurrent fallback connections into one Centrifugo user bucket. Both reviewers agree on minor severity — kept. **Finding 3 (F3) — Unique to A, verified INFO:** Reviewer A flagged `notifyFailure(_now)` discarding its parameter. Verified at `src/fallback/outage.ts:94–101`: the implementation ignores `_now` entirely; only `hasAnyFailure` and the counter are updated. The `@param now` JSDoc implies the timestamp is recorded, but it is not. Callers in `connect.ts:177` pass a real `now`. This is by design (degraded gate depends only on `lastSuccessMs` set by `notifySuccess`), but the JSDoc misleads. Verified correct at info severity — kept. ### Blast Radius The diff spans 16 files across config, fallback logic (new outage detector + pool loader), the connect handler, metrics, server wiring, and tests. The fallback pool and outage detector are new modules injected into the existing connect handler. The config bug (Finding 1) affects startup behaviour in every deployment environment; the sentinel mismatch (Finding 2) affects Centrifugo-side observability for any degraded-mode connection. **BLAST_SCORE: 5/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `consumeFallbackTicket`, `buildFallbackPoolResult`, `loadFallbackPool`, `OutageDetector.isInDegradedMode` | | Migration touched | — | | Test delta | +623 / -21 lines in test files | | Dependency changes | — | ### CI status (head `5e9d351dbb37`) _No CI checks reported for this commit._ ### Findings (3) #### **[MAJOR]** `FALLBACK_TICKETS_PATH=""` is silently absorbed by `readString`; the documented "disable by empty string" path is unreachable via env var _src/config.ts:156_ `readString` (lines 85–94) collapses both `undefined` and `""` to the `fallback` argument: ```typescript function readString(env: NodeJS.ProcessEnv, key: string, fallback?: string): string { const value = env[key]; if (value === undefined || value === "") { if (fallback === undefined) { throw new ConfigError(...); } return fallback; // ← reached for BOTH undefined AND "" } return value; } ``` For `FALLBACK_TICKETS_PATH` the fallback is `"/var/run/secrets/centrifugo-fallback-tickets/fallback-tickets.txt"` — the production Secret mount path — **not** empty string. Therefore: 1. Setting `FALLBACK_TICKETS_PATH=""` in a Kubernetes ConfigMap gives `config.fallbackTicketsPath === "/var/run/secrets/..."` not `""`. 2. The `if (path === "")` disabled branch in `pool.ts:72` is dead code from any real env-var configuration. 3. Five documentation points are wrong: `.env.example` ("Empty value disables the path"), `src/config.ts` JSDoc ("Empty string DISABLES fallback acceptance"), `pool.ts` JSDoc status table, `pool.ts:62` param doc, and `README.md` ("Set `FALLBACK_TICKETS_PATH=\"\"` to disable the path entirely"). **Production risk:** an operator following the runbook who sets `FALLBACK_TICKETS_PATH=""` in a ConfigMap will find that if the Secret is mounted at the default path, the pool loads normally and fallback remains active — silently defeating the intended disable. **Minimal fix:** introduce a helper that treats `""` as a valid configured value (meaning disabled) rather than falling through to the default: ```typescript function readStringAllowEmpty(env: NodeJS.ProcessEnv, key: string, fallback: string): string { const value = env[key]; if (value === undefined) return fallback; // unset → default return value; // "" → "" (disabled), "/foo" → "/foo" } // In loadConfig: fallbackTicketsPath: readStringAllowEmpty(env, "FALLBACK_TICKETS_PATH", ""), // default="" means disabled locally; prod overlay sets the Secret mount path explicitly ``` This makes `FALLBACK_TICKETS_PATH=` actually work as documented and removes the confusing ENOENT WARN on dev laptops. #### **[MINOR]** `buildFallbackPoolResult` accepts `_ticket` but never reads it; JSDoc describes a per-ticket sha256 sentinel that is not implemented — all fallback sessions share one Centrifugo user identity _src/handlers/connect.ts:363_ The function signature is `buildFallbackPoolResult(_ticket: string)` and the docblock at lines 347–362 states: > "The sentinel is `'fallback:<sha256(ticket)[0..12]>'` so two reconnects with the same fallback ticket would share an identity" But the implementation returns `FALLBACK_USER_PLACEHOLDER` (`"fallback:degraded"`) unconditionally; `_ticket` is never read. All fallback connections share a single Centrifugo user identity. **Behavioural consequence:** Centrifugo uses `user` in its presence, connection-limit, and unsubscribe-by-user APIs. With a static sentinel, N simultaneously degraded connections all appear as the same Centrifugo user, collapsing per-connection tracking into one bucket. Note: `FALLBACK_USER_PLACEHOLDER` is tested and exported, so any fix to Option A below must update the corresponding test assertions (e.g. `tests/handlers.connect.fallback.test.ts:124`). **Two acceptable resolutions:** **Option A** — implement the sha256 derivation as documented: ```typescript import { createHash } from "node:crypto"; function buildFallbackPoolResult(ticket: string) { const prefix = createHash("sha256").update(ticket).digest("hex").slice(0, 12); return { user: `fallback:${prefix}`, info: { degraded: true, fallback_path: "pool" } }; } ``` **Option B** — keep the static sentinel intentionally and fix the comment: ```typescript /** * All fallback-pool sessions intentionally share one Centrifugo user identity * ("fallback:degraded"). Downstream proxies gate on info.degraded, not user. * Centrifugo presence/connection-limit tracking will bucket all fallback * sessions together — acceptable given the degraded-mode context. */ export const FALLBACK_USER_PLACEHOLDER = "fallback:degraded"; ``` and remove the incorrect sha256 description from the JSDoc. #### **[INFO]** `notifyFailure(_now)` accepts and silently discards its timestamp parameter — JSDoc and callers imply it is recorded _src/fallback/outage.ts:94_ ```typescript notifyFailure(_now: number = Date.now()): void { this.hasAnyFailure = true; if (this.consecutiveFailureCount < Number.MAX_SAFE_INTEGER) { this.consecutiveFailureCount += 1; } // _now is never used } ``` The `@param now` JSDoc says "wall-clock at the time the failure was observed", implying the timestamp is stored. It is not — only `hasAnyFailure` and the counter are updated. The degraded gate is `now - this.lastSuccessMs > windowMs`, driven entirely by `notifySuccess`, not by failure timestamps. In `connect.ts:177`, the caller passes a real `now` (`deps.outageDetector.notifyFailure(now)`) under the apparent belief it matters. It does not. **Fix (either approach):** Remove the `_now` parameter entirely and update callers: ```typescript /** * Record a transient Validate failure (UNAVAILABLE / DEADLINE_EXCEEDED). * The degraded gate is driven by lastSuccessMs (set by notifySuccess), * NOT by failure timestamps — no timestamp is recorded here. */ notifyFailure(): void { this.hasAnyFailure = true; if (this.consecutiveFailureCount < Number.MAX_SAFE_INTEGER) { this.consecutiveFailureCount += 1; } } ``` Or, at minimum, add a clarifying note to the existing JSDoc and remove the `@param now` line so future maintainers understand the parameter is intentionally absent. ### Verdict **NEEDS_WORK** --- hib-pr-reviewer • round 1 • 3 findings (1M/1m/1i) • 2026-05-25T01:31:25.463Z → 2026-05-25T01:33:45.729Z • posted-as: pr-reviewer-bot </details>

hibryda added 1 commit

2026-05-25 03:41:49 +02:00

fix(realtime-service): PR-OPAQUE-7 R1 verdict — 3 findings (1 MAJOR + 1 MINOR + 1 INFO) 3833938e82

R1 verdict (head 5e9d351) findings (kept=3):

(1) MAJOR src/config.ts:156 — FALLBACK_TICKETS_PATH="" did not disable
    the path. The prior `readString` helper (lines 85-94) collapses both
    undefined and "" to the fallback argument; with the prod-mount-path
    as the fallback, an operator setting FALLBACK_TICKETS_PATH="" in a
    ConfigMap silently kept the pool active. Documentation in 5 places
    (.env.example, src/config.ts JSDoc, pool.ts JSDoc, README.md ×2)
    claimed empty=disabled.

    Fix: new `readStringAllowEmpty` helper that treats "" as a valid
    configured value (only `undefined` falls back to the default). The
    default for FALLBACK_TICKETS_PATH changes from the prod-mount-path
    to "" — disabled by default, opt-in via explicit ConfigMap entry in
    the production overlay. Local dev is now never accidentally opted
    in. Existing fallthrough-to-503 behaviour preserved by the loader's
    empty-path branch.

    Loader's empty-path log demoted from WARN to INFO — empty IS the
    documented default; WARN is reserved for anomalies (missing file
    when path is set, empty file, low-watermark crossed).

    Tests added (tests/config.test.ts):
    - FALLBACK_TICKETS_PATH unset → ""
    - FALLBACK_TICKETS_PATH="" → "" (explicit disable honoured)
    - FALLBACK_TICKETS_PATH=/path → /path verbatim
    Plus existing window + watermark coverage extended.

(2) MINOR src/handlers/connect.ts:363 — buildFallbackPoolResult(_ticket)
    accepted the ticket but never read it; JSDoc described a per-ticket
    sha256 sentinel but the implementation collapsed every fallback
    connection into the static FALLBACK_USER_PLACEHOLDER. Centrifugo's
    presence + connection-limit + unsubscribe-by-user APIs bucket by
    `user` string — collapsing all degraded connections into one
    bucket meant: presence reports 1 user with N clients; disconnect-
    by-user would drop ALL degraded connections; connection-limit
    treated the pool as a single user.

    Fix (Option A — implement the documented sha256 derivation): user
    = "fallback:" + first 12 hex chars of sha256(ticket). 48 bits gives
    1.8×10⁻⁷ birthday collision probability across 10k tickets — and
    a collision is only a cosmetic presence-merge, NOT an authorisation
    grant (info.degraded is the actual gate). FALLBACK_USER_PLACEHOLDER
    renamed to FALLBACK_USER_PREFIX = "fallback:". Ticket value flows
    through Node crypto's update() only — never echoed in spans, logs,
    or response body.

    Tests added (tests/handlers.connect.fallback.test.ts):
    - distinct tickets → distinct user identifiers
    - same ticket twice → same user identifier (derivation is stable)
    Existing 200-OK fallback assertion updated to match
    /^fallback:[0-9a-f]{12}$/.

(3) INFO src/fallback/outage.ts:94 — notifyFailure(_now) accepted a
    timestamp that the implementation never read. The degraded gate is
    `now - lastSuccessMs > windowMs` driven entirely by notifySuccess'
    recorded timestamp; the failure side only flips hasAnyFailure +
    bumps the counter.

    Fix: drop the `_now` parameter from notifyFailure. Callers in
    src/handlers/connect.ts:179 + tests updated. JSDoc reframed to make
    the timeless semantics explicit.

Fix:

Verification:
- npm test                                       → 94 passing (was 82; +12)
- npx tsc --noEmit --noUnusedLocals --noUnusedParameters → clean
- npm run build                                  → clean
- ssh NAS redis bq:pr-review:active/:waiting     → 0/0 (rule 69)

pr-reviewer-bot commented

2026-05-25 03:48:51 +02:00

Member

Superseded by round 3.

Show previous round

hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service)

Round 2 — head 3833938e8294, base main, trigger synchronize

TL;DR: NEEDS_WORK — kept 4 findings (1 agreed minor, 2 unique-A minor+info, 1 unique-B minor); all 3 R1 findings confirmed resolved.

Summary

Round 2 Arbitration — 4 findings kept (3 minor, 1 info)

Memora persistence

Made best-effort Memora persistence; all tool calls were denied by the permission layer. No prior Memora history found for this PR — treating as first recorded run.

R1 disposition (all three resolved)

All three R1 findings confirmed resolved by both reviewers and independently verified: readStringAllowEmpty at src/config.ts:194; ticket actively hashed at src/handlers/connect.ts:382; notifyFailure() parameter removed at src/fallback/outage.ts:102. These are closed and NOT re-raised.

Reconciliation summary

A#3 + B#1 → AGREED MINOR (src/fallback/pool.ts lines 94–98): Both reviewers independently flagged the same root cause — the catch block does not distinguish ENOENT from EACCES/EIO/other. Log message says "file missing" and status is "missing" regardless of error class. Verified: lines 94–98 confirm a single uniform return path covering all errors. Kept at minor (B's severity, higher than A's info; both agree on the defect).
A#1 → UNIQUE-A MINOR (src/handlers/connect.ts line 378): buildFallbackPoolResult returns { user, info } with no expire_at. Verified: lines 378–393 confirm absence. buildSuccessResult (line 425–436) conditionally sets expire_at; the asymmetry is real. Kept.
A#2 → UNIQUE-A INFO (src/fallback/pool.ts line 90): Comment says "fail open / to an empty Set" but an empty pool means every degraded-mode connect gets 503 — that is fail closed for callers. Verified: line 90 contains exact phrase. Kept.
B#2 → UNIQUE-B MINOR (src/config.ts line 198): Guard rejects < 0 but silently accepts 0. Verified: isInDegradedMode at outage.ts:121 uses > (strict), so windowMs=0 means any ms after boot satisfies the condition — first gRPC fault immediately opens the gate. Constructor comment at line 31–34 explicitly documents cold-start protection, which breaks at windowMs=0. Kept.

Blast Radius

This PR touches 9 production source files spanning config loading, the connect handler (auth-path hot code), the outage detector, pool loader, health endpoint, and metrics. The fallback ticket consumption is a security-sensitive code path — it is the mechanism by which un-authenticated connections are accepted during identity-service outages. Errors in this path affect every connection attempt made while identity-service is unreachable.

BLAST_SCORE: 6/10

Risk Indicators

Indicator	Value
Sensitive functions	`buildFallbackPoolResult`, `loadFallbackPool`, `isInDegradedMode`, `consumeFallbackTicket`, `connectHandler`
Migration touched	—
Test delta	+737 / -21 lines in test files
Dependency changes	—

CI status (head `3833938e8294`)

No CI checks reported for this commit.

PR-OPAQUE-5
PR-OPAQUE-5-FLUX-FOLLOWUP-fallback-mount

Findings (4)

[MINOR] catch block emits wrong log message and `status: "missing"` for non-ENOENT read errors — operator cannot distinguish mis-configured RBAC from unprovisioned Secret

src/fallback/pool.ts:94

Lines 86–98: the catch block has a single execution path regardless of errno. For ENOENT (Secret not yet provisioned) AND for EACCES / EIO / any other class, the logger emits "fallback ticket pool file missing — degraded-mode fallback DISABLED" and returns status: "missing". An operator whose pod has the Secret mounted at the correct path but lacks read permission (wrong fsGroup / missing defaultMode) would see size: 0 in /health, the literal text "file missing" in the WARN log, and waste time verifying Secret provisioning rather than checking RBAC.

The errno field is present in the structured log payload (line 95), which helps a careful reader, but the human-readable message body actively misleads.

Concrete fix — branch on errno inside the catch:

const errno = (err as NodeJS.ErrnoException).code ?? "unknown";
if (errno === "ENOENT") {
  logger.warn({ path, errno, pool_status: "missing" },
    "fallback ticket pool file missing — degraded-mode fallback DISABLED");
  return { pool: new Set<string>(), initialSize: 0, status: "missing" };
} else {
  logger.warn({ path, errno, pool_status: "unreadable" },
    "fallback ticket pool file unreadable — degraded-mode fallback DISABLED");
  return { pool: new Set<string>(), initialSize: 0, status: "unreadable" };
}

Also extend FallbackPoolLoadResult.status to "loaded" | "disabled" | "missing" | "empty" | "unreadable" and surface the status string in the /health response alongside size. Alertmanager / a human reader can then distinguish the two failure classes without grepping logs.

[MINOR] `buildFallbackPoolResult` emits no `expire_at` — fallback-accepted connections persist indefinitely after identity-service recovers

src/handlers/connect.ts:378

buildFallbackPoolResult (lines 378–393) returns { user, info } with no expire_at field. By contrast, buildSuccessResult (lines 425–436) conditionally sets expire_at from success.expiresAt — this is the field Centrifugo uses to schedule re-authentication via refresh_proxy. Without it, a client accepted during a 20-minute outage remains connected with info.degraded=true indefinitely — even hours after identity-service is back — unless it disconnects voluntarily.

Downstream proxies are required to fail-closed on degraded=true, which limits blast radius. But the stale connection is still undesirable: the user never gets upgraded to a real identity without an explicit reconnect.

Concrete fix — add an expire_at parameter to the call site and function:

function buildFallbackPoolResult(ticket: string, nowS: number): {
  user: string;
  info: Record<string, unknown>;
  expire_at: number;
} {
  const prefix = createHash("sha256").update(ticket).digest("hex").slice(0, 12);
  return {
    user: `${FALLBACK_USER_PREFIX}${prefix}`,
    info: { degraded: true, fallback_path: "pool" },
    expire_at: nowS + FALLBACK_CONNECT_TTL_S, // e.g. 300
  };
}

The call site already has const now = Date.now() (line 179); pass Math.floor(now / 1000). FALLBACK_CONNECT_TTL_S can be a new config knob (default 300 s) or a file-level constant. The Centrifugo refresh_proxy callback (future PR) would then prompt an inline token refresh, reverting to the real identity once it succeeds.

[MINOR] `FALLBACK_DEGRADED_WINDOW_MS=0` silently accepted — any single gRPC fault immediately opens the fallback pool gate

src/config.ts:198

Line 198 guards fallbackDegradedWindowMs < 0 but silently accepts 0. In src/fallback/outage.ts:121:

return now - this.lastSuccessMs > this.windowMs; // windowMs=0 → > 0

The > is strict. lastSuccessMs is initialised to process-start time (constructor line 80). So with windowMs=0: process boots, some time elapses, the first UNAVAILABLE / DEADLINE_EXCEEDED arrives → notifyFailure() sets hasAnyFailure=true → isInDegradedMode(now) evaluates now − startMs > 0 → true (trivially, since any ms has passed) → degraded mode is immediately active → the fallback pool is drained on the very first gRPC blip. The constructor's own JSDoc (line 31–34) explicitly documents that cold-start protection breaks down at windowMs=0, but config validation doesn't enforce it.

Test helpers intentionally inject windowMs: 0 to avoid sleeping, but nothing prevents an operator from setting FALLBACK_DEGRADED_WINDOW_MS=0 in a ConfigMap.

Concrete fix — add a combined guard after per-field validation, before return Object.freeze(config):

if (config.fallbackTicketsPath !== "" && config.fallbackDegradedWindowMs === 0) {
  throw new ConfigError(
    "FALLBACK_DEGRADED_WINDOW_MS=0 is not valid when fallback pool is enabled " +
    "(any single gRPC fault immediately opens the gate — ADR-0002 §3.4 default: 5000). " +
    "Test code should inject OutageDetector directly."
  );
}

Alternatively emit a loud WARN and accept 0 only when fallbackTicketsPath is empty. Throwing ConfigError is the safer choice: it fails fast at boot rather than silently misbehaving during a production incident.

[INFO] `"fail open"` comment uses the phrase in the opposite sense from its standard security meaning

src/fallback/pool.ts:90

Line 90: // Other errors (permission denied, etc.) are also tolerated — fail open / to an empty Set rather than crash boot.

In security literature, "fail open" means granting access when the guard is unavailable. The actual behaviour here is the opposite: an unreadable file produces an empty pool, which means every degraded-mode connect falls through to 503 — that is fail closed for clients attempting fallback-path connections. A future security auditor scanning the codebase for "fail open" patterns will find this comment and incorrectly conclude that the error path grants access.

Suggested replacement:

// Other errors (EACCES, etc.) are also tolerated — the service starts
// with an empty pool (fallback path stays DISABLED) rather than crashing
// at boot. The /health surface + errno in the WARN log make the
// mis-mount visible to operators. This is fail-CLOSED for callers:
// every degraded-mode connect will receive 503 until the pool is fixed.

Verdict

NEEDS_WORK

_{hib-pr-reviewer • round 2 • 4 findings (3m/1i) • 2026-05-25T01:45:39.767Z → 2026-05-25T01:48:51.111Z • posted-as: pr-reviewer-bot}

> _Superseded by round 3._ <details> <summary>Show previous round</summary> ## hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service) **Round 2** — head `3833938e8294`, base `main`, trigger `synchronize` **TL;DR:** NEEDS_WORK — kept 4 findings (1 agreed minor, 2 unique-A minor+info, 1 unique-B minor); all 3 R1 findings confirmed resolved. ### Summary ## Round 2 Arbitration — 4 findings kept (3 minor, 1 info) ### Memora persistence Made best-effort Memora persistence; all tool calls were denied by the permission layer. No prior Memora history found for this PR — treating as first recorded run. ### R1 disposition (all three resolved) All three R1 findings confirmed resolved by both reviewers and independently verified: `readStringAllowEmpty` at `src/config.ts:194`; `ticket` actively hashed at `src/handlers/connect.ts:382`; `notifyFailure()` parameter removed at `src/fallback/outage.ts:102`. These are closed and NOT re-raised. ### Reconciliation summary - **A#3 + B#1 → AGREED MINOR** (`src/fallback/pool.ts` lines 94–98): Both reviewers independently flagged the same root cause — the catch block does not distinguish `ENOENT` from `EACCES`/`EIO`/other. Log message says "file missing" and `status` is `"missing"` regardless of error class. Verified: lines 94–98 confirm a single uniform return path covering all errors. Kept at **minor** (B's severity, higher than A's info; both agree on the defect). - **A#1 → UNIQUE-A MINOR** (`src/handlers/connect.ts` line 378): `buildFallbackPoolResult` returns `{ user, info }` with no `expire_at`. Verified: lines 378–393 confirm absence. `buildSuccessResult` (line 425–436) conditionally sets `expire_at`; the asymmetry is real. Kept. - **A#2 → UNIQUE-A INFO** (`src/fallback/pool.ts` line 90): Comment says "fail open / to an empty Set" but an empty pool means every degraded-mode connect gets 503 — that is fail **closed** for callers. Verified: line 90 contains exact phrase. Kept. - **B#2 → UNIQUE-B MINOR** (`src/config.ts` line 198): Guard rejects `< 0` but silently accepts `0`. Verified: `isInDegradedMode` at `outage.ts:121` uses `>` (strict), so `windowMs=0` means any ms after boot satisfies the condition — first gRPC fault immediately opens the gate. Constructor comment at line 31–34 explicitly documents cold-start protection, which breaks at `windowMs=0`. Kept. ### Blast Radius This PR touches 9 production source files spanning config loading, the connect handler (auth-path hot code), the outage detector, pool loader, health endpoint, and metrics. The fallback ticket consumption is a security-sensitive code path — it is the mechanism by which un-authenticated connections are accepted during identity-service outages. Errors in this path affect every connection attempt made while identity-service is unreachable. **BLAST_SCORE: 6/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `buildFallbackPoolResult`, `loadFallbackPool`, `isInDegradedMode`, `consumeFallbackTicket`, `connectHandler` | | Migration touched | — | | Test delta | +737 / -21 lines in test files | | Dependency changes | — | ### CI status (head `3833938e8294`) _No CI checks reported for this commit._ ### Related PRs - PR-OPAQUE-5 - PR-OPAQUE-5-FLUX-FOLLOWUP-fallback-mount ### Findings (4) #### **[MINOR]** catch block emits wrong log message and `status: "missing"` for non-ENOENT read errors — operator cannot distinguish mis-configured RBAC from unprovisioned Secret _src/fallback/pool.ts:94_ Lines 86–98: the `catch` block has a single execution path regardless of `errno`. For `ENOENT` (Secret not yet provisioned) AND for `EACCES` / `EIO` / any other class, the logger emits `"fallback ticket pool file missing — degraded-mode fallback DISABLED"` and returns `status: "missing"`. An operator whose pod has the Secret mounted at the correct path but lacks read permission (wrong `fsGroup` / missing `defaultMode`) would see `size: 0` in `/health`, the literal text "file missing" in the WARN log, and waste time verifying Secret provisioning rather than checking RBAC. The `errno` field is present in the structured log payload (line 95), which helps a careful reader, but the human-readable message body actively misleads. **Concrete fix** — branch on `errno` inside the catch: ```typescript const errno = (err as NodeJS.ErrnoException).code ?? "unknown"; if (errno === "ENOENT") { logger.warn({ path, errno, pool_status: "missing" }, "fallback ticket pool file missing — degraded-mode fallback DISABLED"); return { pool: new Set<string>(), initialSize: 0, status: "missing" }; } else { logger.warn({ path, errno, pool_status: "unreadable" }, "fallback ticket pool file unreadable — degraded-mode fallback DISABLED"); return { pool: new Set<string>(), initialSize: 0, status: "unreadable" }; } ``` Also extend `FallbackPoolLoadResult.status` to `"loaded" | "disabled" | "missing" | "empty" | "unreadable"` and surface the status string in the `/health` response alongside `size`. Alertmanager / a human reader can then distinguish the two failure classes without grepping logs. #### **[MINOR]** `buildFallbackPoolResult` emits no `expire_at` — fallback-accepted connections persist indefinitely after identity-service recovers _src/handlers/connect.ts:378_ `buildFallbackPoolResult` (lines 378–393) returns `{ user, info }` with no `expire_at` field. By contrast, `buildSuccessResult` (lines 425–436) conditionally sets `expire_at` from `success.expiresAt` — this is the field Centrifugo uses to schedule re-authentication via `refresh_proxy`. Without it, a client accepted during a 20-minute outage remains connected with `info.degraded=true` indefinitely — even hours after identity-service is back — unless it disconnects voluntarily. Downstream proxies are required to fail-closed on `degraded=true`, which limits blast radius. But the stale connection is still undesirable: the user never gets upgraded to a real identity without an explicit reconnect. **Concrete fix** — add an `expire_at` parameter to the call site and function: ```typescript function buildFallbackPoolResult(ticket: string, nowS: number): { user: string; info: Record<string, unknown>; expire_at: number; } { const prefix = createHash("sha256").update(ticket).digest("hex").slice(0, 12); return { user: `${FALLBACK_USER_PREFIX}${prefix}`, info: { degraded: true, fallback_path: "pool" }, expire_at: nowS + FALLBACK_CONNECT_TTL_S, // e.g. 300 }; } ``` The call site already has `const now = Date.now()` (line 179); pass `Math.floor(now / 1000)`. `FALLBACK_CONNECT_TTL_S` can be a new config knob (default 300 s) or a file-level constant. The Centrifugo `refresh_proxy` callback (future PR) would then prompt an inline token refresh, reverting to the real identity once it succeeds. #### **[MINOR]** `FALLBACK_DEGRADED_WINDOW_MS=0` silently accepted — any single gRPC fault immediately opens the fallback pool gate _src/config.ts:198_ Line 198 guards `fallbackDegradedWindowMs < 0` but silently accepts `0`. In `src/fallback/outage.ts:121`: ```typescript return now - this.lastSuccessMs > this.windowMs; // windowMs=0 → > 0 ``` The `>` is strict. `lastSuccessMs` is initialised to process-start time (constructor line 80). So with `windowMs=0`: process boots, some time elapses, the first `UNAVAILABLE` / `DEADLINE_EXCEEDED` arrives → `notifyFailure()` sets `hasAnyFailure=true` → `isInDegradedMode(now)` evaluates `now − startMs > 0` → `true` (trivially, since any ms has passed) → degraded mode is immediately active → the fallback pool is drained on the very first gRPC blip. The constructor's own JSDoc (line 31–34) explicitly documents that cold-start protection breaks down at `windowMs=0`, but config validation doesn't enforce it. Test helpers intentionally inject `windowMs: 0` to avoid sleeping, but nothing prevents an operator from setting `FALLBACK_DEGRADED_WINDOW_MS=0` in a ConfigMap. **Concrete fix** — add a combined guard after per-field validation, before `return Object.freeze(config)`: ```typescript if (config.fallbackTicketsPath !== "" && config.fallbackDegradedWindowMs === 0) { throw new ConfigError( "FALLBACK_DEGRADED_WINDOW_MS=0 is not valid when fallback pool is enabled " + "(any single gRPC fault immediately opens the gate — ADR-0002 §3.4 default: 5000). " + "Test code should inject OutageDetector directly." ); } ``` Alternatively emit a loud `WARN` and accept `0` only when `fallbackTicketsPath` is empty. Throwing `ConfigError` is the safer choice: it fails fast at boot rather than silently misbehaving during a production incident. #### **[INFO]** `"fail open"` comment uses the phrase in the opposite sense from its standard security meaning _src/fallback/pool.ts:90_ Line 90: `// Other errors (permission denied, etc.) are also tolerated — fail open / to an empty Set rather than crash boot.` In security literature, "fail open" means granting access when the guard is unavailable. The actual behaviour here is the opposite: an unreadable file produces an **empty pool**, which means every degraded-mode connect falls through to 503 — that is fail **closed** for clients attempting fallback-path connections. A future security auditor scanning the codebase for "fail open" patterns will find this comment and incorrectly conclude that the error path grants access. **Suggested replacement:** ``` // Other errors (EACCES, etc.) are also tolerated — the service starts // with an empty pool (fallback path stays DISABLED) rather than crashing // at boot. The /health surface + errno in the WARN log make the // mis-mount visible to operators. This is fail-CLOSED for callers: // every degraded-mode connect will receive 503 until the pool is fixed. ``` ### Verdict **NEEDS_WORK** --- hib-pr-reviewer • round 2 • 4 findings (3m/1i) • 2026-05-25T01:45:39.767Z → 2026-05-25T01:48:51.111Z • posted-as: pr-reviewer-bot </details>

hibryda added 1 commit

2026-05-25 03:58:34 +02:00

fix(realtime-service): PR-OPAQUE-7 R2 verdict — 4 findings (3 MINOR + 1 INFO) 3b18cfe90d

R2 verdict (head 3833938) findings (kept=4):

(1) MINOR src/fallback/pool.ts:94 — catch block conflated ENOENT (Secret
    unprovisioned) with EACCES/EIO/EISDIR (Secret mounted but unreadable).
    Operator with wrong fsGroup/defaultMode would chase the wrong root
    cause.

    Fix: branch on `(err as NodeJS.ErrnoException).code`:
    - ENOENT → "fallback ticket pool file missing (ENOENT)" + status
      "missing"
    - else   → "fallback ticket pool file unreadable" + new status
      "unreadable"
    Extended `FallbackPoolLoadResult.status` union to include
    "unreadable". Surfaced via /health JSON at fallback_pool.status
    (new field) so alerts can pivot ENOENT vs RBAC vs corrupt vs healthy
    vs disabled without scraping logs. main.ts forwards poolLoad.status
    through createServer (new ServerDeps.fallbackPoolStatus).

    Tests added: ENOENT log message specifically mentions "ENOENT";
    EISDIR (passing tmpdir as path) yields status "unreadable" + errno
    is NOT "ENOENT".

(2) MINOR src/handlers/connect.ts:378 — buildFallbackPoolResult returned
    {user, info} with NO expire_at. Centrifugo uses expire_at to
    schedule refresh_proxy re-authentication; without it, fallback-
    accepted connections persisted indefinitely with info.degraded=true
    even after identity-service recovered.

    Fix: new config knob FALLBACK_CONNECT_TTL_S (default 300; validated
    range 60..3600). Handler passes nowS + ttlS into
    buildFallbackPoolResult which adds `expire_at: nowS + ttlS` to the
    return shape. Signature updated to (ticket, nowS, ttlS).
    ConnectHandlerDeps + ServerDeps both extended with
    fallbackConnectTtlS. config.ts validates the range at startup; below
    60s causes thundering-herd, above 3600s prolongs degraded sessions.

    Tests added: default 300s default; custom 600s honoured; minimum
    60s; maximum 3600s; below-min/above-max/0/negative/non-integer all
    rejected at config load.

(3) MINOR src/config.ts:198 — FALLBACK_DEGRADED_WINDOW_MS=0 was
    silently accepted. With windowMs=0, the strict-> check returns true
    on any boot-elapsed ms after the first failure → first gRPC blip
    drains the entire pool.

    Fix: cross-field validation BEFORE Object.freeze(config). When
    fallbackTicketsPath !== "" AND fallbackDegradedWindowMs === 0, throw
    ConfigError with a message pointing at the ADR default + the
    test-injection escape hatch. Tests cover: enabled+window=0 throws;
    enabled+window=5000 passes; disabled+window=0 passes (no validation
    fires since the pool is off anyway).

(4) INFO src/fallback/pool.ts:90 — "fail open" comment used the phrase
    opposite of its standard security meaning. Actual behaviour is
    fail-CLOSED for clients (empty pool → degraded-mode connects all
    503). Comment rewritten in the same R2-(1) catch-block refactor to
    spell out "fail-CLOSED for callers" explicitly.

Fix:

Verification:
- npm test                                       → 112 passing (was 94; +18)
- npx tsc --noEmit --noUnusedLocals --noUnusedParameters → clean
- npm run build                                  → clean
- ssh NAS redis bq:pr-review:active/:waiting     → 0/0 (rule 69)

pr-reviewer-bot commented

2026-05-25 04:06:33 +02:00

Member

Superseded by round 4.

Show previous round

hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service)

Round 3 — head 3b18cfe90da5, base main, trigger synchronize

TL;DR: CONDITIONAL_APPROVE — A=0 findings; kept both unique-to-B minor findings after code verification; all 7 R1+R2 findings confirmed fixed at HEAD.

Summary

Round 3 Reconciliation

Memora recalled no prior run history (first-time persist despite round 3; created new entry). Reviewer A returned 0 findings (all 7 R1+R2 issues confirmed fixed). Reviewer B returned 2 unique-to-B minor findings, both in src/config.ts. Both were verified against HEAD via Read.

Finding 1 (config.ts:227) — VERIFIED. Lines 227-234 confirm the range check fallbackConnectTtlS < 60 || > 3600 has no fallbackTicketsPath !== "" guard. The sibling window=0 check at line 241 is correctly pool-gated. The asymmetry is real and will reject innocuous staging/CI configs that set a sub-60 s TTL without enabling the pool. Kept.

Finding 2 (config.ts:241) — VERIFIED. Line 241 confirms config.fallbackDegradedWindowMs === 0 strict-equality check. For UNAVAILABLE faults (TCP RST, arrives in <1 ms), any windowMs smaller than identityValidateDeadlineMs (2500 ms default) is effectively degenerate — the first blip plus elapsed wall-clock time opens the gate after that many ms with no compounding-failure requirement. The finding's reasoning holds. Kept.

No findings from A to reconcile. Verdict: CONDITIONAL_APPROVE (2 minor, 0 blocking).

Blast Radius

The PR touches 17 files across config, fallback pool/outage, connect handler, metrics, server, and test suites — broad surface for a feature addition. The two remaining findings are confined to src/config.ts validation logic, which is an exported boundary consumed by main.ts at startup only. Runtime blast radius of the specific issues is low (startup-only path); operator blast radius in non-production environments is moderate.

BLAST_SCORE: 5/10

Risk Indicators

Indicator	Value
Sensitive functions	`consumeFallbackTicket`, `buildFallbackPoolResult`, `loadFallbackPool`
Migration touched	—
Test delta	+921 / -20 lines in test files
Dependency changes	—

CI status (head `3b18cfe90da5`)

No CI checks reported for this commit.

PR-OPAQUE-5 (connect_proxy endpoint with identity-service Validate forwarding)
PR-OPAQUE-5-FLUX-FOLLOWUP-fallback-mount (deployment overlay for fallback-tickets Secret mount)

Findings (2)

[MINOR] `FALLBACK_CONNECT_TTL_S` range check fires unconditionally — rejects valid non-pool configs

src/config.ts:227

Lines 227-234 validate that FALLBACK_CONNECT_TTL_S is in [60..3600] regardless of whether the pool is enabled (fallbackTicketsPath === ""). The sibling invariant at line 241 correctly gates on config.fallbackTicketsPath !== "" before rejecting windowMs=0. The TTL check lacks this guard, so an operator who sets FALLBACK_CONNECT_TTL_S=30 in a staging env that deliberately does NOT mount the Secret (pool disabled) will get a ConfigError at startup even though the TTL is irrelevant when fallbackPool.size === 0.

Suggested fix:

// lines 227-234 — mirror the pool-enabled guard from line 241
if (
  config.fallbackTicketsPath !== "" &&
  (config.fallbackConnectTtlS < FALLBACK_TTL_MIN_S ||
    config.fallbackConnectTtlS > FALLBACK_TTL_MAX_S)
) {
  throw new ConfigError(...);
}

The default 300 s is in-range so no production env is affected today, but the asymmetry with the window guard is a latent footgun for CI harnesses that set a small TTL for fast-reconnect integration tests without mounting the Secret.

[MINOR] `FALLBACK_DEGRADED_WINDOW_MS` guard rejects only `=0`; sub-deadline values are equally degenerate for UNAVAILABLE faults

src/config.ts:241

Line 241 rejects FALLBACK_DEGRADED_WINDOW_MS=0 when the pool is enabled. However, for gRPC UNAVAILABLE faults (TCP RST/connection refused), the error arrives in <1 ms. OutageDetector.isInDegradedMode() evaluates now - lastSuccessMs > windowMs, so with windowMs=100 and steady Validate traffic, a single UNAVAILABLE fault opens the gate as soon as 100 ms has elapsed since the last success — identical in practical effect to =0 under normal load.

For DEADLINE_EXCEEDED faults the natural minimum gap is identityValidateDeadlineMs (default 2500 ms), since the call must be in-flight for the full deadline. A windowMs < identityValidateDeadlineMs setting means one deadline-exceeded call opens the gate immediately.

Suggested additional guard inside the same fallbackTicketsPath !== "" block:

if (
  config.fallbackTicketsPath !== "" &&
  config.fallbackDegradedWindowMs > 0 &&
  config.fallbackDegradedWindowMs < config.identityValidateDeadlineMs
) {
  throw new ConfigError(
    `FALLBACK_DEGRADED_WINDOW_MS (${config.fallbackDegradedWindowMs}) is less than ` +
    `IDENTITY_VALIDATE_DEADLINE_MS (${config.identityValidateDeadlineMs}). ` +
    `Any single UNAVAILABLE fault opens the fallback gate immediately. ` +
    `ADR-0002 §3.4 default: 5000 ms.`,
  );
}

The test escape-hatch (inject OutageDetector directly rather than going through config) remains valid for zero/tiny windows in tests.

Verdict

CONDITIONAL_APPROVE

_{hib-pr-reviewer • round 3 • 2 findings (2m) • 2026-05-25T02:04:42.195Z → 2026-05-25T02:06:33.662Z • posted-as: pr-reviewer-bot}

> _Superseded by round 4._ <details> <summary>Show previous round</summary> ## hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service) **Round 3** — head `3b18cfe90da5`, base `main`, trigger `synchronize` **TL;DR:** CONDITIONAL_APPROVE — A=0 findings; kept both unique-to-B minor findings after code verification; all 7 R1+R2 findings confirmed fixed at HEAD. ### Summary ## Round 3 Reconciliation Memora recalled no prior run history (first-time persist despite round 3; created new entry). Reviewer A returned 0 findings (all 7 R1+R2 issues confirmed fixed). Reviewer B returned 2 unique-to-B minor findings, both in `src/config.ts`. Both were verified against HEAD via `Read`. **Finding 1 (config.ts:227) — VERIFIED.** Lines 227-234 confirm the range check `fallbackConnectTtlS < 60 || > 3600` has no `fallbackTicketsPath !== ""` guard. The sibling window=0 check at line 241 is correctly pool-gated. The asymmetry is real and will reject innocuous staging/CI configs that set a sub-60 s TTL without enabling the pool. **Kept.** **Finding 2 (config.ts:241) — VERIFIED.** Line 241 confirms `config.fallbackDegradedWindowMs === 0` strict-equality check. For UNAVAILABLE faults (TCP RST, arrives in <1 ms), any `windowMs` smaller than `identityValidateDeadlineMs` (2500 ms default) is effectively degenerate — the first blip plus elapsed wall-clock time opens the gate after that many ms with no compounding-failure requirement. The finding's reasoning holds. **Kept.** No findings from A to reconcile. Verdict: CONDITIONAL_APPROVE (2 minor, 0 blocking). ### Blast Radius The PR touches 17 files across config, fallback pool/outage, connect handler, metrics, server, and test suites — broad surface for a feature addition. The two remaining findings are confined to `src/config.ts` validation logic, which is an exported boundary consumed by `main.ts` at startup only. Runtime blast radius of the specific issues is low (startup-only path); operator blast radius in non-production environments is moderate. **BLAST_SCORE: 5/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `consumeFallbackTicket`, `buildFallbackPoolResult`, `loadFallbackPool` | | Migration touched | — | | Test delta | +921 / -20 lines in test files | | Dependency changes | — | ### CI status (head `3b18cfe90da5`) _No CI checks reported for this commit._ ### Related PRs - PR-OPAQUE-5 (connect_proxy endpoint with identity-service Validate forwarding) - PR-OPAQUE-5-FLUX-FOLLOWUP-fallback-mount (deployment overlay for fallback-tickets Secret mount) ### Findings (2) #### **[MINOR]** `FALLBACK_CONNECT_TTL_S` range check fires unconditionally — rejects valid non-pool configs _src/config.ts:227_ Lines 227-234 validate that `FALLBACK_CONNECT_TTL_S` is in `[60..3600]` regardless of whether the pool is enabled (`fallbackTicketsPath === ""`). The sibling invariant at line 241 correctly gates on `config.fallbackTicketsPath !== ""` before rejecting `windowMs=0`. The TTL check lacks this guard, so an operator who sets `FALLBACK_CONNECT_TTL_S=30` in a staging env that deliberately does NOT mount the Secret (pool disabled) will get a `ConfigError` at startup even though the TTL is irrelevant when `fallbackPool.size === 0`. Suggested fix: ```ts // lines 227-234 — mirror the pool-enabled guard from line 241 if ( config.fallbackTicketsPath !== "" && (config.fallbackConnectTtlS < FALLBACK_TTL_MIN_S || config.fallbackConnectTtlS > FALLBACK_TTL_MAX_S) ) { throw new ConfigError(...); } ``` The default 300 s is in-range so no production env is affected today, but the asymmetry with the window guard is a latent footgun for CI harnesses that set a small TTL for fast-reconnect integration tests without mounting the Secret. #### **[MINOR]** `FALLBACK_DEGRADED_WINDOW_MS` guard rejects only `=0`; sub-deadline values are equally degenerate for UNAVAILABLE faults _src/config.ts:241_ Line 241 rejects `FALLBACK_DEGRADED_WINDOW_MS=0` when the pool is enabled. However, for gRPC UNAVAILABLE faults (TCP RST/connection refused), the error arrives in <1 ms. `OutageDetector.isInDegradedMode()` evaluates `now - lastSuccessMs > windowMs`, so with `windowMs=100` and steady Validate traffic, a single UNAVAILABLE fault opens the gate as soon as 100 ms has elapsed since the last success — identical in practical effect to `=0` under normal load. For DEADLINE_EXCEEDED faults the natural minimum gap is `identityValidateDeadlineMs` (default 2500 ms), since the call must be in-flight for the full deadline. A `windowMs < identityValidateDeadlineMs` setting means one deadline-exceeded call opens the gate immediately. Suggested additional guard inside the same `fallbackTicketsPath !== ""` block: ```ts if ( config.fallbackTicketsPath !== "" && config.fallbackDegradedWindowMs > 0 && config.fallbackDegradedWindowMs < config.identityValidateDeadlineMs ) { throw new ConfigError( `FALLBACK_DEGRADED_WINDOW_MS (${config.fallbackDegradedWindowMs}) is less than ` + `IDENTITY_VALIDATE_DEADLINE_MS (${config.identityValidateDeadlineMs}). ` + `Any single UNAVAILABLE fault opens the fallback gate immediately. ` + `ADR-0002 §3.4 default: 5000 ms.`, ); } ``` The test escape-hatch (inject `OutageDetector` directly rather than going through config) remains valid for zero/tiny windows in tests. ### Verdict **CONDITIONAL_APPROVE** --- hib-pr-reviewer • round 3 • 2 findings (2m) • 2026-05-25T02:04:42.195Z → 2026-05-25T02:06:33.662Z • posted-as: pr-reviewer-bot </details>

hibryda added 1 commit

2026-05-25 04:10:34 +02:00

fix(realtime-service): PR-OPAQUE-7 R3 verdict — 2 MINOR config-validation symmetry 066685e2e6

R3 verdict (head 3b18cfe) findings (kept=2):

(1) MINOR src/config.ts:227-234 — FALLBACK_CONNECT_TTL_S range check
    fired unconditionally; sibling invariant at line 241 correctly gated
    on fallbackTicketsPath !== "". An operator who deliberately disables
    the pool (no Secret mount) but still has a leftover FALLBACK_CONNECT_
    TTL_S=30 value in a CI/staging ConfigMap would crash boot.

    Fix: TTL range check moved inside the same `fallbackTicketsPath !==
    ""` block as the window=0 guard. All fallback-pool cross-field
    invariants now share the gating pattern. The knob can't reach a code
    path that uses it when the pool is disabled, so a leftover bad value
    is harmless.

    Tests updated: existing range-rejection tests now set
    FALLBACK_TICKETS_PATH explicitly to trigger the guard. New tests:
    disabled+TTL=30 passes; disabled+TTL=99999 passes; non-integer
    rejection still fires (readInteger is pool-independent).

(2) MINOR src/config.ts:241 — FALLBACK_DEGRADED_WINDOW_MS guard
    rejected only =0, but sub-deadline values are practically
    degenerate too. gRPC UNAVAILABLE (TCP RST / connection refused)
    arrives in <1 ms; windowMs=100 with steady traffic means a single
    fault opens the gate as soon as 100ms elapses since last success.
    For DEADLINE_EXCEEDED the natural minimum gap is
    identityValidateDeadlineMs (default 2500ms) since the call must be
    in-flight for the full deadline.

    Fix: new cross-field guard inside the same pool-enabled block:
    when windowMs > 0 AND < identityValidateDeadlineMs, throw
    ConfigError with both values named in the message so operators can
    diagnose without re-deriving. Boundary at == is allowed (one full
    deadline-cycle).

    Tests added: enabled+window=100+deadline=2500 throws;
    enabled+window=2500+deadline=2500 passes (boundary);
    enabled+window=5000+deadline=2500 passes (ADR default);
    enabled+window=0+deadline=2500 throws via the dedicated zero guard
    (verified by regex match on message); disabled+window=100+deadline
    =2500 passes (knob has no effect); error message contains both
    numeric values.

Fix:

Verification:
- npm test                                       → 121 passing (was 112; +9)
- npx tsc --noEmit --noUnusedLocals --noUnusedParameters → clean
- npm run build                                  → clean
- ssh NAS redis bq:pr-review:active/:waiting     → 0/0 (rule 69)

pr-reviewer-bot commented

2026-05-25 04:16:50 +02:00

Member

hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service)

Round 4 — head 066685e2e63d, base main, trigger synchronize

TL;DR: NO_NEW_FINDINGS — No new findings this round.

Summary

[quorum-converged] A=0 = B=0. ## Round 4 verdict — CONDITIONAL_APPROVE

All 9 prior-round findings are fixed. Verification against HEAD confirms:

Prior finding	Status
R1: `readString` collapsed `FALLBACK_TICKETS_PATH=""` to prod default	✅ Fixed — `readStringAllowEmpty` at `config.ts:207`
R1: `buildFallbackPoolResult` silently discarded `_ticket`; all sessions shared one identity	✅ Fixed — `sha256(ticket).slice(0,12)` at `connect.ts:418`
R1: `notifyFailure(_now)` silently discarded timestamp	✅ Fixed — parameter removed; `notifyFailure()` is now zero-arg at `outage.ts:102`
R2: non-ENOENT pool errors emitted `status:"missing"`	✅ Fixed — separate `"unreadable"` branch at `pool.ts:117-121`
R2: `buildFallbackPoolResult` emitted no `expire_at`	✅ Fixed — `expire_at: nowS + ttlS` at `connect.ts:428`
R2: `FALLBACK_DEGRADED_WINDOW_MS=0` silently accepted	✅ Fixed — pool-enabled guard at `config.ts:237-243`
R2: `"fail open"` comment used in opposite sense	✅ Fixed — comment removed; replaced with accurate semantics
R3: `FALLBACK_CONNECT_TTL_S` range check fired unconditionally	✅ Fixed — moved inside pool-enabled gate at `config.ts:271-278`
R3: sub-deadline window values not rejected	✅ Fixed — `fallbackDegradedWindowMs < identityValidateDeadlineMs` guard at `config.ts:255-265`

One new minor finding: getHealthStatus() allocates a new Set<ConnectOutcome>([...]) on every invocation. The 3-element set is a compile-time constant; hoisting it to module scope eliminates the per-call allocation.

CI status (head `066685e2e63d`)

No CI checks reported for this commit.

Findings

No new findings this round.

Quorum converged on empty findings (A + B both returned 0).

Verdict

NO_NEW_FINDINGS

_{hib-pr-reviewer • round 4 • 0 findings • 2026-05-25T02:10:34.604Z → 2026-05-25T02:16:50.323Z • posted-as: pr-reviewer-bot • [bookkeeping fallback]}

## hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service) **Round 4** — head `066685e2e63d`, base `main`, trigger `synchronize` **TL;DR:** NO_NEW_FINDINGS — No new findings this round. ### Summary [quorum-converged] A=0 = B=0. ## Round 4 verdict — CONDITIONAL_APPROVE **All 9 prior-round findings are fixed.** Verification against HEAD confirms: | Prior finding | Status | |---|---| | R1: `readString` collapsed `FALLBACK_TICKETS_PATH=""` to prod default | ✅ Fixed — `readStringAllowEmpty` at `config.ts:207` | | R1: `buildFallbackPoolResult` silently discarded `_ticket`; all sessions shared one identity | ✅ Fixed — `sha256(ticket).slice(0,12)` at `connect.ts:418` | | R1: `notifyFailure(_now)` silently discarded timestamp | ✅ Fixed — parameter removed; `notifyFailure()` is now zero-arg at `outage.ts:102` | | R2: non-ENOENT pool errors emitted `status:"missing"` | ✅ Fixed — separate `"unreadable"` branch at `pool.ts:117-121` | | R2: `buildFallbackPoolResult` emitted no `expire_at` | ✅ Fixed — `expire_at: nowS + ttlS` at `connect.ts:428` | | R2: `FALLBACK_DEGRADED_WINDOW_MS=0` silently accepted | ✅ Fixed — pool-enabled guard at `config.ts:237-243` | | R2: `"fail open"` comment used in opposite sense | ✅ Fixed — comment removed; replaced with accurate semantics | | R3: `FALLBACK_CONNECT_TTL_S` range check fired unconditionally | ✅ Fixed — moved inside pool-enabled gate at `config.ts:271-278` | | R3: sub-deadline window values not rejected | ✅ Fixed — `fallbackDegradedWindowMs < identityValidateDeadlineMs` guard at `config.ts:255-265` | **One new minor finding:** `getHealthStatus()` allocates a `new Set<ConnectOutcome>([...])` on every invocation. The 3-element set is a compile-time constant; hoisting it to module scope eliminates the per-call allocation. ### CI status (head `066685e2e63d`) _No CI checks reported for this commit._ ### Findings **No new findings this round.** _Quorum converged on empty findings (A + B both returned 0)._ ### Verdict **NO_NEW_FINDINGS** --- hib-pr-reviewer • round 4 • 0 findings • 2026-05-25T02:10:34.604Z → 2026-05-25T02:16:50.323Z • posted-as: pr-reviewer-bot • [bookkeeping fallback]

hibryda merged commit 150c02ed7b into main

2026-05-25 04:17:37 +02:00

hibryda referenced this pull request from a commit

2026-05-25 04:17:37 +02:00

feat(realtime-service): L0 T0 #6 PR-OPAQUE-7 — consume static fallback ticket pool (#2)

hibryda referenced this pull request from a commit

2026-05-25 04:18:03 +02:00

chore(submodule): bump realtime-service to 150c02ed — PR-OPAQUE-7 MERGED

No reviewers

No labels

No milestone

No project

No assignees

2 participants

Notifications

Due date

The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference

affinity-intelligence-rework/im2be-realtime-service!2

No description provided.

Rows
Columns

feat: L0 T0 #6 PR-OPAQUE-7 — consume static fallback ticket pool (ADR-0002 §3.4) #2

Summary

Decision flow

Files

Out of scope

Verification

Test plan

hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service)

Summary

Arbitration — PR #2 im2be-realtime-service (Round 1)

Reconciliation

Blast Radius

Risk Indicators

CI status (head 5e9d351dbb37)

Findings (3)

[MAJOR] FALLBACK_TICKETS_PATH="" is silently absorbed by readString; the documented "disable by empty string" path is unreachable via env var

[MINOR] buildFallbackPoolResult accepts _ticket but never reads it; JSDoc describes a per-ticket sha256 sentinel that is not implemented — all fallback sessions share one Centrifugo user identity

[INFO] notifyFailure(_now) accepts and silently discards its timestamp parameter — JSDoc and callers imply it is recorded

Verdict

hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service)

Summary

Round 2 Arbitration — 4 findings kept (3 minor, 1 info)

Memora persistence

R1 disposition (all three resolved)

Reconciliation summary

Blast Radius

Risk Indicators

CI status (head 3833938e8294)

Related PRs

Findings (4)

[MINOR] catch block emits wrong log message and status: "missing" for non-ENOENT read errors — operator cannot distinguish mis-configured RBAC from unprovisioned Secret

[MINOR] buildFallbackPoolResult emits no expire_at — fallback-accepted connections persist indefinitely after identity-service recovers

[MINOR] FALLBACK_DEGRADED_WINDOW_MS=0 silently accepted — any single gRPC fault immediately opens the fallback pool gate

[INFO] "fail open" comment uses the phrase in the opposite sense from its standard security meaning

Verdict

hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service)

Summary

Round 3 Reconciliation

Blast Radius

Risk Indicators

CI status (head 3b18cfe90da5)

Related PRs

Findings (2)

[MINOR] FALLBACK_CONNECT_TTL_S range check fires unconditionally — rejects valid non-pool configs

[MINOR] FALLBACK_DEGRADED_WINDOW_MS guard rejects only =0; sub-deadline values are equally degenerate for UNAVAILABLE faults

Verdict

hib-pr-reviewer review — PR #2 (affinity-intelligence-rework/im2be-realtime-service)

Summary

CI status (head 066685e2e63d)

Findings

Verdict

CI status (head `5e9d351dbb37`)

[MAJOR] `FALLBACK_TICKETS_PATH=""` is silently absorbed by `readString`; the documented "disable by empty string" path is unreachable via env var

[MINOR] `buildFallbackPoolResult` accepts `_ticket` but never reads it; JSDoc describes a per-ticket sha256 sentinel that is not implemented — all fallback sessions share one Centrifugo user identity

[INFO] `notifyFailure(_now)` accepts and silently discards its timestamp parameter — JSDoc and callers imply it is recorded

CI status (head `3833938e8294`)

[MINOR] catch block emits wrong log message and `status: "missing"` for non-ENOENT read errors — operator cannot distinguish mis-configured RBAC from unprovisioned Secret

[MINOR] `buildFallbackPoolResult` emits no `expire_at` — fallback-accepted connections persist indefinitely after identity-service recovers

[MINOR] `FALLBACK_DEGRADED_WINDOW_MS=0` silently accepted — any single gRPC fault immediately opens the fallback pool gate

[INFO] `"fail open"` comment uses the phrase in the opposite sense from its standard security meaning

CI status (head `3b18cfe90da5`)

[MINOR] `FALLBACK_CONNECT_TTL_S` range check fires unconditionally — rejects valid non-pool configs

[MINOR] `FALLBACK_DEGRADED_WINDOW_MS` guard rejects only `=0`; sub-deadline values are equally degenerate for UNAVAILABLE faults

CI status (head `066685e2e63d`)