feat(grpc): L0 T0 #6 PR-OPAQUE-3 — ConnectProxyService.Validate (Redis GETDEL atomic + CRC32 pre-check) #3

Merged
hibryda merged 7 commits from feat/l0-t0-opaque-3-validate-rpc into main 2026-05-24 17:03:20 +02:00
Owner

Header

Repo: affinity-intelligence-rework/im2be-identity-service
PR: TBD (this PR)
Round: R0 — initial submission
Branch: feat/l0-t0-opaque-3-validate-rpc (off main, includes PR-OPAQUE-1 #1 merged state — does NOT depend on PR-OPAQUE-2 #2)
Worktree: identity-service-validate-rpc/ (sibling worktree)

TL;DR

APPROVE-CANDIDATE: Replaces the ConnectProxyService.Validate UNIMPLEMENTED stub with the full ADR-0002 §3 validate path — CRC32 pre-check, Redis GETDEL atomic single-use, Jackson-deserialise user context, project to ConnectValidateResponse.Success. 30/30 unit tests pass, zero warnings, BUILD SUCCESS.

Summary

Scope of this round = the entire PR (R0). Covers the four-commit split:

  1. feat(ticket): TicketChecksumVerifier — CRC32 + base64url helper (dd167ba)
  2. feat(grpc): wire ConnectProxyService.Validate — Redis GETDEL atomic (3d68318)
  3. feat(grpc): OTel span + counter + error events for validate flow (ddf94ee)
  4. test(grpc): concurrent-validate smoke (GETDEL single-use enforcement) (b9dcab6)

Scopes verified vs deferred:

  • In scope (this PR): ConnectProxyService.Validate logic, CRC32 verifier, Redis GETDEL, OTel span + counter, error events for transient/terminal paths, concurrent-validate smoke.
  • Deferred (later PRs): mint endpoint (PR-OPAQUE-2 — parallel sub-agent's PR #2), JWT-reuse (PR-OPAQUE-4), realtime-service client (PR-OPAQUE-5), qa-rig E2E (PR-OPAQUE-6), fallback-pool consumption + degraded=true (PR-OPAQUE-7). degraded hardcoded to false in this PR.

Last reviewed SHA: b9dcab6 (HEAD of branch at PR open time).

Findings

No new findings this round — this is R0 / initial submission. Verifications carried out before opening:

Sev File:Line Title Body
INFO grpc/ConnectProxyServiceImpl.java Filter chain ripple inspection (rule 63) gRPC server runs on port 50051 (net.devh starter) — does NOT pass through Spring Security HttpSecurity filter chain. Centrifugo presents ticket as request field, NOT JWT Bearer header. mTLS peer auth enforced one layer up by Envoy sidecar (PR-OPAQUE-3-follow-up: SPIFFE workload-API + ServerInterceptor).
INFO config/JwksClient.java:147-160 R0-deferred JwksClient double-fetch fix verified intact PR #1 R3 fix (freshlyRefreshed flag) confirmed present at the cited lines. No regression. PR-OPAQUE-3 does not touch the JWT/JWKS path.
INFO grpc/ConnectProxyServiceImpl.java No coupling to PR-OPAQUE-2 mint code Authored own TicketChecksumVerifier.java (NOT colliding with PR-OPAQUE-2's likely class name). computeChecksumBigEndian() exposed as public static so PR-OPAQUE-2 can reuse identical byte-order semantics; a follow-up cleanup PR can de-dup if both PRs land with parallel implementations.
INFO Visibility doValidate / Outcome / ValidateOutcome / SPAN_NAME / COUNTER_NAME are public Exposed for unit-test access from sibling unit/grpc package + dashboard pinning. Production callers go through the gRPC observer entry point.

Risk Indicators

Indicator Count Notes
Sensitive functions touched 1 ConnectProxyService.Validate — the entire ticket-validation gate; covered by 21 new tests (8 unit + 4 OTel + 1 concurrent + 4 context + new context-load coverage).
Migration files touched 0 None.
Test delta +21 / -0 TicketChecksumVerifier 12, TicketUserContext 4, ConnectProxyServiceImpl 8, OTel 4, Concurrency 1 — minus an existing reuse offset. 30/30 pass total.
Dependency changes 0 No new pom.xml entries; uses existing Spring Data Redis + OTel SDK.

Blast Radius

Score: 3/10 — scope is localised to grpc/ConnectProxyServiceImpl.java + new ticket/ package. Cross-repo impact:

  • Centrifugo cluster (separate repo) will start receiving real responses instead of UNIMPLEMENTED — but this is the intended effect per ADR-0002 §3. No behaviour change for Centrifugo until PR-OPAQUE-2 merges and mint starts populating Redis with real tickets.
  • realtime-service (separate repo, PR-OPAQUE-5) will eventually consume the validated user context — out of scope this PR.
  • PWA + admin-panel (cross-repo) unaffected — they don't call gRPC directly.

Verdict

APPROVE-CANDIDATE on:

  • Functional contract per ADR-0002 §3 + planning §3.3 (CRC32 → GETDEL → deserialise → project).
  • Rule 62 zero warnings + zero errors (docker maven:3.8.4-openjdk-17 install → BUILD SUCCESS, no [WARNING] lines).
  • Rule 63 ripple inspection completed (gRPC vs Servlet filter-chain separation).
  • Rule 61 doc-grounded verification (Centrifugo v6.7.x proxy contract, ADR-0002 ticket wire format, Spring Data Redis 3.x ValueOperations.getAndDelete GETDEL mapping).
  • Single-use semantics empirically verified by concurrency smoke (1 000 tickets × 10 threads × 1 000 calls each → exactly 1 000 successes, 9 000 ticket-not-founds, zero duplicates).

Conditions / carried-forward items:

  • degraded=false hardcoded for now; PR-OPAQUE-7 wires the fallback-pool path.
  • The PR's TicketChecksumVerifier may overlap with PR-OPAQUE-2's mint-side checksum helper once that PR lands. De-dup follow-up PR if both ship.
  • Forward-compat assumption: PR-OPAQUE-2's mint must serialise to the JSON shape documented in TicketUserContext (see ADR-0002 §3.3 wire shape). Verified at design-time; integration-test will lock the cross-PR contract at qa-rig E2E (PR-OPAQUE-6).

Project: im2be-identity-service • Mode: Stage B / L0 T0 #6Round: R0 / initial submission • Tally: 0 resolved / 0 new / 0 carried-forward • Timestamp: 2026-05-24

Authored by Claude on behalf of @hibryda per Stage D PR-OPAQUE-3 directive.

## Header **Repo:** affinity-intelligence-rework/im2be-identity-service **PR:** TBD (this PR) **Round:** R0 — initial submission **Branch:** feat/l0-t0-opaque-3-validate-rpc (off main, includes PR-OPAQUE-1 #1 merged state — does NOT depend on PR-OPAQUE-2 #2) **Worktree:** identity-service-validate-rpc/ (sibling worktree) ## TL;DR **APPROVE-CANDIDATE:** Replaces the `ConnectProxyService.Validate` UNIMPLEMENTED stub with the full ADR-0002 §3 validate path — CRC32 pre-check, Redis GETDEL atomic single-use, Jackson-deserialise user context, project to `ConnectValidateResponse.Success`. 30/30 unit tests pass, zero warnings, BUILD SUCCESS. ## Summary Scope of this round = the entire PR (R0). Covers the four-commit split: 1. `feat(ticket): TicketChecksumVerifier — CRC32 + base64url helper` (dd167ba) 2. `feat(grpc): wire ConnectProxyService.Validate — Redis GETDEL atomic` (3d68318) 3. `feat(grpc): OTel span + counter + error events for validate flow` (ddf94ee) 4. `test(grpc): concurrent-validate smoke (GETDEL single-use enforcement)` (b9dcab6) Scopes verified vs deferred: - **In scope (this PR):** ConnectProxyService.Validate logic, CRC32 verifier, Redis GETDEL, OTel span + counter, error events for transient/terminal paths, concurrent-validate smoke. - **Deferred (later PRs):** mint endpoint (PR-OPAQUE-2 — parallel sub-agent's PR #2), JWT-reuse (PR-OPAQUE-4), realtime-service client (PR-OPAQUE-5), qa-rig E2E (PR-OPAQUE-6), fallback-pool consumption + `degraded=true` (PR-OPAQUE-7). `degraded` hardcoded to false in this PR. Last reviewed SHA: b9dcab6 (HEAD of branch at PR open time). ## Findings **No new findings this round** — this is R0 / initial submission. Verifications carried out before opening: | Sev | File:Line | Title | Body | |-----|-----------|-------|------| | INFO | grpc/ConnectProxyServiceImpl.java | Filter chain ripple inspection (rule 63) | gRPC server runs on port 50051 (net.devh starter) — does NOT pass through Spring Security `HttpSecurity` filter chain. Centrifugo presents ticket as request field, NOT JWT Bearer header. mTLS peer auth enforced one layer up by Envoy sidecar (PR-OPAQUE-3-follow-up: SPIFFE workload-API + ServerInterceptor). | | INFO | config/JwksClient.java:147-160 | R0-deferred JwksClient double-fetch fix verified intact | PR #1 R3 fix (`freshlyRefreshed` flag) confirmed present at the cited lines. No regression. PR-OPAQUE-3 does not touch the JWT/JWKS path. | | INFO | grpc/ConnectProxyServiceImpl.java | No coupling to PR-OPAQUE-2 mint code | Authored own `TicketChecksumVerifier.java` (NOT colliding with PR-OPAQUE-2's likely class name). `computeChecksumBigEndian()` exposed as public static so PR-OPAQUE-2 can reuse identical byte-order semantics; a follow-up cleanup PR can de-dup if both PRs land with parallel implementations. | | INFO | Visibility | doValidate / Outcome / ValidateOutcome / SPAN_NAME / COUNTER_NAME are public | Exposed for unit-test access from sibling `unit/grpc` package + dashboard pinning. Production callers go through the gRPC observer entry point. | ## Risk Indicators | Indicator | Count | Notes | |-----------|-------|-------| | Sensitive functions touched | 1 | `ConnectProxyService.Validate` — the entire ticket-validation gate; covered by 21 new tests (8 unit + 4 OTel + 1 concurrent + 4 context + new context-load coverage). | | Migration files touched | 0 | None. | | Test delta | +21 / -0 | TicketChecksumVerifier 12, TicketUserContext 4, ConnectProxyServiceImpl 8, OTel 4, Concurrency 1 — minus an existing reuse offset. 30/30 pass total. | | Dependency changes | 0 | No new pom.xml entries; uses existing Spring Data Redis + OTel SDK. | ## Blast Radius **Score: 3/10** — scope is localised to `grpc/ConnectProxyServiceImpl.java` + new `ticket/` package. Cross-repo impact: - Centrifugo cluster (separate repo) will start receiving real responses instead of UNIMPLEMENTED — but this is the *intended* effect per ADR-0002 §3. No behaviour change for Centrifugo until PR-OPAQUE-2 merges and mint starts populating Redis with real tickets. - realtime-service (separate repo, PR-OPAQUE-5) will eventually consume the validated user context — out of scope this PR. - PWA + admin-panel (cross-repo) unaffected — they don't call gRPC directly. ## Verdict **APPROVE-CANDIDATE** on: - Functional contract per ADR-0002 §3 + planning §3.3 (CRC32 → GETDEL → deserialise → project). - Rule 62 zero warnings + zero errors (`docker maven:3.8.4-openjdk-17 install` → BUILD SUCCESS, no [WARNING] lines). - Rule 63 ripple inspection completed (gRPC vs Servlet filter-chain separation). - Rule 61 doc-grounded verification (Centrifugo v6.7.x proxy contract, ADR-0002 ticket wire format, Spring Data Redis 3.x `ValueOperations.getAndDelete` GETDEL mapping). - Single-use semantics empirically verified by concurrency smoke (1 000 tickets × 10 threads × 1 000 calls each → exactly 1 000 successes, 9 000 ticket-not-founds, zero duplicates). Conditions / carried-forward items: - `degraded=false` hardcoded for now; PR-OPAQUE-7 wires the fallback-pool path. - The PR's `TicketChecksumVerifier` may overlap with PR-OPAQUE-2's mint-side checksum helper once that PR lands. De-dup follow-up PR if both ship. - Forward-compat assumption: PR-OPAQUE-2's mint must serialise to the JSON shape documented in `TicketUserContext` (see ADR-0002 §3.3 wire shape). Verified at design-time; integration-test will lock the cross-PR contract at qa-rig E2E (PR-OPAQUE-6). ## Footer **Project:** im2be-identity-service • **Mode:** Stage B / L0 T0 #6 • **Round:** R0 / initial submission • **Tally:** 0 resolved / 0 new / 0 carried-forward • **Timestamp:** 2026-05-24 Authored by Claude on behalf of @hibryda per Stage D PR-OPAQUE-3 directive.
PR-OPAQUE-3 §3.2: pre-Redis integrity check for opaque connection
tickets.  Ticket wire format = base64url(24-byte random || 4-byte
big-endian CRC32(random)); total 28 raw bytes → 38 unpadded base64url
characters.

Static utility class — every method side-effect-free, no Spring
bean wiring. verifyChecksum(String) returns false on null/empty input,
malformed base64url decode, wrong payload length, or CRC mismatch
(does NOT throw — callers map a single boolean to a gRPC status
without an extra try/catch layer).

computeChecksumBigEndian(byte[]) is exposed as well so PR-OPAQUE-2's
mint path can reuse the exact same helper, eliminating the risk of
mint + validate disagreeing on byte order.

12 unit tests cover the contract:
  - happy path: freshly-built ticket via the helper round-trips through
    the verifier (single + 1 000-iteration variants)
  - padded base64url accepted as well as unpadded
  - null / empty input → false
  - malformed base64url (illegal chars) → false
  - payload too short / too long → false
  - random 0xDEADBEEF checksum → false
  - single bit-flip in entropy → false
  - single bit-flip in checksum → false
  - ENCODED_LENGTH constant matches actual unpadded encoding

Verification:
  - docker maven:3.8.4-openjdk-17 -P unit-tests test → 12/12 pass
    + IdentityServiceApplicationTest 1/1 pass (no regressions)
Replaces the PR-OPAQUE-1 UNIMPLEMENTED stub with the full ADR-0002 §3
validate flow:

  1. CRC32 pre-check via TicketChecksumVerifier — invalid tickets
     are rejected with a Disconnect WITHOUT touching Redis (spares
     a round-trip on the connect-flood path where Centrifugo would
     otherwise forward every junk ticket to identity-service).

  2. Redis GETDEL atomic on key  centrifugo🎫<base64url-ticket>
     via StringRedisTemplate.opsForValue().getAndDelete().  A single
     Redis command both reads the stored user-context JSON AND
     removes the row — this is the single-use enforcement gate.  A
     second connect attempt with the same ticket sees nil and is
     rejected.

  3. Jackson-deserialise the JSON blob into TicketUserContext (new
     record matching the ADR-0002 §3.3 wire shape: user_id, family_id,
     subscription_tier, parental_state, is_admin, expires_at).

  4. Project into ConnectValidateResponse.Success — Centrifugo
     receives the user context and proceeds with the WS upgrade.

Failure routing follows Centrifugo's status-code-transforms config:

  - INVALID_CRC | TICKET_NOT_FOUND | DESERIALIZE_ERROR → Disconnect
    (4401 UNAUTHENTICATED — terminal, do not retry).
  - REDIS_FAILURE → Error (transient code 14 / UNAVAILABLE; client
    SDK retries per the temporary=true envelope).

degraded=false is hardcoded for now; PR-OPAQUE-7 wires the static
fallback ticket pool (ADR-0002 §3.4) that surfaces degraded=true.

Out-of-scope (later PRs):
  - Mint path (PR-OPAQUE-2 — parallel sub-agent).
  - JWT-reuse (PR-OPAQUE-4).
  - realtime-service client (PR-OPAQUE-5).
  - qa-rig E2E (PR-OPAQUE-6).
  - Fallback pool consumption (PR-OPAQUE-7).

Ripple inspection (rule 63):

  - The gRPC server runs on port 50051 (net.devh starter) — it does
    NOT pass through Spring Security's HttpSecurity filter chain
    (which only binds to the Servlet stack on the HTTP port).
    Centrifugo presents the ticket as a request field, NOT as a JWT
    Bearer header — and that's exactly what the design wants.  mTLS
    peer auth is enforced one layer up by the Envoy sidecar; a future
    PR will wire the SPIFFE workload-API + ServerInterceptor for
    caller-identity assertion.

  - JwksClient.getPublicKeyByKid `freshlyRefreshed` guard from PR
    #1 R3 is intact (verified at config/JwksClient.java:147-160).
    PR-OPAQUE-3 doesn't touch the JWT/JWKS path.

  - PR-OPAQUE-2's TicketGenerator (parallel sub-agent) will reuse
    TicketChecksumVerifier.computeChecksumBigEndian() so mint +
    validate cannot disagree on byte order.  No coupling on the
    Redis-key prefix beyond the documented  centrifugo🎫
    convention (constant exported as TICKET_KEY_PREFIX).

Tests:

  - 4× TicketUserContextTest: canonical JSON round-trip; null
    family_id; admin flag preserved; unknown future fields tolerated.
  - 8× ConnectProxyServiceImplTest: happy path projects all fields;
    invalid CRC short-circuits Redis; empty ticket short-circuits
    Redis; ticket-not-found → Disconnect; Redis QueryTimeoutException
    → transient Error; corrupt stored JSON → Disconnect; canonical
    key prefix usage; unknown subscription tier maps to UNSPECIFIED.

Verification:

  - docker maven:3.8.4-openjdk-17 -P unit-tests test → 25/25 pass
    (12 verifier + 4 context + 8 grpc + 1 context-load).
  - Zero [WARNING] lines in the full mvn output.

Visibility note: doValidate(), ValidateOutcome, Outcome are exposed
as public so the unit test in the sibling com.aim2be.identity.unit.grpc
package can drive the logic without spinning up an in-process gRPC
channel.  Production callers go through the gRPC observer entry point.
Adds W2 OTel instrumentation to the Validate path (matches the
calendar / diary / family / notification-service convention — rule 10):

Span:
  - name: identity.validate_centrifugo_ticket (matches the documented
    contract in OtelTracers).
  - kind: SERVER.
  - attributes always set:
      * ticket.lookup_result ∈ {success | invalid-crc | ticket-not-found
        | deserialize-error | redis-failure}
      * degraded (false; PR-OPAQUE-7 flips this for fallback-pool path)
  - attribute set ONLY on SUCCESS AND when GDPR kill-switch is enabled:
      * user_id_hash (16-hex-char SHA-256 prefix; OtelErrorEvents.hashUserId)

  Kill-switch: otel.attributes.user-id-hash-enabled property (defaults
  to true, mirrors the same toggle already in application.properties).

Counter:
  - name: centrifugo_tickets_validated_total (Prometheus-canonical
    snake_case).
  - unit: 1.
  - label: outcome (kebab-case Outcome.label() — stable for dashboards;
    a change is a breaking-change requiring dashboard updates).
  - lazy-init via volatile + double-checked-locking, matching
    OtelErrorEvents' instance-cache pattern.  Without this the cached
    meter would pin the no-op meter present at class-load time and
    every increment would silently drop.

Error-event recording (PR-OPAQUE-3 entrypoints):
  - REDIS_FAILURE       → recordSanitizedEvent (TICKET_LOOKUP_FAILED,
    recoverable=true, downstream=redis).
  - DESERIALIZE_ERROR   → recordSanitizedEvent
    (TICKET_CONTEXT_DESERIALIZE_FAILED, recoverable=false,
    downstream=jackson).

  Sanitized variants are used because the cause chain may carry raw
  ticket material or stored-JSON fragments (W3 chatbot pattern,
  Memora #3144); recordSanitizedEvent skips Span.recordException to
  keep the exception.stacktrace attribute off OTLP exports.

Visibility:

  - SPAN_NAME, COUNTER_NAME exposed as public constants — pinned so
    dashboards + tests can reference the exact strings; changing the
    value would be a dashboard-breaking change.

Tests added:

  - 4× ConnectProxyServiceImplOtelTest (in-process OTel SDK via
    opentelemetry-sdk-testing's InMemoryMetricReader +
    InMemorySpanExporter):
      * success path → span attributes (ticket.lookup_result=success,
        degraded=false, user_id_hash set) + counter outcome=success.
      * invalid-crc → counter outcome=invalid-crc.
      * ticket-not-found → counter outcome=ticket-not-found.
      * GDPR kill-switch off → user_id_hash absent; ticket.lookup_result
        still set.

  DELTA aggregation temporality on InMemoryMetricReader so each test
  sees only its own counter increments without prior-test residue.

Constructor change:

  - ConnectProxyServiceImpl now takes a third @Value-resolved
    userIdHashEnabled boolean.  Existing test in
    ConnectProxyServiceImplTest updated to pass true.

Verification:

  - docker maven:3.8.4-openjdk-17 -P unit-tests test → 29/29 pass
    (12 verifier + 4 context + 8 grpc + 4 OTel + 1 context-load).
  - docker maven:3.8.4-openjdk-17 install → BUILD SUCCESS, zero
    [WARNING] lines.
Strict-load smoke confirming the GETDEL atomic single-use guarantee
under contended access — the entire load-bearing reason
ConnectProxyService.Validate exists.

Design:
  - Pre-seed 1 000 distinct, CRC-valid tickets into an in-memory
    ConcurrentHashMap-backed Redis surrogate (Mockito-bridged
    StringRedisTemplate → ConcurrentMap.remove, which provides the
    same atomic compare-and-remove contract as Redis GETDEL).
  - Launch 10 threads, each running through the full TOTAL_TICKETS
    pool — 10 000 validate attempts on a 1 000-ticket pool.  Race
    conditions on every overlapping key are guaranteed; the test
    relies on the underlying GETDEL atomicity to resolve them.

Strict assertions (a regression in either direction fails the test):

  - successes == 1 000 (exactly one win per ticket).
  - successfulTickets.size() == 1 000 (the success set is a strict
    subset of unique tickets — no duplicates).
  - notFound == 9 000 (every redundant call saw nil).
  - unexpected == 0 (no DESERIALIZE_ERROR / REDIS_FAILURE /
    INVALID_CRC, no double-success on any ticket).
  - store.isEmpty() (every ticket consumed; no orphans).

Why not Testcontainers / a real Redis cluster: the standard test
runner image doesn't always grant Docker socket access; spinning a
Redis container per CI run adds 5-10s of wall-clock.  The Mockito-
bridged ConcurrentHashMap honours the GETDEL contract precisely
(ConcurrentMap.remove is itself atomic compare-and-remove).  A
future Stage C / qa-rig E2E will exercise the same flow against a
real Redis cluster.

Regression coverage: a future refactor that broke the atomicity —
say, replacing  opsForValue().getAndDelete(key)  with a
non-atomic GET-then-DEL — would cause 11+ successes for the same
ticket and immediately fail this test.

Verification:

  - docker maven:3.8.4-openjdk-17 -P unit-tests test
      -Dtest='ConnectProxyServiceImplConcurrencyTest'
      → 1/1 pass in 1.277s
  - docker maven:3.8.4-openjdk-17 install
      → 30/30 tests pass, BUILD SUCCESS, zero [WARNING] lines
        (12 verifier + 4 context + 8 grpc + 4 OTel + 1 concurrency
         + 1 context-load).

Superseded by round 2.

Show previous round

hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service)

Round 1 — head b9dcab662be2, base main, trigger opened

TL;DR: NEEDS_WORK — 5 findings kept (A=2, B=3), all verified via Read; 3 major correctness/PII/maintenance issues require fixes before merge.

Summary

Arbitration — PR #3 ConnectProxyService.Validate (Redis GETDEL + CRC32)

First run; no prior arbitration history for this PR in Memora. Verified all 5 findings (2 unique-to-A, 3 unique-to-B) via Read on the checked-out HEAD. Kept all 5: three major, one minor, one info.

Verification notes per finding:

  • A1 (PII log, line 338): Confirmed at line 338: ex.getOriginalMessage() logged; line 310 (Redis path) correctly uses ex.getClass().getSimpleName(). Reviewer B dismisses this claiming Jackson messages don't echo raw JSON content — that claim is incorrect for MismatchedInputException (type-coercion failures), where Jackson's getOriginalMessage() produces strings like Cannot deserialize value of type 'java.time.Instant' from String "<raw-value>". The GDPR inconsistency with the Redis-failure path and the OTel path (which already uses recordSanitizedEvent) confirms the finding. Kept as major.

  • A2 (null userId NPE, line 344): Confirmed three ways: (a) OtelErrorEvents.hashUserId() calls Objects.requireNonNull(userId, ...) at OtelErrorEvents.java:402; (b) TicketUserContext.java:56 shows userId is a plain String with no @NotNull or Jackson required=true — JSON {"user_id": null} or omission yields null; (c) line 297 (GETDEL) atomically consumes the ticket before the NPE at line 344, making the ticket irrecoverable. The null-guard asymmetry at lines 356/359 for familyId/expiresAt corroborates the omission. Kept as major.

  • B1 (contradictory comment, lines 324-328): Confirmed: lines 324-328 read "Surface as a transient error so Centrifugo retries" while line 339 returns new ValidateOutcome(disconnect(), Outcome.DESERIALIZE_ERROR). Also confirmed: recordSanitizedEvent at line 333 passes /* recoverable */ false, consistent with terminal behaviour. Copy-paste from the Redis-failure path (line 299-300) where transientError() is actually returned. Dangerous maintenance hazard. Kept as major.

  • B2 (observer completion unguarded, line 222): Confirmed: lines 220-225 show span.end() inside finally, but recordCounter(outcome.outcome()), responseObserver.onNext(...), and responseObserver.onCompleted() are all outside any try/finally. A RuntimeException from the double-checked-locking lazy-init path in recordCounter (e.g. propagated from GlobalOpenTelemetry.getMeter()) would leave the gRPC call permanently unanswered. Kept as minor.

  • B3 (doValidate public, line 286): Confirmed at line 286; Javadoc at lines 274-281 explicitly acknowledges the visibility is test-driven. Low urgency; kept as info.

Blast Radius

The diff introduces the authentication gate (doValidate) for every Centrifugo WebSocket connection in the system — a null-userId NPE there causes a silent ticket loss and a raw INTERNAL gRPC error on every affected connection attempt. The OTel instrumentation added spans a counter, a span, and a GDPR kill-switch used across the observe plane. The 8 touched files span the gRPC layer, two new domain types, and five test suites.

BLAST_SCORE: 7/10

Risk Indicators

Indicator Value
Sensitive functions doValidate, hashUserId, recordCounter, getAndDelete
Migration touched
Test delta
Dependency changes

CI status (head b9dcab662be2)

No CI checks reported for this commit.

Findings (5)

[MAJOR] DESERIALIZE_ERROR log uses ex.getOriginalMessage() — may surface PII field values

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:338

Line 338:

log.error("Centrifugo validate failed: stored user-context JSON unreadable; client_id={} cause={}",
        request.getClientId(), ex.getOriginalMessage());

JsonProcessingException.getOriginalMessage() returns Jackson's raw parser message. For MismatchedInputException (type-coercion failures, e.g. a mis-typed expires_at field), the message is of the form Cannot deserialize value of type 'java.time.Instant' from String "<raw-value>" — the offending string from the stored JSON is included verbatim and can carry PII.

The Redis-failure path at line 310 correctly uses ex.getClass().getSimpleName(). The OTel path (lines 329-336) already passes a sanitized message via recordSanitizedEvent, so the OTel signal is clean; only the log.error is inconsistent.

Fix: Replace ex.getOriginalMessage() with ex.getClass().getSimpleName() at line 338:

log.error("Centrifugo validate failed: stored user-context JSON unreadable; client_id={} cause={}",
        request.getClientId(), ex.getClass().getSimpleName());

[MAJOR] Null userId from stored JSON causes uncaught NPE after ticket is irrevocably consumed

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:344

If the Redis-stored JSON carries "user_id": null or omits the field, context.userId() is null. TicketUserContext.java declares userId as a plain String with no @NotNull or Jackson required=true constraint — Jackson silently sets it to null. The first crash is at line 344:

OtelErrorEvents.hashUserId(context.userId()); // Objects.requireNonNull → NullPointerException

OtelErrorEvents.hashUserId() opens with Objects.requireNonNull(userId, "userId must not be null") (confirmed at OtelErrorEvents.java:402). Even if that were guarded, line 351 would NPE:

.setUserId(UserId.newBuilder().setValue(context.userId()).build()) // proto rejects null

The GETDEL at line 297 atomically consumed and deleted the ticket before reaching line 344, so the ticket is irrecoverably lost. The gRPC framework surfaces a raw INTERNAL error to Centrifugo instead of a controlled Disconnect.

The null-guard pattern is already applied for familyId() (line 356) and expiresAt() (line 359); userId is the only required field missing the same treatment. No test exercises this path (corruptStoredJsonReturnsDisconnect uses invalid JSON syntax, not a null field value).

Fix: Add a null/blank guard immediately after deserialization (before line 342) and return a controlled DESERIALIZE_ERROR:

if (context.userId() == null || context.userId().isBlank()) {
    log.error("Centrifugo validate failed: stored user-context missing user_id; client_id={}",
            request.getClientId());
    return new ValidateOutcome(disconnect(), Outcome.DESERIALIZE_ERROR);
}

Also add a companion test: store {"user_id": null, ...valid fields...} in the mock, assert DESERIALIZE_ERROR + hasDisconnect().

[MAJOR] Comment on DESERIALIZE_ERROR branch says 'Surface as a transient error' — code returns terminal disconnect()

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:325

Lines 324-328 read:

// The ticket was consumed (GETDEL is atomic) but the stored
// value is corrupt. Surface as a transient error so Centrifugo
// retries — but the next attempt will fail with TICKET_NOT_FOUND
// (the row is gone). Operators see this via the error counter
// and investigate the mint path.

But line 339 returns new ValidateOutcome(disconnect(), Outcome.DESERIALIZE_ERROR) — a terminal Disconnect, not a transient Error. The code is correct (retry is pointless after GETDEL has consumed the key), but the comment directly contradicts it. Two further signals confirm the comment is a copy-paste artifact from the Redis-failure path at lines 299-300: (a) recordSanitizedEvent at line 333 passes /* recoverable */ false, consistent with terminal behaviour; (b) the Redis-failure path at line 311 actually does return transientError().

A future reviewer seeing the mismatch is likely to 'fix' the return to transientError(), breaking the single-use authentication guarantee and producing a confusing TICKET_NOT_FOUND sequence on Centrifugo's retry.

Fix: Update the comment to match the code:

// The ticket was consumed (GETDEL is atomic) but the stored
// value is corrupt. Return Disconnect (terminal) — retry is
// pointless because the key is gone. Operators see this via the
// DESERIALIZE_ERROR counter and investigate the mint path.

[MINOR] recordCounter and observer completion are unguarded outside the span try/finally — any exception leaves the gRPC call unanswered

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:222

Lines 220-225:

} finally {
    span.end();                          // span closed here
}
recordCounter(outcome.outcome());        // line 223 — outside try/finally
responseObserver.onNext(outcome.response());  // line 224
responseObserver.onCompleted();               // line 225

The gRPC contract requires responseObserver to always receive exactly one of onError or onCompleted. If recordCounter throws (e.g. a RuntimeException propagated from the synchronized lazy-init block during a degraded OTel SDK startup), onNext and onCompleted are never called. Centrifugo's gRPC client holds the stream open until its own deadline fires.

LongCounter.add() is documented non-throwing in practice, but the structural guarantee should not depend on that.

Fix:

try {
    recordCounter(outcome.outcome());
} finally {
    responseObserver.onNext(outcome.response());
    responseObserver.onCompleted();
}

[INFO] doValidate is public to enable cross-package test access — tech debt for future PR

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:286

Line 286 declares doValidate as public because the unit tests live in com.aim2be.identity.unit.grpc, a different package. The Javadoc (lines 274-281) acknowledges this explicitly. Any code in any package can bypass the OTel instrumentation wiring in validate() by calling doValidate() directly.

A future PR should either (a) annotate with @VisibleForTesting (Guava) and add a Checkstyle/ArchUnit rule, or (b) move the unit tests into src/test/java/com/aim2be/identity/grpc/ so package-private access suffices. Track as tech-debt before PR-OPAQUE-7 wires the full auth chain.

Verdict

NEEDS_WORK


hib-pr-reviewer • round 1 • 5 findings (3M/1m/1i) • 2026-05-24T02:30:47.462Z → 2026-05-24T02:34:19.230Z • posted-as: pr-reviewer-bot

<!-- im2be-pr-reviewer collapsed --> > _Superseded by round 2._ <details> <summary>Show previous round</summary> ## hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service) **Round 1** — head `b9dcab662be2`, base `main`, trigger `opened` **TL;DR:** NEEDS_WORK — 5 findings kept (A=2, B=3), all verified via Read; 3 major correctness/PII/maintenance issues require fixes before merge. ### Summary ## Arbitration — PR #3 ConnectProxyService.Validate (Redis GETDEL + CRC32) First run; no prior arbitration history for this PR in Memora. Verified all 5 findings (2 unique-to-A, 3 unique-to-B) via `Read` on the checked-out HEAD. Kept all 5: three major, one minor, one info. **Verification notes per finding:** - **A1 (PII log, line 338):** Confirmed at line 338: `ex.getOriginalMessage()` logged; line 310 (Redis path) correctly uses `ex.getClass().getSimpleName()`. Reviewer B dismisses this claiming Jackson messages don't echo raw JSON content — that claim is incorrect for `MismatchedInputException` (type-coercion failures), where Jackson's `getOriginalMessage()` produces strings like `Cannot deserialize value of type 'java.time.Instant' from String "<raw-value>"`. The GDPR inconsistency with the Redis-failure path and the OTel path (which already uses `recordSanitizedEvent`) confirms the finding. Kept as **major**. - **A2 (null userId NPE, line 344):** Confirmed three ways: (a) `OtelErrorEvents.hashUserId()` calls `Objects.requireNonNull(userId, ...)` at `OtelErrorEvents.java:402`; (b) `TicketUserContext.java:56` shows `userId` is a plain `String` with no `@NotNull` or Jackson `required=true` — JSON `{"user_id": null}` or omission yields `null`; (c) line 297 (GETDEL) atomically consumes the ticket before the NPE at line 344, making the ticket irrecoverable. The null-guard asymmetry at lines 356/359 for `familyId`/`expiresAt` corroborates the omission. Kept as **major**. - **B1 (contradictory comment, lines 324-328):** Confirmed: lines 324-328 read `"Surface as a transient error so Centrifugo retries"` while line 339 returns `new ValidateOutcome(disconnect(), Outcome.DESERIALIZE_ERROR)`. Also confirmed: `recordSanitizedEvent` at line 333 passes `/* recoverable */ false`, consistent with terminal behaviour. Copy-paste from the Redis-failure path (line 299-300) where `transientError()` is actually returned. Dangerous maintenance hazard. Kept as **major**. - **B2 (observer completion unguarded, line 222):** Confirmed: lines 220-225 show `span.end()` inside `finally`, but `recordCounter(outcome.outcome())`, `responseObserver.onNext(...)`, and `responseObserver.onCompleted()` are all outside any try/finally. A RuntimeException from the double-checked-locking lazy-init path in `recordCounter` (e.g. propagated from `GlobalOpenTelemetry.getMeter()`) would leave the gRPC call permanently unanswered. Kept as **minor**. - **B3 (doValidate public, line 286):** Confirmed at line 286; Javadoc at lines 274-281 explicitly acknowledges the visibility is test-driven. Low urgency; kept as **info**. ### Blast Radius The diff introduces the authentication gate (`doValidate`) for every Centrifugo WebSocket connection in the system — a null-userId NPE there causes a silent ticket loss and a raw INTERNAL gRPC error on every affected connection attempt. The OTel instrumentation added spans a counter, a span, and a GDPR kill-switch used across the observe plane. The 8 touched files span the gRPC layer, two new domain types, and five test suites. **BLAST_SCORE: 7/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `doValidate`, `hashUserId`, `recordCounter`, `getAndDelete` | | Migration touched | — | | Test delta | — | | Dependency changes | — | ### CI status (head `b9dcab662be2`) _No CI checks reported for this commit._ ### Findings (5) #### **[MAJOR]** DESERIALIZE_ERROR log uses `ex.getOriginalMessage()` — may surface PII field values _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:338_ Line 338: ```java log.error("Centrifugo validate failed: stored user-context JSON unreadable; client_id={} cause={}", request.getClientId(), ex.getOriginalMessage()); ``` `JsonProcessingException.getOriginalMessage()` returns Jackson's raw parser message. For `MismatchedInputException` (type-coercion failures, e.g. a mis-typed `expires_at` field), the message is of the form `Cannot deserialize value of type 'java.time.Instant' from String "<raw-value>"` — the offending string from the stored JSON is included verbatim and can carry PII. The Redis-failure path at line 310 correctly uses `ex.getClass().getSimpleName()`. The OTel path (lines 329-336) already passes a sanitized message via `recordSanitizedEvent`, so the OTel signal is clean; only the `log.error` is inconsistent. **Fix:** Replace `ex.getOriginalMessage()` with `ex.getClass().getSimpleName()` at line 338: ```java log.error("Centrifugo validate failed: stored user-context JSON unreadable; client_id={} cause={}", request.getClientId(), ex.getClass().getSimpleName()); ``` #### **[MAJOR]** Null `userId` from stored JSON causes uncaught NPE after ticket is irrevocably consumed _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:344_ If the Redis-stored JSON carries `"user_id": null` or omits the field, `context.userId()` is `null`. `TicketUserContext.java` declares `userId` as a plain `String` with no `@NotNull` or Jackson `required=true` constraint — Jackson silently sets it to `null`. The first crash is at line 344: ```java OtelErrorEvents.hashUserId(context.userId()); // Objects.requireNonNull → NullPointerException ``` `OtelErrorEvents.hashUserId()` opens with `Objects.requireNonNull(userId, "userId must not be null")` (confirmed at `OtelErrorEvents.java:402`). Even if that were guarded, line 351 would NPE: ```java .setUserId(UserId.newBuilder().setValue(context.userId()).build()) // proto rejects null ``` The GETDEL at line 297 atomically consumed and deleted the ticket *before* reaching line 344, so the ticket is irrecoverably lost. The gRPC framework surfaces a raw INTERNAL error to Centrifugo instead of a controlled `Disconnect`. The null-guard pattern is already applied for `familyId()` (line 356) and `expiresAt()` (line 359); `userId` is the only required field missing the same treatment. No test exercises this path (`corruptStoredJsonReturnsDisconnect` uses invalid JSON syntax, not a null field value). **Fix:** Add a null/blank guard immediately after deserialization (before line 342) and return a controlled `DESERIALIZE_ERROR`: ```java if (context.userId() == null || context.userId().isBlank()) { log.error("Centrifugo validate failed: stored user-context missing user_id; client_id={}", request.getClientId()); return new ValidateOutcome(disconnect(), Outcome.DESERIALIZE_ERROR); } ``` Also add a companion test: store `{"user_id": null, ...valid fields...}` in the mock, assert `DESERIALIZE_ERROR` + `hasDisconnect()`. #### **[MAJOR]** Comment on DESERIALIZE_ERROR branch says 'Surface as a transient error' — code returns terminal `disconnect()` _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:325_ Lines 324-328 read: ```java // The ticket was consumed (GETDEL is atomic) but the stored // value is corrupt. Surface as a transient error so Centrifugo // retries — but the next attempt will fail with TICKET_NOT_FOUND // (the row is gone). Operators see this via the error counter // and investigate the mint path. ``` But line 339 returns `new ValidateOutcome(disconnect(), Outcome.DESERIALIZE_ERROR)` — a **terminal Disconnect**, not a transient Error. The code is correct (retry is pointless after GETDEL has consumed the key), but the comment directly contradicts it. Two further signals confirm the comment is a copy-paste artifact from the Redis-failure path at lines 299-300: (a) `recordSanitizedEvent` at line 333 passes `/* recoverable */ false`, consistent with terminal behaviour; (b) the Redis-failure path at line 311 actually does return `transientError()`. A future reviewer seeing the mismatch is likely to 'fix' the return to `transientError()`, breaking the single-use authentication guarantee and producing a confusing `TICKET_NOT_FOUND` sequence on Centrifugo's retry. **Fix:** Update the comment to match the code: ```java // The ticket was consumed (GETDEL is atomic) but the stored // value is corrupt. Return Disconnect (terminal) — retry is // pointless because the key is gone. Operators see this via the // DESERIALIZE_ERROR counter and investigate the mint path. ``` #### **[MINOR]** `recordCounter` and observer completion are unguarded outside the span try/finally — any exception leaves the gRPC call unanswered _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:222_ Lines 220-225: ```java } finally { span.end(); // span closed here } recordCounter(outcome.outcome()); // line 223 — outside try/finally responseObserver.onNext(outcome.response()); // line 224 responseObserver.onCompleted(); // line 225 ``` The gRPC contract requires `responseObserver` to always receive exactly one of `onError` or `onCompleted`. If `recordCounter` throws (e.g. a `RuntimeException` propagated from the `synchronized` lazy-init block during a degraded OTel SDK startup), `onNext` and `onCompleted` are never called. Centrifugo's gRPC client holds the stream open until its own deadline fires. `LongCounter.add()` is documented non-throwing in practice, but the structural guarantee should not depend on that. **Fix:** ```java try { recordCounter(outcome.outcome()); } finally { responseObserver.onNext(outcome.response()); responseObserver.onCompleted(); } ``` #### **[INFO]** `doValidate` is `public` to enable cross-package test access — tech debt for future PR _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:286_ Line 286 declares `doValidate` as `public` because the unit tests live in `com.aim2be.identity.unit.grpc`, a different package. The Javadoc (lines 274-281) acknowledges this explicitly. Any code in any package can bypass the OTel instrumentation wiring in `validate()` by calling `doValidate()` directly. A future PR should either (a) annotate with `@VisibleForTesting` (Guava) and add a Checkstyle/ArchUnit rule, or (b) move the unit tests into `src/test/java/com/aim2be/identity/grpc/` so package-private access suffices. Track as tech-debt before PR-OPAQUE-7 wires the full auth chain. ### Verdict **NEEDS_WORK** --- <sub>hib-pr-reviewer • round 1 • 5 findings (3M/1m/1i) • 2026-05-24T02:30:47.462Z → 2026-05-24T02:34:19.230Z • posted-as: pr-reviewer-bot</sub> </details>
Closes 4 actionable findings (3 MAJOR + 1 MINOR; INFO doValidate-public
deferred per reviewer's tech-debt tag):

**MAJOR #1 — PII leak via JsonProcessingException.getOriginalMessage()**:
MismatchedInputException raw parser message echoes the offending JSON
value (e.g. 'Cannot deserialize value of type Instant from String
"<raw-value>"') — may carry PII. Sibling Redis-failure path uses
.getClass().getSimpleName(); applied same here. OTel signal was
already sanitized via recordSanitizedEvent.

**MAJOR #2 — null userId NPE after GETDEL consumed the ticket**:
TicketUserContext.userId is plain String, no @NotNull / Jackson
required=true. {"user_id":null} or omitted field silently sets null
→ OtelErrorEvents.hashUserId(null) NPEs after GETDEL has already
deleted the key → raw INTERNAL surfaces to Centrifugo instead of
controlled Disconnect, ticket irrecoverably lost. Added required-
field guard immediately after deserialization that returns
DESERIALIZE_ERROR + disconnect() (terminal — same as parser-failure
path; retry pointless after GETDEL). Mirrors the existing
familyId/expiresAt null-guards.

**MAJOR #3 — Comment said 'transient' but code returned terminal**:
Comment at the JsonProcessingException catch said 'Surface as a
transient error so Centrifugo retries' but the code returned
disconnect() (terminal). recordSanitizedEvent's
/* recoverable */ false flag confirms the code is correct;
comment was copy-paste from the Redis-failure path which
genuinely does transientError(). Comment rewritten to match.

**MINOR — recordCounter + observer.onNext outside try/finally**:
gRPC contract requires responseObserver to receive exactly one of
onError or onCompleted (paired with onNext). If recordCounter()
throws (unlikely but structurally not impossible), onNext +
onCompleted never run; Centrifugo's gRPC client holds the stream
open until its own deadline. Wrapped in defensive try/finally.

Verification:
- docker run --rm -v $PWD:/app -w /app maven:3.8.4-openjdk-17
  mvn install -> BUILD SUCCESS, 30/30 tests pass, 0 warnings
- (Note: docker maven required because target/ was created by
  earlier sub-agent's docker-maven build; my user can't write
  to root-owned target/. JDK-17-via-mise approach used for PR #2
  R1 didn't apply here due to that ownership.)

Files:
- src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java

Superseded by round 3.

Show previous round

hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service)

Round 2 — head 765570b3ab3f, base main, trigger synchronize

TL;DR: NEEDS_WORK — kept 2 agreed/verified minor findings (no test for null/blank userId guard; unchecked-exception gRPC observer gap) and 2 info/deferred findings (DESERIALIZE_ERROR label conflation; ticket-mint duplication); all R1 findings confirmed resolved.

Summary

Arbitration Summary — Round 2

Recalled Memora: no prior run history found (Memora persist unavailable due to tag allowlist constraint — treated as first recorded run).

Agreed finding kept (A1 ≡ B2): Both reviewers independently identified missing test coverage for the null/blank userId guard at ConnectProxyServiceImpl.java:366-378. Verified: 8 test cases in ConnectProxyServiceImplTest, none inject structurally-valid JSON with null/blank user_id. A cites the test file end (line 239); B cites the guard (line 366). Merged into one finding citing the test file.

Unique-to-B finding B1 verified and kept: B claims unchecked exceptions from doValidate (line 218) or annotateSpan (line 219) escape the first try (Scope) block, skipping the second try { recordCounter } finally { responseObserver } block entirely. Verified at lines 217-234: the two try blocks are sequential — an exception from the first propagates past the second, leaving responseObserver uncalled. Real correctness gap, kept as minor.

Unique-to-A finding A2 verified and kept (deferred): Outcome.DESERIALIZE_ERROR returned at both line 355 (JsonProcessingException) and line 377 (null userId guard). Verified. Both share one counter bucket — operators cannot distinguish parse corruption from semantic mint bugs without drilling OTel spans. Kept as info/deferred.

Unique-to-B finding B3 verified and kept (deferred): Ticket-mint algorithm duplicated across 4 test helpers. Grep confirmed validTicket() in ConnectProxyServiceImplTest:66 and ConnectProxyServiceImplOtelTest:119; a parameterized validTicket(SecureRandom) in ConnectProxyServiceImplConcurrencyTest:81; mintTicket(SecureRandom) in TicketChecksumVerifierTest:37. B's characterization as "verbatim" is slightly overstated (signatures vary), but the wire-format divergence risk is genuine. Kept as info/deferred.

Final tally: A=2 kept, B=3 kept, total distinct kept=4 (2 minor, 2 info). Verdict: NEEDS_WORK.

Blast Radius

The PR adds 471 net lines to the core gRPC auth handler (ConnectProxyServiceImpl) — the single chokepoint for every Centrifugo WebSocket connection attempt. It also introduces two new support classes (TicketChecksumVerifier, TicketUserContext) and five new test files. The service runs on a separate gRPC port with its own channel, but a defect here affects all real-time connections rather than a single API endpoint.

BLAST_SCORE: 6/10

Risk Indicators

Indicator Value
Sensitive functions doValidate, hashUserId, getAndDelete, recordSanitizedEvent
Migration touched
Test delta
Dependency changes

CI status (head 765570b3ab3f)

No CI checks reported for this commit.

  • PR-OPAQUE-1 (ConnectProxyService scaffold)
  • PR-OPAQUE-2 (TicketMinter — mint path; ticket-helper duplication risk materialises here)
  • PR-OPAQUE-7 (metrics wiring + fallback-pool degraded path)

Findings (4)

[MINOR] No test covers null/blank userId guard (agreed: A + B)

src/test/java/com/aim2be/identity/unit/grpc/ConnectProxyServiceImplTest.java:239

The null/blank userId guard at ConnectProxyServiceImpl.java:366-378 — the fix for the previously-blocking R1-MAJOR-2 NPE — has no direct test coverage. Verified: ConnectProxyServiceImplTest ends at line 239 with 8 test cases; the corruptStoredJsonReturnsDisconnect test (line 183) exercises a parse failure, not a parse success with null/blank required field. A regression that removed the guard would silently restore an NPE-after-ticket-consumed path — the ticket is irrevocably consumed by GETDEL before the check, so Centrifugo would see an uncontrolled INTERNAL error instead of a clean Disconnect.

Add two tests:

@Test
void nullUserIdInStoredJsonReturnsDisconnect() {
    String json = """
            {"user_id": null, "family_id": null,
             "subscription_tier": "FREE",
             "parental_state": {"screen_time_lock":false,
               "feature_lock_chat":false,"feature_lock_diary":false,
               "feature_lock_flowcoach":false},
             "is_admin": false, "expires_at": "2026-05-23T12:05:00Z"}
            """;
    String ticket = validTicket();
    when(redisTemplate.opsForValue()).thenReturn(valueOperations);
    when(valueOperations.getAndDelete(any())).thenReturn(json);

    ValidateOutcome outcome = service.doValidate(requestWith(ticket));

    assertThat(outcome.outcome()).isEqualTo(Outcome.DESERIALIZE_ERROR);
    assertThat(outcome.response().hasDisconnect()).isTrue();
}

@Test
void blankUserIdInStoredJsonReturnsDisconnect() {
    // Replace "user_id": null with "user_id": "   " — exercises isBlank() branch
    // (same mock setup as nullUserIdInStoredJsonReturnsDisconnect)
    assertThat(outcome.outcome()).isEqualTo(Outcome.DESERIALIZE_ERROR);
    assertThat(outcome.response().hasDisconnect()).isTrue();
}

[MINOR] Unchecked exception from doValidate/annotateSpan leaves responseObserver uncalled (unique to B, verified)

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:218

Verified at lines 217-234. The two try blocks are sequential, not nested:

try (Scope ignored = span.makeCurrent()) {
    outcome = doValidate(request);   // line 218
    annotateSpan(span, outcome);     // line 219
} finally {
    span.end();                      // line 221
}
// ... (lines 223-228 comment)
try {
    recordCounter(outcome.outcome()); // line 230
} finally {
    responseObserver.onNext();       // line 232
    responseObserver.onCompleted();   // line 233
}

If doValidate or annotateSpan throw any unchecked exception (e.g. OTel SDK race, unexpected NPE, IllegalArgumentException from proto builder), the first finally runs (span.end()) then the exception propagates past lines 229-234 entirely — responseObserver.onNext and responseObserver.onCompleted are never invoked. Centrifugo's gRPC client holds the stream open until its own deadline fires, logging a spurious timeout error.

doValidate itself catches DataAccessException and JsonProcessingException, but any other unchecked exception escapes. The class-comment at lines 223-228 correctly identifies the gRPC contract (responseObserver MUST receive exactly one of onError or onCompleted) but the guard does not protect the earlier operations.

Fix — nest or catch around the full body:

try (Scope ignored = span.makeCurrent()) {
    outcome = doValidate(request);
    annotateSpan(span, outcome);
} catch (Throwable t) {
    span.recordException(t);
    span.end();
    responseObserver.onError(
        io.grpc.Status.INTERNAL.withCause(t).asRuntimeException());
    return;
} finally {
    span.end(); // OTel spec: safe to call multiple times
}
try {
    recordCounter(outcome.outcome());
} finally {
    responseObserver.onNext(outcome.response());
    responseObserver.onCompleted();
}

[INFO] DESERIALIZE_ERROR counter label conflates JSON parse failures with missing-required-field failures (unique to A, verified, deferred)

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:377

Verified: line 355 returns Outcome.DESERIALIZE_ERROR on JsonProcessingException; line 377 returns the same outcome for the null/blank userId guard — two distinct root causes sharing one counter bucket.

  • Corrupt JSON (line 355): mint is writing garbage bytes — storage corruption or serialiser regression.
  • Missing required field (line 377): mint is writing structurally-valid JSON with a semantic defect — likely a mint-path code bug.

Operators paging on a spike in centrifugo_tickets_validated_total{outcome="deserialize-error"} cannot distinguish these without drilling into the OTel span events (TICKET_CONTEXT_DESERIALIZE_FAILED vs. TICKET_CONTEXT_MISSING_USER_ID).

Propose a new Outcome entry:

/** Stored JSON parsed but lacked a required field; Disconnect returned (one-shot). */
MISSING_REQUIRED_FIELD("missing-required-field"),

And at line 377:

return new ValidateOutcome(disconnect(), Outcome.MISSING_REQUIRED_FIELD);

Safe to defer alongside PR-OPAQUE-7 metric wiring — no correctness regression in the current form.

[INFO] Ticket-mint algorithm duplicated across 4 test helpers — consolidate before PR-OPAQUE-2 ships (unique to B, verified, deferred)

src/test/java/com/aim2be/identity/unit/grpc/ConnectProxyServiceImplTest.java:66

Verified: the 24-byte-entropy + CRC32-big-endian → base64url algorithm appears in 4 test helpers:

File Method Signature
ConnectProxyServiceImplTest:66 validTicket() no param
ConnectProxyServiceImplOtelTest:119 validTicket() no param
ConnectProxyServiceImplConcurrencyTest:81 validTicket(SecureRandom) parameterized
TicketChecksumVerifierTest:37 mintTicket(SecureRandom) parameterized

B's characterization as "verbatim" is slightly overstated (signatures and names vary), but the duplication risk is real: any change to the wire format (e.g. extending entropy from 24 to 32 bytes) requires 4 parallel edits — a recipe for test/production divergence. When PR-OPAQUE-2 ships TicketMinter, extract a single TicketTestHelper under src/test/java/com/aim2be/identity/util/ and replace all four copies.

Verdict

NEEDS_WORK


hib-pr-reviewer • round 2 • 4 findings (2m/2i) • 2026-05-24T02:43:06.291Z → 2026-05-24T02:47:31.603Z • posted-as: pr-reviewer-bot

<!-- im2be-pr-reviewer collapsed --> > _Superseded by round 3._ <details> <summary>Show previous round</summary> ## hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service) **Round 2** — head `765570b3ab3f`, base `main`, trigger `synchronize` **TL;DR:** NEEDS_WORK — kept 2 agreed/verified minor findings (no test for null/blank userId guard; unchecked-exception gRPC observer gap) and 2 info/deferred findings (DESERIALIZE_ERROR label conflation; ticket-mint duplication); all R1 findings confirmed resolved. ### Summary ## Arbitration Summary — Round 2 Recalled Memora: no prior run history found (Memora persist unavailable due to tag allowlist constraint — treated as first recorded run). **Agreed finding kept (A1 ≡ B2):** Both reviewers independently identified missing test coverage for the null/blank `userId` guard at `ConnectProxyServiceImpl.java:366-378`. Verified: 8 test cases in `ConnectProxyServiceImplTest`, none inject structurally-valid JSON with `null`/blank `user_id`. A cites the test file end (line 239); B cites the guard (line 366). Merged into one finding citing the test file. **Unique-to-B finding B1 verified and kept:** B claims unchecked exceptions from `doValidate` (line 218) or `annotateSpan` (line 219) escape the first `try (Scope)` block, skipping the second `try { recordCounter } finally { responseObserver }` block entirely. Verified at lines 217-234: the two `try` blocks are sequential — an exception from the first propagates past the second, leaving `responseObserver` uncalled. Real correctness gap, kept as minor. **Unique-to-A finding A2 verified and kept (deferred):** `Outcome.DESERIALIZE_ERROR` returned at both line 355 (`JsonProcessingException`) and line 377 (null userId guard). Verified. Both share one counter bucket — operators cannot distinguish parse corruption from semantic mint bugs without drilling OTel spans. Kept as info/deferred. **Unique-to-B finding B3 verified and kept (deferred):** Ticket-mint algorithm duplicated across 4 test helpers. Grep confirmed `validTicket()` in `ConnectProxyServiceImplTest:66` and `ConnectProxyServiceImplOtelTest:119`; a parameterized `validTicket(SecureRandom)` in `ConnectProxyServiceImplConcurrencyTest:81`; `mintTicket(SecureRandom)` in `TicketChecksumVerifierTest:37`. B's characterization as "verbatim" is slightly overstated (signatures vary), but the wire-format divergence risk is genuine. Kept as info/deferred. **Final tally:** A=2 kept, B=3 kept, total distinct kept=4 (2 minor, 2 info). Verdict: NEEDS_WORK. ### Blast Radius The PR adds 471 net lines to the core gRPC auth handler (`ConnectProxyServiceImpl`) — the single chokepoint for every Centrifugo WebSocket connection attempt. It also introduces two new support classes (`TicketChecksumVerifier`, `TicketUserContext`) and five new test files. The service runs on a separate gRPC port with its own channel, but a defect here affects all real-time connections rather than a single API endpoint. **BLAST_SCORE: 6/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `doValidate`, `hashUserId`, `getAndDelete`, `recordSanitizedEvent` | | Migration touched | — | | Test delta | — | | Dependency changes | — | ### CI status (head `765570b3ab3f`) _No CI checks reported for this commit._ ### Related PRs - PR-OPAQUE-1 (ConnectProxyService scaffold) - PR-OPAQUE-2 (TicketMinter — mint path; ticket-helper duplication risk materialises here) - PR-OPAQUE-7 (metrics wiring + fallback-pool degraded path) ### Findings (4) #### **[MINOR]** No test covers null/blank `userId` guard (agreed: A + B) _src/test/java/com/aim2be/identity/unit/grpc/ConnectProxyServiceImplTest.java:239_ The null/blank `userId` guard at `ConnectProxyServiceImpl.java:366-378` — the fix for the previously-blocking R1-MAJOR-2 NPE — has no direct test coverage. Verified: `ConnectProxyServiceImplTest` ends at line 239 with 8 test cases; the `corruptStoredJsonReturnsDisconnect` test (line 183) exercises a parse *failure*, not a parse *success* with `null`/blank required field. A regression that removed the guard would silently restore an NPE-after-ticket-consumed path — the ticket is irrevocably consumed by `GETDEL` before the check, so Centrifugo would see an uncontrolled `INTERNAL` error instead of a clean `Disconnect`. Add two tests: ```java @Test void nullUserIdInStoredJsonReturnsDisconnect() { String json = """ {"user_id": null, "family_id": null, "subscription_tier": "FREE", "parental_state": {"screen_time_lock":false, "feature_lock_chat":false,"feature_lock_diary":false, "feature_lock_flowcoach":false}, "is_admin": false, "expires_at": "2026-05-23T12:05:00Z"} """; String ticket = validTicket(); when(redisTemplate.opsForValue()).thenReturn(valueOperations); when(valueOperations.getAndDelete(any())).thenReturn(json); ValidateOutcome outcome = service.doValidate(requestWith(ticket)); assertThat(outcome.outcome()).isEqualTo(Outcome.DESERIALIZE_ERROR); assertThat(outcome.response().hasDisconnect()).isTrue(); } @Test void blankUserIdInStoredJsonReturnsDisconnect() { // Replace "user_id": null with "user_id": " " — exercises isBlank() branch // (same mock setup as nullUserIdInStoredJsonReturnsDisconnect) assertThat(outcome.outcome()).isEqualTo(Outcome.DESERIALIZE_ERROR); assertThat(outcome.response().hasDisconnect()).isTrue(); } ``` #### **[MINOR]** Unchecked exception from `doValidate`/`annotateSpan` leaves `responseObserver` uncalled (unique to B, verified) _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:218_ Verified at lines 217-234. The two `try` blocks are sequential, not nested: ```java try (Scope ignored = span.makeCurrent()) { outcome = doValidate(request); // line 218 annotateSpan(span, outcome); // line 219 } finally { span.end(); // line 221 } // ... (lines 223-228 comment) try { recordCounter(outcome.outcome()); // line 230 } finally { responseObserver.onNext(…); // line 232 responseObserver.onCompleted(); // line 233 } ``` If `doValidate` or `annotateSpan` throw any unchecked exception (e.g. OTel SDK race, unexpected NPE, `IllegalArgumentException` from proto builder), the first `finally` runs (`span.end()`) then the exception propagates **past** lines 229-234 entirely — `responseObserver.onNext` and `responseObserver.onCompleted` are never invoked. Centrifugo's gRPC client holds the stream open until its own deadline fires, logging a spurious timeout error. `doValidate` itself catches `DataAccessException` and `JsonProcessingException`, but any other unchecked exception escapes. The class-comment at lines 223-228 correctly identifies the gRPC contract (`responseObserver MUST receive exactly one of onError or onCompleted`) but the guard does not protect the earlier operations. **Fix — nest or catch around the full body:** ```java try (Scope ignored = span.makeCurrent()) { outcome = doValidate(request); annotateSpan(span, outcome); } catch (Throwable t) { span.recordException(t); span.end(); responseObserver.onError( io.grpc.Status.INTERNAL.withCause(t).asRuntimeException()); return; } finally { span.end(); // OTel spec: safe to call multiple times } try { recordCounter(outcome.outcome()); } finally { responseObserver.onNext(outcome.response()); responseObserver.onCompleted(); } ``` #### **[INFO]** `DESERIALIZE_ERROR` counter label conflates JSON parse failures with missing-required-field failures (unique to A, verified, deferred) _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:377_ Verified: line 355 returns `Outcome.DESERIALIZE_ERROR` on `JsonProcessingException`; line 377 returns the same outcome for the null/blank `userId` guard — two distinct root causes sharing one counter bucket. - **Corrupt JSON** (line 355): mint is writing garbage bytes — storage corruption or serialiser regression. - **Missing required field** (line 377): mint is writing structurally-valid JSON with a semantic defect — likely a mint-path code bug. Operators paging on a spike in `centrifugo_tickets_validated_total{outcome="deserialize-error"}` cannot distinguish these without drilling into the OTel span events (`TICKET_CONTEXT_DESERIALIZE_FAILED` vs. `TICKET_CONTEXT_MISSING_USER_ID`). Propose a new `Outcome` entry: ```java /** Stored JSON parsed but lacked a required field; Disconnect returned (one-shot). */ MISSING_REQUIRED_FIELD("missing-required-field"), ``` And at line 377: ```java return new ValidateOutcome(disconnect(), Outcome.MISSING_REQUIRED_FIELD); ``` Safe to defer alongside PR-OPAQUE-7 metric wiring — no correctness regression in the current form. #### **[INFO]** Ticket-mint algorithm duplicated across 4 test helpers — consolidate before PR-OPAQUE-2 ships (unique to B, verified, deferred) _src/test/java/com/aim2be/identity/unit/grpc/ConnectProxyServiceImplTest.java:66_ Verified: the 24-byte-entropy + CRC32-big-endian → base64url algorithm appears in 4 test helpers: | File | Method | Signature | |---|---|---| | `ConnectProxyServiceImplTest:66` | `validTicket()` | no param | | `ConnectProxyServiceImplOtelTest:119` | `validTicket()` | no param | | `ConnectProxyServiceImplConcurrencyTest:81` | `validTicket(SecureRandom)` | parameterized | | `TicketChecksumVerifierTest:37` | `mintTicket(SecureRandom)` | parameterized | B's characterization as "verbatim" is slightly overstated (signatures and names vary), but the duplication risk is real: any change to the wire format (e.g. extending entropy from 24 to 32 bytes) requires 4 parallel edits — a recipe for test/production divergence. When PR-OPAQUE-2 ships `TicketMinter`, extract a single `TicketTestHelper` under `src/test/java/com/aim2be/identity/util/` and replace all four copies. ### Verdict **NEEDS_WORK** --- <sub>hib-pr-reviewer • round 2 • 4 findings (2m/2i) • 2026-05-24T02:43:06.291Z → 2026-05-24T02:47:31.603Z • posted-as: pr-reviewer-bot</sub> </details>
R2 verdict findings (kept=2):

(1) MINOR ConnectProxyServiceImpl.java:218 — unchecked exception from
    doValidate/annotateSpan left responseObserver uncalled. The two
    try blocks were sequential not nested; if doValidate or annotateSpan
    threw any unchecked exception (OTel SDK race, NPE from unexpected
    JSON shape, IAE from proto builder), the first finally ran
    span.end() then the exception propagated past lines 229-234 entirely
    — responseObserver.onNext + onCompleted never invoked, Centrifugos
    gRPC client holds the stream open until its own deadline fires +
    logs a spurious timeout. doValidate itself catches DataAccessException
    and JsonProcessingException, but any other unchecked exception
    escaped. Fix: wrapped doValidate+annotateSpan inner block with
    catch(Throwable) → recordSanitizedEvent + span.end() +
    responseObserver.onError(Status.INTERNAL) + return. Outer finallys
    span.end() is a no-op per OTel spec (end is idempotent).

(2) MINOR ConnectProxyServiceImplTest — no direct test coverage for the
    null/blank userId guard at ConnectProxyServiceImpl.java:366-378
    (the R1-MAJOR-2 NPE fix). A regression that removed the guard would
    silently restore an NPE-after-ticket-consumed path — GETDEL
    irrevocably consumes the ticket BEFORE the guard fires, so
    Centrifugo would see an uncontrolled INTERNAL error instead of a
    clean Disconnect. Fix: added two new tests:
    - nullUserIdInStoredJsonReturnsDisconnect — exercises the
      userId() == null branch
    - blankUserIdInStoredJsonReturnsDisconnect — exercises the
      isBlank() branch (whitespace user_id)
    Both assert outcome=DESERIALIZE_ERROR + Disconnect response.

INFO/deferred (NOT addressed this round, per reviewer guidance):
- DESERIALIZE_ERROR counter label conflation (corrupt JSON vs missing
  required field) — deferred to a follow-up dashboard-label split.
- Ticket-mint duplication risk — materialises in PR-OPAQUE-2; cross-PR
  refactor deferred to a unification task once both PRs merge.

Verification:
- mvn -DskipTests=false clean install → BUILD SUCCESS
- 32 tests pass (ConnectProxyServiceImplTest 10/10 — was 8, +2 new;
  ConnectProxyServiceImplOtelTest 4/4; ConnectProxyServiceImplConcurrency
  Test 1/1; TicketChecksumVerifierTest 12/12; TicketUserContextTest 4/4;
  IdentityServiceApplicationTest 1/1)
- Zero warnings on touched modules

Superseded by round 4.

Show previous round

hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service)

Round 3 — head 95e1211bc686, base main, trigger synchronize

TL;DR: CONDITIONAL_APPROVE — 0 agreed findings, all 4 unique-to-one findings verified and kept; no blocking issues; all prior round blockers confirmed resolved.

Summary

Round 3 Arbitration — all 4 findings are unique-to-one; all 4 verified against HEAD

Reconciliation actions:

  • No findings appear in both A and B (zero agreed pairs).
  • Verified all 4 unique findings via Read and Grep at the checkout path.
  • A-1 kept: TicketChecksumVerifier.java:113computeChecksumBigEndian returns crc.getValue() (a plain long); no endian-specific byte serialisation occurs in the method. The BigEndian suffix is genuinely misleading. Verified.
  • A-2 kept: ConnectProxyServiceImpl.java:96static final String TICKET_KEY_PREFIX has no access modifier (package-private). Grep confirms 4 occurrences of the hardcoded literal in tests (ConnectProxyServiceImplTest:109, :202, :208; ConnectProxyServiceImplConcurrencyTest:104). Verified.
  • B-1 kept: ConnectProxyServiceImpl.java:407-409log.info("… user_id_hash={}", …, OtelErrorEvents.hashUserId(context.userId())) is unconditional. annotateSpan at line 276 correctly gates behind userIdHashEnabled; the log does not. GDPR kill-switch bypass in log path confirmed. Verified.
  • B-2 kept: ConnectProxyServiceImpl.java:234-248 — the catch arm executes OtelErrorEvents.recordSanitizedEvent (line 235), span.end() (line 243), then responseObserver.onError (line 244) sequentially with no inner try/finally. A throw from lines 235 or 243 leaves responseObserver unsignalled. Verified.
  • All prior round blockers confirmed resolved by both reviewers; not re-raised.
  • Memora run summary persisted (memory ID 217; new entry, review_count=3).

Blast Radius

The diff adds 497 lines to ConnectProxyServiceImpl (the live gRPC surface Centrifugo calls on every WebSocket connection attempt), introduces two new domain classes (TicketChecksumVerifier, TicketUserContext), and lands four new test files. The gRPC handler is an exported service boundary touched by network I/O, Redis, OTel, and Jackson; a defect there affects every Centrifugo connection attempt. TicketChecksumVerifier is shared with the upcoming PR-OPAQUE-2 mint path, making its naming contract cross-PR critical.

BLAST_SCORE: 6/10

Risk Indicators

Indicator Value
Sensitive functions doValidate, hashUserId, computeChecksumBigEndian, redisTemplate.opsForValue().getAndDelete, annotateSpan
Migration touched
Test delta
Dependency changes

CI status (head 95e1211bc686)

No CI checks reported for this commit.

  • PR-OPAQUE-1 (scaffold stub, superseded)
  • PR-OPAQUE-2 (mint path — shares TicketChecksumVerifier)
  • PR-OPAQUE-7 (fallback-pool / degraded path)

Findings (4)

[MINOR] computeChecksumBigEndian name implies byte serialisation but returns a raw long

src/main/java/com/aim2be/identity/ticket/TicketChecksumVerifier.java:113

Line 113: public static long computeChecksumBigEndian(byte[] input) calls crc.getValue() and returns a plain Java long — no endian-specific byte serialisation is performed inside the method. The big-endian byte layout is handled externally by callers. A mint developer reading the name without inspecting the return type may reasonably skip the explicit byte-order extraction step, producing a little-endian or native-order CRC that never matches the verifier.

Proposed fix: rename to computeChecksum and update the Javadoc: "returns the numeric CRC32 value in [0, 2^32); callers must serialise to big-endian bytes themselves." Alternatively add a writeCrcBigEndian(byte[] dest, int offset, long crc) companion so mint and verify both delegate to the same place.

[MINOR] TICKET_KEY_PREFIX is package-private; tests duplicate the literal in 4 places

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:96

Line 96: static final String TICKET_KEY_PREFIX = "centrifugo:ticket:"; — no public modifier.

Tests in com.aim2be.identity.unit.grpc (a different package) cannot reference the constant and hardcode the literal in four verified locations:

  • ConnectProxyServiceImplTest:109 — mock stub
  • ConnectProxyServiceImplTest:202 — mock stub
  • ConnectProxyServiceImplTest:208 — verify assertion
  • ConnectProxyServiceImplConcurrencyTest:104 — pre-seed map key

A future prefix change silently causes mock stubs to return null, surfacing as spurious TICKET_NOT_FOUND failures. SPAN_NAME and COUNTER_NAME are already public for the same reason; TICKET_KEY_PREFIX warrants identical treatment.

Proposed fix: change static final to public static final (one word); update tests to reference ConnectProxyServiceImpl.TICKET_KEY_PREFIX.

[MINOR] userIdHashEnabled kill-switch not honored in the success info log

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:407

Lines 407–409:

log.info("Centrifugo validate success; client_id={} user_id_hash={}",
        request.getClientId(),
        OtelErrorEvents.hashUserId(context.userId()));

annotateSpan at line 276 correctly gates the user_id_hash span attribute behind userIdHashEnabled. The success log at 407–409 calls hashUserId unconditionally. When an operator sets OTEL_ATTRIBUTES_USER_ID_HASH_ENABLED=false (GDPR Art. 5(1)(c) kill-switch), the hash is suppressed from OTel spans but continues to appear in every successful validate log line — violating the operator's intent for log pipelines forwarded to external collectors (e.g. Loki → Grafana Cloud).

Proposed fix — mirror the flag check from annotateSpan:

if (userIdHashEnabled) {
    log.info("Centrifugo validate success; client_id={} user_id_hash={}",
            request.getClientId(), OtelErrorEvents.hashUserId(context.userId()));
} else {
    log.info("Centrifugo validate success; client_id={}", request.getClientId());
}

[MINOR] responseObserver.onError is skipped if OtelErrorEvents.recordSanitizedEvent throws inside catch(Throwable)

src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:235

The catch arm (lines 234–248) executes sequentially: OtelErrorEvents.recordSanitizedEvent(...) at 235 → span.end() at 243 → responseObserver.onError(...) at 244. No inner try/finally wraps this sequence. If recordSanitizedEvent (or span.end()) raises, responseObserver.onError is never called — gRPC contract violated. Centrifugo holds the stream open until its proxy timeout fires, logging a spurious timeout and potentially blocking a reconnect slot. The outer finally { span.end(); } at line 251 still runs (OTel preserved) but the observer is never signalled.

Proposed fix — guard responseObserver.onError with an inner finally:

} catch (Throwable t) {
    try {
        OtelErrorEvents.recordSanitizedEvent(
                span, t, "VALIDATE_UNHANDLED", true, COMPONENT, "grpc",
                "Unhandled exception in doValidate/annotateSpan");
        span.end();
    } finally {
        responseObserver.onError(io.grpc.Status.INTERNAL
                .withDescription("internal_error")
                .withCause(t)
                .asRuntimeException());
    }
    return;
}

span.end() in the outer finally remains as the idempotent safety net.

Verdict

CONDITIONAL_APPROVE


hib-pr-reviewer • round 3 • 4 findings (4m) • 2026-05-24T03:09:00.239Z → 2026-05-24T03:11:26.581Z • posted-as: pr-reviewer-bot

<!-- im2be-pr-reviewer collapsed --> > _Superseded by round 4._ <details> <summary>Show previous round</summary> ## hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service) **Round 3** — head `95e1211bc686`, base `main`, trigger `synchronize` **TL;DR:** CONDITIONAL_APPROVE — 0 agreed findings, all 4 unique-to-one findings verified and kept; no blocking issues; all prior round blockers confirmed resolved. ### Summary ## Round 3 Arbitration — all 4 findings are unique-to-one; all 4 verified against HEAD **Reconciliation actions:** - No findings appear in both A and B (zero agreed pairs). - Verified all 4 unique findings via `Read` and `Grep` at the checkout path. - **A-1 kept**: `TicketChecksumVerifier.java:113` — `computeChecksumBigEndian` returns `crc.getValue()` (a plain `long`); no endian-specific byte serialisation occurs in the method. The `BigEndian` suffix is genuinely misleading. Verified. - **A-2 kept**: `ConnectProxyServiceImpl.java:96` — `static final String TICKET_KEY_PREFIX` has no access modifier (package-private). Grep confirms 4 occurrences of the hardcoded literal in tests (`ConnectProxyServiceImplTest:109`, `:202`, `:208`; `ConnectProxyServiceImplConcurrencyTest:104`). Verified. - **B-1 kept**: `ConnectProxyServiceImpl.java:407-409` — `log.info("… user_id_hash={}", …, OtelErrorEvents.hashUserId(context.userId()))` is unconditional. `annotateSpan` at line 276 correctly gates behind `userIdHashEnabled`; the log does not. GDPR kill-switch bypass in log path confirmed. Verified. - **B-2 kept**: `ConnectProxyServiceImpl.java:234-248` — the catch arm executes `OtelErrorEvents.recordSanitizedEvent` (line 235), `span.end()` (line 243), then `responseObserver.onError` (line 244) sequentially with no inner try/finally. A throw from lines 235 or 243 leaves `responseObserver` unsignalled. Verified. - All prior round blockers confirmed resolved by both reviewers; not re-raised. - Memora run summary persisted (memory ID 217; new entry, review_count=3). ### Blast Radius The diff adds 497 lines to ConnectProxyServiceImpl (the live gRPC surface Centrifugo calls on every WebSocket connection attempt), introduces two new domain classes (TicketChecksumVerifier, TicketUserContext), and lands four new test files. The gRPC handler is an exported service boundary touched by network I/O, Redis, OTel, and Jackson; a defect there affects every Centrifugo connection attempt. TicketChecksumVerifier is shared with the upcoming PR-OPAQUE-2 mint path, making its naming contract cross-PR critical. **BLAST_SCORE: 6/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `doValidate`, `hashUserId`, `computeChecksumBigEndian`, `redisTemplate.opsForValue().getAndDelete`, `annotateSpan` | | Migration touched | — | | Test delta | — | | Dependency changes | — | ### CI status (head `95e1211bc686`) _No CI checks reported for this commit._ ### Related PRs - PR-OPAQUE-1 (scaffold stub, superseded) - PR-OPAQUE-2 (mint path — shares TicketChecksumVerifier) - PR-OPAQUE-7 (fallback-pool / degraded path) ### Findings (4) #### **[MINOR]** computeChecksumBigEndian name implies byte serialisation but returns a raw long _src/main/java/com/aim2be/identity/ticket/TicketChecksumVerifier.java:113_ Line 113: `public static long computeChecksumBigEndian(byte[] input)` calls `crc.getValue()` and returns a plain Java `long` — no endian-specific byte serialisation is performed inside the method. The big-endian byte layout is handled externally by callers. A mint developer reading the name without inspecting the return type may reasonably skip the explicit byte-order extraction step, producing a little-endian or native-order CRC that never matches the verifier. **Proposed fix:** rename to `computeChecksum` and update the Javadoc: "returns the numeric CRC32 value in [0, 2^32); callers must serialise to big-endian bytes themselves." Alternatively add a `writeCrcBigEndian(byte[] dest, int offset, long crc)` companion so mint and verify both delegate to the same place. #### **[MINOR]** TICKET_KEY_PREFIX is package-private; tests duplicate the literal in 4 places _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:96_ Line 96: `static final String TICKET_KEY_PREFIX = "centrifugo:ticket:";` — no `public` modifier. Tests in `com.aim2be.identity.unit.grpc` (a different package) cannot reference the constant and hardcode the literal in four verified locations: - `ConnectProxyServiceImplTest:109` — mock stub - `ConnectProxyServiceImplTest:202` — mock stub - `ConnectProxyServiceImplTest:208` — verify assertion - `ConnectProxyServiceImplConcurrencyTest:104` — pre-seed map key A future prefix change silently causes mock stubs to return null, surfacing as spurious TICKET_NOT_FOUND failures. `SPAN_NAME` and `COUNTER_NAME` are already `public` for the same reason; `TICKET_KEY_PREFIX` warrants identical treatment. **Proposed fix:** change `static final` to `public static final` (one word); update tests to reference `ConnectProxyServiceImpl.TICKET_KEY_PREFIX`. #### **[MINOR]** `userIdHashEnabled` kill-switch not honored in the success info log _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:407_ Lines 407–409: ```java log.info("Centrifugo validate success; client_id={} user_id_hash={}", request.getClientId(), OtelErrorEvents.hashUserId(context.userId())); ``` `annotateSpan` at line 276 correctly gates the `user_id_hash` span attribute behind `userIdHashEnabled`. The success log at 407–409 calls `hashUserId` unconditionally. When an operator sets `OTEL_ATTRIBUTES_USER_ID_HASH_ENABLED=false` (GDPR Art. 5(1)(c) kill-switch), the hash is suppressed from OTel spans but continues to appear in every successful validate log line — violating the operator's intent for log pipelines forwarded to external collectors (e.g. Loki → Grafana Cloud). **Proposed fix** — mirror the flag check from `annotateSpan`: ```java if (userIdHashEnabled) { log.info("Centrifugo validate success; client_id={} user_id_hash={}", request.getClientId(), OtelErrorEvents.hashUserId(context.userId())); } else { log.info("Centrifugo validate success; client_id={}", request.getClientId()); } ``` #### **[MINOR]** `responseObserver.onError` is skipped if `OtelErrorEvents.recordSanitizedEvent` throws inside `catch(Throwable)` _src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:235_ The catch arm (lines 234–248) executes sequentially: `OtelErrorEvents.recordSanitizedEvent(...)` at 235 → `span.end()` at 243 → `responseObserver.onError(...)` at 244. No inner try/finally wraps this sequence. If `recordSanitizedEvent` (or `span.end()`) raises, `responseObserver.onError` is never called — gRPC contract violated. Centrifugo holds the stream open until its proxy timeout fires, logging a spurious timeout and potentially blocking a reconnect slot. The outer `finally { span.end(); }` at line 251 still runs (OTel preserved) but the observer is never signalled. **Proposed fix** — guard `responseObserver.onError` with an inner finally: ```java } catch (Throwable t) { try { OtelErrorEvents.recordSanitizedEvent( span, t, "VALIDATE_UNHANDLED", true, COMPONENT, "grpc", "Unhandled exception in doValidate/annotateSpan"); span.end(); } finally { responseObserver.onError(io.grpc.Status.INTERNAL .withDescription("internal_error") .withCause(t) .asRuntimeException()); } return; } ``` `span.end()` in the outer `finally` remains as the idempotent safety net. ### Verdict **CONDITIONAL_APPROVE** --- <sub>hib-pr-reviewer • round 3 • 4 findings (4m) • 2026-05-24T03:09:00.239Z → 2026-05-24T03:11:26.581Z • posted-as: pr-reviewer-bot</sub> </details>
R3 verdict findings (kept=4, all 2 prior R2 closed → CONDITIONAL_APPROVE):

(1) MINOR ticket/TicketChecksumVerifier.java:113 — name
    computeChecksumBigEndian implies byte serialisation but returns
    a raw long; risks mint-side regression where a developer skips
    the explicit BE shift step and silently produces little-endian
    or native-order bytes that never match the verifier. Fix:
    renamed to computeChecksum + expanded Javadoc clarifying the
    method returns the numeric CRC32 value and that callers must
    serialise to big-endian bytes themselves (verify side uses
    readUnsignedIntBigEndian; mint side will use four shifts).
    Updated all callers: TicketChecksumVerifierTest,
    ConnectProxyServiceImplTest, ConnectProxyServiceImplOtelTest,
    ConnectProxyServiceImplConcurrencyTest, and the class-level
    Javadoc cross-reference at line 33.

(2) MINOR grpc/ConnectProxyServiceImpl.java:96 — TICKET_KEY_PREFIX
    was package-private; tests in com.aim2be.identity.unit.grpc
    (different package) couldn't reference it and hardcoded the
    literal "centrifugo🎫" in 4 locations (mock stubs +
    verify assertion + concurrency pre-seed). A future prefix
    change silently caused spurious TICKET_NOT_FOUND failures.
    SPAN_NAME + COUNTER_NAME already public for the same reason.
    Fix: TICKET_KEY_PREFIX → public static final; test references
    now use ConnectProxyServiceImpl.TICKET_KEY_PREFIX.

(3) MINOR grpc/ConnectProxyServiceImpl.java:428 (was 407) — the
    success info log called OtelErrorEvents.hashUserId(context.userId())
    unconditionally, ignoring the OTEL_ATTRIBUTES_USER_ID_HASH_ENABLED
    kill-switch (GDPR Art. 5(1)(c) data-minimisation). annotateSpan()
    correctly gates the user_id_hash SPAN attribute but the log line
    leaked the hash into external log collectors (Loki → Grafana
    Cloud) when the operator explicitly disabled hash propagation.
    Fix: gate the log line behind userIdHashEnabled, with a
    no-hash variant when disabled.

(4) MINOR grpc/ConnectProxyServiceImpl.java:235 (was) — the catch
    (Throwable) block I added in R2 ran recordSanitizedEvent →
    span.end → responseObserver.onError SEQUENTIALLY. If
    recordSanitizedEvent (event-buffer overflow under stress) or
    span.end (future OTel SDK regression — currently idempotent
    no-op) raised, responseObserver.onError would never fire and
    Centrifugo would hold the stream open until its own deadline.
    Fix: nested try/finally so even if either of the earlier calls
    raises, onError ALWAYS fires (gRPC contract: exactly one of
    {onNext+onCompleted | onError} per call).

INFO/deferred (NOT addressed this round, per reviewer's tech-debt
tag): DESERIALIZE_ERROR label conflation + ticket-mint duplication
risk in PR-OPAQUE-2.

Verification:
- mvn -DskipTests=false clean install → BUILD SUCCESS
- 32/32 tests pass (test signatures unchanged; literal-to-constant
  refactor is symbol-only)
- Zero warnings on touched modules

hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service)

Round 4 — head 05dc31cc0564, base main, trigger synchronize

TL;DR: NO_NEW_FINDINGS — No new findings this round.

Summary

[quorum-converged] A=0 = B=0. # hib-pr-reviewer/quorum/A — round 4

Reviewing PR #3 in affinity-intelligence-rework/im2be-identity-service
(head 05dc31c, base main).

CI status (head 05dc31cc0564)

No CI checks reported for this commit.

Findings

No new findings this round.

Quorum converged on empty findings (A + B both returned 0).

Verdict

NO_NEW_FINDINGS


hib-pr-reviewer • round 4 • 0 findings • 2026-05-24T03:16:46.652Z → 2026-05-24T03:25:32.267Z • posted-as: pr-reviewer-bot • [bookkeeping fallback]

## hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service) **Round 4** — head `05dc31cc0564`, base `main`, trigger `synchronize` **TL;DR:** NO_NEW_FINDINGS — No new findings this round. ### Summary [quorum-converged] A=0 = B=0. # hib-pr-reviewer/quorum/A — round 4 Reviewing PR **#3** in `affinity-intelligence-rework/im2be-identity-service` (head `05dc31c`, base `main`). ### CI status (head `05dc31cc0564`) _No CI checks reported for this commit._ ### Findings **No new findings this round.** _Quorum converged on empty findings (A + B both returned 0)._ ### Verdict **NO_NEW_FINDINGS** --- <sub>hib-pr-reviewer • round 4 • 0 findings • 2026-05-24T03:16:46.652Z → 2026-05-24T03:25:32.267Z • posted-as: pr-reviewer-bot • [bookkeeping fallback]</sub>
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
affinity-intelligence-rework/im2be-identity-service!3
No description provided.