feat(grpc): L0 T0 #6 PR-OPAQUE-3 — ConnectProxyService.Validate (Redis GETDEL atomic + CRC32 pre-check) #3
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/l0-t0-opaque-3-validate-rpc"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Header
Repo: affinity-intelligence-rework/im2be-identity-service
PR: TBD (this PR)
Round: R0 — initial submission
Branch: feat/l0-t0-opaque-3-validate-rpc (off main, includes PR-OPAQUE-1 #1 merged state — does NOT depend on PR-OPAQUE-2 #2)
Worktree: identity-service-validate-rpc/ (sibling worktree)
TL;DR
APPROVE-CANDIDATE: Replaces the
ConnectProxyService.ValidateUNIMPLEMENTED stub with the full ADR-0002 §3 validate path — CRC32 pre-check, Redis GETDEL atomic single-use, Jackson-deserialise user context, project toConnectValidateResponse.Success. 30/30 unit tests pass, zero warnings, BUILD SUCCESS.Summary
Scope of this round = the entire PR (R0). Covers the four-commit split:
feat(ticket): TicketChecksumVerifier — CRC32 + base64url helper(dd167ba)feat(grpc): wire ConnectProxyService.Validate — Redis GETDEL atomic(3d68318)feat(grpc): OTel span + counter + error events for validate flow(ddf94ee)test(grpc): concurrent-validate smoke (GETDEL single-use enforcement)(b9dcab6)Scopes verified vs deferred:
degraded=true(PR-OPAQUE-7).degradedhardcoded to false in this PR.Last reviewed SHA:
b9dcab6(HEAD of branch at PR open time).Findings
No new findings this round — this is R0 / initial submission. Verifications carried out before opening:
HttpSecurityfilter chain. Centrifugo presents ticket as request field, NOT JWT Bearer header. mTLS peer auth enforced one layer up by Envoy sidecar (PR-OPAQUE-3-follow-up: SPIFFE workload-API + ServerInterceptor).freshlyRefreshedflag) confirmed present at the cited lines. No regression. PR-OPAQUE-3 does not touch the JWT/JWKS path.TicketChecksumVerifier.java(NOT colliding with PR-OPAQUE-2's likely class name).computeChecksumBigEndian()exposed as public static so PR-OPAQUE-2 can reuse identical byte-order semantics; a follow-up cleanup PR can de-dup if both PRs land with parallel implementations.unit/grpcpackage + dashboard pinning. Production callers go through the gRPC observer entry point.Risk Indicators
ConnectProxyService.Validate— the entire ticket-validation gate; covered by 21 new tests (8 unit + 4 OTel + 1 concurrent + 4 context + new context-load coverage).Blast Radius
Score: 3/10 — scope is localised to
grpc/ConnectProxyServiceImpl.java+ newticket/package. Cross-repo impact:Verdict
APPROVE-CANDIDATE on:
docker maven:3.8.4-openjdk-17 install→ BUILD SUCCESS, no [WARNING] lines).ValueOperations.getAndDeleteGETDEL mapping).Conditions / carried-forward items:
degraded=falsehardcoded for now; PR-OPAQUE-7 wires the fallback-pool path.TicketChecksumVerifiermay overlap with PR-OPAQUE-2's mint-side checksum helper once that PR lands. De-dup follow-up PR if both ship.TicketUserContext(see ADR-0002 §3.3 wire shape). Verified at design-time; integration-test will lock the cross-PR contract at qa-rig E2E (PR-OPAQUE-6).Footer
Project: im2be-identity-service • Mode: Stage B / L0 T0 #6 • Round: R0 / initial submission • Tally: 0 resolved / 0 new / 0 carried-forward • Timestamp: 2026-05-24
Authored by Claude on behalf of @hibryda per Stage D PR-OPAQUE-3 directive.
PR-OPAQUE-3 §3.2: pre-Redis integrity check for opaque connection tickets. Ticket wire format = base64url(24-byte random || 4-byte big-endian CRC32(random)); total 28 raw bytes → 38 unpadded base64url characters. Static utility class — every method side-effect-free, no Spring bean wiring. verifyChecksum(String) returns false on null/empty input, malformed base64url decode, wrong payload length, or CRC mismatch (does NOT throw — callers map a single boolean to a gRPC status without an extra try/catch layer). computeChecksumBigEndian(byte[]) is exposed as well so PR-OPAQUE-2's mint path can reuse the exact same helper, eliminating the risk of mint + validate disagreeing on byte order. 12 unit tests cover the contract: - happy path: freshly-built ticket via the helper round-trips through the verifier (single + 1 000-iteration variants) - padded base64url accepted as well as unpadded - null / empty input → false - malformed base64url (illegal chars) → false - payload too short / too long → false - random 0xDEADBEEF checksum → false - single bit-flip in entropy → false - single bit-flip in checksum → false - ENCODED_LENGTH constant matches actual unpadded encoding Verification: - docker maven:3.8.4-openjdk-17 -P unit-tests test → 12/12 pass + IdentityServiceApplicationTest 1/1 pass (no regressions)Replaces the PR-OPAQUE-1 UNIMPLEMENTED stub with the full ADR-0002 §3 validate flow: 1. CRC32 pre-check via TicketChecksumVerifier — invalid tickets are rejected with a Disconnect WITHOUT touching Redis (spares a round-trip on the connect-flood path where Centrifugo would otherwise forward every junk ticket to identity-service). 2. Redis GETDEL atomic on key centrifugo🎫<base64url-ticket> via StringRedisTemplate.opsForValue().getAndDelete(). A single Redis command both reads the stored user-context JSON AND removes the row — this is the single-use enforcement gate. A second connect attempt with the same ticket sees nil and is rejected. 3. Jackson-deserialise the JSON blob into TicketUserContext (new record matching the ADR-0002 §3.3 wire shape: user_id, family_id, subscription_tier, parental_state, is_admin, expires_at). 4. Project into ConnectValidateResponse.Success — Centrifugo receives the user context and proceeds with the WS upgrade. Failure routing follows Centrifugo's status-code-transforms config: - INVALID_CRC | TICKET_NOT_FOUND | DESERIALIZE_ERROR → Disconnect (4401 UNAUTHENTICATED — terminal, do not retry). - REDIS_FAILURE → Error (transient code 14 / UNAVAILABLE; client SDK retries per the temporary=true envelope). degraded=false is hardcoded for now; PR-OPAQUE-7 wires the static fallback ticket pool (ADR-0002 §3.4) that surfaces degraded=true. Out-of-scope (later PRs): - Mint path (PR-OPAQUE-2 — parallel sub-agent). - JWT-reuse (PR-OPAQUE-4). - realtime-service client (PR-OPAQUE-5). - qa-rig E2E (PR-OPAQUE-6). - Fallback pool consumption (PR-OPAQUE-7). Ripple inspection (rule 63): - The gRPC server runs on port 50051 (net.devh starter) — it does NOT pass through Spring Security's HttpSecurity filter chain (which only binds to the Servlet stack on the HTTP port). Centrifugo presents the ticket as a request field, NOT as a JWT Bearer header — and that's exactly what the design wants. mTLS peer auth is enforced one layer up by the Envoy sidecar; a future PR will wire the SPIFFE workload-API + ServerInterceptor for caller-identity assertion. - JwksClient.getPublicKeyByKid `freshlyRefreshed` guard from PR #1 R3 is intact (verified at config/JwksClient.java:147-160). PR-OPAQUE-3 doesn't touch the JWT/JWKS path. - PR-OPAQUE-2's TicketGenerator (parallel sub-agent) will reuse TicketChecksumVerifier.computeChecksumBigEndian() so mint + validate cannot disagree on byte order. No coupling on the Redis-key prefix beyond the documented centrifugo🎫 convention (constant exported as TICKET_KEY_PREFIX). Tests: - 4× TicketUserContextTest: canonical JSON round-trip; null family_id; admin flag preserved; unknown future fields tolerated. - 8× ConnectProxyServiceImplTest: happy path projects all fields; invalid CRC short-circuits Redis; empty ticket short-circuits Redis; ticket-not-found → Disconnect; Redis QueryTimeoutException → transient Error; corrupt stored JSON → Disconnect; canonical key prefix usage; unknown subscription tier maps to UNSPECIFIED. Verification: - docker maven:3.8.4-openjdk-17 -P unit-tests test → 25/25 pass (12 verifier + 4 context + 8 grpc + 1 context-load). - Zero [WARNING] lines in the full mvn output. Visibility note: doValidate(), ValidateOutcome, Outcome are exposed as public so the unit test in the sibling com.aim2be.identity.unit.grpc package can drive the logic without spinning up an in-process gRPC channel. Production callers go through the gRPC observer entry point.Adds W2 OTel instrumentation to the Validate path (matches the calendar / diary / family / notification-service convention — rule 10): Span: - name: identity.validate_centrifugo_ticket (matches the documented contract in OtelTracers). - kind: SERVER. - attributes always set: * ticket.lookup_result ∈ {success | invalid-crc | ticket-not-found | deserialize-error | redis-failure} * degraded (false; PR-OPAQUE-7 flips this for fallback-pool path) - attribute set ONLY on SUCCESS AND when GDPR kill-switch is enabled: * user_id_hash (16-hex-char SHA-256 prefix; OtelErrorEvents.hashUserId) Kill-switch: otel.attributes.user-id-hash-enabled property (defaults to true, mirrors the same toggle already in application.properties). Counter: - name: centrifugo_tickets_validated_total (Prometheus-canonical snake_case). - unit: 1. - label: outcome (kebab-case Outcome.label() — stable for dashboards; a change is a breaking-change requiring dashboard updates). - lazy-init via volatile + double-checked-locking, matching OtelErrorEvents' instance-cache pattern. Without this the cached meter would pin the no-op meter present at class-load time and every increment would silently drop. Error-event recording (PR-OPAQUE-3 entrypoints): - REDIS_FAILURE → recordSanitizedEvent (TICKET_LOOKUP_FAILED, recoverable=true, downstream=redis). - DESERIALIZE_ERROR → recordSanitizedEvent (TICKET_CONTEXT_DESERIALIZE_FAILED, recoverable=false, downstream=jackson). Sanitized variants are used because the cause chain may carry raw ticket material or stored-JSON fragments (W3 chatbot pattern, Memora #3144); recordSanitizedEvent skips Span.recordException to keep the exception.stacktrace attribute off OTLP exports. Visibility: - SPAN_NAME, COUNTER_NAME exposed as public constants — pinned so dashboards + tests can reference the exact strings; changing the value would be a dashboard-breaking change. Tests added: - 4× ConnectProxyServiceImplOtelTest (in-process OTel SDK via opentelemetry-sdk-testing's InMemoryMetricReader + InMemorySpanExporter): * success path → span attributes (ticket.lookup_result=success, degraded=false, user_id_hash set) + counter outcome=success. * invalid-crc → counter outcome=invalid-crc. * ticket-not-found → counter outcome=ticket-not-found. * GDPR kill-switch off → user_id_hash absent; ticket.lookup_result still set. DELTA aggregation temporality on InMemoryMetricReader so each test sees only its own counter increments without prior-test residue. Constructor change: - ConnectProxyServiceImpl now takes a third @Value-resolved userIdHashEnabled boolean. Existing test in ConnectProxyServiceImplTest updated to pass true. Verification: - docker maven:3.8.4-openjdk-17 -P unit-tests test → 29/29 pass (12 verifier + 4 context + 8 grpc + 4 OTel + 1 context-load). - docker maven:3.8.4-openjdk-17 install → BUILD SUCCESS, zero [WARNING] lines.Strict-load smoke confirming the GETDEL atomic single-use guarantee under contended access — the entire load-bearing reason ConnectProxyService.Validate exists. Design: - Pre-seed 1 000 distinct, CRC-valid tickets into an in-memory ConcurrentHashMap-backed Redis surrogate (Mockito-bridged StringRedisTemplate → ConcurrentMap.remove, which provides the same atomic compare-and-remove contract as Redis GETDEL). - Launch 10 threads, each running through the full TOTAL_TICKETS pool — 10 000 validate attempts on a 1 000-ticket pool. Race conditions on every overlapping key are guaranteed; the test relies on the underlying GETDEL atomicity to resolve them. Strict assertions (a regression in either direction fails the test): - successes == 1 000 (exactly one win per ticket). - successfulTickets.size() == 1 000 (the success set is a strict subset of unique tickets — no duplicates). - notFound == 9 000 (every redundant call saw nil). - unexpected == 0 (no DESERIALIZE_ERROR / REDIS_FAILURE / INVALID_CRC, no double-success on any ticket). - store.isEmpty() (every ticket consumed; no orphans). Why not Testcontainers / a real Redis cluster: the standard test runner image doesn't always grant Docker socket access; spinning a Redis container per CI run adds 5-10s of wall-clock. The Mockito- bridged ConcurrentHashMap honours the GETDEL contract precisely (ConcurrentMap.remove is itself atomic compare-and-remove). A future Stage C / qa-rig E2E will exercise the same flow against a real Redis cluster. Regression coverage: a future refactor that broke the atomicity — say, replacing opsForValue().getAndDelete(key) with a non-atomic GET-then-DEL — would cause 11+ successes for the same ticket and immediately fail this test. Verification: - docker maven:3.8.4-openjdk-17 -P unit-tests test -Dtest='ConnectProxyServiceImplConcurrencyTest' → 1/1 pass in 1.277s - docker maven:3.8.4-openjdk-17 install → 30/30 tests pass, BUILD SUCCESS, zero [WARNING] lines (12 verifier + 4 context + 8 grpc + 4 OTel + 1 concurrency + 1 context-load).Show previous round
hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service)
Round 1 — head
b9dcab662be2, basemain, triggeropenedTL;DR: NEEDS_WORK — 5 findings kept (A=2, B=3), all verified via Read; 3 major correctness/PII/maintenance issues require fixes before merge.
Summary
Arbitration — PR #3 ConnectProxyService.Validate (Redis GETDEL + CRC32)
First run; no prior arbitration history for this PR in Memora. Verified all 5 findings (2 unique-to-A, 3 unique-to-B) via
Readon the checked-out HEAD. Kept all 5: three major, one minor, one info.Verification notes per finding:
A1 (PII log, line 338): Confirmed at line 338:
ex.getOriginalMessage()logged; line 310 (Redis path) correctly usesex.getClass().getSimpleName(). Reviewer B dismisses this claiming Jackson messages don't echo raw JSON content — that claim is incorrect forMismatchedInputException(type-coercion failures), where Jackson'sgetOriginalMessage()produces strings likeCannot deserialize value of type 'java.time.Instant' from String "<raw-value>". The GDPR inconsistency with the Redis-failure path and the OTel path (which already usesrecordSanitizedEvent) confirms the finding. Kept as major.A2 (null userId NPE, line 344): Confirmed three ways: (a)
OtelErrorEvents.hashUserId()callsObjects.requireNonNull(userId, ...)atOtelErrorEvents.java:402; (b)TicketUserContext.java:56showsuserIdis a plainStringwith no@NotNullor Jacksonrequired=true— JSON{"user_id": null}or omission yieldsnull; (c) line 297 (GETDEL) atomically consumes the ticket before the NPE at line 344, making the ticket irrecoverable. The null-guard asymmetry at lines 356/359 forfamilyId/expiresAtcorroborates the omission. Kept as major.B1 (contradictory comment, lines 324-328): Confirmed: lines 324-328 read
"Surface as a transient error so Centrifugo retries"while line 339 returnsnew ValidateOutcome(disconnect(), Outcome.DESERIALIZE_ERROR). Also confirmed:recordSanitizedEventat line 333 passes/* recoverable */ false, consistent with terminal behaviour. Copy-paste from the Redis-failure path (line 299-300) wheretransientError()is actually returned. Dangerous maintenance hazard. Kept as major.B2 (observer completion unguarded, line 222): Confirmed: lines 220-225 show
span.end()insidefinally, butrecordCounter(outcome.outcome()),responseObserver.onNext(...), andresponseObserver.onCompleted()are all outside any try/finally. A RuntimeException from the double-checked-locking lazy-init path inrecordCounter(e.g. propagated fromGlobalOpenTelemetry.getMeter()) would leave the gRPC call permanently unanswered. Kept as minor.B3 (doValidate public, line 286): Confirmed at line 286; Javadoc at lines 274-281 explicitly acknowledges the visibility is test-driven. Low urgency; kept as info.
Blast Radius
The diff introduces the authentication gate (
doValidate) for every Centrifugo WebSocket connection in the system — a null-userId NPE there causes a silent ticket loss and a raw INTERNAL gRPC error on every affected connection attempt. The OTel instrumentation added spans a counter, a span, and a GDPR kill-switch used across the observe plane. The 8 touched files span the gRPC layer, two new domain types, and five test suites.BLAST_SCORE: 7/10
Risk Indicators
doValidate,hashUserId,recordCounter,getAndDeleteCI status (head
b9dcab662be2)No CI checks reported for this commit.
Findings (5)
[MAJOR] DESERIALIZE_ERROR log uses
ex.getOriginalMessage()— may surface PII field valuessrc/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:338
Line 338:
JsonProcessingException.getOriginalMessage()returns Jackson's raw parser message. ForMismatchedInputException(type-coercion failures, e.g. a mis-typedexpires_atfield), the message is of the formCannot deserialize value of type 'java.time.Instant' from String "<raw-value>"— the offending string from the stored JSON is included verbatim and can carry PII.The Redis-failure path at line 310 correctly uses
ex.getClass().getSimpleName(). The OTel path (lines 329-336) already passes a sanitized message viarecordSanitizedEvent, so the OTel signal is clean; only thelog.erroris inconsistent.Fix: Replace
ex.getOriginalMessage()withex.getClass().getSimpleName()at line 338:[MAJOR] Null
userIdfrom stored JSON causes uncaught NPE after ticket is irrevocably consumedsrc/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:344
If the Redis-stored JSON carries
"user_id": nullor omits the field,context.userId()isnull.TicketUserContext.javadeclaresuserIdas a plainStringwith no@NotNullor Jacksonrequired=trueconstraint — Jackson silently sets it tonull. The first crash is at line 344:OtelErrorEvents.hashUserId()opens withObjects.requireNonNull(userId, "userId must not be null")(confirmed atOtelErrorEvents.java:402). Even if that were guarded, line 351 would NPE:The GETDEL at line 297 atomically consumed and deleted the ticket before reaching line 344, so the ticket is irrecoverably lost. The gRPC framework surfaces a raw INTERNAL error to Centrifugo instead of a controlled
Disconnect.The null-guard pattern is already applied for
familyId()(line 356) andexpiresAt()(line 359);userIdis the only required field missing the same treatment. No test exercises this path (corruptStoredJsonReturnsDisconnectuses invalid JSON syntax, not a null field value).Fix: Add a null/blank guard immediately after deserialization (before line 342) and return a controlled
DESERIALIZE_ERROR:Also add a companion test: store
{"user_id": null, ...valid fields...}in the mock, assertDESERIALIZE_ERROR+hasDisconnect().[MAJOR] Comment on DESERIALIZE_ERROR branch says 'Surface as a transient error' — code returns terminal
disconnect()src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:325
Lines 324-328 read:
But line 339 returns
new ValidateOutcome(disconnect(), Outcome.DESERIALIZE_ERROR)— a terminal Disconnect, not a transient Error. The code is correct (retry is pointless after GETDEL has consumed the key), but the comment directly contradicts it. Two further signals confirm the comment is a copy-paste artifact from the Redis-failure path at lines 299-300: (a)recordSanitizedEventat line 333 passes/* recoverable */ false, consistent with terminal behaviour; (b) the Redis-failure path at line 311 actually does returntransientError().A future reviewer seeing the mismatch is likely to 'fix' the return to
transientError(), breaking the single-use authentication guarantee and producing a confusingTICKET_NOT_FOUNDsequence on Centrifugo's retry.Fix: Update the comment to match the code:
[MINOR]
recordCounterand observer completion are unguarded outside the span try/finally — any exception leaves the gRPC call unansweredsrc/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:222
Lines 220-225:
The gRPC contract requires
responseObserverto always receive exactly one ofonErrororonCompleted. IfrecordCounterthrows (e.g. aRuntimeExceptionpropagated from thesynchronizedlazy-init block during a degraded OTel SDK startup),onNextandonCompletedare never called. Centrifugo's gRPC client holds the stream open until its own deadline fires.LongCounter.add()is documented non-throwing in practice, but the structural guarantee should not depend on that.Fix:
[INFO]
doValidateispublicto enable cross-package test access — tech debt for future PRsrc/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:286
Line 286 declares
doValidateaspublicbecause the unit tests live incom.aim2be.identity.unit.grpc, a different package. The Javadoc (lines 274-281) acknowledges this explicitly. Any code in any package can bypass the OTel instrumentation wiring invalidate()by callingdoValidate()directly.A future PR should either (a) annotate with
@VisibleForTesting(Guava) and add a Checkstyle/ArchUnit rule, or (b) move the unit tests intosrc/test/java/com/aim2be/identity/grpc/so package-private access suffices. Track as tech-debt before PR-OPAQUE-7 wires the full auth chain.Verdict
NEEDS_WORK
hib-pr-reviewer • round 1 • 5 findings (3M/1m/1i) • 2026-05-24T02:30:47.462Z → 2026-05-24T02:34:19.230Z • posted-as: pr-reviewer-bot
Show previous round
hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service)
Round 2 — head
765570b3ab3f, basemain, triggersynchronizeTL;DR: NEEDS_WORK — kept 2 agreed/verified minor findings (no test for null/blank userId guard; unchecked-exception gRPC observer gap) and 2 info/deferred findings (DESERIALIZE_ERROR label conflation; ticket-mint duplication); all R1 findings confirmed resolved.
Summary
Arbitration Summary — Round 2
Recalled Memora: no prior run history found (Memora persist unavailable due to tag allowlist constraint — treated as first recorded run).
Agreed finding kept (A1 ≡ B2): Both reviewers independently identified missing test coverage for the null/blank
userIdguard atConnectProxyServiceImpl.java:366-378. Verified: 8 test cases inConnectProxyServiceImplTest, none inject structurally-valid JSON withnull/blankuser_id. A cites the test file end (line 239); B cites the guard (line 366). Merged into one finding citing the test file.Unique-to-B finding B1 verified and kept: B claims unchecked exceptions from
doValidate(line 218) orannotateSpan(line 219) escape the firsttry (Scope)block, skipping the secondtry { recordCounter } finally { responseObserver }block entirely. Verified at lines 217-234: the twotryblocks are sequential — an exception from the first propagates past the second, leavingresponseObserveruncalled. Real correctness gap, kept as minor.Unique-to-A finding A2 verified and kept (deferred):
Outcome.DESERIALIZE_ERRORreturned at both line 355 (JsonProcessingException) and line 377 (null userId guard). Verified. Both share one counter bucket — operators cannot distinguish parse corruption from semantic mint bugs without drilling OTel spans. Kept as info/deferred.Unique-to-B finding B3 verified and kept (deferred): Ticket-mint algorithm duplicated across 4 test helpers. Grep confirmed
validTicket()inConnectProxyServiceImplTest:66andConnectProxyServiceImplOtelTest:119; a parameterizedvalidTicket(SecureRandom)inConnectProxyServiceImplConcurrencyTest:81;mintTicket(SecureRandom)inTicketChecksumVerifierTest:37. B's characterization as "verbatim" is slightly overstated (signatures vary), but the wire-format divergence risk is genuine. Kept as info/deferred.Final tally: A=2 kept, B=3 kept, total distinct kept=4 (2 minor, 2 info). Verdict: NEEDS_WORK.
Blast Radius
The PR adds 471 net lines to the core gRPC auth handler (
ConnectProxyServiceImpl) — the single chokepoint for every Centrifugo WebSocket connection attempt. It also introduces two new support classes (TicketChecksumVerifier,TicketUserContext) and five new test files. The service runs on a separate gRPC port with its own channel, but a defect here affects all real-time connections rather than a single API endpoint.BLAST_SCORE: 6/10
Risk Indicators
doValidate,hashUserId,getAndDelete,recordSanitizedEventCI status (head
765570b3ab3f)No CI checks reported for this commit.
Related PRs
Findings (4)
[MINOR] No test covers null/blank
userIdguard (agreed: A + B)src/test/java/com/aim2be/identity/unit/grpc/ConnectProxyServiceImplTest.java:239
The null/blank
userIdguard atConnectProxyServiceImpl.java:366-378— the fix for the previously-blocking R1-MAJOR-2 NPE — has no direct test coverage. Verified:ConnectProxyServiceImplTestends at line 239 with 8 test cases; thecorruptStoredJsonReturnsDisconnecttest (line 183) exercises a parse failure, not a parse success withnull/blank required field. A regression that removed the guard would silently restore an NPE-after-ticket-consumed path — the ticket is irrevocably consumed byGETDELbefore the check, so Centrifugo would see an uncontrolledINTERNALerror instead of a cleanDisconnect.Add two tests:
[MINOR] Unchecked exception from
doValidate/annotateSpanleavesresponseObserveruncalled (unique to B, verified)src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:218
Verified at lines 217-234. The two
tryblocks are sequential, not nested:If
doValidateorannotateSpanthrow any unchecked exception (e.g. OTel SDK race, unexpected NPE,IllegalArgumentExceptionfrom proto builder), the firstfinallyruns (span.end()) then the exception propagates past lines 229-234 entirely —responseObserver.onNextandresponseObserver.onCompletedare never invoked. Centrifugo's gRPC client holds the stream open until its own deadline fires, logging a spurious timeout error.doValidateitself catchesDataAccessExceptionandJsonProcessingException, but any other unchecked exception escapes. The class-comment at lines 223-228 correctly identifies the gRPC contract (responseObserver MUST receive exactly one of onError or onCompleted) but the guard does not protect the earlier operations.Fix — nest or catch around the full body:
[INFO]
DESERIALIZE_ERRORcounter label conflates JSON parse failures with missing-required-field failures (unique to A, verified, deferred)src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:377
Verified: line 355 returns
Outcome.DESERIALIZE_ERRORonJsonProcessingException; line 377 returns the same outcome for the null/blankuserIdguard — two distinct root causes sharing one counter bucket.Operators paging on a spike in
centrifugo_tickets_validated_total{outcome="deserialize-error"}cannot distinguish these without drilling into the OTel span events (TICKET_CONTEXT_DESERIALIZE_FAILEDvs.TICKET_CONTEXT_MISSING_USER_ID).Propose a new
Outcomeentry:And at line 377:
Safe to defer alongside PR-OPAQUE-7 metric wiring — no correctness regression in the current form.
[INFO] Ticket-mint algorithm duplicated across 4 test helpers — consolidate before PR-OPAQUE-2 ships (unique to B, verified, deferred)
src/test/java/com/aim2be/identity/unit/grpc/ConnectProxyServiceImplTest.java:66
Verified: the 24-byte-entropy + CRC32-big-endian → base64url algorithm appears in 4 test helpers:
ConnectProxyServiceImplTest:66validTicket()ConnectProxyServiceImplOtelTest:119validTicket()ConnectProxyServiceImplConcurrencyTest:81validTicket(SecureRandom)TicketChecksumVerifierTest:37mintTicket(SecureRandom)B's characterization as "verbatim" is slightly overstated (signatures and names vary), but the duplication risk is real: any change to the wire format (e.g. extending entropy from 24 to 32 bytes) requires 4 parallel edits — a recipe for test/production divergence. When PR-OPAQUE-2 ships
TicketMinter, extract a singleTicketTestHelperundersrc/test/java/com/aim2be/identity/util/and replace all four copies.Verdict
NEEDS_WORK
hib-pr-reviewer • round 2 • 4 findings (2m/2i) • 2026-05-24T02:43:06.291Z → 2026-05-24T02:47:31.603Z • posted-as: pr-reviewer-bot
R2 verdict findings (kept=2): (1) MINOR ConnectProxyServiceImpl.java:218 — unchecked exception from doValidate/annotateSpan left responseObserver uncalled. The two try blocks were sequential not nested; if doValidate or annotateSpan threw any unchecked exception (OTel SDK race, NPE from unexpected JSON shape, IAE from proto builder), the first finally ran span.end() then the exception propagated past lines 229-234 entirely — responseObserver.onNext + onCompleted never invoked, Centrifugos gRPC client holds the stream open until its own deadline fires + logs a spurious timeout. doValidate itself catches DataAccessException and JsonProcessingException, but any other unchecked exception escaped. Fix: wrapped doValidate+annotateSpan inner block with catch(Throwable) → recordSanitizedEvent + span.end() + responseObserver.onError(Status.INTERNAL) + return. Outer finallys span.end() is a no-op per OTel spec (end is idempotent). (2) MINOR ConnectProxyServiceImplTest — no direct test coverage for the null/blank userId guard at ConnectProxyServiceImpl.java:366-378 (the R1-MAJOR-2 NPE fix). A regression that removed the guard would silently restore an NPE-after-ticket-consumed path — GETDEL irrevocably consumes the ticket BEFORE the guard fires, so Centrifugo would see an uncontrolled INTERNAL error instead of a clean Disconnect. Fix: added two new tests: - nullUserIdInStoredJsonReturnsDisconnect — exercises the userId() == null branch - blankUserIdInStoredJsonReturnsDisconnect — exercises the isBlank() branch (whitespace user_id) Both assert outcome=DESERIALIZE_ERROR + Disconnect response. INFO/deferred (NOT addressed this round, per reviewer guidance): - DESERIALIZE_ERROR counter label conflation (corrupt JSON vs missing required field) — deferred to a follow-up dashboard-label split. - Ticket-mint duplication risk — materialises in PR-OPAQUE-2; cross-PR refactor deferred to a unification task once both PRs merge. Verification: - mvn -DskipTests=false clean install → BUILD SUCCESS - 32 tests pass (ConnectProxyServiceImplTest 10/10 — was 8, +2 new; ConnectProxyServiceImplOtelTest 4/4; ConnectProxyServiceImplConcurrency Test 1/1; TicketChecksumVerifierTest 12/12; TicketUserContextTest 4/4; IdentityServiceApplicationTest 1/1) - Zero warnings on touched modulesShow previous round
hib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service)
Round 3 — head
95e1211bc686, basemain, triggersynchronizeTL;DR: CONDITIONAL_APPROVE — 0 agreed findings, all 4 unique-to-one findings verified and kept; no blocking issues; all prior round blockers confirmed resolved.
Summary
Round 3 Arbitration — all 4 findings are unique-to-one; all 4 verified against HEAD
Reconciliation actions:
ReadandGrepat the checkout path.TicketChecksumVerifier.java:113—computeChecksumBigEndianreturnscrc.getValue()(a plainlong); no endian-specific byte serialisation occurs in the method. TheBigEndiansuffix is genuinely misleading. Verified.ConnectProxyServiceImpl.java:96—static final String TICKET_KEY_PREFIXhas no access modifier (package-private). Grep confirms 4 occurrences of the hardcoded literal in tests (ConnectProxyServiceImplTest:109,:202,:208;ConnectProxyServiceImplConcurrencyTest:104). Verified.ConnectProxyServiceImpl.java:407-409—log.info("… user_id_hash={}", …, OtelErrorEvents.hashUserId(context.userId()))is unconditional.annotateSpanat line 276 correctly gates behinduserIdHashEnabled; the log does not. GDPR kill-switch bypass in log path confirmed. Verified.ConnectProxyServiceImpl.java:234-248— the catch arm executesOtelErrorEvents.recordSanitizedEvent(line 235),span.end()(line 243), thenresponseObserver.onError(line 244) sequentially with no inner try/finally. A throw from lines 235 or 243 leavesresponseObserverunsignalled. Verified.Blast Radius
The diff adds 497 lines to ConnectProxyServiceImpl (the live gRPC surface Centrifugo calls on every WebSocket connection attempt), introduces two new domain classes (TicketChecksumVerifier, TicketUserContext), and lands four new test files. The gRPC handler is an exported service boundary touched by network I/O, Redis, OTel, and Jackson; a defect there affects every Centrifugo connection attempt. TicketChecksumVerifier is shared with the upcoming PR-OPAQUE-2 mint path, making its naming contract cross-PR critical.
BLAST_SCORE: 6/10
Risk Indicators
doValidate,hashUserId,computeChecksumBigEndian,redisTemplate.opsForValue().getAndDelete,annotateSpanCI status (head
95e1211bc686)No CI checks reported for this commit.
Related PRs
Findings (4)
[MINOR] computeChecksumBigEndian name implies byte serialisation but returns a raw long
src/main/java/com/aim2be/identity/ticket/TicketChecksumVerifier.java:113
Line 113:
public static long computeChecksumBigEndian(byte[] input)callscrc.getValue()and returns a plain Javalong— no endian-specific byte serialisation is performed inside the method. The big-endian byte layout is handled externally by callers. A mint developer reading the name without inspecting the return type may reasonably skip the explicit byte-order extraction step, producing a little-endian or native-order CRC that never matches the verifier.Proposed fix: rename to
computeChecksumand update the Javadoc: "returns the numeric CRC32 value in [0, 2^32); callers must serialise to big-endian bytes themselves." Alternatively add awriteCrcBigEndian(byte[] dest, int offset, long crc)companion so mint and verify both delegate to the same place.[MINOR] TICKET_KEY_PREFIX is package-private; tests duplicate the literal in 4 places
src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:96
Line 96:
static final String TICKET_KEY_PREFIX = "centrifugo:ticket:";— nopublicmodifier.Tests in
com.aim2be.identity.unit.grpc(a different package) cannot reference the constant and hardcode the literal in four verified locations:ConnectProxyServiceImplTest:109— mock stubConnectProxyServiceImplTest:202— mock stubConnectProxyServiceImplTest:208— verify assertionConnectProxyServiceImplConcurrencyTest:104— pre-seed map keyA future prefix change silently causes mock stubs to return null, surfacing as spurious TICKET_NOT_FOUND failures.
SPAN_NAMEandCOUNTER_NAMEare alreadypublicfor the same reason;TICKET_KEY_PREFIXwarrants identical treatment.Proposed fix: change
static finaltopublic static final(one word); update tests to referenceConnectProxyServiceImpl.TICKET_KEY_PREFIX.[MINOR]
userIdHashEnabledkill-switch not honored in the success info logsrc/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:407
Lines 407–409:
annotateSpanat line 276 correctly gates theuser_id_hashspan attribute behinduserIdHashEnabled. The success log at 407–409 callshashUserIdunconditionally. When an operator setsOTEL_ATTRIBUTES_USER_ID_HASH_ENABLED=false(GDPR Art. 5(1)(c) kill-switch), the hash is suppressed from OTel spans but continues to appear in every successful validate log line — violating the operator's intent for log pipelines forwarded to external collectors (e.g. Loki → Grafana Cloud).Proposed fix — mirror the flag check from
annotateSpan:[MINOR]
responseObserver.onErroris skipped ifOtelErrorEvents.recordSanitizedEventthrows insidecatch(Throwable)src/main/java/com/aim2be/identity/grpc/ConnectProxyServiceImpl.java:235
The catch arm (lines 234–248) executes sequentially:
OtelErrorEvents.recordSanitizedEvent(...)at 235 →span.end()at 243 →responseObserver.onError(...)at 244. No inner try/finally wraps this sequence. IfrecordSanitizedEvent(orspan.end()) raises,responseObserver.onErroris never called — gRPC contract violated. Centrifugo holds the stream open until its proxy timeout fires, logging a spurious timeout and potentially blocking a reconnect slot. The outerfinally { span.end(); }at line 251 still runs (OTel preserved) but the observer is never signalled.Proposed fix — guard
responseObserver.onErrorwith an inner finally:span.end()in the outerfinallyremains as the idempotent safety net.Verdict
CONDITIONAL_APPROVE
hib-pr-reviewer • round 3 • 4 findings (4m) • 2026-05-24T03:09:00.239Z → 2026-05-24T03:11:26.581Z • posted-as: pr-reviewer-bot
R3 verdict findings (kept=4, all 2 prior R2 closed → CONDITIONAL_APPROVE): (1) MINOR ticket/TicketChecksumVerifier.java:113 — name computeChecksumBigEndian implies byte serialisation but returns a raw long; risks mint-side regression where a developer skips the explicit BE shift step and silently produces little-endian or native-order bytes that never match the verifier. Fix: renamed to computeChecksum + expanded Javadoc clarifying the method returns the numeric CRC32 value and that callers must serialise to big-endian bytes themselves (verify side uses readUnsignedIntBigEndian; mint side will use four shifts). Updated all callers: TicketChecksumVerifierTest, ConnectProxyServiceImplTest, ConnectProxyServiceImplOtelTest, ConnectProxyServiceImplConcurrencyTest, and the class-level Javadoc cross-reference at line 33. (2) MINOR grpc/ConnectProxyServiceImpl.java:96 — TICKET_KEY_PREFIX was package-private; tests in com.aim2be.identity.unit.grpc (different package) couldn't reference it and hardcoded the literal "centrifugo🎫" in 4 locations (mock stubs + verify assertion + concurrency pre-seed). A future prefix change silently caused spurious TICKET_NOT_FOUND failures. SPAN_NAME + COUNTER_NAME already public for the same reason. Fix: TICKET_KEY_PREFIX → public static final; test references now use ConnectProxyServiceImpl.TICKET_KEY_PREFIX. (3) MINOR grpc/ConnectProxyServiceImpl.java:428 (was 407) — the success info log called OtelErrorEvents.hashUserId(context.userId()) unconditionally, ignoring the OTEL_ATTRIBUTES_USER_ID_HASH_ENABLED kill-switch (GDPR Art. 5(1)(c) data-minimisation). annotateSpan() correctly gates the user_id_hash SPAN attribute but the log line leaked the hash into external log collectors (Loki → Grafana Cloud) when the operator explicitly disabled hash propagation. Fix: gate the log line behind userIdHashEnabled, with a no-hash variant when disabled. (4) MINOR grpc/ConnectProxyServiceImpl.java:235 (was) — the catch (Throwable) block I added in R2 ran recordSanitizedEvent → span.end → responseObserver.onError SEQUENTIALLY. If recordSanitizedEvent (event-buffer overflow under stress) or span.end (future OTel SDK regression — currently idempotent no-op) raised, responseObserver.onError would never fire and Centrifugo would hold the stream open until its own deadline. Fix: nested try/finally so even if either of the earlier calls raises, onError ALWAYS fires (gRPC contract: exactly one of {onNext+onCompleted | onError} per call). INFO/deferred (NOT addressed this round, per reviewer's tech-debt tag): DESERIALIZE_ERROR label conflation + ticket-mint duplication risk in PR-OPAQUE-2. Verification: - mvn -DskipTests=false clean install → BUILD SUCCESS - 32/32 tests pass (test signatures unchanged; literal-to-constant refactor is symbol-only) - Zero warnings on touched moduleshib-pr-reviewer review — PR #3 (affinity-intelligence-rework/im2be-identity-service)
Round 4 — head
05dc31cc0564, basemain, triggersynchronizeTL;DR: NO_NEW_FINDINGS — No new findings this round.
Summary
[quorum-converged] A=0 = B=0. # hib-pr-reviewer/quorum/A — round 4
Reviewing PR #3 in
affinity-intelligence-rework/im2be-identity-service(head
05dc31c, basemain).CI status (head
05dc31cc0564)No CI checks reported for this commit.
Findings
No new findings this round.
Quorum converged on empty findings (A + B both returned 0).
Verdict
NO_NEW_FINDINGS
hib-pr-reviewer • round 4 • 0 findings • 2026-05-24T03:16:46.652Z → 2026-05-24T03:25:32.267Z • posted-as: pr-reviewer-bot • [bookkeeping fallback]