fix(observability): ADR-0015 follow-up — null-span + bridge test + recordEventOnly (#332) #7
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/identity-service-error-event-followup-332"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Header
identity-service · PR follow-up to the merged ADR-0015 error-event wiring · R1 reviewer findings ·
#332OTEL sweep · branchfeat/identity-service-error-event-followup-332offmain.TL;DR
CONDITIONAL_APPROVEcandidate — addresses all three R1 MINOR findings on the DORMANT durable-error-event code; full suite GREEN (152 tests), 0 warnings.recordEventOnlywired + tested (it is an error path).Summary
Scope: three R1 follow-up fixes on
observability/OtelErrorEvents.java(null-span trace context,resetForTestinglock scope,recordEventOnlydurable wiring) plus two new durable-stream tests and one new bridge-lifecycle test. The ADR-0015 wiring itself is already merged tomain; this PR only hardens the dormant (im2be.error-event.enabled=falseby default) path. Reviewed commit05b8a96. Theerror-event-publisherJPA exclusion inpom.xmlis load-bearing for the Redis-only service and was left untouched.Findings
OtelErrorEvents.java:buildErrorEventSpan.current(), risking cross-link of the durable event to an unrelated ambient spanSpan.current()fallback (consistent with the span-event path which no-ops whenspan == null). Javadoc updated.OtelErrorEvents.java:resetForTestingsynchronizedblock while the counter clear was insideerrorEventPublisher = nullunder the same lock as the counter clearOtelErrorEvents.java:recordEventOnlyrecordEventOnly decision: WIRED + TESTED
Determination: it IS an error-recording path.
recordEventOnlysetsspan.setStatus(StatusCode.ERROR, errorCode)and emits the same typederror.eventspan attributes (error.type,error.code,error.recoverable,component) as the otherrecord*families — it is the W3 plane-integrationrecord_event_onlymirror for typed errors that have noThrowablein hand (e.g. a validation rejection returning 400 without throwing).Per ADR-0015 D-4 ("one call site → durable event"), an error-recording path must also thread the durable emit, or the invariant holds for every
record*family except this one. Resolution mirrors the other families: the 6-arg form became a delegating overload defaulting toRECOVERY_ACTION_UNSPECIFIED(D-7: action decided at the catch site, never inferred), and a new 7-argRecoveryActionoverload callsemitDurableErrorEventwitherror_class= the caller-supplied type string (noThrowable→ no FQN available) andmessage= the stable error code (Rule 01 — no raw exception text).recordEventOnlycurrently has no production callers (dormant), but wiring it now keeps the D-4 invariant uniform so the first real caller inherits durable emission for free.Verdict
CONDITIONAL_APPROVE— all three R1 findings resolved;recordEventOnlywired (not excluded) because it is an error-recording path. No production behaviour change while the flag is off. Conditions carried forward: none.Footer
identity-service • R1-followup • 152 tests / 0 failures / 0 warnings • commit
05b8a96• 2026-05-31R1 verdict findings (kept=3): (1) MINOR observability/OtelErrorEvents.java:695 — null-span trace-context fallback could cross-link the durable ErrorEvent to an unrelated ambient span. buildErrorEvent() sourced trace/span ids from (span != null ? span : Span.current()), so a null-span catch site emitted a durable event stamped with whatever ambient span happened to be in scope. (2) MINOR observability/OtelErrorEvents.java:132 — resetForTesting() cleared errorEventPublisher OUTSIDE the synchronized block while the counter clear was inside it; the publisher clear belongs under the same lock as the counter clear (consistent publication of both test-reset writes). (3) MINOR observability/OtelErrorEvents.java:580 — recordEventOnly was an error-recording path (sets StatusCode.ERROR + emits the typed error.event attributes) but the ADR-0015 wiring PR left it UNwired, breaking D-4 ("one call site -> durable event") for the no-Throwable path. Fix: (1) buildErrorEvent() now sources trace/span context ONLY from the explicitly passed span (no Span.current() fallback) — consistent with the span-event path which already no-ops when span == null. Javadoc updated. (2) Moved errorEventPublisher = null INTO the synchronized block in resetForTesting(). (3) recordEventOnly: kept the 6-arg form as a delegating overload defaulting to RECOVERY_ACTION_UNSPECIFIED, added a 7-arg RecoveryAction overload (matching the record / recordPropagated / recordSanitizedEvent families) that threads emitDurableErrorEvent through with error_class = the supplied type string (no Throwable in hand) and message = the stable error code (Rule 01, no raw exception text). The JPA-exclusion on error-event-publisher left untouched (load-bearing for the Redis-only service). Tests: - OtelErrorEventsDurableStreamTest: +wiredPublisherNullSpanLeavesTraceAndSpanEmpty (wired publisher + null span -> captured ErrorEvent has empty traceId AND spanId), +recordEventOnlyWiredEmitsTypeAndCode (recordEventOnly durable emit: error_class=type string, message=errorCode, action + trace ids mapped). 10 tests. - NEW OtelErrorEventsBridgeTest in the bridge's own package (destroy() is package-private): in-tree installSdk() harness (replicated from ConnectProxyServiceImplOtelTest), mock ErrorEventPublisher, new bridge -> record() verify(pub).publish(any()) -> bridge.destroy() -> record() again -> verifyNoMoreInteractions(pub). 1 test. Verification: - export JAVA_HOME=temurin-17; mvn -B compile test-compile -> 0 [WARNING] - mvn -B test -> 152 tests, 0 failures, 0 errors (GREEN)Show previous round
hib-pr-reviewer review — PR #7 (affinity-intelligence-rework/im2be-identity-service)
Round 1 — head
05b8a9674e99, basemain, triggeropenedTL;DR: CONDITIONAL_APPROVE — kept 1 unique-to-A info finding (verified: dead
sdkstatic field in bridge test), Reviewer B found nothing; no blocking or major issues.Summary
Arbitration — Round 1
No prior run history in Memora (first arbitration for this PR). Memora persistence attempted but blocked by tag-allowlist restriction; summary embedded here instead.
Agreement baseline: Reviewer B found 0 issues; Reviewer A found 1 (info). The single unique-to-A finding required verification.
Verification performed: Read
OtelErrorEventsBridgeTest.javain full. Confirmed:private static OpenTelemetry sdk;is a static field.installSdk()(sdk = OpenTelemetrySdk.builder()…build(); GlobalOpenTelemetry.set(sdk);).sdkis never referenced again after that assignment — not inshutdownSdk(), not in any test method.shutdownSdk()(lines 59-62) callsGlobalOpenTelemetry.resetForTest()directly and never invokessdk.close()or((OpenTelemetrySdk) sdk).getSdkTracerProvider().close().Finding confirmed as stated. Kept at original
infoseverity.All three production changes verified clean by both reviewers independently:
resetForTesting()sync fix,buildErrorEventambient-span removal, andrecordEventOnly7-arg overload are all correct; no disagreement to resolve there.Outcome: Kept 1 verified unique-to-A info finding. 0 dropped. No blocking or major issues.
Blast Radius
The diff touches one static observability utility (
OtelErrorEvents.java) and two test files.OtelErrorEventsis a service-wide static helper but the changes are additive (new 7-arg overload) and corrective (sync and ambient-span fixes), not a protocol or API contract change. Blast radius is bounded to the observability layer.BLAST_SCORE: 3/10
Risk Indicators
buildErrorEvent,resetForTesting,recordEventOnly,emitDurableErrorEventCI status (head
05b8a9674e99)No CI checks reported for this commit.
Findings (1)
[INFO] Dead
sdkstatic field: assigned ininstallSdk()but never read; SDK never explicitly closedsrc/test/java/com/aim2be/identity/observability/OtelErrorEventsBridgeTest.java:41
Line 41 declares
private static OpenTelemetry sdk;. It is written once ininstallSdk()(line 52-55:sdk = OpenTelemetrySdk.builder()…build(); GlobalOpenTelemetry.set(sdk);) and never read again —GlobalOpenTelemetry.set(sdk)retains the live reference internally.shutdownSdk()(lines 59-62) callsGlobalOpenTelemetry.resetForTest()directly and never invokessdk.close(), so the field serves no purpose after the assignment and may attract a static-analysis warning.Concrete options:
sdkas a local variable insideinstallSdk().shutdownSdk()already works without it.((OpenTelemetrySdk) sdk).getSdkTracerProvider().close()inshutdownSdk()beforeGlobalOpenTelemetry.resetForTest(). For the current in-memory tracer with no exporters this is effectively a no-op, but it documents intent and prevents resource-leak warnings from tools like SpotBugs.Note: the same pattern exists in the pre-existing
OtelErrorEventsDurableStreamTest(out of scope for this diff); both classes would benefit from a follow-on tech-debt pass.Verdict
CONDITIONAL_APPROVE
hib-pr-reviewer • round 1 • 1 finding (1i) • 2026-05-31T00:10:26.582Z → 2026-05-31T00:11:37.644Z • posted-as: pr-reviewer-bot • model: auto
hib-pr-reviewer review — PR #7 (affinity-intelligence-rework/im2be-identity-service)
Round 2 — head
a9360c5df83f, basemain, triggersynchronizeTL;DR: CONDITIONAL_APPROVE — kept 1 unique-to-A minor finding (verified); B's 0-finding result does not contradict; no blocking issues.
Summary
Arbitration summary
Recalled Memora: no prior runs for this PR found (step 2 returned empty). Created new run-history memory (ID 440) after reconciliation.
Unique-to-A (1 finding) — verified with
Read. Line 49 ofOtelErrorEventsBridgeTest.javais confirmed as:tracerProvideris a method-local variable ininstallSdk().shutdownSdk()(lines 56–60) only callsGlobalOpenTelemetry.resetForTest()andOtelErrorEvents.resetForTesting()— no reference totracerProvider, so it cannot callclose()orshutdown(). Finding is kept as minor.Reviewer B (0 findings) — B's explanation that the R1 dead-static-field concern was resolved is consistent with HEAD (line 50 shows
sdkas a local variable passed immediately toGlobalOpenTelemetry.set()). No contradiction with A's finding — A targetstracerProvider, notsdk.Kept 1 agreed finding (unique-to-A, verified). No findings dropped.
Blast Radius
The diff touches one observability utility class (
OtelErrorEvents.java) and its test harness. The production change (null-span fix +recordEventOnlyoverload +resetForTestingrace fix) is self-contained within the observability package. The test-only finding is in a new test file and does not affect production paths.BLAST_SCORE: 3/10
Risk Indicators
resetForTesting,buildErrorEvent,emitDurableErrorEvent,recordEventOnlyCI status (head
a9360c5df83f)No CI checks reported for this commit.
Findings (1)
[MINOR] SdkTracerProvider is a local variable — unreachable in @AfterAll, cannot be explicitly closed
src/test/java/com/aim2be/identity/observability/OtelErrorEventsBridgeTest.java:49
tracerProvideris created as a method-local variable insideinstallSdk()and goes out of scope when the method returns.shutdownSdk()(lines 56–60) callsGlobalOpenTelemetry.resetForTest()andOtelErrorEvents.resetForTesting()but has no reference to calltracerProvider.close()/tracerProvider.shutdown(). Today this is low-risk because no span exporter is registered (so no background flush threads are started), but the pattern silently breaks if an exporter is ever added to the builder, leavingSdkTracerProvider's internal resources without a clean lifecycle signal.Fix: promote to a static field so
shutdownSdk()can close it explicitly:Verdict
CONDITIONAL_APPROVE
hib-pr-reviewer • round 2 • 1 finding (1m) • 2026-05-31T00:29:50.828Z → 2026-05-31T00:31:07.897Z • posted-as: pr-reviewer-bot • model: auto