feat(observability): wire ADR-0015 durable error-event stream (#332) #6

Merged
hibryda merged 1 commit from feat/identity-service-error-event-stream-332 into main 2026-05-31 01:48:50 +02:00
Owner

Header

identity-service · PR (R0/open) · ADR-0015 durable error-event stream (#332) · base main

TL;DR

NEEDS_REVIEW — wires observability.v1.ErrorEventim2be.error.event via the platform-libs error-event-publisher (REQUIRES_NEW outbox write) on top of the existing span event + counter, default OFF behind im2be.error-event.enabled. Load-bearing per-service deviation: a JPA exclusion on the new dep keeps Redis-only identity-service bootable.

Summary

Replicates the review-approved user-service reference (merged to dev) for the SB-3.3.13 / Redis-outbox variant. OtelErrorEvents gains explicit-RecoveryAction (D-7) overloads of record / recordPropagated / recordSanitizedEvent(String) / recordSanitizedEvent(Throwable) plus the recordSanitizedEventInternal(spanErrorType, durableErrorClass, …) FQN-split core (span error.type + counter stay simple-name; durable error_class is the FQN). A new OtelErrorEventsBridge (@ConditionalOnProperty, not @ConditionalOnBean — ordering trap) wires the publisher in when the flag is on. Commit d4834ae. Verified at HEAD: mvn -B compile test-compile 0 warnings; mvn -B test 149/149 green (8 new).

Findings

No findings from the author — net-new wiring. Self-flagged review points:

Severity File Note
INFO pom.xml error-event-publisher carries a spring-boot-starter-data-jpa exclusion — identity is Redis-only; the publisher's transitive outbox-publisher declares JPA at compile scope, which would activate DataSource/JPA auto-config and fail startup. Mirrors the identical exclusion already on redis-outbox-backend.
INFO OtelErrorEvents.java emitDurableErrorEvent has no try/catch by design — publish() is the best-effort boundary and buildErrorEvent is fully null-safe; a blanket catch would feed the NoBlanketCatch ratchet.
INFO OtelErrorEvents.java Durable message is the sanitized string or the stable error code — never raw exception text (Rule 01).
INFO OtelErrorEvents.java recordEventOnly (identity-specific W3 extra, no Throwable) is intentionally NOT wired to the durable stream — outside the reference pattern's three families.

Verdict

NEEDS_REVIEW. Default-OFF flag → zero behaviour change until per-service rollout (post topic-CR kubectl apply gate). Conditions: none carried forward. DO NOT MERGE until reviewed.

identity-service • #332 ADR-0015 • R0/open • new=0 carried=0 • SB 3.3.13 / Redis-outbox variant • 2026-05-31

## Header **identity-service** · PR (R0/open) · ADR-0015 durable error-event stream (#332) · base `main` ## TL;DR `NEEDS_REVIEW` — wires `observability.v1.ErrorEvent` → `im2be.error.event` via the platform-libs `error-event-publisher` (`REQUIRES_NEW` outbox write) on top of the existing span event + counter, default OFF behind `im2be.error-event.enabled`. Load-bearing per-service deviation: a JPA exclusion on the new dep keeps Redis-only identity-service bootable. ## Summary Replicates the review-approved user-service reference (merged to `dev`) for the SB-3.3.13 / Redis-outbox variant. `OtelErrorEvents` gains explicit-`RecoveryAction` (D-7) overloads of `record` / `recordPropagated` / `recordSanitizedEvent(String)` / `recordSanitizedEvent(Throwable)` plus the `recordSanitizedEventInternal(spanErrorType, durableErrorClass, …)` FQN-split core (span `error.type` + counter stay simple-name; durable `error_class` is the FQN). A new `OtelErrorEventsBridge` (`@ConditionalOnProperty`, not `@ConditionalOnBean` — ordering trap) wires the publisher in when the flag is on. Commit `d4834ae`. Verified at HEAD: `mvn -B compile test-compile` 0 warnings; `mvn -B test` 149/149 green (8 new). ## Findings **No findings from the author — net-new wiring.** Self-flagged review points: | Severity | File | Note | |---|---|---| | INFO | `pom.xml` | `error-event-publisher` carries a `spring-boot-starter-data-jpa` **exclusion** — identity is Redis-only; the publisher's transitive `outbox-publisher` declares JPA at compile scope, which would activate DataSource/JPA auto-config and fail startup. Mirrors the identical exclusion already on `redis-outbox-backend`. | | INFO | `OtelErrorEvents.java` | `emitDurableErrorEvent` has **no try/catch** by design — `publish()` is the best-effort boundary and `buildErrorEvent` is fully null-safe; a blanket catch would feed the NoBlanketCatch ratchet. | | INFO | `OtelErrorEvents.java` | Durable `message` is the sanitized string **or** the stable error code — never raw exception text (Rule 01). | | INFO | `OtelErrorEvents.java` | `recordEventOnly` (identity-specific W3 extra, no Throwable) is intentionally NOT wired to the durable stream — outside the reference pattern's three families. | ## Verdict `NEEDS_REVIEW`. Default-OFF flag → zero behaviour change until per-service rollout (post topic-CR `kubectl apply` gate). Conditions: none carried forward. DO NOT MERGE until reviewed. ## Footer identity-service • #332 ADR-0015 • R0/open • new=0 carried=0 • SB 3.3.13 / Redis-outbox variant • 2026-05-31
In addition to the existing OtelErrorEvents span event + identity_errors_total
counter, OtelErrorEvents now also serializes observability.v1.ErrorEvent and
publishes it to im2be.error.event via the platform-libs error-event-publisher
(OutboxPublisher write in a REQUIRES_NEW tx, ADR-0015 D-3) so a domain error
survives the rolling-back business transaction it was caught in. Default OFF
behind im2be.error-event.enabled (ADR-0015 migration order); span event +
counter behaviour is byte-for-byte unchanged when the flag is off.

Wiring:
- OtelErrorEvents: static volatile ErrorEventPublisher + setErrorEventPublisher;
  cleared in resetForTesting(). Explicit-RecoveryAction overloads (D-7) of
  record / recordPropagated / recordSanitizedEvent(String) /
  recordSanitizedEvent(Throwable). New private recordSanitizedEventInternal
  (spanErrorType, durableErrorClass, ...) FQN-split core: the span error.type +
  counter stay the SIMPLE name (uniform with the other record* paths), while the
  durable error_class carries the FQN so the error-event stream buckets the same
  class consistently regardless of which path emitted it. emitDurableErrorEvent
  is a no-op when no publisher is wired; otherwise publishes buildErrorEvent —
  NO try/catch (publish() is the best-effort boundary and buildErrorEvent is
  fully null-safe, so a blanket catch would only feed the NoBlanketCatch ratchet).
- buildErrorEvent: service_name=INSTRUMENTATION_NAME, error_class=getName() FQN,
  PII-safe message = sanitized string ?: stable error code (NEVER the raw
  exception text — Rule 01), operation=component, trace/span ids from the span
  (or Span.current()) only when the context is valid, request_id="",
  occurred_at=Instant.now(), explicit recovery_action, all strings null-coalesced.
- OtelErrorEventsBridge: @Component @ConditionalOnProperty(im2be.error-event,
  enabled=true) → OtelErrorEvents.setErrorEventPublisher; @PreDestroy clears.
  @ConditionalOnProperty NOT @ConditionalOnBean — a component-scanned bean's
  conditions evaluate before the platform-libs autoconfig registers the
  publisher, so @ConditionalOnBean would silently never wire (ordering trap);
  @ConditionalOnProperty reads the Environment, which is order-safe.

Dependencies (pom.xml):
- error-event-publisher (${platform-libs.version}=1.1.0-SNAPSHOT) WITH a
  spring-boot-starter-data-jpa exclusion — identity-service is Redis-ONLY and
  the publisher's transitive outbox-publisher declares the JPA starter at
  compile scope; without the exclusion JPA returns to the classpath and Spring
  Boot's DataSource/JPA auto-configs activate and FAIL at startup (no datasource).
  This mirrors the identical exclusion already on the redis-outbox-backend dep.
- proto-observability (new property proto-observability.version=1.0.0, fixed
  release per D-5) for the ErrorEvent / RecoveryAction generated types.
- The error-event autoconfig is @ConditionalOnBean(OutboxPublisher.class):
  identity-service supplies OutboxPublisher via the Redis backend, so the same
  backend-agnostic facade backs the REQUIRES_NEW error-event write.

Config (application.properties): im2be.error-event.enabled (default false) +
im2be.error-event.topic=im2be.error.event. im2be.outbox.enabled is already true.

Test: OtelErrorEventsDurableStreamTest (8 tests) — no-publisher no-op; wired
field + trace mapping; PII guard (durable message == error code, not raw text);
explicit RecoveryAction; recordPropagated FQN + code fallback;
recordSanitizedEvent(Throwable) FQN durable class while span stays simple;
recordSanitizedEvent(Throwable, RecoveryAction) (D-7); resetForTesting clears.
Uses TicketMintFailedException as the concrete IdentityError and the in-tree
@BeforeAll installSdk() harness pattern (no shared OtelTestHarness exists).

Verification:
- JAVA_HOME=temurin-17; mvn -B compile test-compile → 0 [WARNING] (rule 62)
- mvn -B test → 149/149 green (8 new); NoBlanketCatch path untouched.

hib-pr-reviewer review — PR #6 (affinity-intelligence-rework/im2be-identity-service)

Round 1 — head d4834ae95861, base main, trigger opened

TL;DR: CONDITIONAL_APPROVE — kept 2 minor findings (recordEventOnly durable gap verified in code; proto-observability Enforcer gap confirmed by grep); no blocking issues.

Summary

Arbitration Round 1 — PR #6 (feat: ADR-0015 durable error-event stream)

Prior run history: None (first arbitration for this PR).

Reviewer A produced 1 JSON finding (pom.xml:111, minor) and mentioned a second gap (recordEventOnly) in prose only. Reviewer B produced 0 JSON findings but also prose-mentioned the recordEventOnly gap.

Verification performed

  1. recordEventOnly durable emission gap — Read OtelErrorEvents.java in full (727 lines) and Grep for all emitDurableErrorEvent call-sites. Confirmed: emitDurableErrorEvent is called from the record full-fidelity overload (line 258), recordPropagated full-fidelity overload (line 368), and recordSanitizedEventInternal (line 561). The recordEventOnly method (lines 580–608) ends after the counter increment with no emitDurableErrorEvent call. The class-level Javadoc at line 103 explicitly lists recordEventOnly in the durable-stream family. Both reviewers independently flagged this in prose; code confirms it is real → kept (Rule 2 / Rule 4 — own verification succeeds).

  2. No Enforcer guard on proto-observability version — Grep for enforcer|requireUpperBoundDeps|dependencyConvergence in pom.xml returned zero matches, confirming A's finding that no CI guard exists. Comment at lines 107–110 documents the risk but provides no build-time enforcement → kept (Rule 2 — unique-to-A, verified confirms).

Outcome: 2 findings kept (both minor), no blockers. CONDITIONAL_APPROVE.

Blast Radius

The diff is concentrated in the observability layer (OtelErrorEvents, OtelErrorEventsBridge, application.properties) plus pom.xml. OtelErrorEvents is a static utility called from every catch block in the service, so the recordEventOnly gap affects all validation-rejection paths. The feature flag (im2be.error-event.enabled, default off) constrains blast radius: no behaviour changes until the flag is turned on.

BLAST_SCORE: 4/10

Risk Indicators

Indicator Value
Sensitive functions emitDurableErrorEvent, setErrorEventPublisher, buildErrorEvent, OtelErrorEventsBridge
Migration touched
Test delta
Dependency changes

CI status (head d4834ae95861)

No CI checks reported for this commit.

Findings (1)

[MINOR] No CI guard against proto-observability / error-event-publisher version drift

pom.xml:111

Line 111 pins <proto-observability.version>1.0.0</proto-observability.version> as a bare <properties> entry used directly in <dependencies>, making it the Maven "nearest definition" for proto-observability. The inline comment (lines 107–110) correctly documents the downgrade risk:

a value here LOWER than the publisher's transitive would silently DOWNGRADE the generated types (Maven nearest-definition wins) → NoSuchMethodError on any newer field.

However, no build-time check enforces alignment. Grep for enforcer, requireUpperBoundDeps, and dependencyConvergence in pom.xml returns zero matches. When platform-libs.version is bumped in a future PR and error-event-publisher pulls a higher proto-observability transitively, the mismatch is invisible at compile time and surfaces as a NoSuchMethodError at runtime.

Concrete fix (either approach):

  1. Add a scoped Maven Enforcer requireUpperBoundDeps execution:
<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-enforcer-plugin</artifactId>
  <executions>
    <execution>
      <id>enforce-proto-observability-convergence</id>
      <goals><goal>enforce</goal></goals>
      <configuration>
        <rules><requireUpperBoundDeps/></rules>
      </configuration>
    </execution>
  </executions>
</plugin>
  1. Or move proto-observability into <dependencyManagement> sourced from error-event-publisher's BOM (if one exists), so the two versions are mechanically tied rather than manually synchronized. If no BOM exists, the Enforcer rule is the pragmatic path.

Verdict

CONDITIONAL_APPROVE


hib-pr-reviewer • round 1 • 1 finding (1m) • 2026-05-30T23:35:59.569Z → 2026-05-30T23:38:03.260Z • posted-as: pr-reviewer-bot • model: auto

## hib-pr-reviewer review — PR #6 (affinity-intelligence-rework/im2be-identity-service) **Round 1** — head `d4834ae95861`, base `main`, trigger `opened` **TL;DR:** CONDITIONAL_APPROVE — kept 2 minor findings (recordEventOnly durable gap verified in code; proto-observability Enforcer gap confirmed by grep); no blocking issues. ### Summary ## Arbitration Round 1 — PR #6 (feat: ADR-0015 durable error-event stream) **Prior run history:** None (first arbitration for this PR). **Reviewer A** produced 1 JSON finding (pom.xml:111, minor) and mentioned a second gap (`recordEventOnly`) in prose only. **Reviewer B** produced 0 JSON findings but also prose-mentioned the `recordEventOnly` gap. ### Verification performed 1. **`recordEventOnly` durable emission gap** — Read `OtelErrorEvents.java` in full (727 lines) and Grep for all `emitDurableErrorEvent` call-sites. Confirmed: `emitDurableErrorEvent` is called from the `record` full-fidelity overload (line 258), `recordPropagated` full-fidelity overload (line 368), and `recordSanitizedEventInternal` (line 561). The `recordEventOnly` method (lines 580–608) ends after the counter increment with **no** `emitDurableErrorEvent` call. The class-level Javadoc at line 103 explicitly lists `recordEventOnly` in the durable-stream family. Both reviewers independently flagged this in prose; code confirms it is real → **kept** (Rule 2 / Rule 4 — own verification succeeds). 2. **No Enforcer guard on proto-observability version** — Grep for `enforcer|requireUpperBoundDeps|dependencyConvergence` in `pom.xml` returned zero matches, confirming A's finding that no CI guard exists. Comment at lines 107–110 documents the risk but provides no build-time enforcement → **kept** (Rule 2 — unique-to-A, verified confirms). **Outcome:** 2 findings kept (both minor), no blockers. CONDITIONAL_APPROVE. ### Blast Radius The diff is concentrated in the observability layer (`OtelErrorEvents`, `OtelErrorEventsBridge`, `application.properties`) plus `pom.xml`. `OtelErrorEvents` is a static utility called from every catch block in the service, so the `recordEventOnly` gap affects all validation-rejection paths. The feature flag (`im2be.error-event.enabled`, default off) constrains blast radius: no behaviour changes until the flag is turned on. **BLAST_SCORE: 4/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `emitDurableErrorEvent`, `setErrorEventPublisher`, `buildErrorEvent`, `OtelErrorEventsBridge` | | Migration touched | — | | Test delta | — | | Dependency changes | — | ### CI status (head `d4834ae95861`) _No CI checks reported for this commit._ ### Findings (1) #### **[MINOR]** No CI guard against `proto-observability` / `error-event-publisher` version drift _pom.xml:111_ Line 111 pins `<proto-observability.version>1.0.0</proto-observability.version>` as a bare `<properties>` entry used directly in `<dependencies>`, making it the Maven "nearest definition" for `proto-observability`. The inline comment (lines 107–110) correctly documents the downgrade risk: > a value here LOWER than the publisher's transitive would silently DOWNGRADE the generated types (Maven nearest-definition wins) → NoSuchMethodError on any newer field. However, no build-time check enforces alignment. Grep for `enforcer`, `requireUpperBoundDeps`, and `dependencyConvergence` in `pom.xml` returns zero matches. When `platform-libs.version` is bumped in a future PR and `error-event-publisher` pulls a higher `proto-observability` transitively, the mismatch is invisible at compile time and surfaces as a `NoSuchMethodError` at runtime. **Concrete fix (either approach):** 1. Add a scoped Maven Enforcer `requireUpperBoundDeps` execution: ```xml <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-enforcer-plugin</artifactId> <executions> <execution> <id>enforce-proto-observability-convergence</id> <goals><goal>enforce</goal></goals> <configuration> <rules><requireUpperBoundDeps/></rules> </configuration> </execution> </executions> </plugin> ``` 2. Or move `proto-observability` into `<dependencyManagement>` sourced from `error-event-publisher`'s BOM (if one exists), so the two versions are mechanically tied rather than manually synchronized. If no BOM exists, the Enforcer rule is the pragmatic path. ### Verdict **CONDITIONAL_APPROVE** --- <sub>hib-pr-reviewer • round 1 • 1 finding (1m) • 2026-05-30T23:35:59.569Z → 2026-05-30T23:38:03.260Z • posted-as: pr-reviewer-bot • model: auto</sub>
hibryda deleted branch feat/identity-service-error-event-stream-332 2026-05-31 01:48:50 +02:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
affinity-intelligence-rework/im2be-identity-service!6
No description provided.