feat(error-event-publisher): durable error-event Kafka stream via outbox (ADR-0015, #332) #18
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/error-event-publisher"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
New 6th platform-libs module — the foundation for the #332 error-event Kafka stream (ADR-0015, ACCEPTED). Serializes
observability.v1.ErrorEventand publishes it toim2be.error.eventvia the ADR-0014 outbox, in addition to the existingOtelErrorEventsspan-event + counter — giving a durable, 100% (unsampled), replayable, queryable cross-service error log.Design (per ADR-0015)
ErrorEventPublisher— best-effort facade;publish(ErrorEvent)never throws (catchesRuntimeException, logs, swallows: observability must not break the caller; D-3).ErrorEventOutboxWriter—@Transactional(REQUIRES_NEW)so the error-event row survives a rolling-back business tx (the D-3 hazard); a separate bean so the tx proxy applies (not self-invoked). Writes raw protobuf bytes (D-2 — no Apicurio; registry-outage resilience) via the deferred-serialisationOutboxPublisheroverload (ErrorEventhas noevent_idto embed → supplied id ignored). Partition key =trace_id→128-bit UUID (random fallback).ErrorEventProperties—im2be.error-event.{enabled(default off),topic}.ErrorEventAutoConfiguration—@ConditionalOnProperty(enabled)+@ConditionalOnBean(OutboxPublisher); opt-in per service.Verification
mvn -pl error-event-publisher test→ Tests run: 6, Failures: 0, Errors: 0 (best-effort swallow, null no-op, delegation, raw-bytes payload, trace→UUID determinism + random fallback).mvn -pl error-event-publisher -am install→ BUILD SUCCESS, zero warnings, javadoc:jar clean.Blast radius
Additive new module; no change to existing modules (only the parent
<modules>list + README). Default off — no behaviour change until a service setsim2be.error-event.enabled=true. Score 3/10.Next (separate, post-merge+deploy)
flux-applications
KafkaTopicCRs (im2be.error.event+.DLT) — operatorkubectl applygated; then per-serviceOtelErrorEvents.record(...)wiring.Show previous round
hib-pr-reviewer review — PR #18 (affinity-intelligence-rework/im2be-platform-libs)
Round 1 — head
56198fcfc49b, basemain, triggeropenedTL;DR: NEEDS_WORK — kept 5 findings (2 agreed pairs + 2 unique-to-one each verified, + 1 agreed): 1 major silent non-wiring defect, 4 minor issues including swallowed stack trace, untested REQUIRES_NEW boundary, regex recompile, and missing startup validation.
Summary
Arbitration — Round 1
No prior Memora history for this PR (first run). Memora persist attempted but blocked by tag-allowlist restrictions in this environment — proceeding with reconciliation output only.
Verification performed (all 5 source files read at HEAD):
@AutoConfigureAfter@AutoConfigurationat line 19 has noafter=attribute;@ConditionalOnBean(OutboxPublisher.class)at line 30 is present with no ordering guardString.matches()regex recompileErrorEventOutboxWriter.javais&& traceId.matches("[0-9a-fA-F]{32}")LOG.warnswallows stack traceErrorEventPublisher.javapasse.toString()as the{}format arg, SLF4J will not append the stack traceErrorEventOutboxWriterTest.javaline 34 usesnew ErrorEventOutboxWriter(...),write()is called on the plain instance at line 41 — proxy never engaged@Validated/@NotBlankmissingErrorEventProperties.javahas no@Validatedon the class and no@NotBlankon thetopicfieldFinal tally: 5 kept (1 major, 4 minor). Zero dropped.
Blast Radius
This is an entirely new module (all 10 files are additions) exporting a public autoconfigured API surface (
ErrorEventPublisher,ErrorEventOutboxWriter,ErrorEventProperties). The blast radius is self-contained — the module wraps the existingoutbox-publisherand is opt-in via a property flag — but the major wiring defect affects every service that activates bothim2be.error-event.enabled=trueandim2be.outbox.enabled=true, potentially silently disabling the error stream across all adopters.BLAST_SCORE: 4/10
CI status (head
56198fcfc49b)Overall: ✗ failure
2 checks: 2 pending
Findings (5)
[MAJOR] Missing
@AutoConfigureAfter(OutboxAutoConfiguration.class)—@ConditionalOnBean(OutboxPublisher.class)silently evaluates false under non-deterministic classpath orderingerror-event-publisher/src/main/java/com/aim2be/platform/errorevent/ErrorEventAutoConfiguration.java:19
ErrorEventAutoConfiguration(line 19) carries only@AutoConfigurationwith noafter=directive. Spring Boot evaluates@ConditionalOnBeanagainst bean definitions already registered at the moment each auto-configuration is processed. Because this module andOutboxAutoConfigurationship in separate JARs with separate.importsfiles, Spring Boot may (and commonly does, since alphabetical ordering puts 'E' before 'O' in merged import lists) processErrorEventAutoConfigurationfirst. At that pointOutboxPublisheris not yet a registered bean definition,@ConditionalOnBean(OutboxPublisher.class)at line 30 returnsfalse, and neithererrorEventOutboxWriternorerrorEventPublisheris created — with no warning or startup failure. A service that sets bothim2be.error-event.enabled=trueandim2be.outbox.enabled=truesilently gets no error-event publishing.Fix (idiomatic Spring Boot 2.7+): Change line 19 from:
to:
Alternatively, add
@AutoConfigureAfter(OutboxAutoConfiguration.class)as a separate annotation. The Spring Boot docs explicitly state: "we strongly recommend that you use @AutoConfigureAfter and @AutoConfigureBefore annotations to create an ordering between auto-configurations when @ConditionalOnBean is used."[MINOR]
String.matches()recompiles the hex-validation regex on everyaggregateId()callerror-event-publisher/src/main/java/com/aim2be/platform/errorevent/ErrorEventOutboxWriter.java:80
Line 80:
&& traceId.matches("[0-9a-fA-F]{32}")callsPattern.compile(regex)internally on each invocation, allocating a newPatternobject per call toaggregateId(). Thelength == W3C_TRACE_ID_HEX_LENpre-check on line 79 passes for every well-formed W3C trace ID (the common case), so the regex allocation is on the hot path — once perwrite().Fix: Hoist the pattern to a
private static finalconstant:and replace line 80 with:
The existing
W3C_TRACE_ID_HEX_LENlength pre-check remains useful and should be kept.[MINOR]
LOG.warnpassese.toString()as a{}placeholder argument — full stack trace is swallowed from log outputerror-event-publisher/src/main/java/com/aim2be/platform/errorevent/ErrorEventPublisher.java:47
Lines 47-49:
SLF4J appends a Throwable's full stack trace only when the Throwable is passed as the last argument with no corresponding
{}placeholder. Here the trailing{}causes SLF4J to consumee.toString()as a plain string argument; the stack trace is discarded entirely. During an outage triage the log will show only the exception message (e.g.IllegalStateException: db down) with no indication of where in the outbox stack the failure originated.Fix: Remove the final
{}placeholder and passedirectly as the trailing argument:SLF4J recognises the trailing
Throwableand appends the full cause chain and stack frames automatically.[MINOR] ADR-0015 D-3 guarantee (error survives business-tx rollback) is untested — unit tests bypass the Spring transactional proxy
error-event-publisher/src/test/java/com/aim2be/platform/errorevent/ErrorEventOutboxWriterTest.java:34
All three tests instantiate the writer with
new ErrorEventOutboxWriter(outbox, TOPIC)at line 34 and callwriter.write(event)directly at line 41. A plain-newinstance carries no Spring AOP proxy, so@Transactional(propagation = REQUIRES_NEW)onwrite()is a no-op in these tests. The module's primary correctness claim — that the outbox row commits independently when an outer transaction rolls back (ADR-0015 D-3) — is verified nowhere in the test suite.The existing Mockito tests are valuable for payload/key logic and should be retained. Recommend adding at least one
@DataJpaTest(or@SpringBootTestslice) integration test that:REQUIREDtransaction.errorEventOutboxWriter.write(event)through the Spring-managed bean (notnew).[MINOR]
ErrorEventPropertieslacks@Validated+@NotBlankontopic— blank misconfiguration surfaces at runtime, not startuperror-event-publisher/src/main/java/com/aim2be/platform/errorevent/ErrorEventProperties.java:9
ErrorEventProperties(lines 9-39) has no@Validatedannotation on the class and no JSR-303 constraint on thetopicfield. An operator who accidentally setsim2be.error-event.topic=(blank) orim2be.error-event.topic=(whitespace) will get a blank topic silently propagated intoErrorEventOutboxWriter; the misconfiguration surfaces only at the firstwrite()call as aRuntimeExceptionthatErrorEventPublisher.publish()swallows with a single log line — the error stream fails silently with no indication of misconfiguration. The siblingOutboxPropertiescorrectly carries@Validated.Fix: Add
@Validatedto the class and@NotBlankto thetopicfield:Spring Boot will then reject the application context at startup with a clear
BindValidationException, consistent withOutboxProperties.Verdict
NEEDS_WORK
hib-pr-reviewer • round 1 • 5 findings (1M/4m) • 2026-05-30T22:17:25.096Z → 2026-05-30T22:19:49.977Z • posted-as: pr-reviewer-bot • model: auto
R1 verdict NEEDS_WORK; all 5 findings (kept=5): (MAJOR) ErrorEventAutoConfiguration lacked ordering → @ConditionalOnBean(OutboxPublisher) could evaluate before OutboxAutoConfiguration registered the bean (separate JARs, 'E' before 'O' in merged imports) → error-event SILENTLY not wired even when enabled. Fix: @AutoConfiguration(after = OutboxAutoConfiguration.class). (MINOR) aggregateId() recompiled the hex regex per call (hot path) → hoisted to a static final Pattern TRACE_ID_HEX_PATTERN. (MINOR) ErrorEventPublisher.warn passed e.toString() as a {} arg → stack trace swallowed. Now passes `e` as the trailing arg with no {} → SLF4J logs the trace. (MINOR) REQUIRES_NEW boundary untested (plain-new tests bypass the proxy). Added ErrorEventOutboxWriterTxTest: a @SpringBootTest under @EnableTransactionManagement with a recording PlatformTransactionManager proving write() runs through the proxy in a PROPAGATION_REQUIRES_NEW tx (the part in this module's control; rollback survival is then Spring's documented REQUIRES_NEW guarantee). (MINOR) ErrorEventProperties not validated → blank topic surfaced at runtime. Added @Validated + @NotBlank on topic (+ spring-boot-starter-validation) → fail-fast at startup, parity with OutboxProperties. Verification: - mvn -pl error-event-publisher test → Tests run: 7, Failures: 0, Errors: 0 - mvn -pl error-event-publisher install → BUILD SUCCESS, zero warnings, javadoc cleanhib-pr-reviewer review — PR #18 (affinity-intelligence-rework/im2be-platform-libs)
Round 2 — head
e0a906749494, basemain, triggersynchronizeTL;DR: NO_NEW_FINDINGS — No new findings this round.
Summary
[quorum-converged] A=0 = B=0. ## Round 2 review — all five prior-round findings addressed
Finding-by-finding status:
ErrorEventAutoConfiguration.java:19— missing@AutoConfigureAfter@AutoConfiguration(after = OutboxAutoConfiguration.class)at line 20ErrorEventOutboxWriter.java:80— regex recompiled per callTRACE_ID_HEX_PATTERNhoisted asstatic final PatternErrorEventPublisher.java:47— stack trace swallowed bye.toString()eis the third variadic argument (no matching{}), SLF4J logs the full stackErrorEventOutboxWriterTest.java:34— D-3 guarantee (REQUIRES_NEW proxy) untestedErrorEventOutboxWriterTxTestwires the writer as a Spring bean under@EnableTransactionManagement+RecordingTransactionManagerand assertsPROPAGATION_REQUIRES_NEWErrorEventProperties.java:9—@Validated/@NotBlankabsentspring-boot-starter-validationadded to pomAll five conditions are satisfied. No new actionable findings were identified in this round.
CI status (head
e0a906749494)Overall: ✗ failure
2 checks: 2 pending
Findings
No new findings this round.
Quorum converged on empty findings (A + B both returned 0).
Verdict
NO_NEW_FINDINGS
hib-pr-reviewer • round 2 • 0 findings • 2026-05-30T22:26:57.523Z → 2026-05-30T22:31:05.533Z • posted-as: pr-reviewer-bot • model: auto • [bookkeeping fallback]