feat(outbox): carry W3C traceparent through the outbox to Kafka headers (GC-23 (b) producer half) #24
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feat/outbox-traceparent-carry"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Header
im2be-platform-libs · GC-23 (b) producer→relay origin (producer half) · branch
feat/outbox-traceparent-carry→mainTL;DR
The transactional outbox now carries the producing span's W3C
traceparentthrough to Kafka — captured at outbox-write time, persisted on the row, and re-emitted as Kafka message headers by both the hot relay and the cold poller — so a business event continues the SAMEtrace_idfrom the producing service → Kafka → the consuming relay (which extracts it, realtime #8) → Centrifugo → client.Summary
Closes the producer half of GC-23 boundary (b). The consumer half shipped in
im2be-realtime-service#8 (relaypropagation.extractof the Kafka-headertraceparent); the relay→Centrifugo segment (boundary (a)) is already closed end-to-end (Tempod2b8b4ab…). This PR makes the outbox the trace bridge across its store-and-forward boundary:OutboxTraceContext(package-private) — W3C-explicit (W3CTraceContextPropagator, not the configured propagator, so thetraceparentheader the polyglot consumer expects is always emitted, ADR-0013):captureCurrent(),restore(traceParent, traceState),currentKafkaHeaders(),kafkaHeadersFor(traceParent, traceState).OutboxRecord—trace_parent(VARCHAR 64) +trace_state(VARCHAR 512) columns + accessors (setters truncate to bound). equals/hashCode unchanged (PK-only).OutboxPublisher.publish(...)— captures the current W3C context at write (while the business span is current) and persists it on the row; the AFTER_COMMIT hot relay nests itsim2be.outbox.publish.hotspan under the restored business context and emits that span's context as Kafka headers viaProducerRecord.OutboxPollerWorker.tryColdPublish(...)— re-emits the persistedtraceparent/tracestateverbatim as Kafka headers (the cold poller may run after a restart with no live context), viaProducerRecord.No-op when no recording span is active (
OpenTelemetry.noop()/ sampled-out):trace_parentisNULL, no header is emitted, downstream roots its own trace — safe with or without a real OTel SDK, and forward-compatible with services not yet upgraded.Risk indicators
outbox-publishertrace_parent,trace_state); local ddl-auto auto-applies; dev/stage/prod owners add a FlywayALTER TABLE(documented in README)send(topic,key,payload)→send(ProducerRecord)in both relays (tests updated for the matcher ripple)W3CTraceContextPropagatoris in the already-presentopentelemetry-api;ProducerRecord/Headervia the presentspring-kafka/kafka-clients)Verification
mvn -pl outbox-publisher test— green, zero[WARNING](parent pomfailOnWarning=true).trace_parenton the row; the hot relay'sProducerRecordcarries atraceparentheader under an active SDK span; the cold poller re-emits a persistedtraceparentverbatim (and emits none when null); accessor round-trip + truncation.trace_id) is gated on publishing the new1.1.0-SNAPSHOTto the Forgejo Maven registry + a family-service--no-cacherebuild (ci.yml does not yet auto-deploy — PR-PLATFORM-CI-1); tracked as the remaining verification step.Verdict
NEEDS_REVIEW — shared-library change with a send-shape + schema ripple; backward/forward compatible; no new deps. README updated under "Distributed-trace carry through the outbox (GC-23 (b))".
Footer
im2be-platform-libs · GC-23(b) producer half · R0 (open) · outbox-publisher tests green/0-warn · 2026-06-01
Show previous round
hib-pr-reviewer review — PR #24 (affinity-intelligence-rework/im2be-platform-libs)
Round 1 — head
6e2661d4b572, basemain, triggeropenedTL;DR: CONDITIONAL_APPROVE — kept 2 minor findings (1 agreed-pair + 1 verified unique-to-A); no blocking issues.
Summary
Arbitration: PR #24 — W3C traceparent carry through outbox to Kafka headers
Memora lookup: No prior run history for this PR; no reusable submodule patterns stored. This is round 1.
Finding reconciliation:
F1 (unique to A) —
OutboxPublisher.javascope-management gap. Verified viaReadat lines 469–500:scope = span.makeCurrent()is called at line 469, thentraceHeaders = OutboxTraceContext.currentKafkaHeaders()at line 474 sits outside thetryblock that starts at line 475. Thefinally { scope.close(); }is at lines 498–500. If line 474 were to throw, thefinallyis never reached and the scope leaks. Finding confirmed; kept.F2 (agreed A + B) —
OutboxPublisherTest.javatest gap. Verified viaReadat lines 780–786:tryHotPublishis called withnull, nullfortraceParent/traceState. Therestore(traceParent, traceState) → spanBuilder.setParent(businessContext)path (non-null parent, sametrace_idinvariant) is never exercised. Both reviewers independently flag the same test at the same line with materially equivalent bodies. Finding confirmed; kept.Kept 2 findings (1 agreed-pair + 1 verified unique-to-A). All minor. Verdict: CONDITIONAL_APPROVE.
Blast Radius
The diff touches 10 files across the outbox-publisher module — entity, two relay code paths (hot + cold), a new helper class, tests, and README. Impact is contained within a single library module; no cross-module API surface or protocol contracts are changed. Consuming services require a manual DDL migration (documented in README) but the library change itself is backward-compatible (new columns nullable, carry degrades gracefully to no-op).
BLAST_SCORE: 4/10
Risk Indicators
OutboxTraceContext.captureCurrent,OutboxTraceContext.restore,OutboxTraceContext.currentKafkaHeaders,OutboxTraceContext.kafkaHeadersForCI status (head
6e2661d4b572)Overall: ✓ success
2 checks: 2 pending
Findings (2)
[MINOR] Scope opened before try-block:
currentKafkaHeaders()call at line 474 sits outside the finally that closes itoutbox-publisher/src/main/java/com/aim2be/platform/outbox/OutboxPublisher.java:474
At line 469,
scope = span.makeCurrent()is called. Thetryblock that containsfinally { scope.close(); }(lines 475–500) does not start until line 475.OutboxTraceContext.currentKafkaHeaders()at line 474 sits in the unprotected gap between them. If that call were to throw (the OTel inject path does not in current practice, but it is outside the API contract),scopewould never be closed — leaking the hot-publish span as the thread's current span indefinitely and preventing the span from ending.Fix: move
traceHeadersinitialization to be the first statement inside thetryblock:Context.current()at line 470 is infallible (it reads a thread-local) so it is safe to leave outside the try.currentKafkaHeaders()performs OTel map injection and is the first non-trivial call that could theoretically propagate an unchecked exception.[MINOR] Hot-relay GC-23 test exercises only
traceParent=null; therestore()→setParent()path with a non-null parent is never verifiedoutbox-publisher/src/test/java/com/aim2be/platform/outbox/OutboxPublisherTest.java:761
hotPublishCarriesTraceparentHeaderWhenSpanActive(lines 761–804) callstryHotPublishwithtraceParent=null, traceState=null(lines 785–786). This exercises only therestore(null, null) → Context.root()branch — the hot-publish span starts a fresh trace andcurrentKafkaHeaders()captures that fresh trace's ID. The primary GC-23 (b) invariant — a non-null persistedtraceParentcausesrestore()to rebuild the business context, the hot-publish span becomes its child (sametrace_id, newspan_id), and the Kafka header carries that sametrace_id— is never asserted. This is the only path that would catch a regression inOutboxTraceContext.restore()(e.g. if the W3Cextractcall returnedContext.root()due to an API change).The cold-poller equivalent (
coldPublishReEmitsPersistedTraceparentAndOmitsWhenAbsent) correctly covers its non-null path. Add a symmetric hot-relay test:Verdict
CONDITIONAL_APPROVE
hib-pr-reviewer • round 1 • 2 findings (2m) • 2026-06-01T14:07:20.615Z → 2026-06-01T14:09:04.195Z • posted-as: pr-reviewer-bot • model: auto
R1 findings addressed — head
5839776TL;DR: Both R1 MINOR findings fixed;
mvn -pl outbox-publisher verifygreen (116 unit + 4 Testcontainers IT, 0[WARNING]). Re-review (R2) requested.Summary: R1 = CONDITIONAL_APPROVE (2 minor, no blocking). Addressed at
5839776; no carried conditions.Findings resolved
OutboxPublisher.javahot relaycurrentKafkaHeaders()+ thepublishContextcapture INSIDE thetry, sofinally { scope.close(); }covers them — closes themakeCurrent()→trygap where an OTel-inject throw would have leaked the current scope. Rule-63 ripple: also protects the pre-existingpublishContextcapture, not just the flagged line.OutboxPublisherTest.javahotPublishNestsUnderRestoredParentWhenTraceParentProvided— passes a craftedtraceparent, asserts the emitted Kafka header keeps the parent'strace_idwith a NEW span_id (exercises therestore(...)→setParentnon-null-parent path).Verification:
mvn -pl outbox-publisher verify→ 116 unit + 4 Testcontainers IT green, 0[WARNING];OutboxPublisherITexercises the realProducerRecordsend vs a live Kafka broker.Verdict: NEEDS_REVIEW (R2).
Footer: im2be-platform-libs · GC-23(b) producer half · R1→fix · 116u+4it green · 2026-06-01
Show previous round
hib-pr-reviewer review — PR #24 (affinity-intelligence-rework/im2be-platform-libs)
Round 2 — head
583977601c50, basemain, triggersynchronizeTL;DR: CONDITIONAL_APPROVE — 1 agreed minor finding kept (no OutboxTraceContextTest for boundary cases); both R1 findings confirmed resolved.
Summary
Round 2 arbitration — 1 agreed finding kept
Memora persistence
Attempted to persist run summary to Memora; all
memory_createcalls failed withinvalid_input: Tag 'memora/issues' is not in the allowed tag listregardless of tag parameters provided. Proceeding with reconciliation output only.R1 findings
Both A and B confirm R1 findings resolved:
publishContext/traceHeadersmoved inside thetryblock (lines 467–476 ofOutboxPublisher.java, confirmed via diff).hotPublishNestsUnderRestoredParentWhenTraceParentProvidedadded (line 815+ ofOutboxPublisherTest.java).Finding reconciliation
Agreed (both A and B): Both cite
OutboxTraceContext.javaline 112 with materially-equivalent bodies — no dedicated direct unit test forOutboxTraceContextboundary cases. Verified:Globfinds noOutboxTraceContextTest.java;GrepfindsOutboxTraceContextappears in test sources only once (as a comment inOutboxPublisherTest.java).ReadofOutboxTraceContext.javaconfirmsrestore()at line 112 delegates malformed-but-non-blank input directly toW3C.extract()with no test asserting the silent-fallback contract. B's broader body (3 cases: malformed restore, captureCurrent noop, kafkaHeadersFor null) is adopted as it is a strict superset of A's 1-case claim and all three cases are independently verified against the code at lines 96–165. Kept as 1 merged minor finding.Blast Radius
The diff adds
OutboxTraceContext(a new shared utility) and wires it into both the hot-relay and cold-poller publish paths inOutboxPublisherandOutboxPollerWorker. Every service that depends onim2be-platform-libsand uses the outbox will pick up this carry automatically. The surface is bounded to the outbox module but the library is a platform shared dependency, so a regression would affect all consuming services.BLAST_SCORE: 5/10
Risk Indicators
OutboxTraceContext.restore,OutboxTraceContext.captureCurrent,OutboxTraceContext.currentKafkaHeaders,OutboxTraceContext.kafkaHeadersFor,OutboxPublisher.tryHotPublish,OutboxPollerWorker.tryColdPublishCI status (head
583977601c50)Overall: ✓ success
2 checks: 2 pending
Findings (1)
[MINOR] No dedicated unit test for
OutboxTraceContext— three boundary cases are unassertedoutbox-publisher/src/main/java/com/aim2be/platform/outbox/OutboxTraceContext.java:112
OutboxTraceContextis a new, non-trivial utility class (166 lines, zero prior tests). The only test coverage it receives is transitive — throughOutboxPublisherTestandOutboxPollerWorkerTest. Three boundary cases are exercised by neither caller test:restore(malformed, null)— a syntactically invalidtraceparent(e.g."garbage","00-short", or a value truncated byOutboxRecord.TRACE_PARENT_MAX_LENGTH=64that exceeds the W3C fixed-55-char format) reaches line 121 (W3C.extract(...)) whose OTel spec contract is to silently returnContext.root()— correct fallback — but that contract is nowhere asserted. A future OTel upgrade that changed extractor behaviour on malformed input would pass all tests while breaking trace isolation.captureCurrent()with the noop SDK — should return an empty map (notraceparententry), so callers correctly persistNULLrather than a garbage header. Currently only implied by indirect null-parent Kafka-header tests, not verified directly.kafkaHeadersFor(null, null)— should return an empty list;kafkaHeadersFor(traceParent, null)should return exactly one header (notracestate). The null-guard logic at lines 156–163 is correct but unasserted.Recommended fix: add
OutboxTraceContextTest.java:This pins the "no regression on bad DB data" and "no-op SDK produces no header" guarantees that the code already implements but does not assert.
Verdict
CONDITIONAL_APPROVE
hib-pr-reviewer • round 2 • 1 finding (1m) • 2026-06-01T14:16:39.679Z → 2026-06-01T14:18:28.811Z • posted-as: pr-reviewer-bot • model: auto
R2 finding addressed — head
16aa8c1TL;DR: R2's lone minor (no
OutboxTraceContextTest) resolved;mvn -pl outbox-publisher test→ 127 unit (was 116, +11), 0[WARNING]. Both R1 findings already confirmed resolved by R2. Re-review (R3) requested.Finding resolved
OutboxTraceContextboundary coverageOutboxTraceContextTest(11 tests) pinning the no-op/null/blank paths the carry relies on:captureCurrent()/currentKafkaHeaders()empty with no active span;restore(null|blank)→ root context,restore(validTP)→ valid REMOTE span context (right trace_id/span_id) + tracestate round-trip;kafkaHeadersFornull/blank → empty, (tp) → 1 header, (tp,ts) → 2, (tp, blank-ts) → tracestate skipped. Active-span emit paths remain covered against the SDK inOutboxPublisherTest.Verification:
mvn -pl outbox-publisher test→ 127 unit, 0[WARNING](IT unchanged: 4 Testcontainers IT green in the priorverify).Verdict: NEEDS_REVIEW (R3) — no carried conditions; R1 (F1+F2) + R2 (boundary tests) all resolved.
Footer: im2be-platform-libs · GC-23(b) producer half · R2→fix · 127u green · 2026-06-01
hib-pr-reviewer review — PR #24 (affinity-intelligence-rework/im2be-platform-libs)
Round 3 — head
16aa8c17de14, basemain, triggersynchronizeTL;DR: CONDITIONAL_APPROVE — kept 1 minor agreed/verified finding (unique-to-A, confirmed by file read); 0 findings from B; all 3 prior-round issues resolved.
Summary
Round 3 Arbitration
Kept 1 finding (minor, unique-to-A, verified).
Dropped 0 findings.
Prior-round findings: all resolved
publishContextandcurrentKafkaHeaders()are now inside thetryblock.hotPublishNestsUnderRestoredParentWhenTraceParentProvidedis present and exercises the full invariant.OutboxTraceContextTest.java(11 tests) covers the full boundary matrix.Unique-to-A finding: verified and kept
Read
OutboxPollerWorkerTest.javalines 340–382: line 353 setswithTrace.setTraceParent(persistedTraceparent)only. NosetTraceState(...)call. No assertion on atracestateheader. Production code atOutboxPollerWorker.javalines 218–219 callskafkaHeadersFor(record.getTraceParent(), record.getTraceState())with both arguments. The wiring fromrecord.getTraceState()→ Kafka header is untested at the integration level. The unit-level logic is covered inOutboxTraceContextTest.kafkaHeadersForTraceParentAndStateEmitsBothHeaders, so risk is low — kept at minor.Memora note
Tag allowlist blocked persistence (tags
pr-review-runand default auto-tags both rejected); Memora write skipped.Blast Radius
The diff touches 11 files across the outbox-publisher main and test trees. The production surface is an exported library used by consuming services (OutboxRecord schema change, OutboxTraceContext new class, OutboxPublisher and OutboxPollerWorker wiring changes). The schema change (new nullable columns) requires a manual migration step for services with explicit Flyway/Liquibase migrations, as documented in README.md.
BLAST_SCORE: 4/10
Risk Indicators
OutboxTraceContext.captureCurrent,OutboxTraceContext.restore,OutboxTraceContext.kafkaHeadersFor,OutboxTraceContext.currentKafkaHeadersCI status (head
16aa8c17de14)Overall: ✓ success
2 checks: 2 pending
Findings (1)
[MINOR] Cold-poller test exercises only
traceParent; thetraceStateco-carry path is never asserted at the integration leveloutbox-publisher/src/test/java/com/aim2be/platform/outbox/OutboxPollerWorkerTest.java:353
Line 353 sets
withTrace.setTraceParent(persistedTraceparent)but never callswithTrace.setTraceState(...). The cold-poller sendskafkaHeadersFor(record.getTraceParent(), record.getTraceState())(OutboxPollerWorker.java:218–219); thetraceStateargument only produces a header when it is non-blank. The test asserts thetraceparentheader is present and correct but has no assertion ontracestate, so a future refactor that drops or shadows the second argument would pass all tests silently.Fix: add
withTrace.setTraceState("vendorA=t61rcWkgMzE")after line 353, then assertfirst.headers().lastHeader("tracestate")is non-null and equals"vendorA=t61rcWkgMzE". The unit-level logic is already covered byOutboxTraceContextTest.kafkaHeadersForTraceParentAndStateEmitsBothHeaders, so this is a narrow wiring-level gap only.Verdict
CONDITIONAL_APPROVE
hib-pr-reviewer • round 3 • 1 finding (1m) • 2026-06-01T14:24:05.179Z → 2026-06-01T14:25:36.580Z • posted-as: pr-reviewer-bot • model: auto