feat(redis-outbox): Redis Lua-EVAL OutboxBackend module (Wave A.2 Phase 0b) #8

Merged
hibryda merged 4 commits from feat/redis-outbox-backend into main 2026-05-28 14:22:58 +02:00
Owner

Wave A.2 Phase 0b — Redis Lua-EVAL OutboxBackend module

New redis-outbox-backend Maven module: a Redis Lua-EVAL implementation of the Phase 0a OutboxBackend SPI, so identity-service keeps ticket mint + outbox event atomic (the mint is itself a Redis op — ADR-0014 D-3 / docs/decisions/0014-supporting/identity-redis-outbox-backend.md). Separate module so the 7 PG consumers never pull in spring-data-redis. Design + the 12 required refinements: .planning/32-wave-a2-redis-outbox-design.md.

Storage model

One Hash {identity:outbox}:entries (field=event_id, value=JSON entry) + companion sorted-sets :pending (score=nextAttemptAt) + :expiry-due (score=expiresAt), all hash-tagged {identity:outbox} for single-slot EVAL (#8). requiresActiveTransaction()=false; the backend owns its own inline relay (RedisOutboxRelay) + cold poller (RedisOutboxPollerWorker) — it does not route through the PG OutboxPublisher's unconditional AFTER_COMMIT relay (resolves the Phase 0a R3 @apiNote).

12 refinements → artifacts

  • #1 fail-closed Lua (outbox-enqueue): TYPE precheck before any write; HSET entry before ZADD index so a mid-script death leaves only a repairable orphan-Hash-entry (never an unpersisted index member); HEXISTS idempotent guard.
  • #2 durable expiry intent: recordExpiryDue/findDueExpiries/decideExpiry + expiry-decide.lua + TicketExpiryEventId (UUIDv3 collapse). Primitive only — identity wiring is Phase 1.
  • #3 parked FAILED: mark-failure returns [applied,parked]; FAILED retained in Hash, ZREM'd from :pending, never auto-deleted; parked metric.
  • #4 lock: SET NX PX owner-token + lock-renew/lock-release compare-and-* Lua; batch bounded < lock TTL; lost-lock aborts batch; renewMargin < ttl validated at startup.
  • #5 nextAttemptAt scoring: RedisOutboxBackoff (exp, capped, overflow-safe); failure re-scores :pending; findPendingBatch = ZRANGEBYSCORE -inf now.
  • #7 durability: RedisDurabilityValidator (startup + @Scheduled) asserts appendonly/appendfsync/noeviction; FAIL_FAST/WARN modes.
  • #8 hash-tags on all 4 keys.
  • #9 retention: HSCAN + status-guarded delete of SENT past a window; FAILED never reaped.
  • #10 repair: reconciles both orphan classes.
  • #11 explicit selection: @ConditionalOnProperty backend=REDIS (NOT classpath auto-select); supplies OutboxBackend so PG @ConditionalOnMissingBean backs off.
  • #12 client-retry idempotency via the enqueue HEXISTS guard.

markSent/markFailureAttempt honour the SPI 1/0 contract (the mark-failure 2-element [applied,parked] Lua return is unwrapped to applied; parked drives the metric). 16 main classes + 8 Lua scripts.

Rule-61 falsification (flagged for review)

Verified against redis.io/commands/wait: Redis WAIT is a replication barrier — in standalone mode (identity's audited Valkey, cluster_enabled:0) WAIT 0 … returns immediately and does not flush AOF. So appendfsync is the only single-node durability lever. fsyncBarrierEnabled is kept as a documented no-op toggle (default false) for a future replicated topology; RedisDurabilityValidator is the actual enforcement. The design doc §7 floated a "WAIT-after-EVAL fsync barrier" — this corrects it.

Other judgment calls

  • ARCHIVED (PG-only) has no Redis analogue; retention HDELs SENT instead.
  • @JsonInclude(NON_NULL) is load-bearing (Lua cjson decodes JSON null to a truthy sentinel) — documented + unit-tested.
  • markFailureAttempt reads current retries via one GET before the EVAL to size the backoff (best-effort hint; the script is the atomic mutator; duplicates are consumer-deduped).
  • The PG OutboxAutoConfiguration's eager JPA wiring (its own Javadoc's Phase-0b/1 follow-up) is a FROZEN Phase 0a file — left untouched; a Redis-only consumer simply omits spring-data-jpa from its classpath.

Verification

  • 66 unit + 31 Testcontainers-Valkey IT = 97, all green. IT scenarios: crash-after-EVAL→repair; crash-after-ack-before-markSent→stays PENDING; OOM-under-noeviction→enqueue fail-closed (real CONFIG SET maxmemory 1mb); lock-expiry-mid-batch + single-winner; Kafka-outage-beyond-budget→parked; expire/revoke race + UUIDv3 collapse; durability CONFIG SET drift; REDIS-vs-PG autoconfig selection.
  • Full-reactor mvn -B install -DskipTests=falseBUILD SUCCESS (6 modules), 0 [WARNING]. outbox-publisher (Phase 0a) untouched.

Note

The Forgejo mvn install CI check is red on this repo for all commits (pre-existing — runner default node:22-bookworm has no JDK/Maven, mvn -version→exit 127; PR-PLATFORM-CI-1). Verified locally green.

## Wave A.2 Phase 0b — Redis Lua-EVAL `OutboxBackend` module New `redis-outbox-backend` Maven module: a Redis Lua-EVAL implementation of the Phase 0a `OutboxBackend` SPI, so identity-service keeps **ticket mint + outbox event atomic** (the mint is itself a Redis op — ADR-0014 D-3 / `docs/decisions/0014-supporting/identity-redis-outbox-backend.md`). Separate module so the 7 PG consumers never pull in spring-data-redis. Design + the 12 required refinements: `.planning/32-wave-a2-redis-outbox-design.md`. ### Storage model One Hash `{identity:outbox}:entries` (field=`event_id`, value=JSON entry) + companion sorted-sets `:pending` (score=`nextAttemptAt`) + `:expiry-due` (score=`expiresAt`), all hash-tagged `{identity:outbox}` for single-slot `EVAL` (#8). `requiresActiveTransaction()=false`; the backend owns its **own** inline relay (`RedisOutboxRelay`) + cold poller (`RedisOutboxPollerWorker`) — it does **not** route through the PG `OutboxPublisher`'s unconditional AFTER_COMMIT relay (resolves the Phase 0a R3 `@apiNote`). ### 12 refinements → artifacts - **#1 fail-closed Lua** (`outbox-enqueue`): TYPE precheck before any write; HSET entry **before** ZADD index so a mid-script death leaves only a repairable orphan-Hash-entry (never an unpersisted index member); HEXISTS idempotent guard. - **#2 durable expiry intent**: `recordExpiryDue`/`findDueExpiries`/`decideExpiry` + `expiry-decide.lua` + `TicketExpiryEventId` (UUIDv3 collapse). Primitive only — identity wiring is Phase 1. - **#3 parked FAILED**: `mark-failure` returns `[applied,parked]`; FAILED retained in Hash, ZREM'd from `:pending`, never auto-deleted; parked metric. - **#4 lock**: `SET NX PX` owner-token + `lock-renew`/`lock-release` compare-and-* Lua; batch bounded `< lock TTL`; lost-lock aborts batch; `renewMargin < ttl` validated at startup. - **#5 `nextAttemptAt` scoring**: `RedisOutboxBackoff` (exp, capped, overflow-safe); failure re-scores `:pending`; `findPendingBatch` = `ZRANGEBYSCORE -inf now`. - **#7 durability**: `RedisDurabilityValidator` (startup + `@Scheduled`) asserts appendonly/appendfsync/noeviction; FAIL_FAST/WARN modes. - **#8 hash-tags** on all 4 keys. - **#9 retention**: HSCAN + status-guarded delete of SENT past a window; FAILED never reaped. - **#10 repair**: reconciles both orphan classes. - **#11 explicit selection**: `@ConditionalOnProperty backend=REDIS` (NOT classpath auto-select); supplies `OutboxBackend` so PG `@ConditionalOnMissingBean` backs off. - **#12 client-retry idempotency** via the enqueue HEXISTS guard. `markSent`/`markFailureAttempt` honour the SPI `1`/`0` contract (the `mark-failure` 2-element `[applied,parked]` Lua return is unwrapped to `applied`; `parked` drives the metric). 16 main classes + 8 Lua scripts. ### Rule-61 falsification (flagged for review) Verified against `redis.io/commands/wait`: Redis **`WAIT` is a replication barrier** — in standalone mode (identity's audited Valkey, `cluster_enabled:0`) `WAIT 0 …` returns immediately and does **not** flush AOF. So `appendfsync` is the only single-node durability lever. `fsyncBarrierEnabled` is kept as a documented **no-op toggle** (default false) for a future replicated topology; `RedisDurabilityValidator` is the actual enforcement. The design doc §7 floated a "WAIT-after-EVAL fsync barrier" — this corrects it. ### Other judgment calls - `ARCHIVED` (PG-only) has no Redis analogue; retention HDELs SENT instead. - `@JsonInclude(NON_NULL)` is load-bearing (Lua `cjson` decodes JSON `null` to a truthy sentinel) — documented + unit-tested. - `markFailureAttempt` reads current retries via one GET before the EVAL to size the backoff (best-effort hint; the script is the atomic mutator; duplicates are consumer-deduped). - The PG `OutboxAutoConfiguration`'s eager JPA wiring (its own Javadoc's Phase-0b/1 follow-up) is a FROZEN Phase 0a file — left untouched; a Redis-only consumer simply omits spring-data-jpa from its classpath. ### Verification - 66 unit + 31 Testcontainers-Valkey IT = **97**, all green. IT scenarios: crash-after-EVAL→repair; crash-after-ack-before-markSent→stays PENDING; OOM-under-noeviction→enqueue fail-closed (real `CONFIG SET maxmemory 1mb`); lock-expiry-mid-batch + single-winner; Kafka-outage-beyond-budget→parked; expire/revoke race + UUIDv3 collapse; durability `CONFIG SET` drift; REDIS-vs-PG autoconfig selection. - Full-reactor `mvn -B install -DskipTests=false` → **BUILD SUCCESS** (6 modules), **0 `[WARNING]`**. `outbox-publisher` (Phase 0a) untouched. ### Note The Forgejo `mvn install` CI check is red on this repo for **all** commits (pre-existing — runner default `node:22-bookworm` has no JDK/Maven, `mvn -version`→exit 127; PR-PLATFORM-CI-1). Verified locally green.
feat(redis-outbox): Redis Lua-EVAL OutboxBackend module (Wave A.2 Phase 0b)
Some checks failed
im2be-platform-libs CI / mvn install (pull_request) Failing after 2s
im2be-platform-libs CI / mvn verify (main only) (pull_request) Has been skipped
f3f96df505
New redis-outbox-backend Maven module: a Redis Lua-EVAL implementation of the
Phase 0a OutboxBackend SPI, so identity-service can keep ticket mint + outbox
event atomic (the mint is itself a Redis op — ADR-0014 D-3 /
0014-supporting/identity-redis-outbox-backend.md). Separate module so the 7 PG
consumers never pull in spring-data-redis. Design + the 12 required refinements:
.planning/32-wave-a2-redis-outbox-design.md.

Storage: one Hash {identity:outbox}:entries (field=event_id, JSON entry) +
companion sorted-sets :pending (score=nextAttemptAt) + :expiry-due
(score=expiresAt), all hash-tagged for single-slot EVAL (#8). requiresActive
Transaction()=false; the backend owns its OWN inline relay (RedisOutboxRelay)
+ cold poller (RedisOutboxPollerWorker) — it does NOT route through the PG
OutboxPublisher's unconditional AFTER_COMMIT relay (Phase 0a R3 @apiNote).

12 refinements → artifacts:
- #1 fail-closed Lua (outbox-enqueue): TYPE precheck before any write; HSET
  entry BEFORE ZADD index so a mid-script death leaves only a repairable
  orphan-Hash-entry, never an unpersisted index member; HEXISTS idempotent guard.
- #2 durable expiry intent: recordExpiryDue/findDueExpiries/decideExpiry +
  expiry-decide.lua + TicketExpiryEventId (UUIDv3) — primitive only, identity
  wiring is Phase 1.
- #3 parked FAILED: mark-failure returns [applied,parked]; FAILED retained in
  Hash, ZREM'd from :pending, never auto-deleted; parked metric.
- #4 lock: SET NX PX owner-token + lock-renew/lock-release compare-and-* Lua,
  batch bounded < lock TTL, lost-lock aborts batch; renewMargin<ttl validated.
- #5 nextAttemptAt scoring: RedisOutboxBackoff (exp, capped, overflow-safe);
  failure re-scores :pending; findPendingBatch = ZRANGEBYSCORE -inf now.
- #7 durability: RedisDurabilityValidator (startup + @Scheduled) asserts
  appendonly/appendfsync/noeviction; FAIL_FAST/WARN modes.
- #8 hash-tags on all 4 keys.
- #9 retention: HSCAN + status-guarded delete of SENT past a window; FAILED
  never reaped.
- #10 repair: reconciles both orphan classes (zset-member-no-hash,
  hash-PENDING-no-zset).
- #11 explicit selection: @ConditionalOnProperty backend=REDIS (NOT classpath
  auto-select); supplies OutboxBackend so PG @ConditionalOnMissingBean backs off.
- #12 client-retry idempotency via the enqueue HEXISTS guard.

markSent / markFailureAttempt honour the SPI 1/0 contract (mark-failure's
2-element [applied,parked] Lua return is unwrapped to applied; parked drives the
metric). 16 main classes + 8 Lua scripts.

Rule-61 falsification (verified against redis.io/commands/wait): Redis WAIT is a
REPLICATION barrier — in standalone (identity's audited Valkey) WAIT 0 returns
immediately and does NOT flush AOF. So appendfsync is the only single-node
durability lever; fsyncBarrierEnabled is kept as a documented no-op toggle
(default false) for a future replicated topology, and RedisDurabilityValidator
is the actual enforcement.

Tests: 66 unit + 31 Testcontainers-Valkey IT = 97, all green. IT scenarios:
crash-after-EVAL→repair, crash-after-ack-before-markSent→stays PENDING,
OOM-under-noeviction→enqueue fail-closed (real CONFIG SET maxmemory),
lock-expiry-mid-batch + single-winner, Kafka-outage-beyond-budget→parked,
expire/revoke race + UUIDv3 collapse, durability CONFIG-SET drift, REDIS-vs-PG
autoconfig selection. Full-reactor mvn install BUILD SUCCESS (6 modules), 0
warnings. outbox-publisher (Phase 0a) untouched; parent pom +module +3 shared
version props.

Superseded by round 2.

Show previous round

hib-pr-reviewer review — PR #8 (affinity-intelligence-rework/im2be-platform-libs)

Round 1 — head f3f96df505e9, base main, trigger opened

TL;DR: NEEDS_WORK — kept 2 agreed+verified major findings (corrupted-JSON poison-pill in cold poller; unbounded .get() in hot relay) plus 2 minors and 1 info; no findings dropped.

Summary

Arbitration — Redis Outbox Backend (PR #8)

Reconciled 7 candidate findings (4 from A, 3 from B). Used Read to verify all five unique-to-one findings at their cited lines.

Agreed finding kept (1): Both A and B independently flag RedisOutboxBackend.java:154 for the LinkedHashSet ordering assumption — confirmed at that line (rangeByScore()Set<String>new ArrayList<>(dueEventIds)).

Unique-to-A kept (3):

  • Line 179 — decode(json).toRecord() throws unchecked RedisOutboxStorageException on malformed JSON with no catch in the for-loop; decode() at lines 410-416 confirmed. Permanently blocks cold poller. KEPT major.
  • Line 188-189 in Properties.java — Javadoc cites RedisOutboxBackendValidation which grep confirms exists nowhere in the codebase; actual validation is in RedisOutboxAutoConfiguration constructor lines 96-103. KEPT minor.
  • Line 81 — @EnableScheduling on the @AutoConfiguration class confirmed. KEPT info.

Unique-to-B kept (2):

  • RedisOutboxRelay.java:112.get() with no timeout args confirmed; contrast with tryColdPublish at PollerWorker line 194 which uses .get(sendTimeoutMs, TimeUnit.MILLISECONDS). Blocking indefinitely on Kafka stall is a real major defect. KEPT major.
  • RedisOutboxPollerWorker.java:147elapsedMs is computed from tickStartNanos and only grows; once threshold is crossed there is no lastRenewNanos reset, so renewLock is called on every remaining record in the batch. Confirmed by reading lines 143-167. KEPT minor.

No findings dropped: all six verified positively.

Blast Radius

This PR adds a single new opt-in module (redis-outbox-backend) gated behind im2be.outbox.backend=REDIS. The blast radius is contained to consumers that explicitly enable the Redis backend. However, the @EnableScheduling placement activates the Spring task scheduler globally within those consumer contexts, and the hot-relay threading issue affects any request path that calls relay() under Kafka degradation.

BLAST_SCORE: 5/10

Risk Indicators

Indicator Value
Sensitive functions RedisOutboxRelay.relay, RedisOutboxRelay.relayAsync, RedisOutboxPollerWorker.tryColdPublish, RedisOutboxPollerWorker.pollAndPublishPending, RedisOutboxBackend.findPendingBatch, RedisOutboxBackend.decode, RedisDurabilityValidator.validate
Migration touched
Test delta
Dependency changes

CI status (head f3f96df505e9)

Overall: ✗ failure

2 checks: 2 pending

Check State Link
im2be-platform-libs CI / mvn install (pull_request) pending details
im2be-platform-libs CI / mvn verify (main only) (pull_request) pending details

Findings (6)

[MAJOR] Corrupted Hash entry propagates unhandled RedisOutboxStorageException from findPendingBatch, permanently blocking the cold poller

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxBackend.java:179

Line 179:

batch.add(decode(json).toRecord());

decode() (lines 410-416) throws RedisOutboxStorageException (unchecked) on JsonProcessingException. This exception propagates out of findPendingBatch() and out of sweep() — it is never caught before the finally { releaseLock(ownerToken); } in pollAndPublishPending(). The @Scheduled framework swallows it and reschedules. On the next tick the same corrupted entry is still at the front of :pending (its score is unchanged), and the cycle repeats indefinitely, silencing all cold-path relaying.

The null-check at line 172 handles orphan members, but there is no catch for malformed JSON. The repair worker's own decode() path correctly continues on parse error but does not detect this condition as needing repair.

Fix: Add a try-catch inside the for-loop body:

try {
    batch.add(decode(json).toRecord());
} catch (RedisOutboxStorageException e) {
    LOG.warn("redis.outbox.poll skipping un-decodable entry eventId={} — will not be relayed",
            orderedIds.get(i), e);
}

Also add a unit test that injects a malformed JSON value and asserts findPendingBatch returns the other valid entries rather than throwing.

[MAJOR] relay() blocks on .get() with no timeout — can exhaust calling threads when Kafka degrades

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxRelay.java:112

Lines 110-112:

kafkaTemplate.send(record.getTopic(), aggregateKey(record.getAggregateId()),
        record.getPayloadBytes())
        .get();

This blocks indefinitely if the Kafka producer stalls. The cold-path equivalent in RedisOutboxPollerWorker.tryColdPublish (line 194) correctly uses .get(sendTimeoutMs, TimeUnit.MILLISECONDS). When the broker becomes unreachable and no delivery.timeout.ms is set in the producer config, every thread that calls relay() hangs until interrupted. Crucially, a hung .get() never raises an error to the circuit breaker — so the breaker never opens, and thread-pool exhaustion occurs before any self-healing kicks in.

Fix: Apply the same bounded-timeout pattern as the poller:

kafkaTemplate.send(record.getTopic(), aggregateKey(record.getAggregateId()),
        record.getPayloadBytes())
        .get(sendTimeoutMs, TimeUnit.MILLISECONDS);

Either reuse RedisOutboxProperties.Poller.sendTimeoutMs (already wired) or add a dedicated relay.send-timeout-ms property. Propagate TimeoutException to circuitBreaker.onError(...) the same way ExecutionException is handled at line 121.

[MINOR] Javadoc cites non-existent class RedisOutboxBackendValidation for the lockRenewMargin cross-field check

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxProperties.java:189

Lines 187-188 of the lockRenewMarginMs field Javadoc:

* Must be strictly less than {@link #lockTtlMs} (validated at
* post-binding in {@link RedisOutboxBackendValidation}).

RedisOutboxBackendValidation does not exist anywhere in this module (confirmed via repo-wide grep — the only occurrence is this comment). The actual validation lives in RedisOutboxAutoConfiguration constructor lines 96-103. Future maintainers following the @link will hit a dead end.

Fix:

* Must be strictly less than {@link #lockTtlMs} (validated at
* context startup in {@link RedisOutboxAutoConfiguration}).

[MINOR] Renew condition fires on every remaining record after the threshold is crossed — O(N) redundant Redis round-trips

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxPollerWorker.java:147

Line 144 computes elapsedMs from tickStartNanos; it only grows. Line 147:

if (elapsedMs >= renewAtElapsedMs && !renewLock(ownerToken, lockTtlMs)) {

Once elapsedMs crosses renewAtElapsedMs, the condition stays true for every subsequent record in the batch. There is no lastRenewNanos reset after a successful renewal, so renewLock is called (and makes a Redis EVAL round-trip) for every record from the threshold-crossing position to the end of the batch. For a 500-item batch with a 30s TTL and 10s margin this is potentially ~100 extra Redis calls per tick.

Correctness is intact (TTL reset is idempotent), but the unnecessary round-trips add latency under load.

Fix: Track a lastRenewNanos and gate on both elapsed and since-last-renew:

long lastRenewNanos = tickStartNanos;
for (final OutboxRecord record : batch) {
    final long elapsedMs = (System.nanoTime() - tickStartNanos) / 1_000_000L;
    final long sinceLastMs = (System.nanoTime() - lastRenewNanos) / 1_000_000L;
    if (elapsedMs >= renewAtElapsedMs && sinceLastMs >= renewAtElapsedMs) {
        if (!renewLock(ownerToken, lockTtlMs)) { break; }
        lastRenewNanos = System.nanoTime();
    }
    ...
}

Alternatively a simple boolean renewed flag reset per tick achieves the same goal.

[MINOR] findPendingBatch ordering claim relies on LinkedHashSet implementation detail of Spring Data Redis

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxBackend.java:154

Lines 154-163:

final Set<String> dueEventIds = redis.opsForZSet().rangeByScore(
        keys.pending(), Double.NEGATIVE_INFINITY, nowMs, 0, batchSize);
// ...
final List<String> orderedIds = new ArrayList<>(dueEventIds);

The method Javadoc states results are in "ascending-score (oldest-due-first) order", but opsForZSet().rangeByScore() is declared to return Set<V>, which carries no ordering contract. The implementation works because Spring Data Redis / Lettuce returns a LinkedHashSet that preserves ZRANGEBYSCORE insertion order, and ArrayList's copy constructor iterates in that order — but this is an undeclared implementation detail.

A Spring Data upgrade switching to a non-insertion-ordered set would silently break relay ordering without any compile error or test failure.

Preferred fix: Use rangeByScoreWithScores() (as already done in findDueExpiries() at line 336) and collect values from the TypedTuple stream, which is explicitly sorted. Minimal fix: Add a code comment documenting the LinkedHashSet dependency so a future library upgrade triggers a conscious review.

[INFO] @EnableScheduling on @AutoConfiguration activates the Spring task scheduler globally for the consumer context

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxAutoConfiguration.java:81

Line 81: @EnableScheduling is placed directly on the @AutoConfiguration class. This registers ScheduledAnnotationBeanPostProcessor for the entire application context, not just for this module's beans. Any @Scheduled beans in the consumer application that were previously dormant (because the consumer had not opted in to Spring's task scheduler) will be activated as a side effect of setting im2be.outbox.backend=REDIS.

The annotation is required for the poller/durability/retention/repair ticks, and the blast is scoped to consumers who explicitly opt in, so the risk is low. However this side effect is not documented anywhere in the module's README or CLAUDE.md.

Suggested mitigation: Add a sentence to the module README noting that enabling im2be.outbox.backend=REDIS also enables the Spring task scheduler in the application context, so consumer teams are not surprised.

Verdict

NEEDS_WORK


hib-pr-reviewer • round 1 • 6 findings (2M/3m/1i) • 2026-05-28T11:35:02.016Z → 2026-05-28T11:37:02.457Z • posted-as: pr-reviewer-bot

<!-- hib-pr-reviewer collapsed --> > _Superseded by round 2._ <details> <summary>Show previous round</summary> ## hib-pr-reviewer review — PR #8 (affinity-intelligence-rework/im2be-platform-libs) **Round 1** — head `f3f96df505e9`, base `main`, trigger `opened` **TL;DR:** NEEDS_WORK — kept 2 agreed+verified major findings (corrupted-JSON poison-pill in cold poller; unbounded .get() in hot relay) plus 2 minors and 1 info; no findings dropped. ### Summary ## Arbitration — Redis Outbox Backend (PR #8) Reconciled 7 candidate findings (4 from A, 3 from B). Used `Read` to verify all five unique-to-one findings at their cited lines. **Agreed finding kept (1):** Both A and B independently flag `RedisOutboxBackend.java:154` for the `LinkedHashSet` ordering assumption — confirmed at that line (`rangeByScore()` → `Set<String>` → `new ArrayList<>(dueEventIds)`). **Unique-to-A kept (3):** - Line 179 — `decode(json).toRecord()` throws unchecked `RedisOutboxStorageException` on malformed JSON with no catch in the for-loop; `decode()` at lines 410-416 confirmed. Permanently blocks cold poller. **KEPT major.** - Line 188-189 in Properties.java — Javadoc cites `RedisOutboxBackendValidation` which grep confirms exists nowhere in the codebase; actual validation is in `RedisOutboxAutoConfiguration` constructor lines 96-103. **KEPT minor.** - Line 81 — `@EnableScheduling` on the `@AutoConfiguration` class confirmed. **KEPT info.** **Unique-to-B kept (2):** - `RedisOutboxRelay.java:112` — `.get()` with no timeout args confirmed; contrast with `tryColdPublish` at PollerWorker line 194 which uses `.get(sendTimeoutMs, TimeUnit.MILLISECONDS)`. Blocking indefinitely on Kafka stall is a real major defect. **KEPT major.** - `RedisOutboxPollerWorker.java:147` — `elapsedMs` is computed from `tickStartNanos` and only grows; once threshold is crossed there is no `lastRenewNanos` reset, so `renewLock` is called on every remaining record in the batch. Confirmed by reading lines 143-167. **KEPT minor.** No findings dropped: all six verified positively. ### Blast Radius This PR adds a single new opt-in module (`redis-outbox-backend`) gated behind `im2be.outbox.backend=REDIS`. The blast radius is contained to consumers that explicitly enable the Redis backend. However, the `@EnableScheduling` placement activates the Spring task scheduler globally within those consumer contexts, and the hot-relay threading issue affects any request path that calls `relay()` under Kafka degradation. **BLAST_SCORE: 5/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `RedisOutboxRelay.relay`, `RedisOutboxRelay.relayAsync`, `RedisOutboxPollerWorker.tryColdPublish`, `RedisOutboxPollerWorker.pollAndPublishPending`, `RedisOutboxBackend.findPendingBatch`, `RedisOutboxBackend.decode`, `RedisDurabilityValidator.validate` | | Migration touched | — | | Test delta | — | | Dependency changes | — | ### CI status (head `f3f96df505e9`) **Overall: ✗ failure** 2 checks: 2 pending | Check | State | Link | |---|---|---| | im2be-platform-libs CI / mvn install (pull_request) | ⏳ pending | [details](/affinity-intelligence-rework/im2be-platform-libs/actions/runs/66/jobs/0) | | im2be-platform-libs CI / mvn verify (main only) (pull_request) | ⏳ pending | [details](/affinity-intelligence-rework/im2be-platform-libs/actions/runs/66/jobs/1) | ### Findings (6) #### **[MAJOR]** Corrupted Hash entry propagates unhandled RedisOutboxStorageException from findPendingBatch, permanently blocking the cold poller _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxBackend.java:179_ Line 179: ```java batch.add(decode(json).toRecord()); ``` `decode()` (lines 410-416) throws `RedisOutboxStorageException` (unchecked) on `JsonProcessingException`. This exception propagates out of `findPendingBatch()` and out of `sweep()` — it is never caught before the `finally { releaseLock(ownerToken); }` in `pollAndPublishPending()`. The `@Scheduled` framework swallows it and reschedules. On the next tick the same corrupted entry is still at the front of `:pending` (its score is unchanged), and the cycle repeats indefinitely, **silencing all cold-path relaying**. The `null`-check at line 172 handles orphan members, but there is no catch for malformed JSON. The repair worker's own `decode()` path correctly `continue`s on parse error but does not detect this condition as needing repair. **Fix:** Add a try-catch inside the for-loop body: ```java try { batch.add(decode(json).toRecord()); } catch (RedisOutboxStorageException e) { LOG.warn("redis.outbox.poll skipping un-decodable entry eventId={} — will not be relayed", orderedIds.get(i), e); } ``` Also add a unit test that injects a malformed JSON value and asserts `findPendingBatch` returns the other valid entries rather than throwing. #### **[MAJOR]** `relay()` blocks on `.get()` with no timeout — can exhaust calling threads when Kafka degrades _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxRelay.java:112_ Lines 110-112: ```java kafkaTemplate.send(record.getTopic(), aggregateKey(record.getAggregateId()), record.getPayloadBytes()) .get(); ``` This blocks **indefinitely** if the Kafka producer stalls. The cold-path equivalent in `RedisOutboxPollerWorker.tryColdPublish` (line 194) correctly uses `.get(sendTimeoutMs, TimeUnit.MILLISECONDS)`. When the broker becomes unreachable and no `delivery.timeout.ms` is set in the producer config, every thread that calls `relay()` hangs until interrupted. Crucially, a hung `.get()` never raises an error to the circuit breaker — so the breaker never opens, and thread-pool exhaustion occurs before any self-healing kicks in. **Fix:** Apply the same bounded-timeout pattern as the poller: ```java kafkaTemplate.send(record.getTopic(), aggregateKey(record.getAggregateId()), record.getPayloadBytes()) .get(sendTimeoutMs, TimeUnit.MILLISECONDS); ``` Either reuse `RedisOutboxProperties.Poller.sendTimeoutMs` (already wired) or add a dedicated `relay.send-timeout-ms` property. Propagate `TimeoutException` to `circuitBreaker.onError(...)` the same way `ExecutionException` is handled at line 121. #### **[MINOR]** Javadoc cites non-existent class RedisOutboxBackendValidation for the lockRenewMargin cross-field check _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxProperties.java:189_ Lines 187-188 of the `lockRenewMarginMs` field Javadoc: ``` * Must be strictly less than {@link #lockTtlMs} (validated at * post-binding in {@link RedisOutboxBackendValidation}). ``` `RedisOutboxBackendValidation` does not exist anywhere in this module (confirmed via repo-wide grep — the only occurrence is this comment). The actual validation lives in `RedisOutboxAutoConfiguration` constructor lines 96-103. Future maintainers following the `@link` will hit a dead end. **Fix:** ```java * Must be strictly less than {@link #lockTtlMs} (validated at * context startup in {@link RedisOutboxAutoConfiguration}). ``` #### **[MINOR]** Renew condition fires on every remaining record after the threshold is crossed — O(N) redundant Redis round-trips _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxPollerWorker.java:147_ Line 144 computes `elapsedMs` from `tickStartNanos`; it only grows. Line 147: ```java if (elapsedMs >= renewAtElapsedMs && !renewLock(ownerToken, lockTtlMs)) { ``` Once `elapsedMs` crosses `renewAtElapsedMs`, the condition stays true for every subsequent record in the batch. There is no `lastRenewNanos` reset after a successful renewal, so `renewLock` is called (and makes a Redis EVAL round-trip) for every record from the threshold-crossing position to the end of the batch. For a 500-item batch with a 30s TTL and 10s margin this is potentially ~100 extra Redis calls per tick. Correctness is intact (TTL reset is idempotent), but the unnecessary round-trips add latency under load. **Fix:** Track a `lastRenewNanos` and gate on both elapsed and since-last-renew: ```java long lastRenewNanos = tickStartNanos; for (final OutboxRecord record : batch) { final long elapsedMs = (System.nanoTime() - tickStartNanos) / 1_000_000L; final long sinceLastMs = (System.nanoTime() - lastRenewNanos) / 1_000_000L; if (elapsedMs >= renewAtElapsedMs && sinceLastMs >= renewAtElapsedMs) { if (!renewLock(ownerToken, lockTtlMs)) { break; } lastRenewNanos = System.nanoTime(); } ... } ``` Alternatively a simple `boolean renewed` flag reset per tick achieves the same goal. #### **[MINOR]** `findPendingBatch` ordering claim relies on LinkedHashSet implementation detail of Spring Data Redis _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxBackend.java:154_ Lines 154-163: ```java final Set<String> dueEventIds = redis.opsForZSet().rangeByScore( keys.pending(), Double.NEGATIVE_INFINITY, nowMs, 0, batchSize); // ... final List<String> orderedIds = new ArrayList<>(dueEventIds); ``` The method Javadoc states results are in "ascending-score (oldest-due-first) order", but `opsForZSet().rangeByScore()` is declared to return `Set<V>`, which carries no ordering contract. The implementation works because Spring Data Redis / Lettuce returns a `LinkedHashSet` that preserves `ZRANGEBYSCORE` insertion order, and `ArrayList`'s copy constructor iterates in that order — but this is an undeclared implementation detail. A Spring Data upgrade switching to a non-insertion-ordered set would silently break relay ordering without any compile error or test failure. **Preferred fix:** Use `rangeByScoreWithScores()` (as already done in `findDueExpiries()` at line 336) and collect values from the `TypedTuple` stream, which is explicitly sorted. **Minimal fix:** Add a code comment documenting the `LinkedHashSet` dependency so a future library upgrade triggers a conscious review. #### **[INFO]** @EnableScheduling on @AutoConfiguration activates the Spring task scheduler globally for the consumer context _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxAutoConfiguration.java:81_ Line 81: `@EnableScheduling` is placed directly on the `@AutoConfiguration` class. This registers `ScheduledAnnotationBeanPostProcessor` for the **entire** application context, not just for this module's beans. Any `@Scheduled` beans in the consumer application that were previously dormant (because the consumer had not opted in to Spring's task scheduler) will be activated as a side effect of setting `im2be.outbox.backend=REDIS`. The annotation is required for the poller/durability/retention/repair ticks, and the blast is scoped to consumers who explicitly opt in, so the risk is low. However this side effect is not documented anywhere in the module's README or CLAUDE.md. **Suggested mitigation:** Add a sentence to the module README noting that enabling `im2be.outbox.backend=REDIS` also enables the Spring task scheduler in the application context, so consumer teams are not surprised. ### Verdict **NEEDS_WORK** --- <sub>hib-pr-reviewer • round 1 • 6 findings (2M/3m/1i) • 2026-05-28T11:35:02.016Z → 2026-05-28T11:37:02.457Z • posted-as: pr-reviewer-bot</sub> </details>
fix(redis-outbox): apply PR #8 R1 reviewer findings (Phase 0b)
Some checks failed
im2be-platform-libs CI / mvn install (pull_request) Failing after 2s
im2be-platform-libs CI / mvn verify (main only) (pull_request) Has been skipped
3a50299300
R1 verdict NEEDS_WORK, findings (kept=6: 2 MAJOR / 3 MINOR / 1 INFO):

(1) MAJOR RedisOutboxBackend.findPendingBatch:179 — corrupted-JSON poison-pill.
    decode() throws unchecked RedisOutboxStorageException uncaught in the
    for-loop → propagates out of the @Scheduled sweep every tick while the
    entry's :pending score is unchanged → permanently stalls ALL cold relaying.
(2) MAJOR RedisOutboxRelay.relay:112 — unbounded .get(). A stalled broker
    blocks the calling thread forever; a hung .get() never raises to the breaker
    so it never opens → thread exhaustion before self-heal.
(3) MINOR RedisOutboxProperties:189 — Javadoc @link to non-existent
    RedisOutboxBackendValidation.
(4) MINOR RedisOutboxPollerWorker:147 — renew fired on every remaining row once
    the elapsed threshold was crossed (O(batch) redundant Redis EVALs).
(5) MINOR RedisOutboxBackend:154 — findPendingBatch ordering relied on the
    undeclared LinkedHashSet impl detail of rangeByScore's Set<V> return.
(6) INFO RedisOutboxAutoConfiguration:81 — @EnableScheduling activates the
    Spring scheduler context-wide (consumer side effect, undocumented).

Fix:
- (1) try-catch the per-entry decode in findPendingBatch: skip + WARN + a new
  im2be_outbox_redis_undecodable_total metric (observable, not a silent
  forever-skip; operator inspects/repairs off the metric). Other due entries
  still relay. New unit test injects malformed JSON and asserts the valid entry
  survives + the metric fires.
- (2) bound relay's send on a new sendTimeoutMs ctor arg (wired from
  poller.send-timeout-ms in the autoconfig); add TimeoutException to the
  onError+markFailureAttempt catch so the breaker can open. New unit test proves
  a never-completing future returns promptly + records a failure attempt.
- (3) Javadoc @link → RedisOutboxAutoConfiguration (the real validation site).
- (4) gate renewal on time-SINCE-LAST-RENEW (lastRenewNanos), so it fires at
  most once per (ttl - margin) window.
- (5) findPendingBatch uses rangeByScoreWithScores + an explicit
  Comparator.comparingDouble(score) sort, so oldest-due-first ordering is OUR
  guarantee, not a library impl detail (consistent with findDueExpiries).
- (6) document the context-wide @EnableScheduling side effect in the autoconfig
  class Javadoc.

Ripple (rule 63): the relay ctor change (5→6 args) broke
RedisOutboxFailureLifecycleIT's construction — updated. The findPendingBatch
switch to rangeByScoreWithScores broke 2 RedisOutboxBackendTest stubs
(rangeByScore → rangeByScoreWithScores returning Set<DefaultTypedTuple>) —
updated.

Verification:
- mvn -pl redis-outbox-backend -am clean verify → BUILD SUCCESS, 0 [WARNING]
- 68 unit + 31 IT green (+2 new: malformed-JSON skip + relay send-timeout).

Superseded by round 3.

Show previous round

hib-pr-reviewer review — PR #8 (affinity-intelligence-rework/im2be-platform-libs)

Round 2 — head 3a502993001b, base main, trigger synchronize

TL;DR: CONDITIONAL_APPROVE — kept all 3 unique-to-one findings after direct Read verification; all minor; no blocking issues.

Summary

Arbitration — Round 2

All three findings are unique-to-one reviewer and required verification. All three were confirmed against HEAD.

A-1 (unique-to-A)@EnableScheduling at line 88 of RedisOutboxAutoConfiguration.java confirmed. Javadoc at lines 70–79 documents the side-effect but the annotation and its context-wide behaviour are unchanged. Kept.

A-2 (unique-to-A)scanHashEntries() at lines 157–170 of RedisOutboxRepairWorker.java confirmed. The method accumulates both keys and full JSON values (field.getValue().toString()) into a HashMap before any filtering. Kept.

B-1 (unique-to-B)findDueExpiries() at lines 372–378 of RedisOutboxBackend.java confirmed. The method iterates a Set<TypedTuple<String>> without calling .sorted(Comparator.comparingDouble(...)), despite the Javadoc at line 354 guaranteeing "(oldest-due-first)". The analogous fix for findPendingBatch() is visible at line 171 of the same file, with a comment explicitly noting the reviewer-R1-MINOR rationale — making the omission in findDueExpiries() a clear oversight. Kept.

Result: 3 unique-to-one findings kept after verification, 0 dropped. All minor.

Blast Radius

All changes are confined to the new redis-outbox-backend module. The auto-configuration registers via AutoConfiguration.imports and activates only when im2be.outbox.backend=REDIS is set, so existing consumers using the default PostgreSQL backend are unaffected. The @EnableScheduling side-effect (finding 1) is the only cross-cutting concern — it reaches into the consumer's full application context when the backend is selected.

BLAST_SCORE: 4/10

Risk Indicators

Indicator Value
Sensitive functions RedisOutboxAutoConfiguration (EnableScheduling context-wide side-effect), RedisDurabilityValidator (AOF + noeviction enforcement gate), RedisOutboxRelay (hot-path Kafka send with circuit-breaker), RedisOutboxBackend.findDueExpiries (expiry ordering contract)
Migration touched
Test delta
Dependency changes

CI status (head 3a502993001b)

Overall: ✗ failure

2 checks: 2 pending

Check State Link
im2be-platform-libs CI / mvn install (pull_request) pending details
im2be-platform-libs CI / mvn verify (main only) (pull_request) pending details

Findings (3)

[MINOR] @EnableScheduling side-effect still present — documentation improved but not fixed (R1-6 carry-over)

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxAutoConfiguration.java:88

The @EnableScheduling annotation on the auto-configuration class (line 88) registers ScheduledAnnotationBeanPostProcessor for the entire application context. The Javadoc at lines 70–79 now explicitly documents this, which is a meaningful improvement, but the behaviour is unchanged: any consumer @Scheduled bean that was dormant because the consumer had not opted into scheduling will silently start running the moment im2be.outbox.backend=REDIS is set.

A safe, minimal fix that preserves the documented semantics without removing scheduling support:

// Remove @EnableScheduling from the class-level annotation.
// Add a nested activator that scopes the opt-in:

@Configuration
@ConditionalOnProperty(prefix = "im2be.outbox.redis.poller",
        name = "enabled", havingValue = "true", matchIfMissing = true)
@EnableScheduling
static class SchedulingActivator {}

This scopes the scheduling opt-in to consumers that have the backend workers active.

[MINOR] scanHashEntries() materialises every Hash field value (full JSON) into JVM heap with no upper bound

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxRepairWorker.java:157

scanHashEntries() (lines 157–170) accumulates both the key and full JSON value of every entry in the :entries Hash into a java.util.HashMap before the cross-set comparison begins. Under the default 24-hour retention TTL, a moderately active system can accumulate tens of thousands of entries; at ~500 bytes/entry JSON this becomes tens-to-hundreds of MB materialised per repair-worker tick.

The Class A check (orphan zset members) requires only field names, not values. The Class B check (orphan Hash entries) needs only the status field for candidate IDs not found in the pending set. Both can be served with a two-pass approach:

// Class A: scan only Hash keys
private Set<String> scanHashKeys() {
    final Set<String> keys = new HashSet<>();
    final ScanOptions options = ScanOptions.scanOptions().count(256).build();
    try (Cursor<Map.Entry<Object, Object>> cursor =
                 redis.<Object, Object>opsForHash().scan(this.keys.entries(), options)) {
        while (cursor.hasNext()) {
            final Object k = cursor.next().getKey();
            if (k != null) keys.add(k.toString());
        }
    }
    return keys;
}

For Class B, decode on demand via a targeted HMGET for only those candidate IDs that are in Hash but NOT in pending, rather than pre-loading all values. This keeps the repair worker heap-neutral at the cost of one additional targeted fetch for the candidate set — a worthwhile trade given unbounded growth under noeviction.

[MINOR] findDueExpiries() iterates Set without sorting — breaks the "oldest-due-first" Javadoc contract

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxBackend.java:372

findDueExpiries() at lines 372–378 iterates the Set<ZSetOperations.TypedTuple<String>> returned by rangeByScoreWithScores() without sorting by score, yet the Javadoc at line 354 guarantees "(oldest-due-first)". Set<V> carries no ordering guarantee; the contract relies on the undocumented insertion-order behaviour of whatever LinkedHashSet Spring Data Redis happens to return.

The identical issue was fixed for findPendingBatch() at line 171 (same file), where a comment explicitly calls out the reviewer-R1-MINOR rationale:

.sorted(Comparator.comparingDouble(ZSetOperations.TypedTuple::getScore))

Apply the same pattern to findDueExpiries():

final List<UUID> ids = due.stream()
    .filter(t -> t.getValue() != null && t.getScore() != null)
    .sorted(Comparator.comparingDouble(ZSetOperations.TypedTuple::getScore))
    .map(ZSetOperations.TypedTuple::getValue)
    .map(UUID::fromString)
    .toList();

The Phase-1 due-time worker will drive expiry ordering off this list; incorrect ordering (e.g. newest-first under a different Set impl) would process recent expirations before older ones, violating the promised fairness contract.

Verdict

CONDITIONAL_APPROVE


hib-pr-reviewer • round 2 • 3 findings (3m) • 2026-05-28T11:52:53.172Z → 2026-05-28T11:54:04.320Z • posted-as: pr-reviewer-bot

<!-- hib-pr-reviewer collapsed --> > _Superseded by round 3._ <details> <summary>Show previous round</summary> ## hib-pr-reviewer review — PR #8 (affinity-intelligence-rework/im2be-platform-libs) **Round 2** — head `3a502993001b`, base `main`, trigger `synchronize` **TL;DR:** CONDITIONAL_APPROVE — kept all 3 unique-to-one findings after direct Read verification; all minor; no blocking issues. ### Summary ## Arbitration — Round 2 All three findings are unique-to-one reviewer and required verification. All three were confirmed against HEAD. **A-1 (unique-to-A)** — `@EnableScheduling` at line 88 of `RedisOutboxAutoConfiguration.java` confirmed. Javadoc at lines 70–79 documents the side-effect but the annotation and its context-wide behaviour are unchanged. **Kept.** **A-2 (unique-to-A)** — `scanHashEntries()` at lines 157–170 of `RedisOutboxRepairWorker.java` confirmed. The method accumulates both keys *and* full JSON values (`field.getValue().toString()`) into a `HashMap` before any filtering. **Kept.** **B-1 (unique-to-B)** — `findDueExpiries()` at lines 372–378 of `RedisOutboxBackend.java` confirmed. The method iterates a `Set<TypedTuple<String>>` without calling `.sorted(Comparator.comparingDouble(...))`, despite the Javadoc at line 354 guaranteeing `"(oldest-due-first)"`. The analogous fix for `findPendingBatch()` is visible at line 171 of the same file, with a comment explicitly noting the reviewer-R1-MINOR rationale — making the omission in `findDueExpiries()` a clear oversight. **Kept.** Result: 3 unique-to-one findings kept after verification, 0 dropped. All minor. ### Blast Radius All changes are confined to the new `redis-outbox-backend` module. The auto-configuration registers via `AutoConfiguration.imports` and activates only when `im2be.outbox.backend=REDIS` is set, so existing consumers using the default PostgreSQL backend are unaffected. The `@EnableScheduling` side-effect (finding 1) is the only cross-cutting concern — it reaches into the consumer's full application context when the backend is selected. **BLAST_SCORE: 4/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `RedisOutboxAutoConfiguration (EnableScheduling context-wide side-effect)`, `RedisDurabilityValidator (AOF + noeviction enforcement gate)`, `RedisOutboxRelay (hot-path Kafka send with circuit-breaker)`, `RedisOutboxBackend.findDueExpiries (expiry ordering contract)` | | Migration touched | — | | Test delta | — | | Dependency changes | — | ### CI status (head `3a502993001b`) **Overall: ✗ failure** 2 checks: 2 pending | Check | State | Link | |---|---|---| | im2be-platform-libs CI / mvn install (pull_request) | ⏳ pending | [details](/affinity-intelligence-rework/im2be-platform-libs/actions/runs/67/jobs/0) | | im2be-platform-libs CI / mvn verify (main only) (pull_request) | ⏳ pending | [details](/affinity-intelligence-rework/im2be-platform-libs/actions/runs/67/jobs/1) | ### Findings (3) #### **[MINOR]** @EnableScheduling side-effect still present — documentation improved but not fixed (R1-6 carry-over) _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxAutoConfiguration.java:88_ The `@EnableScheduling` annotation on the auto-configuration class (line 88) registers `ScheduledAnnotationBeanPostProcessor` for the entire application context. The Javadoc at lines 70–79 now explicitly documents this, which is a meaningful improvement, but the behaviour is unchanged: any consumer `@Scheduled` bean that was dormant because the consumer had not opted into scheduling will silently start running the moment `im2be.outbox.backend=REDIS` is set. A safe, minimal fix that preserves the documented semantics without removing scheduling support: ```java // Remove @EnableScheduling from the class-level annotation. // Add a nested activator that scopes the opt-in: @Configuration @ConditionalOnProperty(prefix = "im2be.outbox.redis.poller", name = "enabled", havingValue = "true", matchIfMissing = true) @EnableScheduling static class SchedulingActivator {} ``` This scopes the scheduling opt-in to consumers that have the backend workers active. #### **[MINOR]** scanHashEntries() materialises every Hash field value (full JSON) into JVM heap with no upper bound _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxRepairWorker.java:157_ `scanHashEntries()` (lines 157–170) accumulates both the key **and** full JSON value of every entry in the `:entries` Hash into a `java.util.HashMap` before the cross-set comparison begins. Under the default 24-hour retention TTL, a moderately active system can accumulate tens of thousands of entries; at ~500 bytes/entry JSON this becomes tens-to-hundreds of MB materialised per repair-worker tick. The Class A check (orphan zset members) requires only field names, not values. The Class B check (orphan Hash entries) needs only the status field for candidate IDs not found in the pending set. Both can be served with a two-pass approach: ```java // Class A: scan only Hash keys private Set<String> scanHashKeys() { final Set<String> keys = new HashSet<>(); final ScanOptions options = ScanOptions.scanOptions().count(256).build(); try (Cursor<Map.Entry<Object, Object>> cursor = redis.<Object, Object>opsForHash().scan(this.keys.entries(), options)) { while (cursor.hasNext()) { final Object k = cursor.next().getKey(); if (k != null) keys.add(k.toString()); } } return keys; } ``` For Class B, decode on demand via a targeted `HMGET` for only those candidate IDs that are in Hash but NOT in pending, rather than pre-loading all values. This keeps the repair worker heap-neutral at the cost of one additional targeted fetch for the candidate set — a worthwhile trade given unbounded growth under `noeviction`. #### **[MINOR]** findDueExpiries() iterates Set<TypedTuple> without sorting — breaks the "oldest-due-first" Javadoc contract _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisOutboxBackend.java:372_ `findDueExpiries()` at lines 372–378 iterates the `Set<ZSetOperations.TypedTuple<String>>` returned by `rangeByScoreWithScores()` without sorting by score, yet the Javadoc at line 354 guarantees `"(oldest-due-first)"`. `Set<V>` carries no ordering guarantee; the contract relies on the undocumented insertion-order behaviour of whatever `LinkedHashSet` Spring Data Redis happens to return. The identical issue was fixed for `findPendingBatch()` at line 171 (same file), where a comment explicitly calls out the reviewer-R1-MINOR rationale: ```java .sorted(Comparator.comparingDouble(ZSetOperations.TypedTuple::getScore)) ``` Apply the same pattern to `findDueExpiries()`: ```java final List<UUID> ids = due.stream() .filter(t -> t.getValue() != null && t.getScore() != null) .sorted(Comparator.comparingDouble(ZSetOperations.TypedTuple::getScore)) .map(ZSetOperations.TypedTuple::getValue) .map(UUID::fromString) .toList(); ``` The Phase-1 due-time worker will drive expiry ordering off this list; incorrect ordering (e.g. newest-first under a different `Set` impl) would process recent expirations before older ones, violating the promised fairness contract. ### Verdict **CONDITIONAL_APPROVE** --- <sub>hib-pr-reviewer • round 2 • 3 findings (3m) • 2026-05-28T11:52:53.172Z → 2026-05-28T11:54:04.320Z • posted-as: pr-reviewer-bot</sub> </details>
fix(redis-outbox): apply PR #8 R2 reviewer findings (Phase 0b)
Some checks failed
im2be-platform-libs CI / mvn install (pull_request) Failing after 2s
im2be-platform-libs CI / mvn verify (main only) (pull_request) Has been skipped
71f9e4525a
R2 verdict CONDITIONAL_APPROVE, findings (kept=3, all MINOR):

(1) RedisOutboxBackend.findDueExpiries:372 — iterated the rangeByScoreWithScores
    Set without sorting, contradicting its "oldest-due-first" Javadoc. Direct
    ripple of the R1 findPendingBatch fix that wasn't applied to its sibling.
(2) RedisOutboxRepairWorker.scanHashEntries:157 — materialised every Hash field
    VALUE (full JSON) into heap before filtering; unbounded under noeviction as
    :entries accumulates SENT/FAILED entries within the retention window.
(3) RedisOutboxAutoConfiguration:88 — @EnableScheduling activates the scheduler
    context-wide (R1 INFO escalated to MINOR with a suggested nested-activator).

Fix:
- (1) findDueExpiries now sorts by score (Comparator.comparingDouble) +
  filters null value/score, identical to findPendingBatch — ordering is ours,
  not a Set-impl detail.
- (2) repair worker is now heap-neutral: scanHashKeys() retains only field
  NAMES (HSCAN streams values but each goes out of scope immediately), Class A
  compares against that key set, and Class B fetches only the non-pending
  candidates' values in bounded HMGET chunks (CLASS_B_FETCH_BATCH=256), decoding
  on demand. Peak heap = chunk×entry, never Hash-size×entry.
- (3) NOT applied — the reviewer's nested-activator gated on poller.enabled is
  flawed: durability (unconditional) + retention + repair are ALSO @Scheduled,
  so a poller-only gate would silently stop them; and Spring's @EnableScheduling
  registers the BPP context-wide wherever declared, so a nested activator does
  not scope the leak. Documented as an intrinsic, accepted trade-off in the
  autoconfig Javadoc (true isolation would need a module-private programmatic
  TaskScheduler — a deliberate non-goal). Reasoned won't-fix.

Verification:
- mvn -pl redis-outbox-backend -am clean verify → BUILD SUCCESS, 0 [WARNING]
- 68 unit + 31 IT green; RedisOutboxCrashRecoveryIT (real Valkey) exercises the
  refactored repair worker.

Superseded by round 4.

Show previous round

hib-pr-reviewer review — PR #8 (affinity-intelligence-rework/im2be-platform-libs)

Round 3 — head 71f9e4525a81, base main, trigger synchronize

TL;DR: CONDITIONAL_APPROVE — dropped A-1 (decode() already returns null on corrupt JSON; no exception propagates); kept B-1 (minor: @Scheduled swallows FAIL_FAST throw, verified no custom ErrorHandler in AutoConfiguration); kept B-2 (info/deferred: TOCTOU on alreadyRevoked ARGV[6], Phase 1 concern).

Summary

Arbitration — Round 3

Methodology

All three findings (A-1, B-1, B-2) are unique-to-one, so each required independent verification via Read.

Finding A-1 — DROPPED (disproved by verification)

Reviewer A claimed decode(json) at RedisOutboxRepairWorker.java:148 throws an unchecked RedisOutboxStorageException when a Hash value is corrupt, with no try-catch in the Class B loop.

Actual code (lines 202-209):

private RedisOutboxEntry decode(final String json) {
    try {
        return objectMapper.readValue(json, RedisOutboxEntry.class);
    } catch (com.fasterxml.jackson.core.JsonProcessingException e) {
        LOG.warn("redis.outbox.repair skipping un-decodable entry", e);
        return null;
    }
}

decode() already catches JsonProcessingException internally and returns null. Then at lines 149-151, if (decoded == null || ...) { continue; } handles the null case — the corrupted entry is skipped and the loop continues. The method never throws RedisOutboxStorageException. The premise of A-1 is factually wrong; the fix the reviewer requested is already in place.

Finding B-1 — KEPT (verified)

RedisDurabilityValidator.periodicCheck() (line 109) calls validate("periodic"), which at lines 128-135 throws RedisOutboxDurabilityException extends RuntimeException in FAIL_FAST mode. Spring's @Scheduled dispatcher uses TaskUtils.LOG_AND_SUPPRESS_ERROR_HANDLER by default, which logs and re-schedules on any RuntimeException. Verified that RedisOutboxAutoConfiguration (all 389 lines) registers NO custom TaskScheduler, SchedulingConfigurer, or ErrorHandler bean. The class-level Javadoc (line 22) and the @throws tag (lines 105-106) both imply the periodic check halts the application on FAIL_FAST violation — but this is mechanically impossible under @Scheduled without an overriding ErrorHandler. The relay continues uninterrupted after a post-boot durability regression. B-1 is confirmed.

Finding B-2 — KEPT (verified, info/deferred)

expiry-decide.lua:35 documents ARGV[6] = revoked as a Java-side pre-computed value. The script header confirms Phase 0b exposes the primitive but the due-time worker caller is not wired until Phase 1. The TOCTOU gap (concurrent revocation between Java check and EVAL execution) is architecturally real; the script-level mitigation comment (UUIDv3 dedup) addresses the duplicate-event collapse but not the incorrect event-type selection. Correctly deferred to Phase 1.

Blast Radius

The diff introduces a net-new module (redis-outbox-backend) with its own auto-configuration, four scheduled workers, Lua scripts, and a Spring Boot AutoConfiguration.imports registration. Blast radius is self-contained for consumers that do not opt in to im2be.outbox.backend=REDIS. For those that do, the module activates the global task scheduler context-wide (@EnableScheduling side-effect, previously documented and accepted). The durability-enforcement gap (B-1) affects all FAIL_FAST adopters post-boot; the TOCTOU gap (B-2) is Phase-1-gated.

BLAST_SCORE: 5/10

Risk Indicators

Indicator Value
Sensitive functions RedisDurabilityValidator.validate, RedisDurabilityValidator.afterPropertiesSet, RedisDurabilityValidator.periodicCheck, RedisOutboxBackend.decideExpiry, RedisOutboxRelay.relay, RedisOutboxPollerWorker.pollOnce, expiry-decide.lua (EVAL boundary)
Migration touched
Test delta
Dependency changes

CI status (head 71f9e4525a81)

Overall: ✗ failure

2 checks: 2 pending

Check State Link
im2be-platform-libs CI / mvn install (pull_request) pending details
im2be-platform-libs CI / mvn verify (main only) (pull_request) pending details

Findings (2)

[MINOR] periodicCheck() throws RedisOutboxDurabilityException in FAIL_FAST mode but @Scheduled silently swallows it — relay continues against a non-durable store post-boot

redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisDurabilityValidator.java:109

periodicCheck() (line 109) delegates to validate("periodic"), which at lines 128-135 throws RedisOutboxDurabilityException extends RuntimeException in FAIL_FAST mode. Spring's @Scheduled dispatcher uses TaskUtils.LOG_AND_SUPPRESS_ERROR_HANDLER as its default ErrorHandler, which logs the exception and re-schedules the task — it does NOT propagate the exception or halt the application. RedisOutboxAutoConfiguration registers no custom TaskScheduler, SchedulingConfigurer, or ErrorHandler bean that would override this behaviour.

Consequence: the FAIL_FAST halting guarantee holds only at startup (via afterPropertiesSet()). A live CONFIG SET appendonly no detected by the periodic check fires the im2be_outbox_redis_durability_violations_total metric counter but leaves RedisOutboxRelay and RedisOutboxPollerWorker running uninterrupted against the now-non-durable store. The class-level Javadoc (line 22: "so a live CONFIG SET that weakens durability after boot is caught") and the @throws declaration on periodicCheck() (lines 105-106) both imply a stronger halting semantics than the dispatch model provides.

Fix options (pick one):

  1. Register a SchedulingConfigurer or TaskScheduler bean with a custom ErrorHandler that calls context.close() on RedisOutboxDurabilityException — matches the startup-gate semantics for periodic violations.
  2. Keep current behaviour, remove the @throws on periodicCheck(), and update the class-level Javadoc to state explicitly that the periodic check is metric-only in FAIL_FAST mode and does NOT halt the application; operators respond to the alert.

Option 2 is the minimal/safe fix; option 1 is correct for audit-grade enforcement.

[INFO] Caller-supplied revoked ARGV[6] creates a TOCTOU gap — concurrent ticket revocation between Java check and EVAL can produce both TicketExpired and TicketRevoked events

redis-outbox-backend/src/main/resources/META-INF/scripts/expiry-decide.lua:35

Line 35 documents ARGV[6] = revoked ('1' when the ticket was already revoked, else '0'). The Java caller (RedisOutboxBackend.decideExpiry) receives alreadyRevoked as a pre-computed boolean determined before the EVAL is issued. A concurrent TicketRevoked processing between when the caller checks revocation state and when the Lua script executes atomically would cause the script to enqueue a TicketExpired event even though the ticket was revoked in that window, yielding both a TicketRevoked and a TicketExpired event for the same ticket.

The design mitigates this via ADR-0014 D-4 consumer-side dedup (the deterministic UUIDv3 expiredEventId collapses double-fires), and the security-audit pipeline (ADR-0002 §3b) must be idempotent with respect to this race. However, the revocation-suppression path (script lines 63-66) is bypassed in the race window, and consumers receive two distinct event types rather than one. Depending on downstream processing, both events being simultaneously present could cause an incorrect state transition (ticket showing as both expired AND revoked rather than only revoked).

A robust fix would move the revocation check inside the Lua script itself against a Redis-resident revocation marker (e.g. EXISTS :ticket:{id} or a dedicated revoked-set membership check), eliminating the Java-side TOCTOU window entirely. Deferred to Phase 1 — the due-time worker that calls decideExpiry is not yet wired in Phase 0b; the gap becomes exploitable only when Phase 1 lands.

Verdict

CONDITIONAL_APPROVE


hib-pr-reviewer • round 3 • 2 findings (1m/1i) • 2026-05-28T12:07:27.767Z → 2026-05-28T12:09:06.769Z • posted-as: pr-reviewer-bot

<!-- hib-pr-reviewer collapsed --> > _Superseded by round 4._ <details> <summary>Show previous round</summary> ## hib-pr-reviewer review — PR #8 (affinity-intelligence-rework/im2be-platform-libs) **Round 3** — head `71f9e4525a81`, base `main`, trigger `synchronize` **TL;DR:** CONDITIONAL_APPROVE — dropped A-1 (decode() already returns null on corrupt JSON; no exception propagates); kept B-1 (minor: @Scheduled swallows FAIL_FAST throw, verified no custom ErrorHandler in AutoConfiguration); kept B-2 (info/deferred: TOCTOU on alreadyRevoked ARGV[6], Phase 1 concern). ### Summary ## Arbitration — Round 3 ### Methodology All three findings (A-1, B-1, B-2) are unique-to-one, so each required independent verification via `Read`. ### Finding A-1 — DROPPED (disproved by verification) Reviewer A claimed `decode(json)` at `RedisOutboxRepairWorker.java:148` throws an unchecked `RedisOutboxStorageException` when a Hash value is corrupt, with no try-catch in the Class B loop. **Actual code (lines 202-209):** ```java private RedisOutboxEntry decode(final String json) { try { return objectMapper.readValue(json, RedisOutboxEntry.class); } catch (com.fasterxml.jackson.core.JsonProcessingException e) { LOG.warn("redis.outbox.repair skipping un-decodable entry", e); return null; } } ``` `decode()` already catches `JsonProcessingException` internally and returns `null`. Then at lines 149-151, `if (decoded == null || ...) { continue; }` handles the null case — the corrupted entry is skipped and the loop continues. The method never throws `RedisOutboxStorageException`. The premise of A-1 is factually wrong; the fix the reviewer requested is already in place. ### Finding B-1 — KEPT (verified) `RedisDurabilityValidator.periodicCheck()` (line 109) calls `validate("periodic")`, which at lines 128-135 throws `RedisOutboxDurabilityException extends RuntimeException` in FAIL_FAST mode. Spring's `@Scheduled` dispatcher uses `TaskUtils.LOG_AND_SUPPRESS_ERROR_HANDLER` by default, which logs and re-schedules on any `RuntimeException`. Verified that `RedisOutboxAutoConfiguration` (all 389 lines) registers NO custom `TaskScheduler`, `SchedulingConfigurer`, or `ErrorHandler` bean. The class-level Javadoc (line 22) and the `@throws` tag (lines 105-106) both imply the periodic check halts the application on FAIL_FAST violation — but this is mechanically impossible under `@Scheduled` without an overriding `ErrorHandler`. The relay continues uninterrupted after a post-boot durability regression. B-1 is confirmed. ### Finding B-2 — KEPT (verified, info/deferred) `expiry-decide.lua:35` documents `ARGV[6] = revoked` as a Java-side pre-computed value. The script header confirms Phase 0b exposes the primitive but the due-time worker caller is not wired until Phase 1. The TOCTOU gap (concurrent revocation between Java check and EVAL execution) is architecturally real; the script-level mitigation comment (UUIDv3 dedup) addresses the duplicate-event collapse but not the incorrect event-type selection. Correctly deferred to Phase 1. ### Blast Radius The diff introduces a net-new module (redis-outbox-backend) with its own auto-configuration, four scheduled workers, Lua scripts, and a Spring Boot AutoConfiguration.imports registration. Blast radius is self-contained for consumers that do not opt in to im2be.outbox.backend=REDIS. For those that do, the module activates the global task scheduler context-wide (@EnableScheduling side-effect, previously documented and accepted). The durability-enforcement gap (B-1) affects all FAIL_FAST adopters post-boot; the TOCTOU gap (B-2) is Phase-1-gated. **BLAST_SCORE: 5/10** ### Risk Indicators | Indicator | Value | |---|---| | Sensitive functions | `RedisDurabilityValidator.validate`, `RedisDurabilityValidator.afterPropertiesSet`, `RedisDurabilityValidator.periodicCheck`, `RedisOutboxBackend.decideExpiry`, `RedisOutboxRelay.relay`, `RedisOutboxPollerWorker.pollOnce`, `expiry-decide.lua (EVAL boundary)` | | Migration touched | — | | Test delta | — | | Dependency changes | — | ### CI status (head `71f9e4525a81`) **Overall: ✗ failure** 2 checks: 2 pending | Check | State | Link | |---|---|---| | im2be-platform-libs CI / mvn install (pull_request) | ⏳ pending | [details](/affinity-intelligence-rework/im2be-platform-libs/actions/runs/68/jobs/0) | | im2be-platform-libs CI / mvn verify (main only) (pull_request) | ⏳ pending | [details](/affinity-intelligence-rework/im2be-platform-libs/actions/runs/68/jobs/1) | ### Findings (2) #### **[MINOR]** periodicCheck() throws RedisOutboxDurabilityException in FAIL_FAST mode but @Scheduled silently swallows it — relay continues against a non-durable store post-boot _redis-outbox-backend/src/main/java/com/aim2be/platform/outbox/redis/RedisDurabilityValidator.java:109_ `periodicCheck()` (line 109) delegates to `validate("periodic")`, which at lines 128-135 throws `RedisOutboxDurabilityException extends RuntimeException` in FAIL_FAST mode. Spring's `@Scheduled` dispatcher uses `TaskUtils.LOG_AND_SUPPRESS_ERROR_HANDLER` as its default `ErrorHandler`, which logs the exception and re-schedules the task — it does NOT propagate the exception or halt the application. `RedisOutboxAutoConfiguration` registers no custom `TaskScheduler`, `SchedulingConfigurer`, or `ErrorHandler` bean that would override this behaviour. Consequence: the FAIL_FAST halting guarantee holds only at startup (via `afterPropertiesSet()`). A live `CONFIG SET appendonly no` detected by the periodic check fires the `im2be_outbox_redis_durability_violations_total` metric counter but leaves `RedisOutboxRelay` and `RedisOutboxPollerWorker` running uninterrupted against the now-non-durable store. The class-level Javadoc (line 22: "so a live CONFIG SET that weakens durability after boot is caught") and the `@throws` declaration on `periodicCheck()` (lines 105-106) both imply a stronger halting semantics than the dispatch model provides. **Fix options (pick one):** 1. Register a `SchedulingConfigurer` or `TaskScheduler` bean with a custom `ErrorHandler` that calls `context.close()` on `RedisOutboxDurabilityException` — matches the startup-gate semantics for periodic violations. 2. Keep current behaviour, remove the `@throws` on `periodicCheck()`, and update the class-level Javadoc to state explicitly that the periodic check is metric-only in FAIL_FAST mode and does NOT halt the application; operators respond to the alert. Option 2 is the minimal/safe fix; option 1 is correct for audit-grade enforcement. #### **[INFO]** Caller-supplied `revoked` ARGV[6] creates a TOCTOU gap — concurrent ticket revocation between Java check and EVAL can produce both TicketExpired and TicketRevoked events _redis-outbox-backend/src/main/resources/META-INF/scripts/expiry-decide.lua:35_ Line 35 documents `ARGV[6] = revoked ('1' when the ticket was already revoked, else '0')`. The Java caller (`RedisOutboxBackend.decideExpiry`) receives `alreadyRevoked` as a pre-computed boolean determined *before* the `EVAL` is issued. A concurrent `TicketRevoked` processing between when the caller checks revocation state and when the Lua script executes atomically would cause the script to enqueue a `TicketExpired` event even though the ticket was revoked in that window, yielding both a `TicketRevoked` and a `TicketExpired` event for the same ticket. The design mitigates this via ADR-0014 D-4 consumer-side dedup (the deterministic UUIDv3 `expiredEventId` collapses double-fires), and the security-audit pipeline (ADR-0002 §3b) must be idempotent with respect to this race. However, the revocation-suppression path (script lines 63-66) is bypassed in the race window, and consumers receive two distinct event types rather than one. Depending on downstream processing, both events being simultaneously present could cause an incorrect state transition (ticket showing as both expired AND revoked rather than only revoked). A robust fix would move the revocation check inside the Lua script itself against a Redis-resident revocation marker (e.g. `EXISTS :ticket:{id}` or a dedicated revoked-set membership check), eliminating the Java-side TOCTOU window entirely. **Deferred to Phase 1** — the due-time worker that calls `decideExpiry` is not yet wired in Phase 0b; the gap becomes exploitable only when Phase 1 lands. ### Verdict **CONDITIONAL_APPROVE** --- <sub>hib-pr-reviewer • round 3 • 2 findings (1m/1i) • 2026-05-28T12:07:27.767Z → 2026-05-28T12:09:06.769Z • posted-as: pr-reviewer-bot</sub> </details>
fix(redis-outbox): apply PR #8 R3 reviewer findings (Phase 0b)
Some checks failed
im2be-platform-libs CI / mvn install (pull_request) Failing after 2s
im2be-platform-libs CI / mvn verify (main only) (pull_request) Has been skipped
9853b72e52
R3 verdict CONDITIONAL_APPROVE. A-1 (findPendingBatch poison-pill) correctly
DROPPED by the arbiter (already handled in R1). Findings (kept=2: 1 MINOR / 1
INFO-deferred):

(1) MINOR RedisDurabilityValidator.periodicCheck:109 — overstated halting
    semantics. periodicCheck() throws RedisOutboxDurabilityException in FAIL_FAST,
    but Spring's @Scheduled default ErrorHandler (LOG_AND_SUPPRESS) logs +
    reschedules — it does NOT halt. So FAIL_FAST only halts at STARTUP; a live
    CONFIG SET weakening durability post-boot fired the metric but left the
    relay/poller running, while the Javadoc + @throws implied a hard stop.
(2) INFO expiry-decide.lua:35 — TOCTOU on the caller-supplied `revoked` ARGV.
    A concurrent TicketRevoked in the Java-check->EVAL window can yield both
    TicketExpired + TicketRevoked. Deferred to Phase 1 (the due-time worker that
    calls decideExpiry is unreachable in Phase 0b).

Fix:
- (1) periodicCheck() now CATCHES the FAIL_FAST throw and converts it into an
  explicit, actionable ERROR alert (paired with the existing
  durability-violations metric) instead of letting Spring swallow it — honest
  (a @Scheduled method cannot abort a running app) + loud. The startup gate
  (afterPropertiesSet) keeps the hard fail-closed throw. Class Javadoc +
  periodicCheck @throws corrected to state the two-phase enforcement. New unit
  test asserts periodicCheck() alerts-but-does-not-throw on a FAIL_FAST violation.
- (2) documented the TOCTOU + the planned in-script revocation-check fix (against
  a Redis-resident revocation marker) as TODO(Phase 1) on both decideExpiry's
  Javadoc and the expiry-decide.lua ARGV[6] comment — to land with the Phase-1
  due-time worker that first exercises this path. Not fixed in Phase 0b (no
  caller; the gap is unreachable until Phase 1).

Ripple (rule 63): the periodicCheck semantics change broke 2 RedisOutboxDurabilityIT
tests that asserted periodicCheck() THROWS (the old, incorrect behavior). Rewrote
them to assert detection (validate("periodic") throws internally) AND
alert-don't-halt (periodicCheck() does not throw) against real Valkey.

Verification:
- mvn -pl redis-outbox-backend -am clean verify → BUILD SUCCESS, 0 [WARNING]
- 69 unit + 31 IT green.

hib-pr-reviewer review — PR #8 (affinity-intelligence-rework/im2be-platform-libs)

Round 4 — head 9853b72e5276, base main, trigger synchronize

TL;DR: NO_NEW_FINDINGS — No new findings this round.

Summary

[quorum-converged] A=0 = B=0. ## Round 4 Review — Wave A.2 Phase 0b Redis Outbox Backend

All prior actionable findings from rounds 1–3 have been addressed in this synchronize push:

Finding Status
R1-1 Corrupt Hash entry poisons cold poller (findPendingBatch) Fixed — RedisOutboxStorageException caught, entry skipped, recordUndecodable() metric fired (lines 192–206)
R1-2 relay() unbounded .get() exhausts threads Fixed — .get(sendTimeoutMs, TimeUnit.MILLISECONDS) with TimeoutException routed to circuit-breaker onError (lines 123–125)
R1-3 Javadoc cites non-existent RedisOutboxBackendValidation Fixed — now links {@link RedisOutboxAutoConfiguration} (line 188)
R1-4 Renew fires O(N) times after threshold crossed Fixed — lastRenewNanos reset after each successful renewal, subsequent records compare against time-since-last-renew (lines 143–160)
R1-5 Ordering relies on LinkedHashSet implementation detail Fixed — explicit .sorted(Comparator.comparingDouble(...)) in findPendingBatch (line 171)
R2-8 scanHashEntries() materialises full JSON into JVM heap Fixed — scanHashKeys() discards values immediately, retains only field names (lines 184–199)
R2-9 findDueExpiries() unsorted Fixed — explicit .sorted() ascending by score (line 379)
R3-10 periodicCheck() silently drops FAIL_FAST violations Fixed — explicit catch (RedisOutboxDurabilityException) block converts to LOG.error with actionable message (lines 122–131)

Two items remain as documented trade-offs with author-accepted rationale — neither is a new finding:

  • @EnableScheduling (R1-6, R2-7): Still active on @AutoConfiguration (line 100 of RedisOutboxAutoConfiguration). Author has added extensive Javadoc (lines 75–92) explaining why narrowing it to a nested @Configuration would not scope the side-effect, and that a programmatic TaskScheduler is a deliberate non-goal. Reasoning is sound; the context-wide activation is now explicitly disclosed to library consumers.
  • TOCTOU in expiry-decide.lua (R3-11): alreadyRevoked ARGV[6] still computed by the Java caller before EVAL (line 35). Author added a detailed TODO(Phase 1, reviewer R3 INFO) block (lines 36–44) describing the gap and the in-script EXISTS-based fix that lands with the Phase-1 due-time worker. The decideExpiry call site is unreachable in Phase 0b.

No new findings this round. The module is substantially complete for Phase 0b.

CI status (head 9853b72e5276)

Overall: ✗ failure

2 checks: 2 pending

Check State Link
im2be-platform-libs CI / mvn install (pull_request) pending details
im2be-platform-libs CI / mvn verify (main only) (pull_request) pending details

Findings

No new findings this round.

Quorum converged on empty findings (A + B both returned 0).

Verdict

NO_NEW_FINDINGS


hib-pr-reviewer • round 4 • 0 findings • 2026-05-28T12:15:21.914Z → 2026-05-28T12:20:52.951Z • posted-as: pr-reviewer-bot • [bookkeeping fallback]

## hib-pr-reviewer review — PR #8 (affinity-intelligence-rework/im2be-platform-libs) **Round 4** — head `9853b72e5276`, base `main`, trigger `synchronize` **TL;DR:** NO_NEW_FINDINGS — No new findings this round. ### Summary [quorum-converged] A=0 = B=0. ## Round 4 Review — Wave A.2 Phase 0b Redis Outbox Backend All prior actionable findings from rounds 1–3 have been addressed in this synchronize push: | Finding | Status | |---|---| | R1-1 Corrupt Hash entry poisons cold poller (`findPendingBatch`) | ✅ Fixed — `RedisOutboxStorageException` caught, entry skipped, `recordUndecodable()` metric fired (lines 192–206) | | R1-2 `relay()` unbounded `.get()` exhausts threads | ✅ Fixed — `.get(sendTimeoutMs, TimeUnit.MILLISECONDS)` with `TimeoutException` routed to circuit-breaker `onError` (lines 123–125) | | R1-3 Javadoc cites non-existent `RedisOutboxBackendValidation` | ✅ Fixed — now links `{@link RedisOutboxAutoConfiguration}` (line 188) | | R1-4 Renew fires O(N) times after threshold crossed | ✅ Fixed — `lastRenewNanos` reset after each successful renewal, subsequent records compare against time-since-last-renew (lines 143–160) | | R1-5 Ordering relies on `LinkedHashSet` implementation detail | ✅ Fixed — explicit `.sorted(Comparator.comparingDouble(...))` in `findPendingBatch` (line 171) | | R2-8 `scanHashEntries()` materialises full JSON into JVM heap | ✅ Fixed — `scanHashKeys()` discards values immediately, retains only field names (lines 184–199) | | R2-9 `findDueExpiries()` unsorted | ✅ Fixed — explicit `.sorted()` ascending by score (line 379) | | R3-10 `periodicCheck()` silently drops `FAIL_FAST` violations | ✅ Fixed — explicit `catch (RedisOutboxDurabilityException)` block converts to `LOG.error` with actionable message (lines 122–131) | **Two items remain as documented trade-offs with author-accepted rationale — neither is a new finding:** - **@EnableScheduling (R1-6, R2-7)**: Still active on `@AutoConfiguration` (line 100 of `RedisOutboxAutoConfiguration`). Author has added extensive Javadoc (lines 75–92) explaining why narrowing it to a nested `@Configuration` would not scope the side-effect, and that a programmatic `TaskScheduler` is a deliberate non-goal. Reasoning is sound; the context-wide activation is now explicitly disclosed to library consumers. - **TOCTOU in `expiry-decide.lua` (R3-11)**: `alreadyRevoked` ARGV[6] still computed by the Java caller before EVAL (line 35). Author added a detailed `TODO(Phase 1, reviewer R3 INFO)` block (lines 36–44) describing the gap and the in-script `EXISTS`-based fix that lands with the Phase-1 due-time worker. The `decideExpiry` call site is unreachable in Phase 0b. No new findings this round. The module is substantially complete for Phase 0b. ### CI status (head `9853b72e5276`) **Overall: ✗ failure** 2 checks: 2 pending | Check | State | Link | |---|---|---| | im2be-platform-libs CI / mvn install (pull_request) | ⏳ pending | [details](/affinity-intelligence-rework/im2be-platform-libs/actions/runs/69/jobs/0) | | im2be-platform-libs CI / mvn verify (main only) (pull_request) | ⏳ pending | [details](/affinity-intelligence-rework/im2be-platform-libs/actions/runs/69/jobs/1) | ### Findings **No new findings this round.** _Quorum converged on empty findings (A + B both returned 0)._ ### Verdict **NO_NEW_FINDINGS** --- <sub>hib-pr-reviewer • round 4 • 0 findings • 2026-05-28T12:15:21.914Z → 2026-05-28T12:20:52.951Z • posted-as: pr-reviewer-bot • [bookkeeping fallback]</sub>
hibryda deleted branch feat/redis-outbox-backend 2026-05-28 14:22:58 +02:00
Sign in to join this conversation.
No reviewers
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
affinity-intelligence-rework/im2be-platform-libs!8
No description provided.