Class RedisDurabilityValidator

java.lang.Object
com.aim2be.platform.outbox.redis.RedisDurabilityValidator
All Implemented Interfaces:
org.springframework.beans.factory.InitializingBean

public class RedisDurabilityValidator extends Object implements org.springframework.beans.factory.InitializingBean
Durability-SLA validator (refinement #7). The Redis outbox is only as durable as the store it runs on: under appendfsync everysec a crash loses up to ~1s of acked writes, and under an eviction policy other than noeviction an OOM silently drops keys instead of failing the write closed. This validator asserts the store is configured for durable, fail-closed writes — at startup (InitializingBean) AND periodically (Scheduled) so a live CONFIG SET that weakens durability after boot is detected. The two phases have DIFFERENT enforcement (reviewer R3): the startup gate HALTS bootstrap on a FAIL_FAST violation; the periodic check can only ALERT (a @Scheduled method cannot abort a running app — see periodicCheck()).

Required configuration:

  • appendonly = yes — AOF persistence on (RDB-only would lose all writes since the last snapshot on crash).
  • appendfsync ∈ {everysec, always}no means the OS decides when to flush (unbounded loss window). When require-always-fsync=true (audit-grade), ONLY always passes.
  • maxmemory-policy = noeviction — turns an OOM into a fail-closed write error (correct only when the mint fails-closed + alerts, which the enqueue script + this validator together ensure); any eviction policy could silently evict an un-relayed outbox entry.

On violation: RedisOutboxProperties.EnforcementMode.FAIL_FAST at STARTUP throws (aborting bootstrap before any event is relayed against a non-durable store); on the PERIODIC path it instead emits an explicit ERROR alert (the throw cannot halt a live app — see periodicCheck()). RedisOutboxProperties.EnforcementMode.WARN logs loudly + lets the app run (dev only). Both modes increment im2be_outbox_redis_durability_violations_total on every violation.

Doc-grounded note (rule 61). A WAIT-after-EVAL fsync barrier was considered (design doc §7) but VERIFIED to be a no-op for AOF durability in a standalone deployment (https://redis.io/docs/latest/commands/wait/WAIT is a replication barrier; with zero replicas it returns immediately and does NOT flush the AOF). The appendfsync policy is the only real single-node durability lever — hence this validator enforces it rather than relying on a barrier (see RedisOutboxProperties.Durability.isFsyncBarrierEnabled()).

  • Constructor Details

    • RedisDurabilityValidator

      public RedisDurabilityValidator(org.springframework.data.redis.core.StringRedisTemplate redis, RedisOutboxProperties.Durability config, RedisOutboxMetrics metrics)
      Parameters:
      redis - Redis template (used for CONFIG GET); never null
      config - durability config (enforcement mode, always-fsync flag); never null
      metrics - metrics binder (durability-violation counter); never null
  • Method Details

    • afterPropertiesSet

      public void afterPropertiesSet()
      Startup gate. In FAIL_FAST mode a violation throws here, aborting application bootstrap before any event is relayed against a non-durable store.
      Specified by:
      afterPropertiesSet in interface org.springframework.beans.factory.InitializingBean
      Throws:
      RedisOutboxDurabilityException - in FAIL_FAST mode when the store is not durably configured
    • periodicCheck

      @Scheduled(fixedDelayString="${im2be.outbox.redis.durability.check-interval-ms:60000}") public void periodicCheck()
      Periodic re-check. Detects a live CONFIG SET that weakens durability after boot. Cadence governed by im2be.outbox.redis.durability.check-interval-ms.

      Does NOT halt a running app (reviewer R3). A @Scheduled method cannot abort the application: Spring's default scheduled-task ErrorHandler (LOG_AND_SUPPRESS_ERROR_HANDLER) logs + swallows any thrown exception and re-schedules the next tick. So in FAIL_FAST mode a post-boot violation is converted here into an explicit, actionable ERROR alert (paired with the im2be_outbox_redis_durability_violations_total metric) rather than thrown — the startup gate remains the hard fail-closed; live regressions are alerted, and an operator restores durability (or restarts to re-trigger the startup gate). The exception is intentionally NOT propagated out of this method.