CI: mvn install runs Testcontainers IT but the runner job-container has no Docker → chronic red CI #20
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
im2be-platform-libsCI is red on every PR. Themvn installjob(
.forgejo/workflows/ci.yml→ jobmaven-install, runner labeljava17) runsthe full Maven lifecycle, which includes the Failsafe integration-test phase.
OutboxPublisherITis a Testcontainers IT (spins apostgres:15-alpinecontainer + uses
@EmbeddedKafka). The Forgejo Actions runner executes the jobinside the
maven-node-ci:3.9-temurin17container, which has no access toa Docker daemon, so Testcontainers cannot start the PG container → the IT
errors →
mvn installfails → PR CI red.Evidence: PR #19 (and historically #15/#16/#18) all show CI red while the
identical
mvn install(incl. the IT) is green locally (Docker present).This is environmental (no Docker-in-job), not a code failure.
This blocks "green CI everywhere" and erodes the value of required-status
checks (a real failure would be indistinguishable from this chronic red).
Affected
affinity-intelligence-rework/im2be-platform-libs.forgejo/workflows/ci.ymljobsmaven-install(PR) andmaven-verify(main push — same Testcontainers dependency).forgejo-runner.service(labeljava17→docker://…/maven-node-ci:3.9-temurin17), managed by im2be-monoscripts/forgejo-runner-setup.sh+ci/forgejo-runner-maven-node/.Resolution options
A — give the runner job-container Docker access (preferred; keeps IT
coverage in CI). Mount the host Docker socket into the job container so
Testcontainers can launch sibling containers: add
/var/run/docker.sock:/var/run/docker.sockto the runner'scontainer.valid_volumes(and have the workflow request it / setoptions), plusTESTCONTAINERS_HOST_OVERRIDE/TESTCONTAINERS_RYUK_DISABLEDas needed. Runner-config change on the operator's laptop — operator-owned
(persistent infra; do not edit unilaterally — see im2be-mono session notes on
not mutating the working runner).
B — split ITs out of the PR job. Change
maven-installtomvn install -DskipITs; run ITs only in a Docker-capable stage. Loses CI-sideIT coverage on PRs (the IT still runs in the local dev loop). No runner change.
C — guard the IT to skip (not fail) when Docker is absent (fast "green now",
keeps coverage where Docker exists). Add a Testcontainers/JUnit assumption
(
assumeTrue(DockerClientFactory.instance().isDockerAvailable())or@EnabledIf…) so the IT is skipped on a Docker-less runner and runslocally + on a Docker-enabled runner. Code-only PR (no operator/runner change).
Best combined with A later for full CI coverage.
Recommendation
Ship C now to get CI green without losing local coverage, then do A so
the IT actually runs in CI. Owner of A = operator (runner config).
Acceptance
im2be-platform-libsPR CI (maven-install) and main-push (maven-verify)report success (not red) for a clean build.
recorded reason (option C) — never a silent red.
Refs: PR #19 (outbox byte[]-producer fix) merged green-locally but CI-red on
this exact condition. Memora #3640 (the fix) notes the CI signature.
Option C merged — CI is now GREEN (both jobs).
PR #21 (squash →
main c524f89) shipped option C plus the dependency-resolution wall behind it:@Testcontainers(disabledWithoutDocker = true)→ they SKIP (not error) on the Docker-less runner job-container.<repositories>in the parent pom soerror-event-publisherresolvesproto-observability:1.0.0(published only to forgejo-air; anonymous read = HTTP 200) on a clean runner without~/.m2/settings.xml.Runner result on
main:mvn install (push)— ✅ Successful in 1m36smvn verify (main only) (push)— ✅ Successful in 1m39sProven beforehand via a faithful bare-container repro (
maven-node-ci:3.9-temurin17, no Docker socket, no~/.m2) → BUILD SUCCESS, 8 modules, 44 IT methods skipped.Residual (this issue stays open for it): option A — mount the host Docker socket into the runner job-container (
container.valid_volumes += /var/run/docker.sock,TESTCONTAINERS_HOST_OVERRIDE/RYUK) so the ITs actually run in CI instead of skipping. That's a persistent runner-config change on the operator's laptop → operator-owned. Until then, IT coverage is via the local dev loop (where Docker is present).Resolved — option C shipped; option A declined as net-negative. Closing.
CI is green (PR #21): the 11 Testcontainers ITs skip cleanly when no Docker is reachable and run in full locally. After inspecting the runner, option A (mount the Docker socket so the ITs also run in CI) is not worth doing on this infra:
Steady state: skip-in-CI (green) + local IT coverage. If a Docker-capable CI runner ever exists (e.g. a dedicated host-exec label or a separate runner),
disabledWithoutDockermeans the ITs will automatically start running there with no code change — so reopening is cheap if the calculus changes.