fix(health): cancel/restart 202 (not 503) when only the publish ack times out #13
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/cancel-restart-202-on-deadline"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Fixes a false-negative I hit live:
POST /cancel/:jobIdreturned 503 (cancel publish deadline exceeded) even though the engine subscriber received the message and aborted the job.Cause
Promise.race(redis.publish, deadline)doesn't cancel the losingredis.publish— so on a deadline the publish is already in flight and (cancel/restart are idempotent best-effort) may well have delivered. Reporting503 failedwas misleading.Fix
PublishDeadlineErrordistinguishes a deadline from a real Redis rejection: deadline →202 {status:"dispatched-unconfirmed"}(caller verifies via the queue), a genuine publish error stays 503. Applied symmetrically to/cancel+/restart.cancelPublishDeadlineMs/restartPublishDeadlineMs) — fast-testable + an ops tuning knob; the two existing deadline tests now assert 202 (no 2s waits), +1 restart real-rejection-503 test.Also backfills the CHANGELOG entry for the merged multi-origin OAuth (PR #11). full suite 2169.
Observed live cancelling a moot review: POST /cancel returned 503 ("cancel publish deadline exceeded") even though the engine subscriber received it and aborted the job. Promise.race(publish, deadline) doesn't cancel the losing redis.publish, so on a deadline the publish is in flight and (cancel/ restart being idempotent best-effort) may well have delivered — 503 was wrong. - New PublishDeadlineError distinguishes a deadline from a real Redis rejection: deadline → 202 {status:"dispatched-unconfirmed"} (caller verifies via queue); a genuine publish error stays 503. - Applied symmetrically to /cancel and /restart. - Publish deadlines made injectable (cancelPublishDeadlineMs / restartPublishDeadlineMs) — fast-testable + an ops tuning knob. The two existing deadline tests now assert 202 (fast, no 2s waits); +1 restart real-rejection-503 test. Also backfills the CHANGELOG entry for the merged multi-origin OAuth (PR #11). full suite 2169.Show previous round
hib-pr-reviewer review — PR #13 (hib/hib-pr-reviewer)
Round 1 — head
434be224faf3, basemain, triggeropenedTL;DR: CONDITIONAL_APPROVE — kept 1 agreed CHANGELOG finding (A+B) and 1 verified unique-to-B stale-comment finding; no blocking issues.
Summary
Reconciliation Summary
Verified both findings sets against HEAD.
Finding 1 (AGREED — A + B): CHANGELOG.md line 18 — confirmed. The line ends
…(+1 restart real-rejection-503 test).and then immediately continues withhandleCallback never wrote the bootstrapAdmin(provider,userId) key…— the full bootstrapAdmin bullet text folded in without any separator, erasing the standalone bootstrap-admin entry. Kept.Finding 2 (unique to B): src/health.ts lines 396–398 — confirmed. The inline comment still reads
// returns 503 instead of hanging the request foreverdespite the behavior having been changed to202 dispatched-unconfirmedviaPublishDeadlineError. The comment immediately precedes thePromise.racecall that now throwsPublishDeadlineErroron timeout; it was not updated. Kept.No findings dropped. Both issues are minor/documentation-class.
Blast Radius
The diff is tightly scoped to two HTTP endpoints (
POST /cancel/:jobIdandPOST /restart) insrc/health.tsand their test coverage. No shared state or exported interfaces are affected beyond the injectable deadline options onHealthServerOpts. Both issues found are documentation-class (CHANGELOG prose and an inline comment) with no runtime impact.BLAST_SCORE: 2/10
Risk Indicators
CI status (head
434be224faf3)Overall: ✓ success
3 checks: 3 pending
Findings (1)
[MINOR] CHANGELOG Fixed bullet corrupted — bootstrapAdmin fix text folded into cancel/restart entry
CHANGELOG.md:18
Line 18 is a single mega-bullet that begins with the cancel/restart 202 fix description and continues — without any separator — with the full text of the previously-standalone First-login admin self-bootstrap (audit #2, 2026-06-02) entry:
handleCallback never wrote the bootstrapAdmin(provider,userId) key, so isCallerAdmin always returned false → /admin/promote + /admin/demote were permanently unreachable…. That bootstrap text (which coverssrc/auth/{routes,redis-keys}.ts+tests/auth-routes.test.ts) is now invisible as a discrete changelog entry and is incorrectly attributed to this PR.Fix: split line 18 at the end of
(+1 restart real-rejection-503 test).and restore the bootstrap text as its own- **First-login admin self-bootstrap…**bullet under### Fixed.Verdict
CONDITIONAL_APPROVE
hib-pr-reviewer • round 1 • 1 finding (1m) • 2026-06-03T16:47:01.369Z → 2026-06-03T16:52:19.311Z • posted-as: pr-reviewer-bot • model: auto
hib-pr-reviewer review — PR #13 (hib/hib-pr-reviewer)
Round 2 — head
7119f1694f56, basemain, triggersynchronizeTL;DR: APPROVE — prior CHANGELOG finding fixed, implementation and test coverage are complete and correct.
Summary
[quorum-converged] A=0 = B=0. ## Round 2 review — prior finding resolved, no new issues
The round-1 finding (CHANGELOG.md:18 — bootstrapAdmin bullet folded into cancel/restart entry) is fixed: both bullets are now separate entries in
### Fixed(lines 18 and 19).The
health.tsimplementation is correct:PublishDeadlineErrorcleanly discriminates deadline-timeout from real Redis rejection.dispatched-unconfirmedon deadline and 503 on genuine error.cancelDeadlineTimer/restartDeadlineTimerare cleared in.finally()— no timer leak.cancelPublishDeadlineMs/restartPublishDeadlineMs) avoid 2 s waits in tests and serve as an ops knob.Test coverage is now symmetric:
No actionable findings this round.
Blast Radius
Changes are confined to the
POST /cancel/:jobIdandPOST /restarthandlers insidesrc/health.ts, an injectable options extension, and corresponding tests. No shared state, middleware, or external protocol contract is altered; the new 202 path is purely additive on top of the existing deadline branch.BLAST_SCORE: 2/10
Risk Indicators
CI status (head
7119f1694f56)Overall: ✓ success
3 checks: 3 pending
Findings
No new findings this round.
Quorum converged on empty findings (A + B both returned 0).
Verdict
APPROVE
hib-pr-reviewer • round 2 • 0 findings • 2026-06-03T16:56:34.926Z → 2026-06-03T16:58:28.609Z • posted-as: pr-reviewer-bot • model: auto