Production live ✅

/health 200, version 2.5.0. Image refreshed 2026-04-30 22:54 UTC to ghcr.io/htt-brands/control-tower@sha256:f762c98a… via run 25193020385.

v2.5.1 release-gate

Internal rehearsal verdict moved CONDITIONAL_PASSPASS-pending-9lfn. Only Tyler-only 9lfn (SECRETS_OF_RECORD.md) remains. Pillar 4 Infrastructure cleared; Pillar 8 Rollback now field-tested.

Auto-rollback field-tested

First real-world auto-rollback execution surfaced bd 1vui (GNU base64 line-wrap broke $GITHUB_OUTPUT); fix landed in 9ccd870; redeploy clean. Fail-closed safety property held throughout — prod was never mutated by the failed run.

Current work queue

What is claimable vs blocked right now

Tyler-only (P1)

9lfn: fill SECRETS_OF_RECORD.md with non-secret pointers, owners, access, and rotation metadata. ~30 min. The last v2.5.1 gate condition.

Scheduled (P2)

uchp: execute Q3 2026 quarterly DR test cycle (PITR + redeploy + Key Vault recover). Due 2026-07-31. Will absorb Dustin Boyd's first hands-on tabletop.

In progress (P2)

xzt4: Bicep drift reconciliation. All 12 child tasks closed; staging Bicep recovered + hardened. Production Bicep apply intentionally deferred — Tyler-gated.

Coordinated cutover (P3)

l96f: JWT issuer rotation. Phase 1 shipped (88d7cf1, auth accepts both azure-governance-platform and control-tower issuers in transition mode). Phase 2 (drop old issuer) needs coordinated cutover window.

Deferred (re-enters bd ready on trigger date)

rtwi: stop domain-intelligence App Service / pause PG if zero-traffic at 60-day mark (~2026-05-17).

m4xw: automate quarterly audit-log archive to Azure Blob Archive tier (trigger 2026-07-01).

Recently closed (May 1–3, 2026)

t88h Azure SQL backup workflow fix · wnyx production-backup environment routing · rxki Bicep drift role definition documented · mp9y stale Bicep parameters removed · k1q7 drift label seeding · 6vrk Bicep drift RG what-if RBAC unblocked · xzt4.1xzt4.12 all closed · staging Bicep apply recovered (228923d + 6b2a8c7).

Bus-factor and rollback authority

Who can recover production when

Authorized humans

Tyler Granlund (lead engineer) and Dustin Boyd (second rollback human). Both hold required-reviewer status on the production environment. Provisioned 2026-04-30 (bd 213e closed). Bus-factor moved 1→2.

Auto-rollback

Active in deploy-production.yml. Captures previous-good linuxFxVersion pre-deploy; fail-closed if image swap fails its post-deploy health gate; restores prior digest. Field-tested 2026-04-30: bd 1vui regression discovered + fixed; safety property held (prod un-mutated).

Machine-verifiable waiver state

See docs/release-gate/rollback-current-state.yaml. waiver.status: resolved, current_authorized_humans: [Tyler, Dustin], requires_min_authorized_humans: 2.

First scheduled DR exercise

bd uchp — Q3 2026 quarterly DR test cycle, due 2026-07-31. Will exercise PITR restore, image redeploy, Key Vault soft-delete recovery, and Dustin's formal tabletop.

Backup / RPO state

Schema-only validation green; bd jzpa closed

Schema-only database backup workflow (backup.yml) is fully green end-to-end on both environments after a multi-day diagnostic series in late April 2026 (OIDC permission fix in bd 3flq; runner ODBC + SQL tooling gaps; ephemeral AZURE_STORAGE_KEY path after RBAC AuthorizationPermissionMismatch; --yes flag removal on the firewall cleanup). Final passing runs: staging 25169438794, production 25171354807. No temporary GitHubActions-* SQL firewall rules left behind. Long-form BACPAC validation (cz89) remains operationally blocked by Azure SQL Free edition's lack of ImportExport — see the BACPAC validation decision doc below.

Continuity documents

Canonical places to update — no treasure maps hidden under the carpet