Production live ✅
/health 200, version 2.5.0. Image refreshed 2026-04-30 22:54 UTC to ghcr.io/htt-brands/control-tower@sha256:f762c98a… via run 25193020385.
v2.5.1 release-gate
Internal rehearsal verdict moved CONDITIONAL_PASS → PASS-pending-9lfn. Only Tyler-only 9lfn (SECRETS_OF_RECORD.md) remains. Pillar 4 Infrastructure cleared; Pillar 8 Rollback now field-tested.
Auto-rollback field-tested
First real-world auto-rollback execution surfaced bd 1vui (GNU base64 line-wrap broke $GITHUB_OUTPUT); fix landed in 9ccd870; redeploy clean. Fail-closed safety property held throughout — prod was never mutated by the failed run.
Current work queue
What is claimable vs blocked right now
Tyler-only (P1)
9lfn: fill SECRETS_OF_RECORD.md with non-secret pointers, owners, access, and rotation metadata. ~30 min. The last v2.5.1 gate condition.
Scheduled (P2)
uchp: execute Q3 2026 quarterly DR test cycle (PITR + redeploy + Key Vault recover). Due 2026-07-31. Will absorb Dustin Boyd's first hands-on tabletop.
In progress (P2)
xzt4: Bicep drift reconciliation. All 12 child tasks closed; staging Bicep recovered + hardened. Production Bicep apply intentionally deferred — Tyler-gated.
Coordinated cutover (P3)
l96f: JWT issuer rotation. Phase 1 shipped (88d7cf1, auth accepts both azure-governance-platform and control-tower issuers in transition mode). Phase 2 (drop old issuer) needs coordinated cutover window.
Deferred (re-enters bd ready on trigger date)
rtwi: stop domain-intelligence App Service / pause PG if zero-traffic at 60-day mark (~2026-05-17).
m4xw: automate quarterly audit-log archive to Azure Blob Archive tier (trigger 2026-07-01).
Recently closed (May 1–3, 2026)
t88h Azure SQL backup workflow fix · wnyx production-backup environment routing · rxki Bicep drift role definition documented · mp9y stale Bicep parameters removed · k1q7 drift label seeding · 6vrk Bicep drift RG what-if RBAC unblocked · xzt4.1–xzt4.12 all closed · staging Bicep apply recovered (228923d + 6b2a8c7).
Bus-factor and rollback authority
Who can recover production when
Authorized humans
Tyler Granlund (lead engineer) and Dustin Boyd (second rollback human). Both hold required-reviewer status on the production environment. Provisioned 2026-04-30 (bd 213e closed). Bus-factor moved 1→2.
Auto-rollback
Active in deploy-production.yml. Captures previous-good linuxFxVersion pre-deploy; fail-closed if image swap fails its post-deploy health gate; restores prior digest. Field-tested 2026-04-30: bd 1vui regression discovered + fixed; safety property held (prod un-mutated).
Machine-verifiable waiver state
See docs/release-gate/rollback-current-state.yaml. waiver.status: resolved, current_authorized_humans: [Tyler, Dustin], requires_min_authorized_humans: 2.
First scheduled DR exercise
bd uchp — Q3 2026 quarterly DR test cycle, due 2026-07-31. Will exercise PITR restore, image redeploy, Key Vault soft-delete recovery, and Dustin's formal tabletop.
Backup / RPO state
Schema-only validation green; bd jzpa closed
Schema-only database backup workflow (backup.yml) is fully green end-to-end on both environments after a multi-day diagnostic series in late April 2026 (OIDC permission fix in bd 3flq; runner ODBC + SQL tooling gaps; ephemeral AZURE_STORAGE_KEY path after RBAC AuthorizationPermissionMismatch; --yes flag removal on the firewall cleanup). Final passing runs: staging 25169438794, production 25171354807. No temporary GitHubActions-* SQL firewall rules left behind. Long-form BACPAC validation (cz89) remains operationally blocked by Azure SQL Free edition's lack of ImportExport — see the BACPAC validation decision doc below.
Continuity documents
Canonical places to update — no treasure maps hidden under the carpet
9lfn): Tyler-only authorship of credential pointers, owners, access, and rotation metadata.cz89).