Skip to content

Rollback Plans

First PublishedByAtif Alam

A rollback plan is a short, written plan for a specific change or release: what you will revert, under what conditions, how, and how you will know it worked.

It is not the same document as an incident runbook—runbooks describe how to respond to symptoms (see SRE runbook template); rollback plans describe up-front intent for a deploy or change window so responders are not inventing recovery under pressure.

Mechanisms (redeploy, traffic switch, flags, database difficulty) and rollout shapes are explained on their own pages—Feature flags and rollback, Progressive delivery.

This page only answers: what belongs in the rollback plan artifact and how it connects to risk and runbooks.

Use judgment: high blast radius, hard reversibility, coordinated releases, or org policy. Change risk and deployment SLOs treats reversibility as a risk factor—if rollback is non-trivial, the plan should be explicit before you ship.

Triggers

  • SLI or SLO Gates — e.g. error rate or latency vs baseline for N minutes after cutover; tie to alerting and error budgets where you use them.
  • Manual Trigger — Who can call rollback (on-call, release manager) if observation or judgment says stop.

Scope

  • What Gets Reverted — Binary or container image id, config revision, traffic weight, feature flag keys, or IaC commit—specific enough that execution is unambiguous.

Chosen Mechanism (by Reference, Not by Copying)

  • State which pattern applies for this release—for example traffic switch vs redeploy vs disabling a flag—and point to the relevant section of Feature flags and rollback or Progressive delivery instead of pasting procedure here. The canonical tables and per-strategy notes stay on those pages.

Preflight

  • Artifacts — Previous build or image retained; flag or LB state known.
  • Reversibility — Confirm the change is still rollback-safe per change risk (backward-compatible deploys, expand–contract schema work, etc.).

Execution and Verification

  • Who Runs Rollback — Role or runbook link if automation owns it.
  • Success Criteria After Rollback — Metrics or checks that show you are back to a known-good state.
  • Escalation — If rollback fails or is incomplete, where to go next (on-call, vendor, incident commander).

Communication

Some changes cannot be undone by flipping traffic or redeploying alone—schema and data paths need design-time safety. Point engineers to Schema migrations and data safety and keep the rollback plan honest: mitigation (feature off, read-only mode) may replace a full revert.

Link to mechanisms; do not duplicate them. One table of strategies in Feature flags and rollback is enough. The rollback plan should name your trigger, your scope, and your verification—plus pointers for everything else.