Change Risk and Deployment SLOs

First PublishedFeb 15, 2026ByAtif Alam

The question is not “will this change cause a problem?” but “how likely is it, and how bad would it be?”

Every change is a potential incident.

Change risk management is about answering that question before you deploy—and having policies that reduce risk when the stakes are high.

Change Risk Assessment

Not all changes carry the same risk.

A typo fix in a log message is not the same as a database schema migration.

Factors that increase risk:

Blast radius — How many users, services, or systems are affected if this goes wrong? A change to a shared library or a core database has a larger blast radius than a change to a single endpoint.
Reversibility — Can you roll back? Code deploys are usually reversible. Database migrations, data deletions, and third-party API changes often are not.
Novelty — Is this a well-understood change (e.g. a dependency bump you’ve done before) or something new (e.g. a new replication topology)?
Coupling — Does this change require coordinated changes across multiple services or teams?
Timing — Deploying during peak traffic or right before a holiday increases the cost of failure.

Risk categories:

Risk Level	Characteristics	Example
Low	Small blast radius, easily reversible, well-understood	Config change, minor UI fix
Medium	Moderate blast radius, reversible with effort	New API endpoint, feature flag rollout
High	Large blast radius, hard to reverse, or novel	Schema migration, core service rewrite, data pipeline change
Critical	Affects all users, irreversible, or crosses compliance boundaries	Payment system change, auth infrastructure change, data deletion

Deployments consume error budget. A bad deploy that causes elevated errors for 10 minutes eats into your monthly availability SLO.

Before and After Every Deploy: A Checklist

Before deploying, check your current error budget burn rate. If the budget is already low, the cost of a failed deploy is higher.
After deploying, monitor SLIs (error rate, latency) and compare against pre-deploy baselines.
Set a policy: e.g. “if error budget is below 20%, only low-risk changes may deploy without additional approval.”

Deployment SLOs are targets for the deployment process itself:

Deploy frequency — How often can you deploy? More frequent = smaller changes = lower risk per deploy.
Deploy lead time — How long from commit to production? Shorter = faster feedback = faster fixes.
Deploy failure rate — What percentage of deploys cause a rollback or incident? Track and trend this.
MTTR for bad deploys — How long to detect and roll back a bad deploy? This is your deployment recovery time.

These are sometimes called the DORA metrics (deployment frequency, lead time, change failure rate, MTTR).

Change management is broader than deployment.

It covers any change that could affect production: code, config, infrastructure, database, network, third-party integrations.

Practices:

Change advisory / review — For high-risk or critical changes, require a review with relevant stakeholders before proceeding. This doesn’t have to be a formal board; a short sync with the on-call engineer and a tech lead may be enough.
Change windows — Some organizations define preferred deployment windows (e.g. “deploy during business hours when the team is available to respond, not on Friday afternoon”). The goal is to have people available if something goes wrong.
Change freeze — Periods where non-emergency changes are not allowed (e.g. during a major sales event, holiday season, or active incident). Define what qualifies as an exception.
Change log — Maintain a record of what changed, when, and by whom. Correlate with incidents to identify patterns (e.g. “most incidents follow config changes on Thursdays”).

Deploy small, deploy often. Smaller changes are easier to understand, review, test, and roll back. Batching changes increases risk.
Use progressive delivery. Canary and traffic shifting limit blast radius by exposing changes to a fraction of traffic first.
Use feature flags. Decouple deploy from release so you can disable a feature without rolling back the deploy.
Automate rollback. If SLIs breach a threshold, roll back automatically. Don’t rely on a human noticing and acting at 3 AM.
Test in production-like environments. The closer your staging environment matches production, the fewer surprises in prod. See CI/CD for Applications.
Backward compatibility. Design changes so the old and new versions can coexist. This makes rollback safe and reduces the risk of coordinated deployments.