Feature Flags and Rollback
A deploy puts new code on production servers. A release exposes that code to users. These don’t have to be the same event.
Feature flags let you deploy code without releasing it.
Rollback automation lets you undo a bad release in seconds instead of minutes or hours. Together, they give you a safety net that makes deploying less risky.
Feature Flags
Section titled “Feature Flags”A feature flag (or feature toggle) is a conditional check in code that controls whether a feature is active.
The flag value is managed externally—a config service, a database, or a feature flag platform—so you can turn features on or off without redeploying.
Common uses:
- Gradual rollout — Enable for 1% of users, then 5%, then 50%, then 100%. Similar to canary, but at the feature level rather than the deployment level.
- Dark launch — Deploy a new code path and execute it in production, but don’t show results to users. Useful for validating performance or correctness under real load.
- Kill switch — If a feature causes problems, disable it instantly without a rollback deploy.
- A/B testing — Show different experiences to different user segments and measure outcomes.
- Operational flags — Control system behavior: enable/disable a cache layer, switch between data sources, toggle a rate limiter.
Flag Lifecycle
Section titled “Flag Lifecycle”Flags should not live forever. A long-lived, forgotten flag becomes tech debt and a source of unexpected behavior.
- Short-lived (release flags) — Used during rollout. Remove once the feature is fully launched and stable (typically days to weeks).
- Long-lived (operational flags) — Kill switches, A/B tests, or config-driven behavior. Review periodically; document why each exists.
- Cleanup — Track flag age. Set a policy: e.g. “release flags must be removed within 30 days of full rollout.” Stale flags increase code complexity and risk.
Rollback
Section titled “Rollback”Rollback means reverting to the previous known-good state. The faster you can roll back, the shorter your incidents.
Rollback strategies:
| Strategy | How It Works | Speed |
|---|---|---|
| Redeploy previous version | Build and deploy the last known-good artifact | Minutes (depends on pipeline) |
| Traffic switch (blue/green) | Point traffic back to the previous environment | Seconds |
| Feature flag disable | Turn off the flag; code is still deployed but feature is inactive | Seconds |
| Database rollback | Revert a migration; much harder and riskier than code rollback | Minutes to hours |
Best practices:
- Keep the previous artifact available. Don’t overwrite or garbage-collect the last successful build immediately.
- Test rollback regularly. A rollback that’s never been tested may not work when you need it. Include rollback in your game-day drills.
- Automate the decision when possible. If SLIs breach a threshold after deploy, trigger rollback automatically. See Error Budgets for how error budget burn rate can drive this.
- Backward-compatible changes. Design database and API changes so the previous version can still run alongside or after the new version. This makes rollback safe.
Dark Launches
Section titled “Dark Launches”A dark launch deploys new functionality and runs it in production—but hides the results from users.
The new code path executes alongside the old one; you compare outputs, measure performance, and validate correctness before exposing it.
- When to use — Risky changes (new algorithms, new data pipelines, new integrations) where you want real-traffic validation without user impact.
- How — Feature flag routes traffic to the new path. Results are logged or compared but not returned to the user. Monitor latency, error rate, and resource consumption of the new path.
See Also
Section titled “See Also”- Progressive Delivery — Canary, blue/green, and rolling strategies for deploying gradually.
- Change Risk and Deployment SLOs — How to assess whether a change is safe to deploy.
- Runbooks and Playbooks — Document your rollback procedures so on-call can execute them under pressure.