Skip to content

Progressive Delivery

First PublishedByAtif Alam

Progressive delivery means rolling out changes to a small subset of users or traffic first, verifying health, then gradually expanding.

If something breaks, only a fraction of users are affected and you can stop or roll back before it gets worse.

The alternative—deploying to 100% of traffic at once—means every bad change is a full-blast incident.

StrategyHow It WorksRollback SpeedComplexity
Rolling updateReplace instances one at a time (or in batches); old and new versions run side by side brieflyModerate — redeploy previous versionLow
Blue/greenRun two identical environments; deploy to the inactive one, then switch trafficFast — switch traffic backMedium (two full environments)
CanaryRoute a small percentage of traffic to the new version; increase gradually if healthyFast — route traffic away from canaryMedium-high (traffic splitting, observability)
Traffic shiftingGradually move traffic from old to new (e.g. 1% → 5% → 25% → 100%) with automated health checksFast — shift backHigh (automation, SLI integration)

The simplest progressive strategy. Your orchestrator (e.g. Kubernetes, ECS) replaces instances in batches.

Each batch is health-checked before the next begins.

  • When to use — Default for most services. Good when you have health checks and can tolerate brief mixed-version traffic.
  • Watch out for — Schema changes or API contract changes where old and new versions are incompatible. If the deploy fails midway, you have a mixed fleet.

Two identical environments: “blue” (current) and “green” (new).

Deploy to green, verify, then switch the load balancer or DNS to point at green.

  • When to use — When you want zero-downtime cutover and fast rollback. Common for stateless services.
  • Watch out for — Database migrations must be backward-compatible (both versions may read/write during cutover). Cost of running two environments, even briefly.
  • Rollback — Switch traffic back to blue. Green becomes the next deployment target.

Deploy the new version to a small slice of traffic (e.g. 1-5%). Monitor SLIs (error rate, latency, throughput) for that slice.

If healthy, promote to more traffic. If not, roll back the canary.

  • When to use — When you need confidence that a change works under real traffic before full rollout. Especially valuable for high-traffic services.
  • What to monitor — Compare canary SLIs against the baseline (the non-canary instances). Look for elevated error rates, latency spikes, or resource consumption changes. See Error Rate and Throughput and Latency Percentiles.
  • Automation — Mature teams automate canary analysis: if SLIs degrade beyond a threshold, the canary is automatically rolled back.

A more granular version of canary. Instead of a binary “canary or not,” you shift traffic in controlled increments (1% → 5% → 25% → 50% → 100%) with automated health gates at each step.

  • When to use — Critical services where you want maximum control and automated safety.
  • Requires — Traffic splitting (service mesh, load balancer rules, or feature flag routing), SLI-based health checks, and automation to advance or roll back.
ConcernRecommendation
Simplest to startRolling update
Fast rollback, stateless serviceBlue/green
High-traffic, need real-traffic validationCanary
Maximum safety, willing to invest in automationTraffic shifting

Most teams start with rolling updates and move to canary or blue/green as they grow.

The right choice depends on your service’s risk profile, traffic volume, and how much you’ve invested in observability and automation.