Planning and Operations

First PublishedFeb 10, 2026ByAtif Alam

This page covers how to run capacity planning in practice: inputs, scaling thresholds, headroom policies, autoscaling, forecasting, dependency and failure-domain capacity, multi-region, and operational processes.

The Basics

Capacity planning starts with three inputs:

Current utilization — How much of your resources (CPU, memory, storage, network, database connections) are you using today? See Infrastructure Metrics.
Traffic patterns — What does your traffic look like over a day, week, month? Are there seasonal peaks (holidays, launches, campaigns)? See Workload and Modeling for workload characterization.
Growth rate — How fast is traffic growing? Use historical data, business forecasts, or both.

From these, you project when you’ll hit capacity limits and plan accordingly.

Scaling Thresholds

A scaling threshold is the utilization level at which you need to add capacity—not the point where the system falls over, but well before that.

Resource	Typical Threshold	Why Not Higher
CPU	60-70% sustained	Spikes above sustained average need headroom; at 90% you’re one spike from degradation
Memory	70-80%	OOM kills are catastrophic; GC pressure increases at high utilization
Disk	70-80%	Some systems (databases, logs) need free space for operations; full disk = outage
Database connections	60-70% of pool	Connection exhaustion causes cascading failures

These are starting points. Tune based on your system’s behavior under load—load testing gives you the data.

Headroom Policies

Headroom is the gap between your current utilization and your capacity limit. A headroom policy formalizes how much buffer you maintain. For the types of headroom (N+1, failure-domain, surge), see Workload and Modeling.

Example policy:

Maintain at least 30% headroom on CPU and memory for production services.
Maintain at least 3 months of growth headroom — if traffic is growing 10% per month, provision for 30% more than current peak.
Review headroom quarterly; adjust if growth rate changes.

Headroom protects against:

Traffic Spikes — Unexpected surges (viral content, marketing campaigns, DDoS).
Deployment Overhead — Rolling deploys temporarily run more instances.
Cascade Effects — If one component slows down, upstream components queue up and consume more resources.

Autoscaling

Autoscaling adjusts capacity automatically based on demand. It reduces cost (scale down when idle) and improves reliability (scale up when busy).

Key decisions:

Scaling Metric — What triggers scale-up? CPU, request count, queue depth, custom metrics. Choose the metric that best predicts user impact.
Scale-Up Speed — How fast can new instances be ready? If it takes 5 minutes to spin up a new node, you need enough headroom to absorb 5 minutes of traffic growth.
Scale-Down Policy — Don’t scale down too aggressively. A cooldown period prevents flapping (scaling up and down repeatedly).
Minimum Instances — Always keep enough instances running to handle baseline traffic without waiting for autoscale.

Autoscaling is not a substitute for capacity planning. It handles short-term fluctuations; capacity planning handles long-term growth.

Forecasting

Forecasting projects future capacity needs based on historical trends and business inputs.

Approaches:

Trend-Based — Fit a line (or curve) to your historical traffic data. Simple and often good enough. Works when growth is steady.
Seasonal Adjustment — If you have recurring peaks (weekday vs weekend, holiday spikes), model them separately. Don’t plan based on averages when your peaks are 3x your average.
Business-Driven — Product team planning a launch? Marketing running a campaign? Sales closing a large customer? Factor these into your forecast—historical trends won’t capture step changes.

How often to forecast:

Quarterly — Review and update capacity forecasts.
Before Major Events — Product launches, seasonal peaks, expected traffic step-changes.
After Incidents — If a capacity-related incident occurred, revisit assumptions.

Dependencies and Failure Domains

Capacity isn’t only about your service. Downstream dependencies and failure domains affect how much you need to provision.

Dependency bottlenecks — If your service can scale but the database or a third-party API cannot, that dependency defines the effective capacity. Model and monitor the slowest link; plan capacity (or degradation behavior) for when dependencies are saturated.
Failure-domain capacity — When one AZ or region fails, the remaining capacity must absorb the load. That often means each region runs at 50% or less of its capacity so that one region can take full traffic. Size for N-1 (or N-1 region) and test failover. See Failover and Failback.

Multi-Region Considerations

If you operate across regions, capacity planning needs to account for:

Traffic Distribution — How is traffic split across regions? Can one region absorb the other’s traffic during failover?
Failover Headroom — If Region A goes down, Region B needs enough capacity to handle both regions’ traffic. This often means each region runs at 50% or less of its capacity.
Data Replication Lag — Cross-region replication consumes bandwidth and adds latency. Factor this into network capacity.

Operational Capacity Processes

Planning cycles — Run capacity reviews on a schedule (e.g. quarterly). Use forecasts, growth rate, and headroom policy to decide when to add capacity. Involve product and business when step changes (launches, big customers) are expected.
Surge runbooks — Define what to do when traffic spikes unexpectedly: who checks metrics, when to scale or shed load, when to page. Link to Runbooks and Playbooks.
Post-incident capacity updates — After a capacity-related incident, update utilization assumptions, thresholds, or headroom. Document the new baseline in runbooks and forecasts so the next review uses accurate data.