Availability and The Nines
Availability is how much of the time the system is working for users. It’s often expressed as “nines” (e.g. 99.9%).
This page explains what that means and how it’s measured.
What the nines mean
Section titled “What the nines mean”- 99% — Two nines; up to ~3.65 days of downtime per year.
- 99.9% — Three nines; up to ~8.76 hours per year, ~43.2 minutes per month.
- 99.99% — Four nines; up to ~52.6 minutes per year, ~4.32 minutes per month.
When you set an availability SLO, the “nines” tell you how much downtime you can afford in a window.
How it’s measured
Section titled “How it’s measured”Two common approaches:
- Success rate — Successful requests / total requests over a window. This is request-based: each request is either a success or a failure (e.g. 5xx, timeout).
- Probe-based uptime — Percentage of time health checks pass. A probe hits an endpoint (or set of endpoints) on a schedule; availability = % of probes that succeed.
Clarify which you’re using when you set an SLO. Success rate reflects real traffic; probe-based uptime can miss issues that only appear under load.
Relation to error rate
Section titled “Relation to error rate”When availability is defined as success rate, failed requests reduce availability. Error rate (failed / total) and availability (successful / total) are two sides of the same coin: error rate = 1 − availability (when both use the same window and definition of “success”).
Connection to SLOs
Section titled “Connection to SLOs”Availability is a common SLI with an SLO target (e.g. “99.9% availability”). For the full framework, see SLOs, SLIs & SLAs.