Error Rate and Throughput
Two other SLIs that show up in requirements and SLOs: error rate (how often requests fail) and throughput (how much traffic the system handles).
This page explains what they mean and when to use them.
Error rate
Section titled “Error rate”Definition — The proportion of requests that fail. You define “fail” (e.g. 5xx, timeouts, or your own rules).
How it’s measured — Failed requests / total requests over a time window.
Relation to availability — When availability is defined as success rate, error rate = 1 − availability. They’re two sides of the same coin: more failed requests mean lower availability.
Example — “Rollback if error rate > 5%,” “monitor error rate,” “alert when error rate spikes.”
Set thresholds and rollback criteria from your SLO and risk tolerance (e.g. 1% error rate might be acceptable for some endpoints, not for others).
Throughput
Section titled “Throughput”When to use as an SLI — Capacity planning, scaling decisions, or when “how much” matters: events per second, requests per second (QPS), or writes per second.
How it appears in docs — “Read throughput,” “write throughput,” “high throughput,” “QPS.”
Throughput is typically a target or capacity indicator rather than a user-facing SLO. It answers “can we handle N requests per second?” rather than “how fast did each request complete?”
In practice — Often paired with latency: e.g. “support N QPS at p95 < X ms.”
See Latency percentiles and targets for latency targets.
Connection to SLOs
Section titled “Connection to SLOs”Both are SLIs you can set SLO targets for (e.g. “error rate < 0.1%,” “throughput ≥ N QPS”). For the full framework, see SLOs, SLIs & SLAs.