Skip to content

RTO and RPO

First PublishedByAtif Alam

RTO and RPO are the two numbers that define your disaster recovery requirements.

Before you design backups, replication, or failover, you need to know what “good enough” looks like.

RTO (Recovery Time Objective) — The maximum acceptable downtime. How long can the system be unavailable before the impact is unacceptable? Measured in minutes or hours.

RPO (Recovery Point Objective) — The maximum acceptable data loss. How much data can you afford to lose? Measured in time (e.g. “no more than 1 hour of data”) or transactions.

Together they drive your strategy: a strict RPO pushes you toward synchronous replication; a strict RTO pushes you toward hot standby and fast failover.

Relaxed targets allow simpler, cheaper approaches.

RTO answers: “If everything goes wrong, how long do we have to restore service before it’s unacceptable?”

  • Minutes — Critical revenue or safety systems; hot standby, automatic failover.
  • Hours — Many business systems; warm standby or manual failover with a runbook.
  • Days — Non-critical systems; cold standby or restore from backup.

RTO is a business decision. It informs how much you invest in redundancy, automation, and runbook quality.

RPO answers: “How much data can we afford to lose if we have to restore from a backup or fail over?”

  • Zero (or near-zero) — Financial transactions, critical state; synchronous replication or continuous backup.
  • Minutes — Most applications; asynchronous replication with short lag, or frequent backups.
  • Hours or days — Batch or non-critical data; periodic backups may be enough.

RPO is often driven by compliance or business rules. It informs replication mode, backup frequency, and whether you need point-in-time recovery.

RTORPOExample strategy
MinutesZeroSynchronous replication, hot standby, automatic failover
MinutesMinutesAsync replication, hot standby, automatic failover
HoursMinutesAsync replication or hourly backups, warm standby, manual failover
HoursHoursPeriodic backups (e.g. every 4–6 hours), warm or cold standby
DaysHours or daysDaily backups, cold standby or restore-from-backup

These are guidelines, not rules. Your actual design depends on workload, cost, and complexity tolerance.

Availability SLOs (e.g. “99.9% uptime”) define how much downtime you can have in normal operation. RTO defines how long a single disaster can last. They’re related but not the same:

  • A 99.9% SLO allows ~43 minutes of downtime per month, but that could be many small outages or one big one.
  • RTO says: “In a disaster, we must be back within X.” If your RTO is 4 hours, a single disaster consumes a large chunk of your monthly budget.
  • Align RTO with your availability target: if 99.9% is the goal, an RTO of 4 hours means one disaster could blow the budget for the month.

For how availability is measured, see Availability and the nines.

For replication and failover design, see System Design Checklist Section 7 (Replication mode, failover strategy).