On-Call and Escalation
On-call is how you ensure someone is available to detect and respond when something breaks.
A clear on-call rotation and escalation matrix are part of Incident lifecycle phase 0 (Preparation) so that in phase 1 (Detection & Declaration) the right people get paged and know what to do.
On-Call Rotation Models
Section titled “On-Call Rotation Models”- Primary + secondary — One person is primary (first to be paged); a second is backup if the primary does not acknowledge or is unavailable. Simple and works well for small teams.
- Follow-the-sun — Hand off on-call by region or time zone so that the person on call is usually in working hours. Reduces burnout and improves response for global services.
- Pool or rotation — A fixed roster (e.g. weekly or bi-weekly) so everyone shares the load. Define who is primary and who escalates to whom.
Document the rotation in a shared place (calendar, on-call tool) and keep it updated so people know who to page and when.
Escalation Paths
Section titled “Escalation Paths”Escalation means involving more people when the first responder cannot resolve the issue or when severity warrants it.
- Time-based — If the incident is not acknowledged or not mitigated within N minutes, escalate to the next level or to the Incident Commander (IC).
- Severity-based — SEV-1 (or your top severity) automatically triggers pages to IC, Operations Lead (OL), and Communications Lead (CL) per your Incident lifecycle roles.
- Skill-based — Escalate to a specialist (e.g. database, network) when the runbook or first responder identifies that expertise is needed.
Maintain an escalation matrix: who is on call at each level, how to reach them (pager, chat, phone), and under what conditions you escalate. Review it when the team or services change.
When To Escalate
Section titled “When To Escalate”Escalate when:
- The first responder cannot resolve within the expected time or does not have the right skills.
- Severity dictates a full response (e.g. SEV-1).
- A runbook or playbook says “escalate to X” for this scenario.
Avoid escalating too late (impact grows) or too early (every small issue becomes a crowd). Your severity matrix and runbooks help draw the line.
Ties To Preparation and Declaration
Section titled “Ties To Preparation and Declaration”In phase 0 (Preparation), the on-call rotation and escalation matrix are core artifacts. In phase 1 (Detection & Declaration), the person who receives the alert verifies the issue, declares severity, and opens the incident—then follows the escalation path if needed.
Clear roles and expectations (see Incident lifecycle and Severity and Classification) make this predictable so responders spend less time figuring out who to call and more time fixing the problem.