5 Signs You Need Better On-Call Schedule Management
DevOps Current Situation
Your team supports services that must be up 24/7. You’ve implemented an on-call strategy, but its just not working. Your on-call process is clunky, out-dated, and inefficient. You know there has to be a solution to managing an on-call rotation, but don’t know where to start. There are several on-call scheduling methods that organizations use today. Some are more sophisticated than others, but each have limitations.
1. Unfair On-Call Burden
A simple and common on-call solution is to use a single phone or pager that is handed off to the next on-call person. In the age of smart phones, this solution may sound antiquated, but many organizations still use this method. However, what happens if:
- Your team is spread across multiple physical locations?
- The on-call phone or pager is lost or damaged?
- The on-call phone is out of signal?
2. Delayed Response Times
Another simple, but labor-intensive option is to staff a 24/7 Network Operation Center (NOC). This involves paying staff members to monitor metrics all day and identify problems manually. When an issue arises, they have to look up the appropriate contact in a directory and notify the person on-call for that team to resolve the situation. Not only is this inefficient, but its also costly in many ways.
- How much time is lost from when the NOC is able to identify the problem, lookup who is on-call, and finally notify them?
- If another team identifies and issue, how should they notify the person on-call? Must they relay it to the NOC instead of notifying another team directly?
- What is the opportunity cost of having the NOC watch metrics all day rather than fixing other issues that already exist?
3. Alert Fatigue
Some companies keep it simple by sending E-mail or SMS blasts to their entire team. Not only is this inefficient but its unfair to your employees. This method creates spam and decreases the sense of urgency when alerts are received. This will also produce the following behaviors from your employees.
- Buying 2 phones, 1 for work and another for personal.
- Leaving the issue unacknowledged until the last minute, in hopes that another team member will acknowledge the incident first.
- Leaving the company because the company does not respect their down time.
4. Dropped Alerts
A more sophisticated option involves automating incident alerts through monitoring tools. For example, you could have your monitoring tool send out an E-mail or an SMS message to a pre-configured address or telephone number. However, this solution only supports a single-level of on-call.
- What happens if the person responsible for the email or SMS cannot respond in that moment?
- Are you re-configuring the alert every week to simulate an on-schedule?
- As your monitoring grows, how will this solution scale?
5. No Authoritative Source of On-Call Schedules
Many companies will send an Excel sheet or CSV via email that states who is on-call for the next week/month/quarter. This too, creates a noisy inbox and leaves the questions:
- What happens if our on-call schedule changes? What is the current and correct schedule?
- What happens if a team forgets to send out the on-call schedule?
- Does one team member always get paged because they are the “go-to” person who always responds?
A Better On-Call Solution
If you suffer from any of the issues above, you are not alone. We personally have experienced the nightmare of bad on-call, and that’s why we created PagerTree). It addresses all the above issues with simplicity and efficiency. By using a single platform to manage your entire company’s on-call schedules, escalation rules, and alerts you can alleviate all the above issues and benefit from:
- A Centralized and Authoritative On-Call Incident Management System
- Faster Incident Response Times
- Lower Mean Time To Resolution (MTTR)
- Users, Teams, and Company Performance Metrics
- Happier Employees and Customers
If you haven’t signed up already, sign up today! With PagerTree, your on-call situation could be so much better.