A deep dive on Routing Rules

 

Hi there! Folks new to Opsgenie sometimes can get a bit perplexed when setting up their Routing Rules and Escalation Policies. Especially if they’re used to using an excel spreadsheet for scheduling, a traditional NOC, or a call list. So for today, I figured a deep dive might be just what is needed. 

Opsgenie alerts are routed based on source, content, and time. Routing Rules and Escalation Policies are used along with the On-call schedule to notify people on your team who are able to take action on the alert. This means, not notifying someone if they're on vacation for example. This is all managed from the Team's page. Here you'll see the Routing rules, Escalation policies, and On-call schedules. 

Screen Shot 2020-03-23 at 9.48.50 AM.png

 

When a team is created, the first action is to create an on-call schedule. You can add multiple rotations to suit different rotation types, for example "working hours," "off-hours," "weekends," etc. 

Once the schedule is created, you'll set up your Escalation policies. 

escalation only.png

This is a pretty basic escalation. When an alert (for the DevOps team) is created, immediately all on-call users in the schedule will be notified. Within 5 minutes, the next user will be notified if the alert is not acknowledged. If it is still not acknowledged then all team members will be notified. Finally if no one has acknowledged the alert Kate, the manager, will be notified. Many teams have similar escalations set up.  Let's take a closer look at how the Routing Rules come into play. 

 

Screen Shot 2020-03-23 at 10.38.47 AM.png

Routing Rules offer you the option to Route to a schedule, Route to No One, or Route to an Escalation Policy. 

If you Route to a schedule, the users in the on-call schedule when the alert was created will be notified. But no escalation will take place if the alert goes unacknowledged. 

If you Route to no one, the alert is recorded but teams aren't notified. A good example of a situation in which you would route to no one, is if the priority of the alert is low, but you still want a record. For example, this routing rule Routes P4s and P5s to no one. 

Screen Shot 2020-03-23 at 10.50.25 AM.pngNotice in this rule, If the priority is less than P3 (P4, P5) then it will route to no one, BUT if it is P3, P2, or P1 it will route to the Escalation Policy.  


A common mistake that folks make, is they route alerts to a schedule and NOT an escalation policy. This means that the alert will not escalate. You must always route to the escalation policy if you want the alert to escalate. It seems clear, but it's an easy mistake to make. 

These examples are not exhaustive, but just shared to give you a better idea of how On-Call schedules, escalation policies, and routing rules work together to ensure that no alert goes unacknowledged without giving teams alert fatigue.  Feel free to post your questions below! 

best,
Kate 




3 comments

Mikhail Stepanov February 21, 2022

Nice explanations, thank you

Mani Arugonda September 30, 2022

Hi Kate,

Thank you for the explanation. 

I created an escalation policy to call on-call users at 0 minutes, then call again after 5 minutes, and if the alert is still not ack, then call the secondary on-call at 10 minutes.

My expectation with this kind of set up is the a call will be paged out and we tested it the on-call person did not receive any call, what step could we be missing?

Priya Nambiar November 21, 2022

Hi Kate, Is there a limit to how many routing rules can be setup, be it within an Integration or Under Global policies?

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events