We ran the Allthethings Prioritization Matrix on our teams On-call Alerts!

An alert is a specialized log from a software component in a computing system, which indicates a problem. Tools like JSM and Opsgenie can help us manage alerts and avert incidents. If they do end up being incidents, they help us mitigate them as well.

An On-call is an engineer that is supposed to Keep The Lights On. They have to be on the lighthouse, on the lookout for alerts and preventing issues that might impact the customer. We also have a very similar process in our teams.

Our alert process has become deeply refined over the years. We have closed down multiple alerts, reduced the occurrence of many others, and we know the common alerts that we see, by intuition, but even more so, by run-books.

Allthethings Prioritization Matrix, also known as the "Eisenhower-Matrix" is a tabular system of filtering tasks, issues, alerts, and everything that has information into their order of attack based on the urgency and importance of the given data point. 

Based on the information, we can categorize things as Important, Not Important, Urgent, and Non-Urgent. Which therefore leads us to 4 kinds of tasks.

  • Urgent and Important: Fix it right now! It is crucial. 
  • Urgent and Not Important: Fix it right now, it is not important, but it is still a problem.
  • Not Urgent but Important: It is important to fix things, maybe not today, but whenever we have time.
  • Not Urgent and Not Important: We don't need to fix it.

Davidjcmorris.png

Source: Picture from WikiMedia Commons by Davidjcmorris

Questions to ask

Well, in case you encounter stuff in these categories, you gotta ask yourself the following questions:

  1. If it's Urgent and Important: Why did we not realise it before? Did it become urgent and important suddenly? What is the way to fix this?
  2. If it's Urgent and Not Important: Why is this even here? Do we have a way to never have this unimportant thing again?
  3. If it's Not Urgent and Important: When will we do this? When it becomes urgent and important? No, right? The best time to plant a tree was 25 years ago and the second-best time is right now!
  4. If it's Not Urgent and Not Important: Why is this here!? Can we please please remove it?

 

How did we run it?

  1. We collected our closed alerts for 6 Months
  2. We ran them through a sort by alias
  3. We prioritised our existing alerts using this by voting
  4. We readjusted priorities

 

Outcomes

  1. Time-based patterns
  2. Reprioritisation
  3. Action Items for long recurring alerts
  4. Tagging of actionable and non-actionable items

2 comments

Bill Sheboy
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 26, 2021

Hi @Nipun Aggarwal 

Thanks for the information.  Please consider improving this by considering the concepts of Classes of Service, Major Incident Management, and clearly defining incident terminology with your stakeholders.  The first and last ones are key: people need an aligned understanding of "urgent", "important", "scope", "impact, etc.; and, teams and stakeholders need alignment on the cost/benefit of classes of service: with better service comes higher cost.

Best regards,
Bill

Like # people like this
Nipun Aggarwal
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 27, 2021

Hey @Bill Sheboy , thanks for the call out.

We used the following as the parameters

Customer Security > Service Availability > Intended usage > Informational alerts

I hope this puts a better picture across about how alerts were ranked

Like Bridget likes this
TAGS
AUG Leaders

Atlassian Community Events