Is it possible to create an alert based on alerts?

Darren Sunley October 14, 2022

I currently have a working setup where Dynatrace runs a bunch of synthetic monitors and they are tied in to OpsGenie to fire alerts to a Slack channel when the monitors fail.

 

However, given that this relates to an IAM solution then we've actually got a lot of different DT monitors for different applications that we protect. At the moment, if ANY application is down (whether it's for a problem or for scheduled maintenance) then an alert if triggered... which is fine, but it means that on-call staff might get called out for one app's scheduled maintenance that they didn't tell us was happening.

 

Is it possible to create a hierarchical/composite sort of alert that says "if you get alerts from more than 2 different DT monitors then create an alert"?? The desire is then to only call the on-call guys when multiple apps are having problems (i.e. a much more lijkely indication that the IAM infrastructure has a problem, rather than just a particular app).

 

Ideally we would leave the existing alerts in place, but just send them to Slack for information only, whereas the overarching alert would trigger a call to the on-call phone to alert the on-call person immediately (who could then see the Slack alerts for more detail on which apps are experiencing issues).

1 answer

0 votes
Tom Russell
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
October 14, 2022

@Darren Sunley we're in a similar situation with one of our application clusters. We get alerted on a node-by-node basis, but want to escalate to a higher priority alert if multiple nodes are having problems. We're just starting to work on a solution, but the two strategies we plan to look at are:

  • Leverage the alias field and notification policies to suppress alerts unless multiple come in with that same alias within a set period of time.
  • Suppress the alerts for n minutes. Have an OEC running that:
    • watches for that alert and monitors for multiple to come in
    • Closes the individual alerts and creates a new alert with a higher priority and links to the closed individual alerts

I'm not sure what we'll go with, and I don't know what level of Opsgenie you're running (and its capabilities), but that's just a couple of ideas.

Darren Sunley October 17, 2022

Hi Tom - thanks for that!

 

I've got a support ticket open too and they've suggested Deduplication and Notification Policies, which sounds like your first suggestion (relating to the alias value).

 

I'm busy trying to work that through with them, but hopefully it'll get me there. If I figure it out I'll come back and give you an update!

Like Tom Russell likes this

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events