You're on your way to the next level! Join the Kudos program to earn points and save your progress.
Level 1: Seed
25 / 150 points
1 badge earned
Challenges come and go, but your rewards stay with you. Do more to earn more!
What goes around comes around! Share the love by gifting kudos to your peers.
Join now to unlock these features and more
Is it possible to only notify team members on the first alert (or first couple of alerts) that are received by an OpsGenie integration within a given time period? For example, when we have a physical site go down, 20+ alerts may be generated by our monitoring system within a 5 minute period due to various devices at that site becoming unreachable. We only wish to notify team members of the first alert that occurs during that 5 minute period to prevent them from being flooded with notifications at the start of a site outage. It appears that PagerDuty has a feature called "Time-Based Alert Grouping" that provides this type of functionality. Is there a way to accomplish this type of functionality in OpsGenie?
I have already looked into OpsGenie's deduplication options that involve modifying the alias on incoming alerts in order to automatically group them into one alert. However, that's not quite what I'm looking for. I still want all of the unique alerts to be displayed in OpsGenie and available for acknowledgement as this helps give a technician a sense for the scope of the outage. But a technician only needs to receive an alert for the first (or first couple of alerts) that come into OpsGenie to know that an issue is occurring and needs attention.
Thanks for any help someone can provide here!
Hi @Tim Tate ,
We don't have a time-based feature that mirrors PagerDuty's "Time-Based Alert Grouping" exactly, however, I think based on what you've described this could be a good use case for incidents and incident rules, which would allow you to group alerts under a single incident based on whether the incident is still open, rather than a static time frame, which may or may not capture all alerts.
With incident rules, you can automatically create an incident from an incoming alert that meets your filter conditions. Then, when additional alerts reach Opsgenie that also meet the incident rule conditions (as long as the incident is still open) the alerts will aggregate under the incident.
The related alerts will still exist in Opsenie separately from the incident, but because they met the incident rule condition, they would be considered part of that same incident and thus would not send out any notifications.
Another advantage of using incidents and incident rules is that you'd be able to easily locate all of the alerts related to a single incident because they would be linked to the incident under "associated alerts"
It's important to note that Incident rules are available on Opsgenie’s Enterprise and Standard plans.
@John M , thank you for the response! It sounds like incidents + incident rules will be the best route forward at the moment. I have upgraded our plan to Opsgenie Standard so that we can test out incidents further.
Follow up question: is there a way to configure an incident rule such that an incident only gets created once "x" number of alerts are generated by an integration? From my research, that did not seem to be possible based on the conditions/filters that are currently available.
Hi @Tim Tate ,
You're correct, there is no way to create an incident based on 'X' number of alerts. But, if you can figure out some way to have the alerts that are incoming increase in priority as the volume of alerts increases, then you could use a delay/suppress notification policy to block the responder alerts (the alerts that actually send out the notifications for the incidents) from sending out notifications to incident responders unless the priority is high enough.
This works because the incoming alert's priority will be reflected in the incident and aggregated alerts will continue to increase the priority of an open incident. I realize that you may not have an easy way to increase the priority of the incoming payloads based on the count/time, which is why you need Opsgenie to handle that.
With that said, I took a look at your original question again and you said:
"I still want all of the unique alerts to be displayed in OpsGenie and available for acknowledgment as this helps give a technician a sense for the scope of the outage."
I think it's important to note that with deduplication you will still be able to see the scope of the outage based on the alert count (which increases as alerts come in) and the activity log, which logs the time and incoming message on each subsequent alert.
So, while deduplication does not display separate alerts that can individually be acknowledged, you are still able to see the scope of the outage based on the incoming data logged to the aggregated alert.