I see that the integration by default makes it easy to send alerts to teams in Opsgenie. I think that's not entirely correct since Opsgenie has the concept of a Service.
It's much cleaner to send a service monitor from Datadog to a service in Opsgenie which is covered by a team. Teams can in that way cover multiple services and move these around and this way it should also be easier to attribute alerts to a service.
Setting this up is not immediately obvious. Does somebody has a good guide for how to do this? I think it involves some setting or API key but the documentation on both sides is lacking.
Hi @Alper Cugun ,
Opsgenie has a prebuilt Datadog integration that can create alerts to notify your Opsgenie users. Best practice is sending alerts to a team since a team's On-call tab can manage which users are notified, as well how the alert is escalated is no action is taken. This can be done through routing rules, and escalations.
Services are another aspect of Opsgenie that can be leveraged to notify multiple responders (teams / users) if there is a serious disruption or outage within your infrastructure affecting your customers, the business, or both. Services might represent a public web site, order processing, mobile apps, customer portal, backup services, etc.
If you knew a certain monitoring alert or metric is Datadog were to classify as a service disruption like mentioned above, you could automate the alert into an incident user incident rules.
The incident rule could be tied to a specific Service, and the Service could then notify multiple responders to remediate the incident, as well stakeholders so they are kept in the loop of what is going on.
Realistically customers most of the time only need an integration to notify their Opsgenie users. I think some Opsgenie competitors terminology might classify their integrations as "services" so maybe that's where the confusion is. Not sure that helps, but hopefully that clarifies the difference between Opsgenie integrations and services.
It's a relatively weird factoring. Let's say we have a PaymentService.
In Datadog there is the concept of a Service Catalog but we don't use it. We have Monitors setup with in this case the tag: service:payment-service
These Monitors then send alerts to a specific team in Opsgenie, let's call it Payment Team. Are the Services in Opsgenie at a coarser resolution or could we map this to a service called "Payment Service"?
The setup I'm thinking of is:
Is that possible? Is that too complicated?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Yes, that should be possible - but I don't necessarily think it's needed.
Creating an incident for all of these types of Datadog alerts is somewhat of an overkill in my opinion. This can realistically be accomplished just using Opsgenie alerts - unless you have a use case to create an incident for all of these?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
OK. But if we handle all of this with alerts then we're effectively not using the Services part in Opsgenie?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
It's not to say that you should not / cannot user Services and Incidents - but again - realistically these should be treated as serious service disruptions and outages. So creating an incident for ALL Datadog alerts does not make much sense to me.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.