Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in
Celebration

Earn badges and make progress

You're on your way to the next level! Join the Kudos program to earn points and save your progress.

Deleted user Avatar
Deleted user

Level 1: Seed

25 / 150 points

Next: Root

Avatar

1 badge earned

Collect

Participate in fun challenges

Challenges come and go, but your rewards stay with you. Do more to earn more!

Challenges
Coins

Gift kudos to your peers

What goes around comes around! Share the love by gifting kudos to your peers.

Recognition
Ribbon

Rise up in the ranks

Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!

Leaderboard

Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,467,374
Community Members
 
Community Events
177
Community Groups

OpsGenie Best Practice on scheduled alerts

We have a lot of events that are scheduled through the day and we want to utilize Opsgenie to notify us if these events fail. 

IE at 5pm we get an email with an attachment that gets dropped to a network store. 
at 9pm 4 files go out via ftp and we get confirmation of success. 
at 4am we receive a set of files that we load on process. 
etc...

So essentially what I would want are scheduled 'soft' alerts that are ack/closed by the confirmations of success from each operation and if they don't get that confirmation by x-time to then escalate and notify the team. 

I didn't see a scheduled way to generate these event/alerts so I'm wondering what routes others have tried for this and what would be considered the best practical way of doing this. 

Thanks!

1 answer

1 accepted

0 votes
Answer accepted

@johnathan_blanco this isn't something Opsgenie can natively do without some external scripting/automation. There are several ways to do this, but each method will require some external system either sending an alert, an alert-close, or a heartbeat. Here are some suggestions:

  1. Simplest would be to run a script at the checkpoints you mentioned above and create an alert if the condition is not met. We use our batch scheduling system to do this all the time.
  2. If you want the visibility that you're waiting for completion, send an alert at start-of-day and suppress until the checkpoint time. As soon as the action is complete, have it send an alert close (use the alias field to match to the correct alert without having to save alert ID somewhere). This leaves the alert visible until it is verified and closed.
  3. The Opsgenie Heartbeat is meant for situations like this. You can just set up a daily heartbeat and have your jobs send a heartbeat ping message when they complete. This may not work as well if there's a lot of variability in completion times (if your transfer finishes early one day, a 24hr heartbeat will expect it to finish at the same time the next day).

Just some thoughts...

Thank you for the recommendations. I believe the heartbeat will be the best fit and I was reviewing this document and started testing the "Script Monitoring Client" linked from: https://docs.opsgenie.com/v1.0/docs/heartbeat-monitoring

I'm not having luck with that one but I'm not sure atm if it's our firewall (which I can check next week) or just something I'm doing wrong.  I'll paste the output below if it helps point in either direction. 

Also one other question from that article I linked above... it says: "Send a built-in email to Opsgenie to ping a Heartbeat. While creating the Heartbeats on Opsgenie, you will see the built-in email address configured according to your account and Heartbeat name...." 
Is the email to heartbeat limited to only ping or can emails generate,close or other heartbeat actions?

Here's the output of the script monitoring client:

PS H:\oghb-windows-amd64-v2.0.3> ./oghb-windows-amd64.exe -apiKey=REMOVEDFORSHARING -name=testbeats -action=start
time="2022-04-01T16:37:16-07:00" level=info msg="Couldn't send the request to opsgenie"
time="2022-04-01T16:37:16-07:00" level=error msg="Get https://api.opsgenie.com/v2/heartbeats/testbeats: dial tcp: lookup api.opsgenie.com: getaddrinfow: A non-recoverable error occurred during a database lookup."
panic: interface conversion: interface is nil, not string

goroutine 1 [running]:
panic(0x6136e0, 0xc04205c700)
/usr/local/Cellar/go/1.7.3/libexec/src/runtime/panic.go:500 +0x1af
main.getHeartbeat(0x0, 0xc042087ee0)
/Users/caglaarikan/heartbeat/opsgenie-heartbeat/script monitor/src/oghb.go:78 +0x2a4
main.startHeartbeat()
/Users/caglaarikan/heartbeat/opsgenie-heartbeat/script monitor/src/oghb.go:63 +0x29
main.main()
/Users/caglaarikan/heartbeat/opsgenie-heartbeat/script monitor/src/oghb.go:29 +0x1a3
PS H:\oghb-windows-amd64-v2.0.3>

 

I believe the email from the heartbeat integration is limited to PINGs, but there's nothing stopping you from creating a dedicated email integration to process close actions, or other actions. You just need to know the alert ID.

Suggest an answer

Log in or Sign up to answer
TAGS

Atlassian Community Events