Our company uses Control-M to schedule all of our system's processes. Currently when a failure in Control-M occurs a "first-response" team is notified and manually calls someone related to that failure, based on who is on-call within "Teams" in Opsgenie. We're looking to automate this process through Opsgenie, so there's less manually intervention - cool, great!
My question/concern is, our system runs 24/7 with Control-M process cycling through multiple times a day. Some processes within Control-M are setup to automatically re-run 1-3 times. I'm looking for a way to setup a process in Opsgenie to not have automated calls when the process fails once, but automatically (within Control-M and 10 seconds) restarts and completes successfully.
I've reviewed the flowchart for "Alert Notifications" and ways to Suppress or Delay alerts, but I'm not sure if either of these options are what I'm looking for. I'm thinking more about how a human sees a failure, gets ready to manually call someone, then "rechecks" if the failed process is still failed before calling. In this case, the failure would automatically go back to rerunning & finish successfully. Thus, no call to on-call teams!
In the past, managing IT infrastructure was a hard job. It required a lot of manual effort and it was hard to keep track of all the necessary information (monitoring, scalability etc). Thankfully, as...
Connect with like-minded Atlassian users at free events near you!Find an event
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no Community Events near you at the moment.Host an event
You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events