Our company uses Control-M to schedule all of our system's processes. Currently when a failure in Control-M occurs a "first-response" team is notified and manually calls someone related to that failure, based on who is on-call within "Teams" in Opsgenie. We're looking to automate this process through Opsgenie, so there's less manually intervention - cool, great!
My question/concern is, our system runs 24/7 with Control-M process cycling through multiple times a day. Some processes within Control-M are setup to automatically re-run 1-3 times. I'm looking for a way to setup a process in Opsgenie to not have automated calls when the process fails once, but automatically (within Control-M and 10 seconds) restarts and completes successfully.
I've reviewed the flowchart for "Alert Notifications" and ways to Suppress or Delay alerts, but I'm not sure if either of these options are what I'm looking for. I'm thinking more about how a human sees a failure, gets ready to manually call someone, then "rechecks" if the failed process is still failed before calling. In this case, the failure would automatically go back to rerunning & finish successfully. Thus, no call to on-call teams!