📣 Retrying automation rules for enhanced reliability

 

Over the next few weeks, we'll be rolling out a new Atlassian Automation enhancement across all products and editions, designed to increase the reliability of your rules. Automation will now attempt to retry Automation rules that have been stopped by temporary system issues, ensuring that your workflows continue to run smoothly without interrupting your team.

 

What causes a rule to be retried?

Understanding the difference between configuration and system errors is key to knowing when rules may be retried. Configuration errors arise from issues in setting up the components in your rule (e.g. Creating a work item in a project that has been deleted) or changes in external systems (such as when the Send Web Request action points to a deprecated third-party API endpoint). These types of errors will not trigger an automatic retry.

Conversely, system errors are temporary interruptions within the Atlassian platform, such as brief service outages. For instance, if you have an Automation that transitions a work item from one status to another based on specific conditions, and it halts mid-run due to a transient error, Automation may attempt to retry the rule from the point of interruption. This process continues until either the action is successfully executed, or the 7-day retry window expires from the time of the initial error. The mechanism to retry rules itself does not contribute to the processing time for a rule - only the time to process the actions taken do. To learn more about when a rule may be retried from a system error, check out the support documentation here.

 

Tracking when rules are retried

Typically, when rule retry is working it will go unnoticed. However, you can view which rules are queued for retry and which have been successfully retried in the Audit Log. There are three statuses you may encounter:

Queued for retry - new status

When a rule stops due to a system error and is waiting or in the process of being retried, it will have the status queued for retry. To find out exactly where the rule stopped, click 'Show more' to view the component that stopped.

image-20250210-023531.png

 

Successfully retried - existing status

If a rule was interrupted due to a system error and subsequently retried successfully, it will show the existing success status. When you click ‘show more’, the component that was retried will display a circular arrow icon instead of a check mark, indicating it was retried.
image-20250210-023857.png

 

Retried and failed - existing status

If a rule stopped due to a system error, was retried but the rule ultimately failed, it will show the existing failure status and details in ‘show more’.

image-20250210-025006.png

 

Rule retry roll out

The rollout of rule retry will start on the 3rd of March, 2025. Initially, it will be gradually introduced with a select few components, with additional components to be added in the following weeks. As always, if you have any questions or feedback, please drop a comment below.

7 comments

Bill Sheboy
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
March 3, 2025

Hi @Simon Chan 

Thank you for this information.  After reviewing the documentation page you linked to, I wonder...

1) Can this feature be disabled at a site, project, and / or rule level?

2) Which rule actions will be candidates for retry attempts: all of them or specific ones?

3) The 7-day retry window seems quite long for many rule usage scenarios.  Will that duration be configurable in the future?

4) Once a retry attempt is queued, can it be cancelled to prevent the rule from executing?  For example, in cases where the rule's successful completion is no longer relevant due to other changes.

5) Will the access and permissions of the rule actor / initiator be cached such that when the retry is attempted it will process in the same manner as the original attempt, or will the current settings be used for the actor / initiator?

6) The documentation indicates the retried rule will continue "from the point of failure".  Will all current rule state be used from that point forward in execution?  That is, trigger field values, trigger incoming webhook data, Lookup Issue results, Lookup Table contents, Created Variable values, Send Web Request response data, etc.  Or, will data be refreshed to the current values in the database, such as reloading the fields of issues?

7) What happens to queued retries when a rule is modified?  Will they be deleted, proceed with the state and definition of the rule before changes, try to continue if the failed rule action (by ID) still exists in the rule, etc.?

8) Will the new "Queued for Retry" be added the Status filters for the global automation audit log view?

9) When a rule retries and either succeeds or fails, the audit log is updated to that status.  How can customers use the global automation log to track the frequency of "Queued for Retry"?

10) When a rule retries and either succeeds or fails, how can customers later review which step had the temporary interruption?

11) Currently when there is an Atlassian outage impacting the running of automation rules, it is unclear to customers which rules will eventually trigger (or complete) execution or not.  This new retry capability impacts "temporary interruptions within the Atlassian platform, such as brief service outages."  What is the threshold at which to expect retries to happen versus the current environment.

Thank you in advance for your responses.

Kind regards,
Bill

Like • # people like this
Norbert Hoppe
Contributor
March 4, 2025

Hi @Simon Chan 

in addition I've also some questions:

- how often / in which frequency the retries will be executed ?

- can I get informed (like on error) when action is queued for retry ?

- (how) can this 7-day period be interrupted ? 

- Can I update the rule during this 7-day period (which is imho too long and should be configurable) ?

- will it be possible to skip the retry for specific actions ?

Br Norbert

Like • Rick Westbrock likes this
Simon Chan
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
March 10, 2025

Hey @Norbert Hoppe & @Bill Sheboy,

Pulled both of your questions together here. We'll be adding some of these to the Support Docs to help make things clearer for everyone - appreciate your questions!

  1. Can this feature be disabled at a site, project, and / or rule level?
    No, this capability cannot be configured. 

  2. Which rule actions will be candidates for retry attempts: all of them or specific ones?
    The rollout phase is happening over the next months, where we we will be gradually adding retry support to all actions.

  3. The 7-day retry window seems quite long for many rule usage scenarios.  Will that duration be configurable in the future?

    The 7-day retry window is the maximum limit and is in place to handle the very rare cases of a long-running outage. The vast majority of system errors are short-running and we do not currently plan on making duration configurable.

  4. Once a retry attempt is queued, can it be cancelled to prevent the rule from executing?  For example, in cases where the rule's successful completion is no longer relevant due to other changes.
    Editing a rule (i.e. opening it and saving it - you do not need to add/update/remove components) will pull it out from the queue.

  5. Will the access and permissions of the rule actor / initiator be cached such that when the retry is attempted it will process in the same manner as the original attempt, or will the current settings be used for the actor / initiator?
    Assuming that the rule is queued and has not been edited whilst queued, the configured rule actor will be applied when it is re-run. If that rule-actor’s permissions are changed whilst the rule has been queued, they will apply once re-run wherever permission checks occur, which is dependent on the action/s configuration and implementation. We do not cache this in the retry mechanism as it references the existing rule configuration.

  6. The documentation indicates the retried rule will continue "from the point of failure".  Will all current rule state be used from that point forward in execution?  That is, trigger field values, trigger incoming webhook data, Lookup Issue results, Lookup Table contents, Created Variable values, Send Web Request response data, etc.  Or, will data be refreshed to the current values in the database, such as reloading the fields of issues?
    The state is saved as up to the point of failure and will not be refreshed.

  7. What happens to queued retries when a rule is modified?  Will they be deleted, proceed with the state and definition of the rule before changes, try to continue if the failed rule action (by ID) still exists in the rule, etc.?
    Rules that are modified when queued for retry will be pulled out of the queue.

  8. Will the new "Queued for Retry" be added the Status filters for the global automation audit log view?
    Yes, there is a new status filter called "Queued for Retry"

  9. When a rule retries and either succeeds or fails, the audit log is updated to that status.  How can customers use the global automation log to track the frequency of "Queued for Retry"?
    With the current Audit Log capabilities, it is not possible to see the frequency of rules that are successful or result in failure that were retried.

  10. When a rule retries and either succeeds or fails, how can customers later review which step had the temporary interruption?
    This will be indicated by the retry circular arrow icon in the ‘show more’ view of the rule run.

  11. Currently when there is an Atlassian outage impacting the running of automation rules, it is unclear to customers which rules will eventually trigger (or complete) execution or not.  This new retry capability impacts "temporary interruptions within the Atlassian platform, such as brief service outages."  What is the threshold at which to expect retries to happen versus the current environment?
    Answer: If an outage causes rules not to trigger it will not be eligible for retry. We have made and are continuing to make improvement to the reliability of rules that trigger during an outage.

  12. how often / in which frequency the retries will be executed ?
    Most retries will be completed quickly and have a set of back-off intervals from 15min up to 6 hours to handle this. This may be tuned during the rollout period.

  13. can I get informed (like on error) when action is queued for retry ?
    Alerting for this capability is not available. We do however have this broader capability on our radar for future development.

  14. (how) can this 7-day period be interrupted ? Can I update the rule during this 7-day period (which is imho too long and should be configurable) ?
    Editing the rule will pull it our of the queue. The 7-day period is in place for long-running outages with most transient errors resolving in less than an hour. If the rule is updated whilst being queued, it will be pulled from the queue. This is consistent to how rule runs work today whereby upon saving, any current rule runs will be stopped.

 

Like • Bill Sheboy likes this
Bill Sheboy
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
March 11, 2025

@Simon Chan thanks for your detailed responses to the questions!

I feel a "gap" is still the missing alerts when queued retries exist for more than a few minutes... 

Some customers seem to create cascading rule chains, primarily due to the absence of serial processing of branches and scheduled triggers, the limits on the number of issues processed by a rule, etc.  Thus when one of those rules in a chain fails to complete when expected, manual interventions by people may be needed to halt / help rule processing.  This could be worsened by the duration of the 7-day retry window leading to a collision between when a rule was expected to complete and when the next actually starts.  (In my opinion, these are brittle solutions in the usage of rules, even though they are sometimes required.)

Knowing that something stopped in the middle of the chain may help customers intervene in a timely manner.

Thanks again!

chris sieverts
Contributor
March 11, 2025

Reading this makes me wonder that perhaps there would be value in creating a configurable dashboard that pulls key administrative information from the audit logs. Don't have this stuff buried, have it displayed.  

Like • Bill Sheboy likes this
Bill Sheboy
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
March 11, 2025

Hi @chris sieverts 

I hypothesize as soon as there is a public REST API for automation rules and the audit logs, that will be possible: https://jira.atlassian.com/browse/AUTO-51

Kind regards,
Bill

Like • chris sieverts likes this
chris sieverts
Contributor
March 11, 2025

that's a step in the right direction. Thank you @Bill Sheboy 

Like • Bill Sheboy likes this

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events