Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Outage on 4 January

Hi all, 

First off, the changes we did in November are bearing fruits already, resulting in a 2 month streak with no incidents. Yay!

However, Jira Product Discovery was down for 6.5 hours on 4 January. 

 

What was the impact?

No one was able to access their projects in that time, and there were no updates on the Statuspage. 

 

What happened? 

A combination of things: 

  • Atlassian has a deployment freeze over the end of year break, and deployments restarted automatically on January 4. In this case the changes piled up, the front-end was deployed automatically while the necessary back-end changes were not yet in production. And Kaboom.
  • Because the product is still in beta with a small team, we do not yet have on-call support out of office hours (with team members in the US and Europe). Up until now all incidents we faced were after back-end deployments, so we optimized for having people available to watch production for a few hours after that. In this case the deployment happened at the worst possible time when everyone was asleep, so the incident wasn't known to the team until 6 hours after it started. Resolving the incident only took a few minutes.

 

What are we changing?

We are taking this very seriously and are taking measures to address the root cause of the incident, and improve the incident response: 

  • To address the root cause of the incident we're looking at how to prevent deployments that happen out of sequence to prevent these kinds of incidents in the future.
  • To improve the incident response, over the next few weeks we're implementing on-call support for out of office hours.

 

Thank you for your continued support!

0 comments

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events