Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in
Celebration

Earn badges and make progress

You're on your way to the next level! Join the Kudos program to earn points and save your progress.

Deleted user Avatar
Deleted user

Level 1: Seed

25 / 150 points

Next: Root

Avatar

1 badge earned

Collect

Participate in fun challenges

Challenges come and go, but your rewards stay with you. Do more to earn more!

Challenges
Coins

Gift kudos to your peers

What goes around comes around! Share the love by gifting kudos to your peers.

Recognition
Ribbon

Rise up in the ranks

Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!

Leaderboard

Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,551,898
Community Members
 
Community Events
184
Community Groups

Outage on 4 January

Hi all, 

First off, the changes we did in November are bearing fruits already, resulting in a 2 month streak with no incidents. Yay!

However, Jira Product Discovery was down for 6.5 hours on 4 January. 

 

What was the impact?

No one was able to access their projects in that time, and there were no updates on the Statuspage. 

 

What happened? 

A combination of things: 

  • Atlassian has a deployment freeze over the end of year break, and deployments restarted automatically on January 4. In this case the changes piled up, the front-end was deployed automatically while the necessary back-end changes were not yet in production. And Kaboom.
  • Because the product is still in beta with a small team, we do not yet have on-call support out of office hours (with team members in the US and Europe). Up until now all incidents we faced were after back-end deployments, so we optimized for having people available to watch production for a few hours after that. In this case the deployment happened at the worst possible time when everyone was asleep, so the incident wasn't known to the team until 6 hours after it started. Resolving the incident only took a few minutes.

 

What are we changing?

We are taking this very seriously and are taking measures to address the root cause of the incident, and improve the incident response: 

  • To address the root cause of the incident we're looking at how to prevent deployments that happen out of sequence to prevent these kinds of incidents in the future.
  • To improve the incident response, over the next few weeks we're implementing on-call support for out of office hours.

 

Thank you for your continued support!

0 comments

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events