Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in
Celebration

Earn badges and make progress

You're on your way to the next level! Join the Kudos program to earn points and save your progress.

Deleted user Avatar
Deleted user

Level 1: Seed

25 / 150 points

Next: Root

Avatar

1 badge earned

Collect

Participate in fun challenges

Challenges come and go, but your rewards stay with you. Do more to earn more!

Challenges
Coins

Gift kudos to your peers

What goes around comes around! Share the love by gifting kudos to your peers.

Recognition
Ribbon

Rise up in the ranks

Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!

Leaderboard

Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,463,426
Community Members
 
Community Events
176
Community Groups

Atlassian Post-Incident Review on the April 2022 Outage

Hi Atlassian Community,

I'm Stephen Deasy, Head of Engineering at Atlassian. 

Earlier this month, several hundred Atlassian customers were impacted by a site outage. We have published a Post-Incident Review which includes a technical deep dive on what happened, details on how we restored customers sites, and the immediate actions we’ve taken to improve our operations and approach to incident management.

To our customers and our partners, we thank you for your continued trust and partnership. We hope the details and actions outlined in our Post-Incident Review demonstrate that we’ll continue to provide a world-class cloud platform and a powerful portfolio of products to meet the needs of every team.

3 comments

Commitment, Focus, Openness, Respect, Courag.

Wisdom comes by suffering 👍

I have read the PIR and found it very informative. One point that wasn't covered was the order in which sites were restored. It would be very helpful to understand the criteria for the order of site restoration, was it based on licence SLA's (e.g. Enterprise first then Premium), or was it based on order of deletion, or some other criteria?

Like Kalin U likes this

I have to admit...this report show some serious issues in both technical architecture and collaboration. The fact that this could even happen is baffling.

The fact that I can see no mention on securing the API to ensure each call have a single point of failure instead of multipoint based on assumed input values, or that there are clear instructions on the steps that are needed, preferably by that first team that prepared the actions (and presumably verified them) does not put my mind at ease.

Even if this is a one time accident, it shows a lack of architectural control and poor disaster preparations.

Comment

Log in or Sign up to comment
TAGS

Atlassian Community Events