You're on your way to the next level! Join the Kudos program to earn points and save your progress.
Level 1: Seed
25 / 150 points
Next: Root
1 badge earned
Challenges come and go, but your rewards stay with you. Do more to earn more!
What goes around comes around! Share the love by gifting kudos to your peers.
Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!
Join now to unlock these features and more
The Atlassian Community can help you and your team get more value out of Atlassian products and practices.
Your companies portal goes down. You're immediately in firefighting mode, trying to fix the bug to restore the portal. But are you being transparent with your customers? Are your support teams starting to see queues with tickets, tweets or chats via your support portal?
Good incident response isn't just about getting services back up quickly - it's about being upfront and frequently updating your customers.
I recently ran my first Atlassian Team Playbook with our Incident Communication Team. The playbook we chose was the 'Incident response communications', and would like to share my thoughts.
We have over 30+ enterprise cloud products, with over half of these using Statuspage. The team responds to over 20 engineering/dev teams with several incidents a day that impact our customers.
We needed time to step back and take a look at how well are we doing with communicating.
I've been fortunate enough to have one of the Atlassians run several team playbooks for our service teams here in Reading. Based on the positive reaction, I knew playbooks were the way forward. Also, why reinvent the wheel? Atlassian puts a lot of research and dedication to these playbooks, which are completely free.
I stumbled upon the Incident response communications playbook, and knew this was what we needed. I shared this with the managers, and they asked me to go ahead and schedule a play session with the team.
I worked with one of our Incident Communication Managers to look at the past 30 days worth of incidents to narrow down what incident we wanted to focus on. Out of the handful of incidents, we chose one and started to gather the incident details collaborating on a Confluence page.
As the entire team is remote, we had to come up with a way to for the team to be interactive and engage. If the team was in the same office, we could have used a whiteboard wall to draw the timeline and use sticky notes, but that wasn't an option.
Therefore I took the timeline from Confluence and created it as a Slide. We then used "sticky notes" on the presentation using a Whiteboard in Webex (we use this as a our video conferencing tool). One of the team members was the scribe and took the whiteboard notes and added them to the timeline in the slide. The team focused on the entire incident from start to finish: from the point that engineering had an alert from monitoring to our last update. It was interesting to see that we were actually posting before we had internal comms.
Once the team assessed the incident, we put an action plan in place which included people, process and technology actions. We assigned each action to an "owner" and a recommendation on the next steps.
A few weeks later we re-grouped and the owners provided an update on the actions which has now improved our future communications.
But this isn't a one-off exercise. It's something that we will continue to repeat every few months to help refine and improve our incident communication process.
Have you ran this play? What did you find? Have you made any changes to your incident process? Let me know!
Thanks for reading!
Nick Coates
Product Owner - Symantec Status
Nick Coates
Rising StarProduct Owner (Service Status)
Broadcom
United Kingdom
39 accepted answers
3 comments