How well are you communicating to customers during an outage?

Your companies portal goes down. You're immediately in firefighting mode, trying to fix the bug to restore the portal. But are you being transparent with your customers? Are your support teams starting to see queues with tickets, tweets or chats via your support portal?

Good incident response isn't just about getting services back up quickly - it's about being upfront and frequently updating your customers.

I recently ran my first Atlassian Team Playbook with our Incident Communication Team. The playbook we chose was the 'Incident response communications', and would like to share my thoughts.

Why did the team agree to run this play?

We have over 30+ enterprise cloud products, with over half of these using Statuspage. The team responds to over 20 engineering/dev teams with several incidents a day that impact our customers.

We needed time to step back and take a look at how well are we doing with communicating.

Choosing a playbook

I've been fortunate enough to have one of the Atlassians run several team playbooks for our service teams here in Reading. Based on the positive reaction, I knew playbooks were the way forward. Also, why reinvent the wheel? Atlassian puts a lot of research and dedication to these playbooks, which are completely free.

I stumbled upon the Incident response communications playbook, and knew this was what we needed. I shared this with the managers, and they asked me to go ahead and schedule a play session with the team.

Preparation

I worked with one of our Incident Communication Managers to look at the past 30 days worth of incidents to narrow down what incident we wanted to focus on. Out of the handful of incidents, we chose one and started to gather the incident details collaborating on a Confluence page.

Confluence_Play.png

Playbook: Engage!

As the entire team is remote, we had to come up with a way to for the team to be interactive and engage. If the team was in the same office, we could have used a whiteboard wall to draw the timeline and use sticky notes, but that wasn't an option.

Therefore I took the timeline from Confluence and created it as a Slide. We then used "sticky notes" on the presentation using a Whiteboard in Webex (we use this as a our video conferencing tool). One of the team members was the scribe and took the whiteboard notes and added them to the timeline in the slide. The team focused on the entire incident from start to finish: from the point that engineering had an alert from monitoring to our last update. It was interesting to see that we were actually posting before we had internal comms.

Screenshot 2019-07-22 at 10.24.17.png

Actions! Actions! Actions!

Once the team assessed the incident, we put an action plan in place which included people, process and technology actions. We assigned each action to an "owner" and a recommendation on the next steps. 

A few weeks later we re-grouped and the owners provided an update on the actions which has now improved our future communications.

But this isn't a one-off exercise. It's something that we will continue to repeat every few months to help refine and improve our incident communication process.

Have you ran this play? What did you find? Have you made any changes to your incident process? Let me know!

Thanks for reading!


Nick Coates
Product Owner - Symantec Status

3 comments

shannyshan
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 22, 2019

@Nick Coates this is an awesome write-up, thank you for sharing and for being a champion of our Team Playbook! I remember you tweeting out this (very impressive) timeline, but it's great to read more about your approach/outcomes. It's also wonderful how you were able to make this work with a fully remote team. Way to get creative with Confluence timelines! 

Have any customers noticed your team doing a better job with incident comms? It would be interesting to see if trust/sentiment seems to change overtime as you continue to make improvements. 

Keep it up!!!

Like Nick Coates likes this
Nick Coates
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 25, 2019

@shannyshan I think it's still early days for us to see any real benefit for our customers. After several runs of the playbook, that is when we will start seeing significant improvements. I'll definitely write a follow up after the next 3 sessions where I will do an analyse on the before and after.

Like # people like this
Tim Keyes
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
January 20, 2020

Hi Nick,

Thank you for the post.  We use the processes as described in the play internally on the Jira Align team as guided by our ticketing process, fields on tickets that facilitate the play, and our incident managers facilitating discussion.  This is the first time I have actually gone through the play.  

When an incident first occurs it is pretty easy to to get distracted and lose focus.  I will definitely bring up the play with our team to ensure communicate internally/externally effectively and efficiently.  

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events