How to set up an incident workflow from the VP of Engineering at Sentry

Hey Atlassian community,

I help lead engineering at Sentry, an open-source error-tracking and monitoring tool that integrates with Jira. We started using Jira Software Cloud internally last year, and learned some best practices from setting up our own incident workflow using Jira and Sentry along the way.

As we grew Sentry’s engineering team from five to thirty, we realized that our own incident workflows weren’t scaling efficiently. As a result, we started building out workflows that combined the potential of Jira and Sentry. We found the time to learn Jira’s workflow rules and automation capabilities to be well spent, as there’s power under Jira’s hood that allows for automation of repetitive tasks -- like incident responses.

The process and learnings from our revised workflow served as the catalyst for launching a new Jira integration for Sentry. Of course, we’ve also adopted this integration into our own automated error and user-reported workflows.

Automated error workflow

It’s probably not a surprise that we’ve always used Sentry to debug Sentry. But as we started using Jira to organize project planning, we wanted to bring more visibility about production errors – and who’s working on them – into our sprint planning process. To do that, we created a new workflow built on top of Sentry’s new Jira integration.

The workflow looks something like this:

pasted image 0.png

When a production error is recorded by Sentry, we use Sentry’s rules to alert the team that “owns” the affected codebase (e.g., if the error occurred in /app/api.py, the team responsible for API development is alerted). That team then uses Sentry’s Jira integration to create a linked ticket inside Jira, which gets placed on their Kanban board.

issue.png

We don’t just use Sentry to alert us to new errors; we also use Sentry’s dashboard to determine the impact, scope, and cause.This can be done a number of ways: looking at histograms of the error frequency, examining the stack trace, or connecting the error to a recent deploy and recognizing that the impact is indeed severe. Knowing this information not only helps us get to resolution faster, it also helps us prioritize the work on our sprint boards.

pasted image 0 (1).png

During the investigation phase, any comments recorded in Sentry are synced to Jira. We use this information to prioritize the bugs and time an engineer might spend fixing them.

activity.png

When a fix is committed, an engineer marks the ticket as resolved in Sentry, which is also automatically reflected in Jira. If Sentry later discovers a regression (e.g., the fix didn’t actually solve the problem), the Sentry issue and Jira issue are both automatically marked as unresolved.

One significant value-add to Sentry’s Jira integrations is that Sentry “verifies” that the Jira ticket is resolved. Traditionally, when your code fails to fix a bug, your support team notifies you, and you’re forced to play a back-and-forth game of debugging with users. Sentry eliminates that struggle by giving you the information that you need to debug the error and by ensuring that you don’t get away with regressing later.

User-reported error workflow

Sentry’s engineers also use a separate board and workflow for capturing errors and incidents that are surfaced directly from users. This workflow helps time to resolution because our support team does the legwork communicating with users to determine cause and scope, while the engineering team focuses on the fix (or fixes).

unnamed (2) copy.png

 

 

Most emails coming through Zendesk -- our customer support tool -- are focused on product help (e.g., “Where do I find feature X?”). We also receive emails where the product isn’t working as expected. Our support engineers triage these issues into our “Customer Issues” Jira board.

We use Jira’s Zendesk plugin to synchronize helpdesk comments into Jira so that engineers get full visibility into the customer-support conversation. Similarly, issue resolution in Jira is noted in Zendesk with an internal comment, keeping status communication consistent.

The “Customer Issues” board has a “verified” status indicating when an issue has been resolved by an engineer, verified by the support team, and communicated to the customer. Custom triggers automatically re-assign the Jira ticket to the support team member who opened the issue, removing any confusion about next steps.

subs.png

Essentially, we use Jira to present an engineering view of active customer issues. However, there’s no requirement to browse to the board itself; we use a custom issue subscription to let managers know about support-surfaced issues as they’re introduced. When managers want to take action, they can link or clone these issues into their own sprint boards to make sure the work is prioritized and assigned appropriately.

Check out Sentry on the Atlassian Marketplace to give these (or other)  incident workflows a try yourself.

 

0 comments

Comment

Log in or Sign up to comment

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you