Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

SLA Breached. Now What? 5 steps to stop the damage and prevent the next one

The ticket sat in queue for 6 hours past the SLA target. Nobody noticed – until the client emailed your VP directly. Now there's an incident thread, a meeting request, and a very uncomfortable conversation about "what went wrong." The breach itself took minutes to happen. The aftermath will take days to manage.

Most teams know SLA breaches are inevitable at some point. What fewer teams have is a clear, practiced response – something that goes beyond "escalate to the manager" and actually covers the steps from detection to resolution to making sure it doesn't repeat. Here's how to run that process.

🚨 Step 1. Detect it before your client does

The worst version of a breach is when the customer discovers it before you do. It signals that nobody was watching – and it hands control of the narrative to someone who's already frustrated.

This is where pre-breach notifications matter. In native Jira Service Management, you can set up basic SLA tracking, but proactive alerting – especially "the SLA will breach in 30 minutes" type of notifications – requires either custom automation rules or an additional layer of tooling. SLA Time and Report for Jira handles this with configurable breach notifications: you can set alerts at any percentage of the SLA cycle (e.g., at 75% elapsed time, at 90%, or immediately on breach), and send them to specific Slack channels, assignees, or team leads – with custom message templates that include ticket context.

Знімок екрана 2026-05-12 о 23.05.25.png

The goal is simple: the first person to know about an impending breach should be on your team, not the client.

👥 Step 2. Get the right people on it immediately

Once a breach is detected (or has just happened), the first operational step is identifying ownership. Not "who's responsible in a blame sense" – but who can actually act right now.

Pull up the ticket. Check: Is it assigned? Is the assignee available? Is there a blocker – waiting on a third party, waiting on the customer, stuck in a status it shouldn't be in? These questions take 90 seconds to answer if you have visibility into the current SLA state. They take much longer if you're digging through Jira filters manually.

A common pattern: the ticket technically has an assignee, but it's been sitting in "Waiting for review" for 4 hours and nobody looped in the reviewer. The breach wasn't caused by workload – it was caused by a handoff gap. Knowing this immediately changes who you alert.

If you use automated actions in SLA Time and Report, you can pre-configure what happens when a breach occurs: reassign the ticket to a backup agent, bump the priority to Critical, or post a Slack message to the on-call channel. Some of these responses can fire automatically without any manual intervention – which matters a lot for breaches that happen overnight or outside business hours.

Знімок екрана 2026-05-12 о 23.07.00.png

🔍 Step 3. Diagnose before you communicate

This is the step most teams skip in the scramble – and it's the one that protects you in the client conversation. Before you reach out to the customer, take 5–10 minutes to understand what actually happened.

Was the SLA target realistic for this ticket type? Did the timer start at the right moment, or was there a misconfiguration that caused it to run during non-business hours? Was the ticket stuck in a paused state that should have been active – or the reverse? Was there a surge in ticket volume that day that overwhelmed the team?

The reason this matters: if you have a call with your client and say "we're sorry, we missed the deadline" – that's an apology. If you call and say "we missed the 4-hour response target because the ticket was mis-routed during a spike in P1 volume – here's what we're putting in place" – that's a professional service recovery. The second version only happens if you did the diagnosis first.

In SLA Time and Report, the SLA Grid Report and Chart Reports show exactly when the SLA clock started, when it paused, and where time was spent across the ticket lifecycle. You're not guessing – you can see the actual timeline and identify precisely where the process broke down.

Знімок екрана 2026-05-12 о 23.09.40.png

📞 Step 4. Communicate: first the solution, then the apology

The sequencing here matters more than the wording.

Don't contact the customer while you're still figuring out what happened. Contact them when you have two things ready: a status update on what's being done right now, and a realistic timeline for resolution. The message doesn't need to be long – it needs to be honest and specific.

Something like:

"We've identified that your ticket missed our 4-hour response target. It's now assigned to [name] and marked as our highest priority. We expect to have an update for you within [X hours]. We'll follow up directly – no need to chase us."

What you want to avoid: vague reassurances ("we're looking into it"), over-explaining the internal cause before you have the fix, or silence, which the customer interprets as nobody working on it.

After the situation is resolved, a follow-up communication is worth the 5 minutes it takes. Thank them for their patience, confirm what was done, and if the breach was significant, acknowledge it directly. Some teams offer a small concession or gesture – that's a judgment call based on the client relationship.

🔁 Step 5. Run the retrospective (while it's still fresh!)

This is the step that separates teams who keep breaching the same SLAs from those who actually improve. The retrospective doesn't need to be a formal post-mortem with a 15-slide deck. It needs to happen within 48 hours of the breach, while the context is still accessible.

Three questions worth answering:

Was this breach predictable? Look at the last 30 days of data for the same ticket type or project. If you see a pattern – specific days of the week, specific request categories, specific agents – the breach wasn't a one-off. It's a systemic gap.

Was the SLA goal itself realistic? Sometimes teams set targets based on what they promised years ago, without revisiting whether current team size, volume, and complexity still support them. A breach can be a signal that the agreement itself needs renegotiation.

Did the tooling fail or the process fail? If your team didn't know the ticket was at risk until after the breach, the detection process failed. If they knew but couldn't act – that's a workload or priority problem. These have different fixes.

In practice: pull a sample of 20–30 breached tickets from the last month using the SLA Grid Report in SLA Time and Report. Group them by ticket type, assignee, and time-to-breach. In most cases, you'll find that 60–70% of breaches cluster around 2–3 root causes. That's where you focus.

Знімок екрана 2026-05-12 о 23.08.54.png

Once you've identified the patterns, it's worth reviewing your SLA configuration – start/stop/pause conditions, calendar settings, and whether automated actions are in place to catch at-risk tickets earlier. A breach that happens because the SLA clock ran over a weekend (when it shouldn't have) is a calendar misconfiguration, not a team performance issue.

Знімок екрана 2026-05-12 о 23.16.38.png


Final thoughts

An SLA breach is always uncomfortable – but it's also one of the clearest signals you'll get about where your service delivery process actually breaks. The teams that improve over time aren't the ones that never breach; they're the ones that have a practiced response and a habit of looking at the data afterward.

The five steps here – detect early, get ownership fast, diagnose before communicating, communicate with context, and run a real retrospective – won't prevent every breach. But they'll prevent most of the avoidable ones, and they'll make the unavoidable ones much less damaging to client relationships.

If you're working in Jira and don't currently have visibility into which tickets are approaching breach in real time, or if your post-breach analysis consists of "check the comments and try to remember what happened," it's worth looking at what SLA Time and Report for Jira can add to your setup – specifically the notification system, automated actions on breach, and the reporting layer that makes retrospectives something you can actually do in under an hour.

Знімок екрана 2026-05-12 о 23.01.44.png

What does your current breach response look like? Is there a step in this process that consistently breaks down for your team? 

0 comments

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events