Reliability tracking - incident vs. problem

We have a non-software product and are trying to figure out if JIRA can he used to track incidents to help build a reliability profile.

An incident is an occurance of a problem.

So if my problem is that the product burst into flames, each time one burst into flames would be a separate incident.

I suspect some of this is terminiology, but everyone uses different terms for these things.

1 answer

1 accepted

Accepted Answer
0 votes

At it's most basic usage, Jira is an "issue tracker". Most people use it more as a "bug tracker", which is one use for an issue tracker, but it can easily be used to track "incidents" as well. Exactly as you say, each time system X goes "bang", you raise a new incident. When your team puts out the fire, you move it to a status that says "panic over", and then, if you're doing proper incident management, your process will spawn out into "what caused it", "how do we prevent it", "what have we learned" and so on.

(As for terminology, I think you've used your English in exactly the way I would. I then got a bit fuzzy about the differences between "issues" and "bugs", but I hope I've been clear enough!)

Thanks Nic. I think that's how we want to use it. I'll see if I can design the workflow to do what I need.

If your number of problems you are tracking is relatively small (a few hundred) you could use components to associate an incident with the problem type.

If you have a large set of problem types to track then I might consider using issues as problems and then either linking the incident issue to the problem issue or turning the incident issue into sub-issue. Linking is a bit more generic since you can incidient issues linked to muliple problem issues.

If you problem tree is more than two levels deep you will need to use a hierachy plugin to help with the associiations.

For reporting you will probaly need some assistance from some linking related plugins, but it depends on the approach you will be taking.

I am assuming you want to know how incidents of a particular problem type occurred today, week month, quaruterly, yearly).

You can put your various solutions into the problem type comments

As Norman says, it's well worth thinking about the long term.

A good approach is to have "incidents" which essentially go through status like:

  • New bad thing happen
  • Support is looking at it
  • It works now

... However, there's a whole second phase which is often neglected

  • We don't know why
  • We do know why, and we need to work on prevention and/or cure
  • Are there more things we can't directly fix *
  • We've fixed everything we can

The second phase is a root-cause analysis function, and the line with the * on the end is very important - that's where you spawn out the more generic "problem" items.

Personally, I'm not sure "heirarchy" reporting will help you much here - the point of creating "problems" is that you have, well, a problem - you might have found it because of X, which is now fixed, but it's not actually important how you found the problem.

Without a better understanding of the problem space I have really no idea if hierachical structure would be useful or not. Just allowing for the concept if you have structure to the problem type space such as (client down -> hardware, network, software, (Security -> Denial of Service, ...))

The good thing is you can experiment to see what works for you.

Great feedback, everyone.

Heirarchy reporting seems to be how it's done in JIRA and some other tools. Perhaps the piece I'm missing or not getting is how to figure out the reliability numbers.

Ideally I'd tie JIRA to my bill of materials + some non-bill components like operator error. I'd then track incidents. These, when understood, would spawn problems to be fixed in a many-to-many relationship. That is to say one incident can be caused by multiple problems, and one problem can cause multiple incidents.

To prioritize which problems I work on, I'd like to be able to report which problems have the most associated high-priority incidents.

To understand component reliability, I'd also like to be able to see which components have the most incidents. I'd compare this to the "exposure" of that component in terms of time in service, and be able to calculate effective MTBF.

When I fix a problem that supposedly caused incidents, I'd like to ensure there are no more incidents relating to that problem. (Note that in the world of hardware products, fixes have to be installed.)

We're getting closer.

Suggest an answer

Log in or Sign up to answer
Community showcase
Posted Sep 25, 2018 in Jira

Atlassian Research Workshop opportunity on Sep. 28th in Austin, TX

We're looking for participants for a workshop at Atlassian! We need Jira admins who have interesting custom workflows, issue views, or boards. Think you have a story to sha...

465 views 7 5
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you