Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in
Deleted user
0 / 0 points
Next:
badges earned

Your Points Tracker
Challenges
Leaderboard
  • Global
  • Feed

Badge for your thoughts?

You're enrolled in our new beta rewards program. Join our group to get the inside scoop and share your feedback.

Join group
Recognition
Give the gift of kudos
You have 0 kudos available to give
Who do you want to recognize?
Why do you want to recognize them?
Kudos
Great job appreciating your peers!
Check back soon to give more kudos.

Past Kudos Given
No kudos given
You haven't given any kudos yet. Share the love above and you'll see it here.

It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

How do you manage accurate uptime?

I am curious about how people are managing accuracy with incidents.  

With manual creation of incidents, during the creation of those incidents you may not have all the details of exactly when an outage has occurred.  Only after things are restored and teams review logs they can determine the actual down time.

For example:

  • 12:25am - person on call gets alert and starts doing initial triage
  • 12:30am - person creates Statuspage incident specifying major outage.
  • 12:35am - person begins restoration process
  • 1:15am - person completes sanity tests to ensure service has been restored and updates the Statuspage incident specifying service has been restored.

According to incident creation & resolution, the downtime is calculated as 45 minutes (12:30am to 1:15am)

That said, after looking at the logs, we find that the service outage really started at 12:15am and that the servers were fully functional at 12:45am (which was when sanity testing started)

According to the logs, the downtime was actually 30 minutes.

Do you go back and edit the incident either by editing the incident times or do update the down time directly on the component uptime history?  Are there other options?


Thanks in advance!

1 comment

Nick Coates Community Leader Jun 07, 2021

Hi @Greg Lee 

Welcome to the Atlassian Community.

At Broadcom if the automated calculation doesn't accurately portray the incident duration, after the incident has closed & the RCA timeline has been generated we would go in and change the components uptime.

We do this quite frequently across all 14 of our pages and it seems to work. We wouldn't bother updating the incident as we would generate the RCA report that goes to customers.

Hope this helps.

Thanks,
Nick

Like Greg Lee likes this

Comment

Log in or Sign up to comment
TAGS
Community showcase
Published in Statuspage

New feature: Slack notifications for Statuspage

We're excited to announce the release of a long-requested feature on Statuspage. Now visitors to your status page can subscribe to get notified in Slack when you report an incident or maintenance. Th...

1,536 views 2 15
Read article

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you