Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in
Celebration

Earn badges and make progress

You're on your way to the next level! Join the Kudos program to earn points and save your progress.

Deleted user Avatar
Deleted user

Level 1: Seed

25 / 150 points

Next: Root

Avatar

1 badge earned

Collect

Participate in fun challenges

Challenges come and go, but your rewards stay with you. Do more to earn more!

Challenges
Coins

Gift kudos to your peers

What goes around comes around! Share the love by gifting kudos to your peers.

Recognition
Ribbon

Rise up in the ranks

Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!

Leaderboard

Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,555,754
Community Members
 
Community Events
184
Community Groups

Best practices for monitoring tool test design

Hello, Community! We have a SaaS product where multiple tenants are hosted within a single instance. 

What is the best practice for implementing automated checks from a monitoring tool where the behavior could be different from one tenant to the next? If you run a pass/fail check against a given tenant and the test passes, meanwhile another tenant could be having issues with the same component. The fact that the check did not fail in the test tenant proves that the root cause of the other tenant's issue is not a system wide incident. Is this method of checking sufficient? The alternative would be to gather statistics and run a check against the average of the data points collected from all tenants in the system. In this case, one tenant having a tenant-specific issue could negatively impact the average, causing the check to fail erroneously. 

Thanks for any guidance you may have :)

1 answer

1 vote
Shivam Naik
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
Nov 10, 2022

Hi @Corey Garretson ,

Happy to help!

Statuspage itself wouldn't be able to perform those checks, but it could certainly work in tandem with another service like Pingdom that would! Another option would be to use Opsgenie, again with a service that tests for pass/fail on different components, and then alert based on that. That being said, I would recommend that you look into services like Pingdom or Datadog to establish that pass/fail test between components, and then use Statuspage or Opsgenie to help notify associated parties to that action could be taken on the error.

Please let me know if you have any follow up questions!

Hi @Shivam Naik , thanks for your reply! We already use another service that runs pass/fail tests on our components, and we pipe the outputs to Statuspage to update status automatically. My question is more around best practices for configuring those tests. Is running the checks against a single test tenant sufficient, or is it better to gather stats from across all tenants and have your Pingdom/Datadog monitoring tool read out a failure for any check where the group average exceeds a threshold? 

Shivam Naik
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
Nov 11, 2022

Hi @Corey Garretson ,

Thank you for the clarification!

I think with Statuspage it would be best to use a Single Tenant so that you can immediately alert based on that failure. The grouping option could work, but I believe it would be more reliant on Pingdom/Datadog assessing results crossing that threshold to alert properly. Both options could work, but for what Statuspage does on its own, the Single Test would be the better option to notify on Components to at least distribute a message stating an irregularity was found and testing is being done to assess whether other tenants are affected

Please let me know if you have any follow up questions!

Like # people like this

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
TAGS
AUG Leaders

Atlassian Community Events