You're on your way to the next level! Join the Kudos program to earn points and save your progress.
Level 1: Seed
25 / 150 points
1 badge earned
Challenges come and go, but your rewards stay with you. Do more to earn more!
What goes around comes around! Share the love by gifting kudos to your peers.
Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!
Join now to unlock these features and more
The Atlassian Community can help you and your team get more value out of Atlassian products and practices.
I've been doing this for years and run into all sorts of ways of monitoring at many levels. Most of it was good and useful, some less so. I do not know a lot about it beyond "you should monitor" and "there are good tools out there to do it"
So I'm interested to see how everyone here does it (or not). Some starter questions:
Timely question !
I would first like to know the parameters we should monitor. IMHO, there are many metrics that can be monitored using system-level utilities (like disk space, memory/swap usage, cpu, etc.), but they would mostly symptomatic of a deeper problem with the stack.
Anyone tried sending custom metrics to Datadog?
My team monitors everything we can. We real time collect 100+ metrics and render them on dashboards. When Bitbucket is having an issue (its slow, users reporting errors, etc) the metrics enable us to quickly focus it on what the problem is and possible remediations.
Some metrics that have proven to be critical:
We also monitor the usual CPU, system memory, JVM memory, JVM GC, disk I/O etc but historically haven't figured into health or scaling issues.
The list above is also useful when load testing -- you can determine bottlenecks and tune it to squeeze out more performance.
For data collection we use a custom java agent that sends metrics directly to our internal home-grown data collection datastore.
Whatever tool you use it is some effort to set it up but worth the effort. Its a true force multiplier for ops support and scaling/tuning.