Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Next challenges

Recent achievements

  • Global
  • Personal

Recognition

  • Give kudos
  • Received
  • Given

Leaderboard

  • Global

Trophy case

Kudos (beta program)

Kudos logo

You've been invited into the Kudos (beta program) private group. Chat with others in the program, or give feedback to Atlassian.

View group

It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage
Highlighted

How are you monitoring your Atlassian stack?

I've been doing this for years and run into all sorts of ways of monitoring at many levels.  Most of it was good and useful, some less so.  I do not know a lot about it beyond "you should monitor" and "there are good tools out there to do it"

So I'm interested to see how everyone here does it (or not).  Some starter questions:

  • What tools are you using for it?
  • Has the recent addition of JMX stuff in Jira been useful?
  • Barring the obvious "it responds to standard requests", what do you look for in general?
  • When you can quantify, what metrics do you use?

3 comments

This is definitely a question of high importance.

I had a discussion with a friend company some months ago when I vaguely recommended them New Relic to monitor their Confluence instance.

They told me they already tried, but New Relic has slowed down their Confluence to a degree where end-users actually noticed that (which means: significantly). I can't confirm or reject it, this is just something I heard.

They also told me they have a periodic "health check" procedure that they execute on their Atlassian applications instances every week. This is a collection of home-grown scripts, SQLs and so. Not strictly related to monitoring, but it improves their applications' health and uptimes.

Timely question !

 

I would first like to know the parameters we should monitor. IMHO, there are many metrics that can be monitored using system-level utilities (like disk space, memory/swap usage, cpu, etc.), but they would mostly symptomatic of a deeper problem with the stack. 

 

Anyone tried sending custom metrics to Datadog?

My team monitors everything we can.  We real time collect 100+ metrics and render them on dashboards.  When Bitbucket is having an issue (its slow, users reporting errors, etc) the metrics enable us to quickly focus it on what the problem is and possible remediations. 

Some metrics that have proven to be critical:

  • SCM hosting tickets
  • SSH sessions active
  • SSH sessions errors
  • DB connection pool in-use / max
  • Requests/sec
  • Response latency
  • Network incoming/outgoing connections
  • Network TCP errors
  • Network throughput

We also monitor the usual CPU, system memory, JVM memory, JVM GC, disk I/O etc but historically haven't figured into health or scaling issues.

The list above is also useful when load testing -- you can determine bottlenecks and tune it to squeeze out more performance.

For data collection we use a custom java agent that sends metrics directly to our internal home-grown data collection datastore.  

Whatever tool you use it is some effort to set it up but worth the effort.  Its a true force multiplier for ops support and scaling/tuning.

Comment

Log in or Sign up to comment
TAGS

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you