Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Next challenges

Recent achievements

  • Global
  • Personal


  • Give kudos
  • Received
  • Given


  • Global

Trophy case

Kudos (beta program)

Kudos logo

You've been invited into the Kudos (beta program) private group. Chat with others in the program, or give feedback to Atlassian.

View group

It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

Want to know how Atlassians monitor their enterprise deployments?

ddomingo Atlassian Team Nov 13, 2018

At Atlassian, we believe in our own products – that's why we use them, even at the enterprise level. Doing so gives us first-hand experience of how they perform at scale; to do this, we monitor each instance closely.

We also believe in transparency. That's why we published reference architectures that describe how we monitor some of our Data Center deployments. As you start forming your own monitoring strategies, you can use these references to guide your decisions.

How do you monitor your enterprise deployments? Use this thread to share your best practices with the community.


We have a number of different ways we monitor Jira.

1. We have a script that creates, updates and deletes an issue every minute on each node of our 4 node cluster. The time for each of those tells us what users are experiencing

2. We use the Health Check REST endpoint to monitor all of those health checks every minute

3. We use Jolokia to provide REST resources for JVM JMX metrics so we can monitor metrics such as Full GC

4. We have various services that check Jira log files for certain critical messages, including the number of ERROR and WARN messages per minute

5. We have custom scripts that extract metrics from the database, and also the database server metrics

6. We monitor the Jira node server metrics as for any other machine. CPU utilization, free disk space, disk IO, network IO

7. We have a scraping script to monitor the outgoing mail queue size, and flush as necessary.

Like # people like this

Then we have a dashboard that combines all the metrics, and alerts that are triggered by levels and duration. The alerts are processed by a custom service that decides who to contact depending on on-call schedules and personal preferences. The alert info also contains links to Wiki pages about handling the problem.

Which software you use for your dashboard that combine all the metrics ?  I have been playing with Tableau but not sure it will be final dashboard as I have hundreds of millions of rows to process (accesslogs/app logs/etc)

Example : on my confluence dashboard (Tableau) I have:

- a TOP slow pages that get looked out each day, 

- user that consumed the most CPU with thread id,

- page that consume the most server side that were viewed the most by day to see who use automation to refresh pages too often.

All those data get sent from Apache NIFI to Mysql


We use a custom app for combining the metrics on different pages

Ah ok.

Us for example tableau (one of the metric for confluence) look like this, we have all the ipaddress + name associated with that computer.  Just parsing the accesslog save us lot of performance problems.  


ddomingo Atlassian Team Nov 15, 2018

Thanks for sharing, Matt! Also, I found this part of your strategy interesting:

1. We have a script that creates, updates and deletes an issue every minute on each node of our 4 node cluster. The time for each of those tells us what users are experiencing

I'll check with our team what they think about it, and whether we're doing something similar on any of our production or test instances. For one of our Confluence DC instances, we monitor the site's Apdex and alert support whenever its response time drops to 4 seconds.

Yes, it's not a common approach but we like it because it gives us a sense of what our customers are experiencing Jira. We use an internal Jira user as the service account, with the internal user directory first in the list of directories. So the absolute values for that metric are probably a little better than our AD users experience.

One odd thing about creating and deleting an issue each minute for years on end is that our issue keys have grown large, e.g. JPT-4612542. But that hasn't broken anything yet. OpenJDK has even larger numbers in their issue keys.


Log in or Sign up to comment

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you