It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

Why does each node in Jira data center have its own copy of lucene indexes

From reading this article on how Jira indexing works in a data center environment; each node has its own copy of the index. 

Reading the Elastic Search documentation on how it functions in a cluster; there is only one instance of an index which may be split across each node.

My question is why does each Jira node have its own copy of the index? It isnt efficient and could mean getting different results dependent on which node you are on.

3 answers

1 accepted

0 votes
Answer accepted
Nic Brough Community Leader Aug 03, 2017

It's actually highly efficient, in the one single area that matters for an index file: speed.

Shared file systems are simply too slow to serve up index files at the rate even a lightly used JIRA system needs it to. 

Thanks for the reply Nic,

In our multi node data center environment we have a cron job that triggers an overnight background re-index on an admin node. Once the index is built this gets copied across to the other nodes. As events are generated these are broadcast to the other Jira nodes. 

Every night we trigger the background re-index  on the admin node to avoid the risk of indexes becoming out of date.

Does this setup sound correct?

Nic Brough Community Leader Aug 07, 2017

Not really.  There's generally no need for regular re-indexing.  I'd take a look at why you're doing this regular indexing and see if it's actually useful.

We are suffering with ongoing issues with our Jira data centre nodes having corrupted indexes.

Atlassian havent got this right at all. For data centre environments using netapp or other shared storage I dont agree with you about the speed issue and it being faster having indexes locally.

A more scaleable robust solution would be where each node contributes towards building  an index in ElasticSearch (which can itself be multi node and resilient).

Splitting the reindex task across nodes would also reduce the reindex time from the several hours it takes currently.

Nic Brough Community Leader Dec 06, 2017

The fastest shared storage I've measured is 10 times slower than equivalent local storage.  A shared elastic search would be nice, but it would still introduce far too much delay to be useful.

Paul, we cannot used a Shared storage system as Nic pointed out for the Lucene indexes due to the access latency of non-local storage. In fact we highly recommend SSD devices for local index storage as well for the best reindex/search performance

Your problem sounds like the JIRA DC bug we have run into (among the various ones) in 7.2.12 and lower. And 7.5.2 as well that you are using. I saw another thread from you about JDC Node hanging and I highly recommend upgrading to 7.6.4 or higher (Enterprise Release). We have seen considerable instability with JDC in 7.4/7.5 and most of the issues found have been  addressed in 7.6.x. Besides staying on an enterprise release will get you significant advantages detailed here

Also you shouldn't have to sync indexes. I recommend if out-of-sync indexes ever becomes a concern to pull the node out of the cluster (out of the LB) and do a foreground reindex and put the node back into the cluster after verifying that the indexing was successful. The indexes among the nodes will be synced automatically and there shouldn't be any need for you to copy indexes over to each of the node. In fact I recommend you don't do this because you might lose any indexes built in the interim. Please see the KB on "Reindexing JIRA Datacenter without downtime"

Hi Kavitha,

Many thanks for your reply.

If a search returns results in a few seconds then I'm happy and for that to happen it doesn't mean indexing and search have to be sat on the same box as Jira; that doesnt scale plus you could get inconsistent resultants dependent on which box you hit.

Of the large scale enterprise systems I have worked on they have an Elastic cluster with application servers carving up the task of indexing, feeding the Elastic cluster with Elastic serving out search results. These are large scale systems using something like a netapp metro cluster for storage. I hear what you and Nic have said about local storage however good design of your Elastic Search cluster can mitigate this.

Jira indexing only runs on one node, so in our 4 node Jira data centre instance, indexing is taking 4 times longer than if it had been designed to allow all 4 nodes to participate in indexing.  In fact its longer than that as it has to copy the indexes to the other 3 nodes after the index is built.




Suggest an answer

Log in or Sign up to answer
Community showcase
Posted in Portfolio for Jira

Program managers, we need your help! We want to learn about how you plan work for a team of teams

Hi community members, My name is Erika and I’m a product manager at Atlassian. We’re currently investigating how teams are planning work at the program level. We understand that every team in a tea...

160 views 0 5
Join discussion

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you