Why does each node in Jira data center have its own copy of lucene indexes

From reading this article on how Jira indexing works in a data center environment; each node has its own copy of the index. 

Reading the Elastic Search documentation on how it functions in a cluster; there is only one instance of an index which may be split across each node.

My question is why does each Jira node have its own copy of the index? It isnt efficient and could mean getting different results dependent on which node you are on.

3 answers

1 accepted

0 vote

It's actually highly efficient, in the one single area that matters for an index file: speed.

Shared file systems are simply too slow to serve up index files at the rate even a lightly used JIRA system needs it to. 

Thanks for the reply Nic,

In our multi node data center environment we have a cron job that triggers an overnight background re-index on an admin node. Once the index is built this gets copied across to the other nodes. As events are generated these are broadcast to the other Jira nodes. 

Every night we trigger the background re-index  on the admin node to avoid the risk of indexes becoming out of date.

Does this setup sound correct?

Not really.  There's generally no need for regular re-indexing.  I'd take a look at why you're doing this regular indexing and see if it's actually useful.

We are suffering with ongoing issues with our Jira data centre nodes having corrupted indexes.

Atlassian havent got this right at all. For data centre environments using netapp or other shared storage I dont agree with you about the speed issue and it being faster having indexes locally.

A more scaleable robust solution would be where each node contributes towards building  an index in ElasticSearch (which can itself be multi node and resilient).

Splitting the reindex task across nodes would also reduce the reindex time from the several hours it takes currently.

The fastest shared storage I've measured is 10 times slower than equivalent local storage.  A shared elastic search would be nice, but it would still introduce far too much delay to be useful.

Paul, we cannot used a Shared storage system as Nic pointed out for the Lucene indexes due to the access latency of non-local storage. In fact we highly recommend SSD devices for local index storage as well for the best reindex/search performance

Your problem sounds like the JIRA DC bug we have run into (among the various ones) in 7.2.12 and lower. And 7.5.2 as well that you are using. I saw another thread from you about JDC Node hanging and I highly recommend upgrading to 7.6.4 or higher (Enterprise Release). We have seen considerable instability with JDC in 7.4/7.5 and most of the issues found have been  addressed in 7.6.x. Besides staying on an enterprise release will get you significant advantages detailed here

Also you shouldn't have to sync indexes. I recommend if out-of-sync indexes ever becomes a concern to pull the node out of the cluster (out of the LB) and do a foreground reindex and put the node back into the cluster after verifying that the indexing was successful. The indexes among the nodes will be synced automatically and there shouldn't be any need for you to copy indexes over to each of the node. In fact I recommend you don't do this because you might lose any indexes built in the interim. Please see the KB on "Reindexing JIRA Datacenter without downtime"

Hi Kavitha,

Many thanks for your reply.

If a search returns results in a few seconds then I'm happy and for that to happen it doesn't mean indexing and search have to be sat on the same box as Jira; that doesnt scale plus you could get inconsistent resultants dependent on which box you hit.

Of the large scale enterprise systems I have worked on they have an Elastic cluster with application servers carving up the task of indexing, feeding the Elastic cluster with Elastic serving out search results. These are large scale systems using something like a netapp metro cluster for storage. I hear what you and Nic have said about local storage however good design of your Elastic Search cluster can mitigate this.

Jira indexing only runs on one node, so in our 4 node Jira data centre instance, indexing is taking 4 times longer than if it had been designed to allow all 4 nodes to participate in indexing.  In fact its longer than that as it has to copy the indexes to the other 3 nodes after the index is built.

 

 

 

Suggest an answer

Log in or Sign up to answer
How to earn badges on the Atlassian Community

How to earn badges on the Atlassian Community

Badges are a great way to show off community activity, whether you’re a newbie or a Champion.

Learn more
Community showcase
Published Thursday in Jira Service Desk

How the Telegram Integration for Jira helps Sergey's team take their support efficiency to the bank

...+ reading Fantasy). The same is true for him at the bank he works for: Efficiency is key when time literally equals money. Read on to learn how Sergey makes most of the time he has by...

202 views 0 2
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you