Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in
Celebration

Earn badges and make progress

You're on your way to the next level! Join the Kudos program to earn points and save your progress.

Deleted user Avatar
Deleted user

Level 1: Seed

25 / 150 points

Next: Root

Avatar

1 badge earned

Collect

Participate in fun challenges

Challenges come and go, but your rewards stay with you. Do more to earn more!

Challenges
Coins

Gift kudos to your peers

What goes around comes around! Share the love by gifting kudos to your peers.

Recognition
Ribbon

Rise up in the ranks

Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!

Leaderboard

Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,555,443
Community Members
 
Community Events
184
Community Groups

The same full-text search engine for different products

Hi, awesome community!

 

In this article, you can read info about the tool which helps me with search index investigation. 

At the moment, I do an investigation with the Russian language stemming and reuse the morphology analysis by reusing existing libraries. 

So, as I understand, today we will speak about Apache Lucene index and small awesome utility Apache Luke.

Because this search engine library is used for full-text search in the Apache Lucene, Solr, Elasticsearch as well. 

It means Jira, Confluence, Bamboo on-premises solution used Lucene, Bitbucket used the Elasticsearch. About Cloud, I imagine the Atlassian team used Elasticsearch as it scales easier even Apache Lucene local index. e.g. for Lucene, you need to use for the replication (lucene-replicator - https://lucene.apache.org/core/7_4_0/replicator/org/apache/lucene/replicator/Replicator.html)

or just use Elasticsearch.

 

Let’s use Apache Luke for the Confluence search indexes:

image.png

As example you will see like this logs:

[2020-03-08T18:04:08,397]  WARN (IndexUtils.java:86) - Format version is not supported (resource BufferedChecksumIndexInput(SimpleFSIndexInput(path="/Users/gonchik.tsymzhitov/temp/lucene/META-INF/112/edge/segments_1"))): 0 (needs to be between 7 and 9). This version of Lucene only supports indexes created with release 6.0 and later.

org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(SimpleFSIndexInput(path="/Users/gonchik.tsymzhitov/temp/lucene/META-INF/112/edge/segments_1"))): 0 (needs to be between 7 and 9). This version of Lucene only supports indexes created with release 6.0 and later.
  • Therefore I recommend you use luke-4.10.1 and run the luke.sh or luke.bat. And don’t forget tick the option like “Don’t open IndexReader (when opening corrupted index)” and “Force unlock, if locked”.

image.png

  • After that you will see stats of your index. e.g.image.png
  • As next one you can see the see the contentBody field used for the Confluence content, and next table you will see the top of terms. 

image.png

  • Than on tab doc, if you double click on top, you will see in which documents you can find the top field “сво”.

image.png

image.png

If you click explain structure, you can find the cause of that rules https://confluence.atlassian.com/doc/confluence-search-syntax-158720.html

image.pngimage.png


That’s all for today. 

 

Conclusion

  1. Happy to see the Apache Luke tool in the Apache Lucene binary builds as built-in tool.
  2. That tool helps me understand how does our search and stats work for Atlassian products, and for many other Lucene-based full-text search projects.
  3. Hope, once Atlassian team, will upgrade the Lucene libraries for the Confluence https://jira.atlassian.com/browse/CONFSERVER-57452  Feel free to click vote if you want to see improvements in Confluence tokenising, stemming, search, ranking functionality.
  4. Once, we can use the morphological analysis as additional functionality option of products. e.g. Russian morphology (https://github.com/AKuznetsov/russianmorphology )



Cheers,

Gonchik Tsymzhitov

0 comments

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events