Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Next challenges

Recent achievements

  • Global
  • Personal

Recognition

  • Give kudos
  • Received
  • Given

Leaderboard

  • Global

Trophy case

Kudos (beta program)

Kudos logo

You've been invited into the Kudos (beta program) private group. Chat with others in the program, or give feedback to Atlassian.

View group

It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

The same full-text search engine for different products

Hi, awesome community!

 

In this article, you can read info about the tool which helps me with search index investigation. 

At the moment, I do an investigation with the Russian language stemming and reuse the morphology analysis by reusing existing libraries. 

So, as I understand, today we will speak about Apache Lucene index and small awesome utility Apache Luke.

Because this search engine library is used for full-text search in the Apache Lucene, Solr, Elasticsearch as well. 

It means Jira, Confluence, Bamboo on-premises solution used Lucene, Bitbucket used the Elasticsearch. About Cloud, I imagine the Atlassian team used Elasticsearch as it scales easier even Apache Lucene local index. e.g. for Lucene, you need to use for the replication (lucene-replicator - https://lucene.apache.org/core/7_4_0/replicator/org/apache/lucene/replicator/Replicator.html)

or just use Elasticsearch.

 

Let’s use Apache Luke for the Confluence search indexes:

image.png

As example you will see like this logs:

[2020-03-08T18:04:08,397]  WARN (IndexUtils.java:86) - Format version is not supported (resource BufferedChecksumIndexInput(SimpleFSIndexInput(path="/Users/gonchik.tsymzhitov/temp/lucene/META-INF/112/edge/segments_1"))): 0 (needs to be between 7 and 9). This version of Lucene only supports indexes created with release 6.0 and later.

org.apache.lucene.index.IndexFormatTooOldException: Format version is not supported (resource BufferedChecksumIndexInput(SimpleFSIndexInput(path="/Users/gonchik.tsymzhitov/temp/lucene/META-INF/112/edge/segments_1"))): 0 (needs to be between 7 and 9). This version of Lucene only supports indexes created with release 6.0 and later.
  • Therefore I recommend you use luke-4.10.1 and run the luke.sh or luke.bat. And don’t forget tick the option like “Don’t open IndexReader (when opening corrupted index)” and “Force unlock, if locked”.

image.png

  • After that you will see stats of your index. e.g.image.png
  • As next one you can see the see the contentBody field used for the Confluence content, and next table you will see the top of terms. 

image.png

  • Than on tab doc, if you double click on top, you will see in which documents you can find the top field “сво”.

image.png

image.png

If you click explain structure, you can find the cause of that rules https://confluence.atlassian.com/doc/confluence-search-syntax-158720.html

image.pngimage.png


That’s all for today. 

 

Conclusion

  1. Happy to see the Apache Luke tool in the Apache Lucene binary builds as built-in tool.
  2. That tool helps me understand how does our search and stats work for Atlassian products, and for many other Lucene-based full-text search projects.
  3. Hope, once Atlassian team, will upgrade the Lucene libraries for the Confluence https://jira.atlassian.com/browse/CONFSERVER-57452  Feel free to click vote if you want to see improvements in Confluence tokenising, stemming, search, ranking functionality.
  4. Once, we can use the morphological analysis as additional functionality option of products. e.g. Russian morphology (https://github.com/AKuznetsov/russianmorphology )



Cheers,

Gonchik Tsymzhitov

0 comments

Comment

Log in or Sign up to comment
TAGS
Community showcase
Published in Confluence

Announcing Team Calendars in Confluence Data Center

Hi Community! We're thrilled to share that Team Calendars for Confluence is now a built-in feature for Confluence Data Center releases 7.11 and beyond.  A long time favorite,  Team Cale...

59 views 0 3
Read article

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you