Problem with searching for combination of text and numbers in Confluence 3.5 - poor results

Joanne Gerrits January 2, 2012

We are running Confluence version 3.5 (hosted locally/download) and are experiencing some problems with searching for a combination of letters and numbers.

For example: when searching for 'd560', all pages that contain items (terms) starting with d followed by a number are listed in the search results (e.g., d577, d701, d60). Searching for 'd AND 560': same results. Solely using 'd': same results. Solely using numbers gives no results at all.

Since we use confluence for factory manuals, these queries (they are numbers of specific machines) are quite important to us. Would anyone know what we can do / how this can be fixed?

Thanks, Joanne

5 answers

1 accepted

3 votes
Answer accepted
Daniel
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
January 15, 2012

I have spoken to Joanne abut this and I believe the problem here is that their instance has the "index language" set to "other". Unfortunately this causes Confluence to use a very basic tokeniser that will produce (even) worse results. Selecting English has the downside of introducing English stop-words (i.e words that are removed at index time, and stripped from the query). The stop words used for English is

"a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "s", "such", "t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

I really hope that we can visit our search/indexing code and fix many of the problems that I know is causing pains.

Joanne Gerrits January 31, 2012

Just a quick update: we are going to change the indexing language, and I will post the results here once I know more!

Joanne Gerrits February 20, 2012

We have changed the indexing language, and I've now been able to perform quite a few queries and everything it working as it should! Meaning that searching for a combination of text and numbers now gives the correct results.

Carol Jones October 24, 2018

@Joanne Gerrits -- What did you change the index language to as I'm also having a similar problem in JIRA and hoping your solution might work for us too.  I tried updating it to *Other* and rerunning the index, but that didn't work for us.  Then I found the below which has me worried that I might be out of luck, but hoping you can share your insight for us all!

https://jira.atlassian.com/browse/JRASERVER-31882

Thanks!!

1 vote
NielsJ
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 2, 2012

Another option is to write a plugin that stores these alphanumeric identifiers as metadata and puts them into the search index (extractor plugin module: https://developer.atlassian.com/display/CONFDEV/Extractor+Module) There you would have the full control about the search field configuration.

Perhaps the Metadata plugin (https://plugins.atlassian.com/plugin/details/5295?versionId=43798) is a quicker solution, but I am not quite sure about its abilities regarding the search index.

In both cases (own plugin or Metadata plugin) you would need additional markup in your search query as stated in my first answer. If you want to avoid that you would have to add another search box that is enriched via JavaScript or another plugin module.

Perhaps you could also do the query modification also as part of you webbrowser. For example in Firefox you can add a smart keyword (http://support.mozilla.org/en-US/kb/Smart%20keywords) that could build an enriched search URL. But such a solution depends on your IT's infrastructure, restrictions and the number of users you want to provide this feature...

Joanne Gerrits January 3, 2012

Thanks again Niels. Too bad I lack the skills to implement these changes. So I will have a look in my network for this. And in the meantime I am hoping that there is (someone with) a more simple solution out there :-)

1 vote
Xiaoliang Wan January 2, 2012

Not sure if Confluence uses Lucene, in the older version of Lucene, the default analyser ignores numbers and that why we cannot query numbers in the index. The newer version has an alphanumeric analyser, if we set it as default it'll work with numbers. Cheers.

NielsJ
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 2, 2012

Yes, Confluence 3.5 uses lucene in version 2.9.3. I suspect the problem in the search field configuration settings (all "interesting" fields get tokenized...)

Joanne Gerrits January 2, 2012

Thanks for your reply. Does that mean that it would help to upgrade to a more recent version of Confluence (4.0/4.1)? Or is there something we can change in the configuration?

NielsJ
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 2, 2012

I doubt an update would change this behavior. The lucene configuration will most probably not change since it is appropriate for the most cases. But searching for these letter/number combinations of yours seems not to be a case Atlassian has thought about ;-)

1 vote
NielsJ
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 2, 2012

You can try surrounding d560 by double quotes "d560". It may also help to reduce the search to the field you wish to search in. Examples:

  • title:"d560"
  • contentBody:"d560"
  • labelText:"d560"

You can find more information in the documentation:

http://confluence.atlassian.com/display/CONF35/Confluence+Search+Fields

http://confluence.atlassian.com/display/CONF35/Confluence+Search+Syntax

Joanne Gerrits January 2, 2012

Thanks for your suggestions Niels. Unfortuntely, using double quotes gives the exact same results (i.e: all pages with d-number items are shown in the search results).

Adding labels with a d-number to the pages (and using labelText:"d560" with searches) does the trick, but I would prefer if people could just type d560 in the search box as the users of the wiki need very simple / straightforward (search) instructions.

(additionaly: narrowing down the results with 'contentbody' or 'title' still shows all titles with all the different d-numbers in the search results).

NielsJ
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 2, 2012

Then you could add another search box to your theme (or directly in the Confluence administration) that prefixes the entered search term with labelText: on submit. Then this search box could be used to search for those numbers. The drawback would be that you would have to label all your documents accordingly...

0 votes
Xiaoliang Wan January 3, 2012

I just had a test in your latest version of confluence. Text searching works fine but no result when I search for numbers or text number combines. Therefore an upgrade won't solve this issue. I think maybe you can try to adjust lucene configuration, it won't hurt the functionanlity, just add numbers when indexing. I had the same issue when I implement lucene on my site: everything worked well except numbers and text number combines. I solved the problem using the solution I mentioned above. Lucene is such a powerful creature, it should really do better than this. Hope this helps, cheers.

Joanne Gerrits January 3, 2012

Thanks Wallern, I will keep this solution in mind (see above :-))

NielsJ
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 3, 2012

What do you mean by " just add numbers when indexing"? I am only aware of the default behavior that adds the whole content body (whether or not containing numbers) to the index document with tokenization enabled:

document.add(new Field(FieldName.CONTENT_BODY, contentBody.toString(), store, Field.Index.TOKENIZED));

Is there any special extractor that cares for numbers? I think the tokenization somehow splits the letters and numbers apart...

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events