Problem with searching for combination of text and numbers in Confluence 3.5 - poor results

We are running Confluence version 3.5 (hosted locally/download) and are experiencing some problems with searching for a combination of letters and numbers.

For example: when searching for 'd560', all pages that contain items (terms) starting with d followed by a number are listed in the search results (e.g., d577, d701, d60). Searching for 'd AND 560': same results. Solely using 'd': same results. Solely using numbers gives no results at all.

Since we use confluence for factory manuals, these queries (they are numbers of specific machines) are quite important to us. Would anyone know what we can do / how this can be fixed?

Thanks, Joanne

5 answers

1 accepted

3 votes

I have spoken to Joanne abut this and I believe the problem here is that their instance has the "index language" set to "other". Unfortunately this causes Confluence to use a very basic tokeniser that will produce (even) worse results. Selecting English has the downside of introducing English stop-words (i.e words that are removed at index time, and stripped from the query). The stop words used for English is

"a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "s", "such", "t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"

I really hope that we can visit our search/indexing code and fix many of the problems that I know is causing pains.

Just a quick update: we are going to change the indexing language, and I will post the results here once I know more!

We have changed the indexing language, and I've now been able to perform quite a few queries and everything it working as it should! Meaning that searching for a combination of text and numbers now gives the correct results.

You can try surrounding d560 by double quotes "d560". It may also help to reduce the search to the field you wish to search in. Examples:

  • title:"d560"
  • contentBody:"d560"
  • labelText:"d560"

You can find more information in the documentation:

http://confluence.atlassian.com/display/CONF35/Confluence+Search+Fields

http://confluence.atlassian.com/display/CONF35/Confluence+Search+Syntax

Thanks for your suggestions Niels. Unfortuntely, using double quotes gives the exact same results (i.e: all pages with d-number items are shown in the search results).

Adding labels with a d-number to the pages (and using labelText:"d560" with searches) does the trick, but I would prefer if people could just type d560 in the search box as the users of the wiki need very simple / straightforward (search) instructions.

(additionaly: narrowing down the results with 'contentbody' or 'title' still shows all titles with all the different d-numbers in the search results).

Then you could add another search box to your theme (or directly in the Confluence administration) that prefixes the entered search term with labelText: on submit. Then this search box could be used to search for those numbers. The drawback would be that you would have to label all your documents accordingly...

Not sure if Confluence uses Lucene, in the older version of Lucene, the default analyser ignores numbers and that why we cannot query numbers in the index. The newer version has an alphanumeric analyser, if we set it as default it'll work with numbers. Cheers.

Yes, Confluence 3.5 uses lucene in version 2.9.3. I suspect the problem in the search field configuration settings (all "interesting" fields get tokenized...)

Thanks for your reply. Does that mean that it would help to upgrade to a more recent version of Confluence (4.0/4.1)? Or is there something we can change in the configuration?

I doubt an update would change this behavior. The lucene configuration will most probably not change since it is appropriate for the most cases. But searching for these letter/number combinations of yours seems not to be a case Atlassian has thought about ;-)

Another option is to write a plugin that stores these alphanumeric identifiers as metadata and puts them into the search index (extractor plugin module: https://developer.atlassian.com/display/CONFDEV/Extractor+Module) There you would have the full control about the search field configuration.

Perhaps the Metadata plugin (https://plugins.atlassian.com/plugin/details/5295?versionId=43798) is a quicker solution, but I am not quite sure about its abilities regarding the search index.

In both cases (own plugin or Metadata plugin) you would need additional markup in your search query as stated in my first answer. If you want to avoid that you would have to add another search box that is enriched via JavaScript or another plugin module.

Perhaps you could also do the query modification also as part of you webbrowser. For example in Firefox you can add a smart keyword (http://support.mozilla.org/en-US/kb/Smart%20keywords) that could build an enriched search URL. But such a solution depends on your IT's infrastructure, restrictions and the number of users you want to provide this feature...

Thanks again Niels. Too bad I lack the skills to implement these changes. So I will have a look in my network for this. And in the meantime I am hoping that there is (someone with) a more simple solution out there :-)

I just had a test in your latest version of confluence. Text searching works fine but no result when I search for numbers or text number combines. Therefore an upgrade won't solve this issue. I think maybe you can try to adjust lucene configuration, it won't hurt the functionanlity, just add numbers when indexing. I had the same issue when I implement lucene on my site: everything worked well except numbers and text number combines. I solved the problem using the solution I mentioned above. Lucene is such a powerful creature, it should really do better than this. Hope this helps, cheers.

Thanks Wallern, I will keep this solution in mind (see above :-))

What do you mean by " just add numbers when indexing"? I am only aware of the default behavior that adds the whole content body (whether or not containing numbers) to the index document with tokenization enabled:

document.add(new Field(FieldName.CONTENT_BODY, contentBody.toString(), store, Field.Index.TOKENIZED));

Is there any special extractor that cares for numbers? I think the tokenization somehow splits the letters and numbers apart...

Suggest an answer

Log in or Sign up to answer
How to earn badges on the Atlassian Community

How to earn badges on the Atlassian Community

Badges are a great way to show off community activity, whether you’re a newbie or a Champion.

Learn more
Community showcase
Posted Jul 10, 2018 in Confluence

We want to see the templates you've created in Confluence!

Hi Community, Jessica here from the Confluence Product Marketing team!  July’s community challenge is all about sharing pictures  — and as an extension of our first post on what ...

698 views 21 12
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you