We are running Confluence version 3.5 (hosted locally/download) and are experiencing some problems with searching for a combination of letters and numbers.
For example: when searching for 'd560', all pages that contain items (terms) starting with d followed by a number are listed in the search results (e.g., d577, d701, d60). Searching for 'd AND 560': same results. Solely using 'd': same results. Solely using numbers gives no results at all.
Since we use confluence for factory manuals, these queries (they are numbers of specific machines) are quite important to us. Would anyone know what we can do / how this can be fixed?
I have spoken to Joanne abut this and I believe the problem here is that their instance has the "index language" set to "other". Unfortunately this causes Confluence to use a very basic tokeniser that will produce (even) worse results. Selecting English has the downside of introducing English stop-words (i.e words that are removed at index time, and stripped from the query). The stop words used for English is
"a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "s", "such", "t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
I really hope that we can visit our search/indexing code and fix many of the problems that I know is causing pains.
@Joanne Gerrits -- What did you change the index language to as I'm also having a similar problem in JIRA and hoping your solution might work for us too. I tried updating it to *Other* and rerunning the index, but that didn't work for us. Then I found the below which has me worried that I might be out of luck, but hoping you can share your insight for us all!
You can try surrounding d560 by double quotes "d560". It may also help to reduce the search to the field you wish to search in. Examples:
You can find more information in the documentation:
Thanks for your suggestions Niels. Unfortuntely, using double quotes gives the exact same results (i.e: all pages with d-number items are shown in the search results).
Adding labels with a d-number to the pages (and using labelText:"d560" with searches) does the trick, but I would prefer if people could just type d560 in the search box as the users of the wiki need very simple / straightforward (search) instructions.
(additionaly: narrowing down the results with 'contentbody' or 'title' still shows all titles with all the different d-numbers in the search results).
Then you could add another search box to your theme (or directly in the Confluence administration) that prefixes the entered search term with labelText: on submit. Then this search box could be used to search for those numbers. The drawback would be that you would have to label all your documents accordingly...
Another option is to write a plugin that stores these alphanumeric identifiers as metadata and puts them into the search index (extractor plugin module: https://developer.atlassian.com/display/CONFDEV/Extractor+Module) There you would have the full control about the search field configuration.
Perhaps the Metadata plugin (https://plugins.atlassian.com/plugin/details/5295?versionId=43798) is a quicker solution, but I am not quite sure about its abilities regarding the search index.
Perhaps you could also do the query modification also as part of you webbrowser. For example in Firefox you can add a smart keyword (http://support.mozilla.org/en-US/kb/Smart%20keywords) that could build an enriched search URL. But such a solution depends on your IT's infrastructure, restrictions and the number of users you want to provide this feature...
I just had a test in your latest version of confluence. Text searching works fine but no result when I search for numbers or text number combines. Therefore an upgrade won't solve this issue. I think maybe you can try to adjust lucene configuration, it won't hurt the functionanlity, just add numbers when indexing. I had the same issue when I implement lucene on my site: everything worked well except numbers and text number combines. I solved the problem using the solution I mentioned above. Lucene is such a powerful creature, it should really do better than this. Hope this helps, cheers.
What do you mean by " just add numbers when indexing"? I am only aware of the default behavior that adds the whole content body (whether or not containing numbers) to the index document with tokenization enabled:
document.add(new Field(FieldName.CONTENT_BODY, contentBody.toString(), store, Field.Index.TOKENIZED));
Is there any special extractor that cares for numbers? I think the tokenization somehow splits the letters and numbers apart...
Connect with like-minded Atlassian users at free events near you!Find a group
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no AUG chapters near you at the moment.Start an AUG