We are running Confluence version 3.5 (hosted locally/download) and are experiencing some problems with searching for a combination of letters and numbers.
For example: when searching for 'd560', all pages that contain items (terms) starting with d followed by a number are listed in the search results (e.g., d577, d701, d60). Searching for 'd AND 560': same results. Solely using 'd': same results. Solely using numbers gives no results at all.
Since we use confluence for factory manuals, these queries (they are numbers of specific machines) are quite important to us. Would anyone know what we can do / how this can be fixed?
Thanks, Joanne
I have spoken to Joanne abut this and I believe the problem here is that their instance has the "index language" set to "other". Unfortunately this causes Confluence to use a very basic tokeniser that will produce (even) worse results. Selecting English has the downside of introducing English stop-words (i.e words that are removed at index time, and stripped from the query). The stop words used for English is
"a", "and", "are", "as", "at", "be", "but", "by", "for", "if", "in", "into", "is", "it", "no", "not", "of", "on", "or", "s", "such", "t", "that", "the", "their", "then", "there", "these", "they", "this", "to", "was", "will", "with"
I really hope that we can visit our search/indexing code and fix many of the problems that I know is causing pains.
Just a quick update: we are going to change the indexing language, and I will post the results here once I know more!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
We have changed the indexing language, and I've now been able to perform quite a few queries and everything it working as it should! Meaning that searching for a combination of text and numbers now gives the correct results.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Joanne Gerrits -- What did you change the index language to as I'm also having a similar problem in JIRA and hoping your solution might work for us too. I tried updating it to *Other* and rerunning the index, but that didn't work for us. Then I found the below which has me worried that I might be out of luck, but hoping you can share your insight for us all!
https://jira.atlassian.com/browse/JRASERVER-31882
Thanks!!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Another option is to write a plugin that stores these alphanumeric identifiers as metadata and puts them into the search index (extractor plugin module: https://developer.atlassian.com/display/CONFDEV/Extractor+Module) There you would have the full control about the search field configuration.
Perhaps the Metadata plugin (https://plugins.atlassian.com/plugin/details/5295?versionId=43798) is a quicker solution, but I am not quite sure about its abilities regarding the search index.
In both cases (own plugin or Metadata plugin) you would need additional markup in your search query as stated in my first answer. If you want to avoid that you would have to add another search box that is enriched via JavaScript or another plugin module.
Perhaps you could also do the query modification also as part of you webbrowser. For example in Firefox you can add a smart keyword (http://support.mozilla.org/en-US/kb/Smart%20keywords) that could build an enriched search URL. But such a solution depends on your IT's infrastructure, restrictions and the number of users you want to provide this feature...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks again Niels. Too bad I lack the skills to implement these changes. So I will have a look in my network for this. And in the meantime I am hoping that there is (someone with) a more simple solution out there :-)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Not sure if Confluence uses Lucene, in the older version of Lucene, the default analyser ignores numbers and that why we cannot query numbers in the index. The newer version has an alphanumeric analyser, if we set it as default it'll work with numbers. Cheers.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Yes, Confluence 3.5 uses lucene in version 2.9.3. I suspect the problem in the search field configuration settings (all "interesting" fields get tokenized...)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for your reply. Does that mean that it would help to upgrade to a more recent version of Confluence (4.0/4.1)? Or is there something we can change in the configuration?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I doubt an update would change this behavior. The lucene configuration will most probably not change since it is appropriate for the most cases. But searching for these letter/number combinations of yours seems not to be a case Atlassian has thought about ;-)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You can try surrounding d560 by double quotes "d560". It may also help to reduce the search to the field you wish to search in. Examples:
You can find more information in the documentation:
http://confluence.atlassian.com/display/CONF35/Confluence+Search+Fields
http://confluence.atlassian.com/display/CONF35/Confluence+Search+Syntax
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for your suggestions Niels. Unfortuntely, using double quotes gives the exact same results (i.e: all pages with d-number items are shown in the search results).
Adding labels with a d-number to the pages (and using labelText:"d560" with searches) does the trick, but I would prefer if people could just type d560 in the search box as the users of the wiki need very simple / straightforward (search) instructions.
(additionaly: narrowing down the results with 'contentbody' or 'title' still shows all titles with all the different d-numbers in the search results).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Then you could add another search box to your theme (or directly in the Confluence administration) that prefixes the entered search term with labelText: on submit. Then this search box could be used to search for those numbers. The drawback would be that you would have to label all your documents accordingly...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I just had a test in your latest version of confluence. Text searching works fine but no result when I search for numbers or text number combines. Therefore an upgrade won't solve this issue. I think maybe you can try to adjust lucene configuration, it won't hurt the functionanlity, just add numbers when indexing. I had the same issue when I implement lucene on my site: everything worked well except numbers and text number combines. I solved the problem using the solution I mentioned above. Lucene is such a powerful creature, it should really do better than this. Hope this helps, cheers.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks Wallern, I will keep this solution in mind (see above :-))
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
What do you mean by " just add numbers when indexing"? I am only aware of the default behavior that adds the whole content body (whether or not containing numbers) to the index document with tokenization enabled:
document.add(new Field(FieldName.CONTENT_BODY, contentBody.toString(), store, Field.Index.TOKENIZED));
Is there any special extractor that cares for numbers? I think the tokenization somehow splits the letters and numbers apart...
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.