I try to add fields to the Lucene index via an implementation of the com.atlassian.confluence.plugins.index.api.Extractor2 interface (https://developer.atlassian.com/display/CONFDEV/How+to+fix+broken+Extractors).
As far as I understand that interface, I simply have to add FieldDescriptors according to my needs and all will end up in the index. But it seems that it is not that simple and that I have misunderstood some basic concepts.
Here is what works: Analyzed Fields. I add them like this:
final FieldDescriptor analyzed = new FieldDescriptor(name, value, FieldDescriptor.Store.NO, Index.ANALYZED); fieldDescriptors.add(analyzed);
For the examples assume that I store the name "Name" with the value "Men in Black".
After reindexing ("Content Indexing") via the UI I can launch a search like this:
Name:(Men in Black)
And get results based on "Men" and "Black" since "in" is a stopword (as far as I understand).
Now I tried this to run an exact match on "Men in Black":
final FieldDescriptor notAnalyzed = new FieldDescriptor(name + "Exact", value, FieldDescriptor.Store.NO, Index.NOT_ANALYZED); fieldDescriptors.add(notAnalyzed);
and expected to find a match with
NameExact:(Men in Black) or
NameExact:"Men in Black" (preferred).
But nothing.
What do I have to do to get exact matches with Lucene work? I assume using Index.NOT_ANALYZED is not enough?
BTW: I would expect that adding multiple field descritors with the same name should work. So I could remove the "Exact" suffix and it should also work (if it would work ;-) )?
Community moderators have prevented the ability to post new answers.
It's not the analyzer here, it's the stemmer. I'm not a Confluence developer, so I have my JIRA hat on here, but without changing the underlying tokenizer it's not possible. I'd say you should vote for https://jira.atlassian.com/browse/CONF-14910, or at least watch it to see if we do fix it.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.