Exact matches with Lucene: Problems with not-analyzed Field Descriptors

Robert Reiner _smartics_
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 11, 2014

I try to add fields to the Lucene index via an implementation of the com.atlassian.confluence.plugins.index.api.Extractor2 interface (https://developer.atlassian.com/display/CONFDEV/How+to+fix+broken+Extractors).

As far as I understand that interface, I simply have to add FieldDescriptors according to my needs and all will end up in the index. But it seems that it is not that simple and that I have misunderstood some basic concepts.

Here is what works: Analyzed Fields. I add them like this:

final FieldDescriptor analyzed =
    new FieldDescriptor(name, value,
        FieldDescriptor.Store.NO, Index.ANALYZED);
fieldDescriptors.add(analyzed);

For the examples assume that I store the name "Name" with the value "Men in Black".

After reindexing ("Content Indexing") via the UI I can launch a search like this:

Name:(Men in Black)

And get results based on "Men" and "Black" since "in" is a stopword (as far as I understand).

Now I tried this to run an exact match on "Men in Black":

final FieldDescriptor notAnalyzed =
    new FieldDescriptor(name + "Exact", value,
        FieldDescriptor.Store.NO, Index.NOT_ANALYZED);
fieldDescriptors.add(notAnalyzed);

and expected to find a match with

NameExact:(Men in Black) or

NameExact:"Men in Black" (preferred).

But nothing.

What do I have to do to get exact matches with Lucene work? I assume using Index.NOT_ANALYZED is not enough?

BTW: I would expect that adding multiple field descritors with the same name should work. So I could remove the "Exact" suffix and it should also work (if it would work ;-) )?

1 answer

1 accepted

Comments for this post are closed

Community moderators have prevented the ability to post new answers.

Post a new question

0 votes
Answer accepted
tier-0 grump
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 14, 2014

It's not the analyzer here, it's the stemmer. I'm not a Confluence developer, so I have my JIRA hat on here, but without changing the underlying tokenizer it's not possible. I'd say you should vote for https://jira.atlassian.com/browse/CONF-14910, or at least watch it to see if we do fix it.

TAGS
AUG Leaders

Atlassian Community Events