Wrong search results with Chinese labels

Felix Grund (Scandio)
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
July 26, 2015

Hi everybody,

One of our customers is experiencing a weird search issue in their Chinese department. I broke the issue down to the most simple case: one page has the label 其他产品 (other products) and one has the label 产品信息 (product information). When I do a label search like labelText:其他产品 both pages are found:

screen-pocketsearch.png

Does anyone have a clue why this happens?

Regards, Felix [Scandio]

2 answers

1 accepted

1 vote
Answer accepted
Stephen Deutsch
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 4, 2015

Hi Felix,

Taking a look at the documentation for the tokenizer for Lucene that deals with CJK characters, it seems like it splits up the characters into two-character bundles:

https://lucene.apache.org/core/3_5_0/api/all/org/apache/lucene/analysis/cjk/CJKTokenizer.html

That would explain why it matches the two characters for "product" in both strings.

0 votes
Felix Grund (Scandio)
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 17, 2015

Hi Stephen,

Yes, we already solved this issue with Atlassian support. It actually works if you put the strings in double quotes. I'll accept your answer wink.

Regards, Felix

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events