When a repository has a dot in the name, like `this-is-my.new-repository`, the search doesn't work when you search for the word after the dot.
Looking for why this happens, it is because of the indexation of the name into ElasticSearch. Bitbucket is using a field that is not tokenized, so the search will work only for full match of the field or match of the words separated by hyphen or underscore, but doesn't for dots. In Elasticsearch, for fields of type string, the word before the dot, the dot and the word after the dot is tokenized as a full word. So you for the repository name used before, Elasitcsearch will tokenize into this words:
Here you can see the query executed by Bitbucket to find projects and repositories by name.
Query executed to find the repository name in Elasticsearch
{
"from": 0,
"size": 25,
"query":
{
"bool":
{
"should": [
{
"function_score":
{
"query":
{
"bool":
{
"filter": [
{
"term":
{
"quickSearchProjectName":
{
"value": "new",
"boost": 1.0
}
}
}],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1.0
}
},
"functions": [
{
"filter":
{
"match_all":
{
"boost": 1.0
}
},
"weight": 1.5,
"field_value_factor":
{
"field": "quickSearchProjectName.length",
"factor": 1.0,
"modifier": "reciprocal"
}
}],
"score_mode": "multiply",
"boost_mode": "replace",
"max_boost": 3.4028235E38,
"boost": 1.0
}
},
{
"function_score":
{
"query":
{
"bool":
{
"filter": [
{
"term":
{
"quickSearchRepositoryName":
{
"value": "new",
"boost": 1.0
}
}
}],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1.0
}
},
"functions": [
{
"filter":
{
"match_all":
{
"boost": 1.0
}
},
"weight": 5.0,
"field_value_factor":
{
"field": "quickSearchRepositoryName.length",
"factor": 1.0,
"modifier": "reciprocal"
}
}],
"score_mode": "multiply",
"boost_mode": "replace",
"max_boost": 3.4028235E38,
"boost": 1.0
}
}],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1.0
}
},
"sort": [
{
"_score":
{
"order": "desc"
}
},
{
"quickSearchProjectName.raw":
{
"order": "asc"
}
},
{
"quickSearchRepositoryName.raw":
{
"order": "asc"
}
}]
}
We are using v5.8.1 in production, but this bug also happens in v5.11.1, the version I used to reproduce in and extract the logs and the queries.
Have a nice week :),
Ariel - Vocento
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks for your detailed report (awesome job there).
With the Bitbucket Server release 5.7, we have introduced some functionalities to support "Punctuation aware code search". However, this was mainly focusing around the search in the code and not the project or repository name. Here is the link to the request for more details:
BSERV-8782 - Punctuation aware code search
Could you share which version of Bitbucket Server you are using so that I can do some testing and confirm if we already have a request around this or raise one for you?
Cheers,
Caterina - Atlassian
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.