Bug searching by repository name with a dot in the name

Ariel Ferrandini Price July 6, 2018

When a repository has a dot in the name, like `this-is-my.new-repository`, the search doesn't work when you search for the word after the dot.

  • [NO OK] Search: new
  • [OK] Search: this
  • [OK] Search: is
  • [OK] Search: my
  • [OK] Search: this-is-my
  • [OK] Search: my.new
  • [OK] Search: this-is-my.new-repository

Looking for why this happens, it is because of the indexation of the name into ElasticSearch. Bitbucket is using a field that is not tokenized, so the search will work only for full match of the field or match of the words separated by hyphen or underscore, but doesn't for dots. In Elasticsearch, for fields of type string, the word before the dot, the dot and the word after the dot is tokenized as a full word. So you for the repository name used before, Elasitcsearch will tokenize into this words:

  • this
  • is
  • my.new
  • repository

 

Here you can see the query executed by Bitbucket to find projects and repositories by name.

Query executed to find the repository name in Elasticsearch

{
"from": 0,
"size": 25,
"query":
{
"bool":
{
"should": [
{
"function_score":
{
"query":
{
"bool":
{
"filter": [
{
"term":
{
"quickSearchProjectName":
{
"value": "new",
"boost": 1.0
}
}
}],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1.0
}
},
"functions": [
{
"filter":
{
"match_all":
{
"boost": 1.0
}
},
"weight": 1.5,
"field_value_factor":
{
"field": "quickSearchProjectName.length",
"factor": 1.0,
"modifier": "reciprocal"
}
}],
"score_mode": "multiply",
"boost_mode": "replace",
"max_boost": 3.4028235E38,
"boost": 1.0
}
},
{
"function_score":
{
"query":
{
"bool":
{
"filter": [
{
"term":
{
"quickSearchRepositoryName":
{
"value": "new",
"boost": 1.0
}
}
}],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1.0
}
},
"functions": [
{
"filter":
{
"match_all":
{
"boost": 1.0
}
},
"weight": 5.0,
"field_value_factor":
{
"field": "quickSearchRepositoryName.length",
"factor": 1.0,
"modifier": "reciprocal"
}
}],
"score_mode": "multiply",
"boost_mode": "replace",
"max_boost": 3.4028235E38,
"boost": 1.0
}
}],
"disable_coord": false,
"adjust_pure_negative": true,
"boost": 1.0
}
},
"sort": [
{
"_score":
{
"order": "desc"
}
},
{
"quickSearchProjectName.raw":
{
"order": "asc"
}
},
{
"quickSearchRepositoryName.raw":
{
"order": "asc"
}
}]
}

 

2 comments

Caterina Curti
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 6, 2018

Hi @Ariel Ferrandini Price,

Thanks for your detailed report (awesome job there).

 

With the Bitbucket Server release 5.7, we have introduced some functionalities to support "Punctuation aware code search". However, this was mainly focusing around the search in the code and not the project or repository name. Here is the link to the request for more details:

BSERV-8782 - Punctuation aware code search

 

Could you share which version of Bitbucket Server you are using so that I can do some testing and confirm if we already have a request around this or raise one for you?

 

Cheers,

Caterina - Atlassian

Ariel Ferrandini Price July 8, 2018

Hi @Caterina Curti

We are using v5.8.1 in production, but this bug also happens in v5.11.1, the version I used to reproduce in and extract the logs and the queries.

 

Have a nice week :),

Ariel - Vocento

Ariel Ferrandini Price July 31, 2018

Hi, any news about this?

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events