Code Search results missing hits

I've built a small app around the Code Search API using filters to search particular extensions for occurrence of a certain word.

However there are results that should have been returned from the search which are omitted, yet when I view the raw page I can see that it should have been a match. There are other "hits" within the page, but as I mentioned not all are returned.

I'm having the same experience with the web based code search as well.

As the repo is private I can't provide a public example. But happy to do so for any @Staff

1 answer

1 accepted

0 votes

Answer accepted

Hi @Serdar Kilic

Thank you for reaching out to us.

With the CodeSearch tool you need to keep in mind a few things:

Search uses the main branch in a repo (usually the main branch will be master).
We index files smaller than 320 KB – you won't see search results from larger files.
Wildcard searches (e.g. qu?ck buil*) are not supported.
We strip the following characters from search terms: !"#$%&'()*+,/;:<=>?@[\]^`{|}~-
Regular expressions are not supported in queries.
Case is not preserved (but search operators must be in ALL CAPS).
Queries can have up to 9 expressions (i.e. combinations of terms and operators).
Queries can be up to 250 characters in length.
We make sure that you only see the code you have permission to view in search results.

The fact that you mentioned that results returned over API match those returned in the UI point that the behavior is consistent.

Based on the criteria I posted above, if the "hits" you see missing in the returned results should be returned and meet all the requirements, please reach out to us via a support ticket and provide us with the repository URL and the search string example and we will be happy to take a look.

https://support.atlassian.com/contact/#/

We also have a few issues already reported with CodeSearch, you can view those here:

https://jira.atlassian.com/browse/BCLOUD-19842?jql=project%20%3D%20BCLOUD%20AND%20text%20~%20%22code%20search%22

Thank you.

Yana

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Hello @Serdar Kilic,

In addition to what Yana referred to above I can add the following: you might've observed another limitation of our search engine.

For each file we return up to 3 chunks, each of them can be up to certain size (simply speaking, number of consequent symbols). So if your file has, say, 4 hits which are too far away from each other, Code Search will only return 3 of them.

This limitation is by design, and the main reason for it is performance. Making search window larger has implication on the response time, and showing more than 3 chunks per file also didn't look great from the UI design perspective.

Hope this helps. Let me know if you have any questions.

Cheers,
Daniil

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Serdar Kilic likes this

Thank you @Yana and @Daniil Penkin for your in-depth responses.

The search criteria that Yana mentions seems not to be an issue, as the search results that I'm expecting are in the same file - and I'm meeting all the listed conditions.

Daniil, what you mention seems to be my experience! Is it possible to have the API result return all hits and have the UI only display the first three?

In the meantime I'll probably re-architect to pull down the file on the first hit and parse locally.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Is it possible to have the API result return all hits and have the UI only display the first three?

Not at the moment, but we have a task in our backlog to consider rethinking and maybe changing this behaviour in some way.

Another concern we had was that without such limit the API response might blow up to around 10MB: we also have a limit for size of files that are indexed + overhead of lines and hits highlighting + there can be up to 10 results per page. Obviously, this is rather an edge case but it's not too hard to actually trigger it by searching some language keywords, for instance (and assuming source files are big enough).

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • Serdar Kilic likes this

These limitations are easily upsetting when search fails to work on the most basic level.
The codebase I work in has many files over 10K lines, and searching the repo for the full name of a function consisting of all alpha-characters, no underscores, (expecting one result) returns nothing.

It's kind of a "WTF?!" moment.

These indexing/search constraints make BitBucket searches completely unreliable (for repos of any moderate size, or larger), which forces me to be skeptical of ANY result, which essentially makes the feature unusable.

I feel that sacrificing some of the intelligence of Elastic Search (which oftentimes is overkill) for the ability to reliably find basic results is something to strongly consider fixing about your current search paradigm.

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Comment

Like • like this

@Daniil Penkin if you can possibly address the above comment, and, related to it, this open ticket, im upvoting this as much as I can

Ability to raise the size limitation for code search

You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Code Search results missing hits

1 answer

1 accepted

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events