Missed Team ’24? Catch up on announcements here.

×
Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

I am getting duplicate repo list while working with get repolist API

Deepak Govindram Kumbhar December 2, 2019

Hi,

 

I was working with get repolist API and suddenly the API started giving me a duplicate repo list. Here is the scenario I have 100+ repository in my account and when I hit the API it gives me 100 projects/repository list as I have set pagelen as 100. Now when I am using the next URL from the repository list response it give me duplicate repo list which is already given me in the previous object or you can say on 1st page. 

Can someone please advise? I am using this URL: https://bitbucket.org/!api/2.0/repositories?sort=-updated_on&access_token=<TOKEN>&role=admin&pagelen=100

1 answer

0 votes
Daniil Penkin
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
December 2, 2019

Hello @Deepak Govindram Kumbhar,

Thanks for reaching out.

You're sorting the repositories by updated_on property, and I believe what you observed was caused by some repositories update which moved them to the top of the list you're fetching (to its first page) and hence pushed all other repositories down the list and made them drift to the next pages and re-appear in your results.

In fact, with a naive pagination you can get duplicates or misses no matter which property you sort on. For instance, imagine that you sort by an immutable value like repository UUID. Now, if a new repository was created while you're traversing pages, and it happened to have a UUID assigned so that it appears on one of the pages you've already fetched, it'll "push" all repositories with larger UUID forward, and you'll get a duplicate at the next page you fetch. A miss will happen if a repo from the past page was deleted.

Unfortunately, there's no cursor API available for this endpoint. However, you can still make page traversal somewhat consistent, i.e. avoid duplicates and misses. Instead of pagination use BBQL query like this:

uuid > "{uuid_of_the_last_item_on_the_last_fetched_page}

So as an example, here's how I fetch the first page (I also limited the payload to just UUIDs using fields parameter):

https://api.bitbucket.org/2.0/repositories/atlassian?fields=values.uuid,values.name&sort=uuid

Now, that page has last item with UUID {046f666c-d011-41f5-b70a-7480aa02798e}, so to fetch the next page I add a query like I shown above (it's URL encoded so is not easy to read):

https://api.bitbucket.org/2.0/repositories/atlassian?fields=values.uuid,values.name&sort=uuid&q=uuid%20%3E%20%22%7B046f666c-d011-41f5-b70a-7480aa02798e%7D%22

Hope this helps. Let me know if you have any questions.

Cheers,
Daniil 

Deepak Govindram Kumbhar December 2, 2019

Hi Daniil,

Thanks for your reply, but in this case, I will loose my updated_on sorting. Any suggestions for this?

Daniil Penkin
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
December 2, 2019

Not really. Pagination for this particular endpoint doesn't work like a cursor. When changes happen, you see them immediately which means while traversing you might miss repositories that have just been updated (cause they jumped from a future page to the first page) and get duplicates (because those jumped repos shifted all other repositories.

Any suggestions for this?

Well, this depends on what you're trying to use the data you fetch for. For instance, if the goal is to index all repositories, I'd fetch them in a way I described above and then sort by updated date on the client side. What's your use case?

Deepak Govindram Kumbhar December 2, 2019

Though there are no new commits while fetching the repository list, then only we are getting duplicates in next page.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events