It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage
Highlighted

How to list all repositories of a team through Bitbucket REST API 2.0

Hello everyone,

I am trying to get a list of all repositories of a team, using Bitbucket REST API 2.0,

https://api.bitbucket.org/2.0/repositories/TEAM_NAME?page=PAGE_NUMBER

And then, I do a list of all repositories present in all PAGE_NUMBERs, who have each 10 repositories.

But after doing that I notice that there was only a half of the all volume, of a total of almost 200 repositories I had only 100 in all pages, 10 repositories by each page.

My question is how can I get all those repositories, and not just a half?

1 comment

Hello @apollopt,

What value has the size element in the JSON you get for a page? There's no limit for the number of pages you can get from this endpoint. Can it be an issue with authentication, like maybe you're looking at the public repositories only?

You can increase the page size using pagelen query parameter – you can set it to 300 to get all of your repositories in one page.

Hope this helps.

Cheers,
Daniil

Hello @Daniil Penkin ,

Thanks for the reply.

At size parameter I got 203, and in pagelen 10.

I have double check if it was only public or private, all repositories are private and in that team, I have verified through bitbucket webservice.

If I tried to get more then 10 in pagelen parameter, for example page 11, I receive this message:

{"pagelen": 10, "previous": "https://api.bitbucket.org/2.0/repositories/gfetecteam?page=10", "values": [], "page": 11, "size": 1}

I really dont know ...

Also how do I set that query to put all the info in one page?

What you get back is a list of 1 repository (note size is 1 in the response), but you requested page 11 which doesn't make sense (Bitbucket should probably return 400 instead, we can improve here).

The reason for this is that you're making unauthenticated call, and the API returns only public repositories – there's exactly one in that team, which I and anyone else including anonymous users can see (you can try it out by navigating to your team page in an incognito window of the browser).

To get all repositories, including private, you need to authenticate with api.bitbucket .org using any of the following: your regular credentials, username and app password, or OAuth token.

The query to fetch repositories and increase page size will look like this:

https://api.bitbucket.org/2.0/repositories/<team>?pagelen=300

Let me know if you have any questions.

Cheers,
Daniil

I am using my username and password to get those requests. This is the entrie that I am using in the python script to get page by page of all available(from 1 to 10), with an authentication:

print (requests.get('https://%s:%s@api.bitbucket.org/2.0/repositories/%s?page=%d' %(username,password,team,i))).json())

Through pagelen at 300 as you said:

print (requests.get('https://%s:%s@api.bitbucket.org/2.0/repositories/%s?pagelen=300' %(username,password,team)).json())

I got this message:

{'type': 'error', 'error': {'message': 'Invalid pagelen'}}

This is the entrie that I am using in the python script to get page by page of all available(from 1 to 10), with an authentication

This should work. In your previous message you had "size": 1 which is the total number of items on all pages, and it didn't match your expectation of around 200 repositories.

page by page of all available(from 1 to 10)

So does your script check the next parameter in the response to keep the fetching loop running? What I mean is where that number 10 comes from? Do you get anything back if you request page 11 (with default page size)?

Through pagelen at 300 as you said I got this message

I'm sorry about this one, I was under impression the upper limit for page size is 1000 but it's 100. So you can request with pagelen=100, higher values will result in the error you got.

Update: here's a guide for pagination in Bitbucket btw. Note that some resources use cursor pagination, i.e. there's no page number as such. These endpoints have this note in their documentation.

Cheers,
Daniil

 

So does your script check the next parameter in the response to keep the fetching loop running? What I mean is where that number 10 comes from? Do you get anything back if you request page 11 (with default page size)?

My script goes from page to page, till there is no more slugs present and stores all slugs in one list, and it stops when there is no more slugs present, where in this case stops in page 11. I have also tried in python console to retrieve manually values above 10, but no success, cannot get more pages through the API, it gives me:

{"pagelen": 10, "previous": "https://api.bitbucket.org/2.0/repositories/<TEAM>?page=10", "values": [], "page": 11, "size": 1}

From 1 to 10, I mean the page number, without setting any size and pagelen. I use the default output when just querying page.

I'm sorry about this one, I was under impression the upper limit for page size is 1000 but it's 100. So you can request with pagelen=100, higher values will result in the error you got.

No problem, I notice, I am following documentation.

I have also tried in python console to retrieve manually values above 10, but no success, cannot get more pages through the API, it gives me:

{"pagelen": 10, "previous": "https://api.bitbucket.org/2.0/repositories/<TEAM>?page=10", "values": [], "page": 11, "size": 1}

Well, in the response you got size is equal to 1 which suggests that you only fetched public repositories (and it explains why 11th page is empty), which in turn suggests that the call was unauthenticated.

without setting any size

Just a note: size is what Bitbucket returns in the response payload, it means the total number of items on all pages. So there's no point to pass it in the request as it will be ignored.

All in all, I ran out of ideas. Do you mind sharing your python script so that I can take a look why it isn't working as expected?

Cheers,
Daniil

Hello Daniil, sorry for the delayed answer.

Here it goes one part: 

##Login
username = 'xxx'
password = 'xxx'
team = 'xxx'

## Get a list of all repos from bitbucket team

#Get number of pages
r_pages = requests.get('https://%s:%s@api.bitbucket.org/2.0/teams/%s/repositories' %(username,password,team))

json_r_pages = r_pages.json()
print (json_r_pages['pagelen'])
r_pagelen = json_r_pages['pagelen']
full_repo_list = [] # string array inicialized

#Get all repositories
for i in range(1,r_pagelen+1):
print ('page: %d\n' %i)
# get json of each page till there is no more new pages
try:
r_pages=requests.get('https://%s:%s@api.bitbucket.org/2.0/repositories/%s?page=%d' %(username,password,team,i))
json_r_pages = r_pages.json()
except:
break

# search in each page for all slugs(aka repository names) till no more slugs could be found
j=0 # slug control for each page
while (1):
try:
print ("%s" %json_r_pages['values'][j]['slug'])
full_repo_list.append(json_r_pages['values'][j]['slug'])
j=j+1
except:
break   

 

Hi @apollopt,

Thanks for posting the script, now things got clear.

The reason it only fetches 10 pages is the very first request where you're getting the number of pages (line 15). You read pagelen attribute, however it isn't the number of pages but rather the maximum number of items on each page (page length).

The overall number of pages isn't returned anywhere in the payload, you can calculate it using pagelen and size (total number of items on all pages). But in fact you don't even need to know the number of pages. Here're the improvements I'd make to your script:

  1. As I said, you don't need to get the number of pages upfront, you can use the navigation values returned in the response.
  2. You can reduce the number of fields in the response JSON which will make the requests a bit faster, and the script memory footprint smaller (it won't need to parse all the things you'll never use).
  3. You can increase the number of items on each page to reduce the number of requests and avoid potential API rate limiting. Maximum number is 100, as we discussed earlier.

Also, just an observation: you're using different endpoints for initial request and for fetching pages. Not a big deal since, as I mentioned, you don't need the first call at all.

So here's my version:

import requests

##Login
username = 'xxx'
password = 'xxx'
team = 'xxx'

full_repo_list = []

# Request 100 repositories per page (and only their slugs), and the next page URL
next_page_url = 'https://api.bitbucket.org/2.0/repositories/%s?pagelen=100&fields=next,values.slug' % team

# Keep fetching pages while there's a page to fetch
while next_page_url is not None:
response = requests.get(next_page_url, auth=(username, password))
page_json = response.json()

# Parse repositories from the JSON
for repo in page_json['values']:
full_repo_list.append(repo['slug'])

# Get the next page URL, if present
# It will include same query parameters, so no need to append them again
next_page_url = page_json.get('next', None)

# Result length will be equal to `size` returned on any page
print ("Result:", len(full_repo_list))

Let me know if this helped.

Cheers,
Daniil

Like apollopt likes this

It worked like a charm! =D

I didnt knew well how to setup the http get request in order to get a json sanitized response.

I have tried so many times in python console and in that script that pagelen have been left forgotten there. eheheh

Also your version is more straighforward then mine. I am not a profissional programmer, I am a system administrator, programming to keep code more resoursesless is not my best skill lool.

Thanks a lot for the help!

Like Daniil Penkin likes this

No worries, I'm happy to help :)

Comment

Log in or Sign up to comment
TAGS
Community showcase
Published in Bitbucket

Powering DevOps with Bitbucket Server & Data Center

Hi everyone, The Cloud team recently announced 12 new DevOps features that help developers ship better code, faster   ! While we’re all excited about the new improvements to Bitbucket ...

1,542 views 0 6
Read article

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you