How to list all repositories of a team through Bitbucket REST API 2.0

FLoureiro July 30, 2019

Hello everyone,

I am trying to get a list of all repositories of a team, using Bitbucket REST API 2.0,

https://api.bitbucket.org/2.0/repositories/TEAM_NAME?page=PAGE_NUMBER

And then, I do a list of all repositories present in all PAGE_NUMBERs, who have each 10 repositories.

But after doing that I notice that there was only a half of the all volume, of a total of almost 200 repositories I had only 100 in all pages, 10 repositories by each page.

My question is how can I get all those repositories, and not just a half?

3 comments

Daniil Penkin
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 30, 2019

Hello @apollopt,

What value has the size element in the JSON you get for a page? There's no limit for the number of pages you can get from this endpoint. Can it be an issue with authentication, like maybe you're looking at the public repositories only?

You can increase the page size using pagelen query parameter – you can set it to 300 to get all of your repositories in one page.

Hope this helps.

Cheers,
Daniil

FLoureiro July 30, 2019

Hello @Daniil Penkin ,

Thanks for the reply.

At size parameter I got 203, and in pagelen 10.

I have double check if it was only public or private, all repositories are private and in that team, I have verified through bitbucket webservice.

If I tried to get more then 10 in pagelen parameter, for example page 11, I receive this message:

{"pagelen": 10, "previous": "https://api.bitbucket.org/2.0/repositories/gfetecteam?page=10", "values": [], "page": 11, "size": 1}

I really dont know ...

Also how do I set that query to put all the info in one page?

Daniil Penkin
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 30, 2019

What you get back is a list of 1 repository (note size is 1 in the response), but you requested page 11 which doesn't make sense (Bitbucket should probably return 400 instead, we can improve here).

The reason for this is that you're making unauthenticated call, and the API returns only public repositories – there's exactly one in that team, which I and anyone else including anonymous users can see (you can try it out by navigating to your team page in an incognito window of the browser).

To get all repositories, including private, you need to authenticate with api.bitbucket .org using any of the following: your regular credentials, username and app password, or OAuth token.

The query to fetch repositories and increase page size will look like this:

https://api.bitbucket.org/2.0/repositories/<team>?pagelen=300

Let me know if you have any questions.

Cheers,
Daniil

FLoureiro July 31, 2019

I am using my username and password to get those requests. This is the entrie that I am using in the python script to get page by page of all available(from 1 to 10), with an authentication:

print (requests.get('https://%s:%s@api.bitbucket.org/2.0/repositories/%s?page=%d' %(username,password,team,i))).json())

Through pagelen at 300 as you said:

print (requests.get('https://%s:%s@api.bitbucket.org/2.0/repositories/%s?pagelen=300' %(username,password,team)).json())

I got this message:

{'type': 'error', 'error': {'message': 'Invalid pagelen'}}
Like Dave Cavell likes this
Daniil Penkin
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 31, 2019

This is the entrie that I am using in the python script to get page by page of all available(from 1 to 10), with an authentication

This should work. In your previous message you had "size": 1 which is the total number of items on all pages, and it didn't match your expectation of around 200 repositories.

page by page of all available(from 1 to 10)

So does your script check the next parameter in the response to keep the fetching loop running? What I mean is where that number 10 comes from? Do you get anything back if you request page 11 (with default page size)?

Through pagelen at 300 as you said I got this message

I'm sorry about this one, I was under impression the upper limit for page size is 1000 but it's 100. So you can request with pagelen=100, higher values will result in the error you got.

Update: here's a guide for pagination in Bitbucket btw. Note that some resources use cursor pagination, i.e. there's no page number as such. These endpoints have this note in their documentation.

Cheers,
Daniil

FLoureiro July 31, 2019

 

So does your script check the next parameter in the response to keep the fetching loop running? What I mean is where that number 10 comes from? Do you get anything back if you request page 11 (with default page size)?

My script goes from page to page, till there is no more slugs present and stores all slugs in one list, and it stops when there is no more slugs present, where in this case stops in page 11. I have also tried in python console to retrieve manually values above 10, but no success, cannot get more pages through the API, it gives me:

{"pagelen": 10, "previous": "https://api.bitbucket.org/2.0/repositories/<TEAM>?page=10", "values": [], "page": 11, "size": 1}

From 1 to 10, I mean the page number, without setting any size and pagelen. I use the default output when just querying page.

I'm sorry about this one, I was under impression the upper limit for page size is 1000 but it's 100. So you can request with pagelen=100, higher values will result in the error you got.

No problem, I notice, I am following documentation.

Daniil Penkin
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
July 31, 2019

I have also tried in python console to retrieve manually values above 10, but no success, cannot get more pages through the API, it gives me:

{"pagelen": 10, "previous": "https://api.bitbucket.org/2.0/repositories/<TEAM>?page=10", "values": [], "page": 11, "size": 1}

Well, in the response you got size is equal to 1 which suggests that you only fetched public repositories (and it explains why 11th page is empty), which in turn suggests that the call was unauthenticated.

without setting any size

Just a note: size is what Bitbucket returns in the response payload, it means the total number of items on all pages. So there's no point to pass it in the request as it will be ignored.

All in all, I ran out of ideas. Do you mind sharing your python script so that I can take a look why it isn't working as expected?

Cheers,
Daniil

FLoureiro August 20, 2019

Hello Daniil, sorry for the delayed answer.

Here it goes one part: 

##Login
username = 'xxx'
password = 'xxx'
team = 'xxx'

## Get a list of all repos from bitbucket team

#Get number of pages
r_pages = requests.get('https://%s:%s@api.bitbucket.org/2.0/teams/%s/repositories' %(username,password,team))

json_r_pages = r_pages.json()
print (json_r_pages['pagelen'])
r_pagelen = json_r_pages['pagelen']
full_repo_list = [] # string array inicialized

#Get all repositories
for i in range(1,r_pagelen+1):
print ('page: %d\n' %i)
# get json of each page till there is no more new pages
try:
r_pages=requests.get('https://%s:%s@api.bitbucket.org/2.0/repositories/%s?page=%d' %(username,password,team,i))
json_r_pages = r_pages.json()
except:
break

# search in each page for all slugs(aka repository names) till no more slugs could be found
j=0 # slug control for each page
while (1):
try:
print ("%s" %json_r_pages['values'][j]['slug'])
full_repo_list.append(json_r_pages['values'][j]['slug'])
j=j+1
except:
break   

 

Daniil Penkin
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 20, 2019

Hi @apollopt,

Thanks for posting the script, now things got clear.

The reason it only fetches 10 pages is the very first request where you're getting the number of pages (line 15). You read pagelen attribute, however it isn't the number of pages but rather the maximum number of items on each page (page length).

The overall number of pages isn't returned anywhere in the payload, you can calculate it using pagelen and size (total number of items on all pages). But in fact you don't even need to know the number of pages. Here're the improvements I'd make to your script:

  1. As I said, you don't need to get the number of pages upfront, you can use the navigation values returned in the response.
  2. You can reduce the number of fields in the response JSON which will make the requests a bit faster, and the script memory footprint smaller (it won't need to parse all the things you'll never use).
  3. You can increase the number of items on each page to reduce the number of requests and avoid potential API rate limiting. Maximum number is 100, as we discussed earlier.

Also, just an observation: you're using different endpoints for initial request and for fetching pages. Not a big deal since, as I mentioned, you don't need the first call at all.

So here's my version:

import requests

##Login
username = 'xxx'
password = 'xxx'
team = 'xxx'

full_repo_list = []

# Request 100 repositories per page (and only their slugs), and the next page URL
next_page_url = 'https://api.bitbucket.org/2.0/repositories/%s?pagelen=100&fields=next,values.slug' % team

# Keep fetching pages while there's a page to fetch
while next_page_url is not None:
response = requests.get(next_page_url, auth=(username, password))
page_json = response.json()

# Parse repositories from the JSON
for repo in page_json['values']:
full_repo_list.append(repo['slug'])

# Get the next page URL, if present
# It will include same query parameters, so no need to append them again
next_page_url = page_json.get('next', None)

# Result length will be equal to `size` returned on any page
print ("Result:", len(full_repo_list))

Let me know if this helped.

Cheers,
Daniil

Like FLoureiro likes this
FLoureiro August 20, 2019

It worked like a charm! =D

I didnt knew well how to setup the http get request in order to get a json sanitized response.

I have tried so many times in python console and in that script that pagelen have been left forgotten there. eheheh

Also your version is more straighforward then mine. I am not a profissional programmer, I am a system administrator, programming to keep code more resoursesless is not my best skill lool.

Thanks a lot for the help!

Like Daniil Penkin likes this
Daniil Penkin
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 20, 2019

No worries, I'm happy to help :)

Saquib Fraz
Contributor
February 4, 2021

Hi All,

 

Small doubt here, in this new api/2.0, could you please tell me where to specify my company URL. 
Let's say my company URL is abc.com

url should be - 

https://api.bitbucket.org/2.0/repositories/abc.com?pagelen=100&fields=next,values.slug' % team

Is this right. If not please suggest the correct url

Esther Strom
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
February 11, 2022

You use your company's workspace name in place of the user param for the repositories endpoint. So if your company's workspace is https://bitbucket.org/abc, then your call should be 

https://api.bitbucket.org/2.0/repositories/abc?pagelen=100&fields=next,values.slug' % team
Alexey Ostrovski
I'm New Here
I'm New Here
Those new to the Atlassian Community have posted less than three times. Give them a warm welcome!
December 20, 2020

Hello Daniil, 

wanted to write in this thread since it relates to API access and might be of interest to others.

I'm struggling to use OAuth2 with python to get a session so I can retrieve some info from scopes: account, email, repository. 

Here's what I've tried:

from rauth import OAuth2Service
import json
bitbucket = OAuth2Service(
name='test',
client_id='1234',
client_secret='qwerty',
access_token_url='https://bitbucket.org/site/oauth2/access_token',
authorize_url='https://bitbucket.org/site/oauth2/authorize',
base_url='https://api.bitbucket.org/')
params = {'redirect_uri': 'http://localhost?dump',
'response_type': 'code'}
url = bitbucket.get_authorize_url(**params)

after that I've tried number of different data={**params} to get a

session = bitbucket.get_auth_session(data=data)

seems like I can't retrieve the token , keep getting errors:

KeyError(PROCESS_TOKEN_ERROR.format(key=bad_key, raw=r.content))

KeyError: 'Decoder failed to handle access_token with data as returned by provider. A different decoder may be needed.

The example on bitbucket website states OAuth1 method but I still can't get it to work.

Would you please help and point me in the right direction.

Thank you.

suraj_nahak April 25, 2021

Hi @apollopt  @Daniil Penkin 

I was using the python script with slight modification and it was working .

But now i am getting python decoder error .

python /tmp/jenkins1523230902563255794.py
Traceback (most recent call last):
  File "/tmp/jenkins1523230902563255794.py", line 23, in <module>
    page_json = response.json()
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 808, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded


After further debug found that , with get request we are not getting json output
Even with the below curl command getting null output.

curl -X GET -u username:password 'https://api.bitbucket.org/2.0/repositories/workspacename'

Can you suggest what's wrong ?

Thank you

Like Tony Stark likes this
suraj_nahak April 26, 2021

the issue is resolved, our admin team has implemented single sign on, hence had to use apppassword which resolved my issue.

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events