Hello everyone,
I am trying to get a list of all repositories of a team, using Bitbucket REST API 2.0,
https://api.bitbucket.org/2.0/repositories/TEAM_NAME?page=PAGE_NUMBER
And then, I do a list of all repositories present in all PAGE_NUMBERs, who have each 10 repositories.
But after doing that I notice that there was only a half of the all volume, of a total of almost 200 repositories I had only 100 in all pages, 10 repositories by each page.
My question is how can I get all those repositories, and not just a half?
Hello @Daniil Penkin ,
Thanks for the reply.
At size parameter I got 203, and in pagelen 10.
I have double check if it was only public or private, all repositories are private and in that team, I have verified through bitbucket webservice.
If I tried to get more then 10 in pagelen parameter, for example page 11, I receive this message:
{"pagelen": 10, "previous": "https://api.bitbucket.org/2.0/repositories/gfetecteam?page=10", "values": [], "page": 11, "size": 1}
I really dont know ...
Also how do I set that query to put all the info in one page?
What you get back is a list of 1 repository (note size is 1 in the response), but you requested page 11 which doesn't make sense (Bitbucket should probably return 400 instead, we can improve here).
The reason for this is that you're making unauthenticated call, and the API returns only public repositories – there's exactly one in that team, which I and anyone else including anonymous users can see (you can try it out by navigating to your team page in an incognito window of the browser).
To get all repositories, including private, you need to authenticate with api.bitbucket .org using any of the following: your regular credentials, username and app password, or OAuth token.
The query to fetch repositories and increase page size will look like this:
https://api.bitbucket.org/2.0/repositories/<team>?pagelen=300
Let me know if you have any questions.
Cheers,
Daniil
I am using my username and password to get those requests. This is the entrie that I am using in the python script to get page by page of all available(from 1 to 10), with an authentication:
print (requests.get('https://%s:%s@api.bitbucket.org/2.0/repositories/%s?page=%d' %(username,password,team,i))).json())
Through pagelen at 300 as you said:
print (requests.get('https://%s:%s@api.bitbucket.org/2.0/repositories/%s?pagelen=300' %(username,password,team)).json())
I got this message:
{'type': 'error', 'error': {'message': 'Invalid pagelen'}}
This is the entrie that I am using in the python script to get page by page of all available(from 1 to 10), with an authentication
This should work. In your previous message you had "size": 1 which is the total number of items on all pages, and it didn't match your expectation of around 200 repositories.
page by page of all available(from 1 to 10)
So does your script check the next parameter in the response to keep the fetching loop running? What I mean is where that number 10 comes from? Do you get anything back if you request page 11 (with default page size)?
Through pagelen at 300 as you said I got this message
I'm sorry about this one, I was under impression the upper limit for page size is 1000 but it's 100. So you can request with pagelen=100, higher values will result in the error you got.
Update: here's a guide for pagination in Bitbucket btw. Note that some resources use cursor pagination, i.e. there's no page number as such. These endpoints have this note in their documentation.
Cheers,
Daniil
So does your script check the next parameter in the response to keep the fetching loop running? What I mean is where that number 10 comes from? Do you get anything back if you request page 11 (with default page size)?
My script goes from page to page, till there is no more slugs present and stores all slugs in one list, and it stops when there is no more slugs present, where in this case stops in page 11. I have also tried in python console to retrieve manually values above 10, but no success, cannot get more pages through the API, it gives me:
{"pagelen": 10, "previous": "https://api.bitbucket.org/2.0/repositories/<TEAM>?page=10", "values": [], "page": 11, "size": 1}
From 1 to 10, I mean the page number, without setting any size and pagelen. I use the default output when just querying page.
I'm sorry about this one, I was under impression the upper limit for page size is 1000 but it's 100. So you can request with pagelen=100, higher values will result in the error you got.
No problem, I notice, I am following documentation.
I have also tried in python console to retrieve manually values above 10, but no success, cannot get more pages through the API, it gives me:
{"pagelen": 10, "previous": "https://api.bitbucket.org/2.0/repositories/<TEAM>?page=10", "values": [], "page": 11, "size": 1}
Well, in the response you got size is equal to 1 which suggests that you only fetched public repositories (and it explains why 11th page is empty), which in turn suggests that the call was unauthenticated.
without setting any size
Just a note: size is what Bitbucket returns in the response payload, it means the total number of items on all pages. So there's no point to pass it in the request as it will be ignored.
All in all, I ran out of ideas. Do you mind sharing your python script so that I can take a look why it isn't working as expected?
Cheers,
Daniil
Hello Daniil, sorry for the delayed answer.
Here it goes one part:
##Login
username = 'xxx'
password = 'xxx'
team = 'xxx'
## Get a list of all repos from bitbucket team
#Get number of pages
r_pages = requests.get('https://%s:%s@api.bitbucket.org/2.0/teams/%s/repositories' %(username,password,team))
json_r_pages = r_pages.json()
print (json_r_pages['pagelen'])
r_pagelen = json_r_pages['pagelen']
full_repo_list = [] # string array inicialized
#Get all repositories
for i in range(1,r_pagelen+1):
print ('page: %d\n' %i)
# get json of each page till there is no more new pages
try:
r_pages=requests.get('https://%s:%s@api.bitbucket.org/2.0/repositories/%s?page=%d' %(username,password,team,i))
json_r_pages = r_pages.json()
except:
break
# search in each page for all slugs(aka repository names) till no more slugs could be found
j=0 # slug control for each page
while (1):
try:
print ("%s" %json_r_pages['values'][j]['slug'])
full_repo_list.append(json_r_pages['values'][j]['slug'])
j=j+1
except:
break
Hi @apollopt,
Thanks for posting the script, now things got clear.
The reason it only fetches 10 pages is the very first request where you're getting the number of pages (line 15). You read pagelen attribute, however it isn't the number of pages but rather the maximum number of items on each page (page length).
The overall number of pages isn't returned anywhere in the payload, you can calculate it using pagelen and size (total number of items on all pages). But in fact you don't even need to know the number of pages. Here're the improvements I'd make to your script:
Also, just an observation: you're using different endpoints for initial request and for fetching pages. Not a big deal since, as I mentioned, you don't need the first call at all.
So here's my version:
import requests
##Login
username = 'xxx'
password = 'xxx'
team = 'xxx'
full_repo_list = []
# Request 100 repositories per page (and only their slugs), and the next page URL
next_page_url = 'https://api.bitbucket.org/2.0/repositories/%s?pagelen=100&fields=next,values.slug' % team
# Keep fetching pages while there's a page to fetch
while next_page_url is not None:
response = requests.get(next_page_url, auth=(username, password))
page_json = response.json()
# Parse repositories from the JSON
for repo in page_json['values']:
full_repo_list.append(repo['slug'])
# Get the next page URL, if present
# It will include same query parameters, so no need to append them again
next_page_url = page_json.get('next', None)
# Result length will be equal to `size` returned on any page
print ("Result:", len(full_repo_list))
Let me know if this helped.
Cheers,
Daniil
It worked like a charm! =D
I didnt knew well how to setup the http get request in order to get a json sanitized response.
I have tried so many times in python console and in that script that pagelen have been left forgotten there. eheheh
Also your version is more straighforward then mine. I am not a profissional programmer, I am a system administrator, programming to keep code more resoursesless is not my best skill lool.
Thanks a lot for the help!
No worries, I'm happy to help :)
Hi All,
Small doubt here, in this new api/2.0, could you please tell me where to specify my company URL.
Let's say my company URL is abc.com
url should be -
https://api.bitbucket.org/2.0/repositories/abc.com?pagelen=100&fields=next,values.slug' % team
Is this right. If not please suggest the correct url
You use your company's workspace name in place of the user param for the repositories endpoint. So if your company's workspace is https://bitbucket.org/abc, then your call should be
https://api.bitbucket.org/2.0/repositories/abc?pagelen=100&fields=next,values.slug' % team
Hello Daniil,
wanted to write in this thread since it relates to API access and might be of interest to others.
I'm struggling to use OAuth2 with python to get a session so I can retrieve some info from scopes: account, email, repository.
Here's what I've tried:
from rauth import OAuth2Service
import json
bitbucket = OAuth2Service(
name='test',
client_id='1234',
client_secret='qwerty',
access_token_url='https://bitbucket.org/site/oauth2/access_token',
authorize_url='https://bitbucket.org/site/oauth2/authorize',
base_url='https://api.bitbucket.org/')
params = {'redirect_uri': 'http://localhost?dump',
'response_type': 'code'}
url = bitbucket.get_authorize_url(**params)
after that I've tried number of different data={**params} to get a
session = bitbucket.get_auth_session(data=data)
seems like I can't retrieve the token , keep getting errors:
KeyError(PROCESS_TOKEN_ERROR.format(key=bad_key, raw=r.content))
KeyError: 'Decoder failed to handle access_token with data as returned by provider. A different decoder may be needed.
The example on bitbucket website states OAuth1 method but I still can't get it to work.
Would you please help and point me in the right direction.
Thank you.
Hi @apollopt @Daniil Penkin
I was using the python script with slight modification and it was working .
But now i am getting python decoder error .
python /tmp/jenkins1523230902563255794.py Traceback (most recent call last): File "/tmp/jenkins1523230902563255794.py", line 23, in <module> page_json = response.json() File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 808, in json return complexjson.loads(self.text, **kwargs) File "/usr/lib/python2.7/json/__init__.py", line 339, in loads return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 364, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded
After further debug found that , with get request we are not getting json output
Even with the below curl command getting null output.
curl -X GET -u username:password 'https://api.bitbucket.org/2.0/repositories/workspacename'
Can you suggest what's wrong ?
Thank you
the issue is resolved, our admin team has implemented single sign on, hence had to use apppassword which resolved my issue.