Finding storage usage of Confluence space and page using REST API

Jonny Klaas August 29, 2022

Hi, 

I try to fetch the consumed storage of our spaces by using the python script as documented here: https://confluence.atlassian.com/confkb/finding-storage-usage-of-confluence-space-and-page-using-rest-api-1063555292.html

The script is running fine but stops after 25 spaces. I can't find any limiter within the script, so maybe the REST API is limiting. 

Can somebody help me out how to fetch all results?

 

Regards,

Jonny

2 answers

1 vote
marc -Collabello--Phase Locked-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
August 29, 2022

It seems the python script does not follow the `_next` links, if there are more than 25 results.

I believe the script needs to be modified for this.

Patrick Alexander December 14, 2022

Hi There,

is there anyone who has already updated the script?

Thanks

Patrick

0 votes
Patrick Alexander December 14, 2022

I got the Solution and edited the Script to get 500 results:

# This code sample uses the 'requests' 'json' 'csv' library:
import requests
import json
import csv

#INSERT "USER", "TOKEN", "BASE_URL" HERE
USER="User"
TOKEN="Token"
BASE_URL="https://xxxxxxxxx.atlassian.net"

with open('per_page.csv', 'w') as pagecsvfile, open('per_space.csv', 'w') as spacecsvfile:
perPageWriter = csv.writer(pagecsvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
perSpaceWriter = csv.writer(spacecsvfile, delimiter=',', quotechar='|', quoting=csv.QUOTE_MINIMAL)
perPageWriter.writerow(['pageid','attachment_size(byte)'])
perSpaceWriter.writerow(['space_name','space_key','attachment_size(byte)'])

headers = {
"Accept": "application/json"
}

response = requests.request(
"GET",
BASE_URL + "/wiki/rest/api/space?limit=500",
headers=headers,
auth=(USER,TOKEN)
)

site_attachment_volume = 0

#Get all space keys
space_key_results = json.loads(response.text)["results"]
for space in space_key_results:
space_attachment_volume = 0
#Get related page IDs from space keys
response = requests.request(
"GET",
BASE_URL + "/wiki/rest/api/space/" + space["key"] + "/content",
headers=headers,
auth=(USER,TOKEN)

)
print("Space Key: " + space["key"])
page_results = json.loads(response.text)["page"]["results"]
for page in page_results:
page_attachment_volume = 0
#Get attachments from each page
print(" " + "Page ID: " + page["id"])
response = requests.request(
"GET",
BASE_URL + "/wiki/rest/api/content/" + page["id"] + "/child/attachment",
headers=headers,
auth=(USER,TOKEN)
)
attachment_results = json.loads(response.text)["results"]
for attachment in attachment_results:
print(" " + "Attachment Name: " + json.dumps(attachment["title"]) + ", " + json.dumps(attachment["extensions"]["fileSize"]) + " bytes")
page_attachment_volume += int(json.dumps(attachment["extensions"]["fileSize"]))

space_attachment_volume += page_attachment_volume
print(" -->" + "PAGE TOTAL: " + str(page_attachment_volume))

#Write to CSV
perPageWriter.writerow([page["id"],str(page_attachment_volume)])

print("\n " + "SPACE TOTAL: " + str(space_attachment_volume) + " bytes")
print("----------")

#Write to CSV
perSpaceWriter.writerow([space["name"],space["key"],str(space_attachment_volume)])

 

 

 

 

 

Jan June 23, 2023

Hi there,

Would anyone be able to assist with updating this script to work with the v2 api from Atlassian for Confluence? Reason being I can use the pagination and response header to pull more than 250 requests which I need due to the amount of spaces.

I have updated the endpoints but am receiving the below error:


 page_results = json.loads(response.text)["page"]["results"]
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 3 column 1 (char 10)


Thanks

Jan

marc -Collabello--Phase Locked-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
July 13, 2023

Hi,

Probably it is some programming work, as the v2 API is different from v1.

Do you need the output on the command line?  Or is output in a table in Confluence also ok?

Jan July 13, 2023

Hi,

Ideally if it could be in the same format where it saves to a CSV would be great, I only actually require the per_space.csv and not individual pages if that helps.

I am also not sure how the response header works in python as I have already have the command working for Get/Spaces working in Postman with the v2 api and see the response header at the bottom.

Thanks

Jan

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PRODUCT PLAN
STANDARD
PERMISSIONS LEVEL
Site Admin
TAGS
AUG Leaders

Atlassian Community Events