I'm currently trying to download an attached image from a Confluence page using the python API. However, I've been getting 404 errors due to an issue with the request URL.
Here's my code:
from atlassian import Confluence
# define variables here
...
# Initialize Confluence client
confluence = Confluence(
url=base_url, password=confluence_token, username=username, cloud=True
)
# This works:
page = confluence.get_page_by_id(page_id, expand='body.storage')
# This doesn't:
downloads = confluence.download_attachments_from_page(
page_id, path=image_dir
)
The get_page_by_id() request before the download request works, but the second request fails, apparently because it's trying to access a URL that doesn't exist.
HTTPError: HTTP error occurred while downloading attachments:
404 Client Error: Not Found for url:
https://safetymasterehs.atlassian.net/wiki/https://safetymasterehs.atlassian.net/wiki/download/attachments/8716303/Screenshot%202024-02-29%20at%2022.42.33.png?version=1&modificationDate=1743358284752&cacheVersion=1&api=v2
The first section of the URL it's trying to download the png from is repeated twice, but if I make the obvious fix by deleting the first section of the URL and pasting it into a browser, the download starts immediately. However, I'm not sure why the rest API is putting together an incorrect URL. I've tried specifying a file by passing it a filename=[file name here] argument, but got the same result.
Is this a bug in the API, or is there something wrong with my setup?
Full traceback:
JSONDecodeError Traceback (most recent call last) File /usr/local/lib/python3.10/site-packages/requests/models.py:971, in Response.json(self, **kwargs) 970 try: --> 971 return complexjson.loads(self.text, **kwargs) 972 except JSONDecodeError as e: 973 # Catch JSON-related errors and raise as requests.JSONDecodeError 974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError File /usr/local/lib/python3.10/site-packages/simplejson/__init__.py:525, in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw) 521 if (cls is None and encoding is None and object_hook is None and 522 parse_int is None and parse_float is None and 523 parse_constant is None and object_pairs_hook is None 524 and not use_decimal and not kw): --> 525 return _default_decoder.decode(s) 526 if cls is None: File /usr/local/lib/python3.10/site-packages/simplejson/decoder.py:370, in JSONDecoder.decode(self, s, _w, _PY3) 369 s = str(s, self.encoding) --> 370 obj, end = self.raw_decode(s) 371 end = _w(s, end).end() File /usr/local/lib/python3.10/site-packages/simplejson/decoder.py:400, in JSONDecoder.raw_decode(self, s, idx, _w, _PY3) 399 idx += 3 --> 400 return self.scan_once(s, idx=_w(s, idx).end()) JSONDecodeError: Expecting value: line 3 column 1 (char 10) During handling of the above exception, another exception occurred: JSONDecodeError Traceback (most recent call last) File ~/.local/lib/python3.10/site-packages/atlassian/confluence.py:3571, in Confluence.raise_for_status(self, response) 3570 try: -> 3571 j = response.json() 3572 error_msg = j["message"] File /usr/local/lib/python3.10/site-packages/requests/models.py:975, in Response.json(self, **kwargs) 972 except JSONDecodeError as e: 973 # Catch JSON-related errors and raise as requests.JSONDecodeError 974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError --> 975 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) JSONDecodeError: Expecting value: line 3 column 1 (char 10) During handling of the above exception, another exception occurred: HTTPError Traceback (most recent call last) File ~/.local/lib/python3.10/site-packages/atlassian/confluence.py:1473, in Confluence.download_attachments_from_page(self, page_id, path, start, limit, filename, to_memory) 1472 # Fetch the file content -> 1473 response = self.get(str(download_link), not_json_response=True) 1475 if to_memory: 1476 # Store in BytesIO object File ~/.local/lib/python3.10/site-packages/atlassian/rest_client.py:441, in AtlassianRestAPI.get(self, path, data, flags, params, headers, not_json_response, trailing, absolute, advanced_mode) 428 """ 429 Get request based on the python-requests module. You can override headers, and also, get not json response 430 :param path: (...) 439 :return: 440 """ --> 441 response = self.request( 442 "GET", 443 path=path, 444 flags=flags, 445 params=params, 446 data=data, 447 headers=headers, 448 trailing=trailing, 449 absolute=absolute, 450 advanced_mode=advanced_mode, 451 ) 452 if self.advanced_mode or advanced_mode: File ~/.local/lib/python3.10/site-packages/atlassian/rest_client.py:413, in AtlassianRestAPI.request(self, method, path, data, json, flags, params, headers, files, trailing, absolute, advanced_mode) 411 return response --> 413 self.raise_for_status(response) 414 return response File ~/.local/lib/python3.10/site-packages/atlassian/confluence.py:3575, in Confluence.raise_for_status(self, response) 3574 log.error(e) -> 3575 response.raise_for_status() 3576 else: File /usr/local/lib/python3.10/site-packages/requests/models.py:1021, in Response.raise_for_status(self) 1020 if http_error_msg: -> 1021 raise HTTPError(http_error_msg, response=self) HTTPError: 404 Client Error: Not Found for url: https://safetymasterehs.atlassian.net/wiki/https://safetymasterehs.atlassian.net/wiki/download/attachments/8716303/Screenshot%202024-02-29%20at%2022.42.33.png?version=1&modificationDate=1743358284752&cacheVersion=1&api=v2 During handling of the above exception, another exception occurred: HTTPError Traceback (most recent call last) Input In [3], in <cell line: 9>() 7 image_dir = 'test_confluence/images' 8 filename = attachments_container['results'][0]['title'] ----> 9 downloads = confluence.download_attachments_from_page(page_id, path=image_dir)#, filename=filename) 10 print("Found files:", glob.glob(os.path.join([image_dir, '*']))) 11 for image in glob.glob(os.path.join([image_dir, '*'])): File ~/.local/lib/python3.10/site-packages/atlassian/confluence.py:1495, in Confluence.download_attachments_from_page(self, page_id, path, start, limit, filename, to_memory) 1493 raise PermissionError(f"Permission denied when trying to save files to '{path}'.") 1494 except requests.HTTPError as http_err: -> 1495 raise requests.HTTPError( 1496 f"HTTP error occurred while downloading attachments: {http_err}", 1497 response=http_err.response, 1498 request=http_err.request, 1499 ) 1500 except Exception as err: 1501 raise Exception(f"An unexpected error occurred: {err}") HTTPError: HTTP error occurred while downloading attachments: 404 Client Error: Not Found for url: https://safetymasterehs.atlassian.net/wiki/https://safetymasterehs.atlassian.net/wiki/download/attachments/8716303/Screenshot%202024-02-29%20at%2022.42.33.png?version=1&modificationDate=1743358284752&cacheVersion=1&api=v2
--------------------------------------------------------------------------- JSONDecodeError Traceback (most recent call last) File /usr/local/lib/python3.10/site-packages/requests/models.py:971, in Response.json(self, **kwargs) 970 try: --> 971 return complexjson.loads(self.text, **kwargs) 972 except JSONDecodeError as e: 973 # Catch JSON-related errors and raise as requests.JSONDecodeError 974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError File /usr/local/lib/python3.10/site-packages/simplejson/__init__.py:525, in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, use_decimal, **kw) 521 if (cls is None and encoding is None and object_hook is None and 522 parse_int is None and parse_float is None and 523 parse_constant is None and object_pairs_hook is None 524 and not use_decimal and not kw): --> 525 return _default_decoder.decode(s) 526 if cls is None: File /usr/local/lib/python3.10/site-packages/simplejson/decoder.py:370, in JSONDecoder.decode(self, s, _w, _PY3) 369 s = str(s, self.encoding) --> 370 obj, end = self.raw_decode(s) 371 end = _w(s, end).end() File /usr/local/lib/python3.10/site-packages/simplejson/decoder.py:400, in JSONDecoder.raw_decode(self, s, idx, _w, _PY3) 399 idx += 3 --> 400 return self.scan_once(s, idx=_w(s, idx).end()) JSONDecodeError: Expecting value: line 3 column 1 (char 10) During handling of the above exception, another exception occurred: JSONDecodeError Traceback (most recent call last) File ~/.local/lib/python3.10/site-packages/atlassian/confluence.py:3571, in Confluence.raise_for_status(self, response) 3570 try: -> 3571 j = response.json() 3572 error_msg = j["message"] File /usr/local/lib/python3.10/site-packages/requests/models.py:975, in Response.json(self, **kwargs) 972 except JSONDecodeError as e: 973 # Catch JSON-related errors and raise as requests.JSONDecodeError 974 # This aliases json.JSONDecodeError and simplejson.JSONDecodeError --> 975 raise RequestsJSONDecodeError(e.msg, e.doc, e.pos) JSONDecodeError: Expecting value: line 3 column 1 (char 10) During handling of the above exception, another exception occurred: HTTPError Traceback (most recent call last) File ~/.local/lib/python3.10/site-packages/atlassian/confluence.py:1473, in Confluence.download_attachments_from_page(self, page_id, path, start, limit, filename, to_memory) 1472 # Fetch the file content -> 1473 response = self.get(str(download_link), not_json_response=True) 1475 if to_memory: 1476 # Store in BytesIO object File ~/.local/lib/python3.10/site-packages/atlassian/rest_client.py:441, in AtlassianRestAPI.get(self, path, data, flags, params, headers, not_json_response, trailing, absolute, advanced_mode) 428 """ 429 Get request based on the python-requests module. You can override headers, and also, get not json response 430 :param path: (...) 439 :return: 440 """ --> 441 response = self.request( 442 "GET", 443 path=path, 444 flags=flags, 445 params=params, 446 data=data, 447 headers=headers, 448 trailing=trailing, 449 absolute=absolute, 450 advanced_mode=advanced_mode, 451 ) 452 if self.advanced_mode or advanced_mode: File ~/.local/lib/python3.10/site-packages/atlassian/rest_client.py:413, in AtlassianRestAPI.request(self, method, path, data, json, flags, params, headers, files, trailing, absolute, advanced_mode) 411 return response --> 413 self.raise_for_status(response) 414 return response File ~/.local/lib/python3.10/site-packages/atlassian/confluence.py:3575, in Confluence.raise_for_status(self, response) 3574 log.error(e) -> 3575 response.raise_for_status() 3576 else: File /usr/local/lib/python3.10/site-packages/requests/models.py:1021, in Response.raise_for_status(self) 1020 if http_error_msg: -> 1021 raise HTTPError(http_error_msg, response=self) HTTPError: 404 Client Error: Not Found for url: https://safetymasterehs.atlassian.net/wiki/https://safetymasterehs.atlassian.net/wiki/download/attachments/8716303/Screenshot%202024-02-29%20at%2022.42.33.png?version=1&modificationDate=1743358284752&cacheVersion=1&api=v2 During handling of the above exception, another exception occurred: HTTPError Traceback (most recent call last) Input In [3], in <cell line: 9>() 7 image_dir = 'test_confluence/images' 8 filename = attachments_container['results'][0]['title'] ----> 9 downloads = confluence.download_attachments_from_page(page_id, path=image_dir)#, filename=filename) 10 print("Found files:", glob.glob(os.path.join([image_dir, '*']))) 11 for image in glob.glob(os.path.join([image_dir, '*'])): File ~/.local/lib/python3.10/site-packages/atlassian/confluence.py:1495, in Confluence.download_attachments_from_page(self, page_id, path, start, limit, filename, to_memory) 1493 raise PermissionError(f"Permission denied when trying to save files to '{path}'.") 1494 except requests.HTTPError as http_err: -> 1495 raise requests.HTTPError( 1496 f"HTTP error occurred while downloading attachments: {http_err}", 1497 response=http_err.response, 1498 request=http_err.request, 1499 ) 1500 except Exception as err: 1501 raise Exception(f"An unexpected error occurred: {err}") HTTPError: HTTP error occurred while downloading attachments: 404 Client Error: Not Found for url: https://safetymasterehs.atlassian.net/wiki/https://safetymasterehs.atlassian.net/wiki/download/attachments/8716303/Screenshot%202024-02-29%20at%2022.42.33.png?version=1&modificationDate=1743358284752&cacheVersion=1&api=v2
I'd suggest to follow the approach in this Stackoverflow answer: https://stackoverflow.com/questions/60038509/how-to-download-a-confluence-page-attachment-with-python
Thank you, this works. Still no luck with download_attachments_from_page(), but the requests version is good enough.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.