Extracting all attachments from a space

June 2, 2023

We're retiring a Confluence server, migrating some spaces to another server, but many just aren't updated any more. We wanted to take snapshots of those spaces for later reference.

We exported each as PDF, a reference to squirrel away somewhere for later. But that only includes image attachments. It ignores all other file types.

The XML export does include all attachments, but names the files with their internal ID rather than something more human readable.

So I threw together this Python script which parses the entities.xml file in an XML export, gathering the filenames from there and mapping them to their ids. It creates a folder named for the space, then sub folders named for each page with an attachment, then copies those attachment files in there.

I hope it is useful to someone else.

https://github.com/rtphokie/confluence_attachment_extract

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Extracting all attachments from a space

1 comment

Comment

Was this helpful?

Thanks!

TAGS

Atlassian Community Events