We're retiring a Confluence server, migrating some spaces to another server, but many just aren't updated any more. We wanted to take snapshots of those spaces for later reference.
We exported each as PDF, a reference to squirrel away somewhere for later. But that only includes image attachments. It ignores all other file types.
The XML export does include all attachments, but names the files with their internal ID rather than something more human readable.
So I threw together this Python script which parses the entities.xml file in an XML export, gathering the filenames from there and mapping them to their ids. It creates a folder named for the space, then sub folders named for each page with an attachment, then copies those attachment files in there.
I hope it is useful to someone else.