My goal is to use the Confluence API to get the content of a page, parse it, edit it, and update that same page with the edited content.
At first, I assumed Confluence's storage format was HTML. Based on that, my original plan was to use Python's BeautifulSoup module to parse and edit the content once I retrieved it from the Confluence API. I now know that the Confluence page storage format is "XHTML-based". I've tried to parse it with various BeautifulSoup parsers (lxml, xml, html.parser) but they all get caught on the standard-breaking macro elements like this:
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="283daa7d-46af-4d6d-a177-a00b4a2bc342"><ac:plain-text-body><![CDATA[*\[CDRL:\]*]]></ac:plain-text-body></ac:structured-macro>
Is there a preferred method for parsing this?