My goal is to use the Confluence API to get the content of a page, parse it, edit it, and update that same page with the edited content.
At first, I assumed Confluence's storage format was HTML. Based on that, my original plan was to use Python's BeautifulSoup module to parse and edit the content once I retrieved it from the Confluence API. I now know that the Confluence page storage format is "XHTML-based". I've tried to parse it with various BeautifulSoup parsers (lxml, xml, html.parser) but they all get caught on the standard-breaking macro elements like this:
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="283daa7d-46af-4d6d-a177-a00b4a2bc342"><ac:plain-text-body><![CDATA[*\[CDRL:\]*]]></ac:plain-text-body></ac:structured-macro>
Is there a preferred method for parsing this?
Hi Atlassian Community, My name is DJ Chung, and I’m a Product Manager on the Confluence Cloud team. Today, I’m excited to share a new and improved version of Home. The new Home helps you ...
Connect with like-minded Atlassian users at free events near you!Find an event
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no Community Events near you at the moment.Host an event
You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events