You're on your way to the next level! Join the Kudos program to earn points and save your progress.
Level 1: Seed
25 / 150 points
Next: Root
1 badge earned
Challenges come and go, but your rewards stay with you. Do more to earn more!
What goes around comes around! Share the love by gifting kudos to your peers.
Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!
Join now to unlock these features and more
My goal is to use the Confluence API to get the content of a page, parse it, edit it, and update that same page with the edited content.
At first, I assumed Confluence's storage format was HTML. Based on that, my original plan was to use Python's BeautifulSoup module to parse and edit the content once I retrieved it from the Confluence API. I now know that the Confluence page storage format is "XHTML-based". I've tried to parse it with various BeautifulSoup parsers (lxml, xml, html.parser) but they all get caught on the standard-breaking macro elements like this:
<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="283daa7d-46af-4d6d-a177-a00b4a2bc342"><ac:plain-text-body><![CDATA[*\[CDRL:\]*]]></ac:plain-text-body></ac:structured-macro>
Is there a preferred method for parsing this?
I am having the same issue! If you found a solution could you share? Thanks!
I never found a parser that worked perfectly for Confluence. Ultimately, I was forced to edit Confluence pages using regular expressions.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I have had the same issue. Ultimately, I have resorted to using an html parser (where that works), an xml parser (where that works), and regex for everything else.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Same issue here. Need to find a way to parse Confluence XHTML incl. macro notation.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey Bill, thanks for your response.
Alas, while these 2 libraries are helpful wrappers for the Confluence API, as far as I can tell, neither have the ability to parse the XHTML that is pulled from the API representing the content of a page.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
This might not be the solution you desire; however, If you are interested in writing Confluence Wiki text to docx format (while maintaining the wiki formats), you can try jirawiki2docx python library.
https://pypi.org/project/jirawiki2docx/
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Woodson Miles I am using ConfluencePS PowerShell module to achieve this. It's quite simple to fetch and upload page contents using Get-ConfluencePage and Set-ConfluencePage commands.
You can get this module from PowerShell Gallery.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.