Extracting current tables from exported XML

Rieks Joosten August 20, 2024

I'm new to Confluence; please forgive me if I'm asking silly things.

I'm in a project where confluence is used to create documentation. Various pages include a glossary (i.e., a table of which the header of the first column contains `Term`), that I want to extract. When doing an XML export, I can extract such tables from `<![CDATA[` blocks. However, I notice that it appears that also all page revisions are in the XML, and it is impossible to tell which of the extracted tables is the most recent one.

How can I find this out - or better even: how can I organize that only the most recent content of a page ends up in an XML export. I don't care about comments, thumbs-up, notifications and all sorts of other stuff.

3 answers

1 accepted

0 votes
Answer accepted
Marc - Devoteam
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 20, 2024

Hi @Rieks Joosten 

What is the purpose of making an XML export, is it used somewhere else after the import?

The export includes all revisions, as it can be used to import into another Confluence instance and history will be important.

You might want to look into apps on the marketplace, which offer export options that suite more than the full xml export.

Marc - Devoteam
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 20, 2024

Why don't you export the page as pdf or html, this is the latest version.

export-content-to-word-pdf-html-and-xml 

There is no documentation about the XML structure. The XML option is for backup and restore.

https://marketplace.atlassian.com/ and search for export.

You can try most apps for a limited time for free.

Like Rieks Joosten likes this
1 vote
Rieks Joosten August 23, 2024

I tried the single page HTML export before, which for some reason didn't work.
However, I got the site-export working for me, so thanks for pointing that out.

0 votes
Rieks Joosten August 20, 2024

The overall purpose is to allow non-technically oriented people to draft definitions for terms they use in their documentation, which Confluence allows them to do. 

However, we then need to extract these definitions from the various locations in a way that we can further process them with our own terminology management tools, so we can compare the (different) definitions that (different) people use for a particular term, produce glossaries with various contents, etc. 

This only needs to be done after a the documentation is approved for release.

As I don't know my way around in Confluence, would you have any suggestions for apps or scripts or ... that could help me out? Or of documentation about the design of the XML that is being exported, particularly about the elements that contain actual data that people have been editing.

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
CLOUD
PRODUCT PLAN
STANDARD
TAGS
AUG Leaders

Atlassian Community Events