Attempting to Convert Historical Revisions of MediaWiki Pages to Confluence

I'm trying to export MediaWiki pages into Confluence using the Universal Wiki Converter.

I've managed to get all of my MediaWiki pages (with history) saved locally, but whenever I go to convert them to Confluence, I get the following error:

CONVERTER_ERROR Error while parsing xml. Skipping

The UWC UI keeps spitting out this error as it goes through each page. Note that I've added the "exported_mediawiki_pages\Pages" folder to the Pages list and every txt file in that folder is a historical revision of each page.

It works (kinda) whenever I select a single txt file in the Pages directory. But that's a different issue entirely (I think). That is, the page gets converted and uploaded to Confluence just fine, but the page is pretty nonsensical. It just shows the user that created the page, followed by the date it was created, and then the page's contents.

{user:<username>}
{timestamp:20110930164746}

<Page text goes here>

I realize that the {user} and {timestamp} strings are really macro's, but shouldn't those execute upon rendering the page? Is there a plugin I'm missing? Furthermore, shouldn't that information be in the page's header where it says:

Added by <last_name>, <first_name>, last edited by <last_name>, <first_name> on <date>

How can I use UWC to convert all of my local MediaWiki pages into Confluence pages while preserving the history?

3 answers

1 accepted

I finally managed to get things working with Confluence 4.1.3.

Historical revisions now are imported into Confluence as well as author and timestamp of each revision.

I did it by simply checking out the latest devel version of UWC and building it with ANT.

Running the devel version seems to work for everything I need it to. Woot! :)

Thanks for your help, Laura.

Hi Phillip,

You may want to read the configuration options section of the uwc mediawiki notes. In particular, you're hitting issues with HTML conversion caused by non-wellformed html or xml, so check out the advice in the HTML Conversions subsection. You've also turned on user and timestamp export option, but you haven't yet turned on the associated conversion option, so check out the user/timestamp subsection. You'll need to install the UDMF plugin for that option to work (as Confluence does not support user and timestamp alterations out of the box).

Cheers,

Laura

I already had the UDMF plugin installed, so what I've done is turned on htmltidy:

Mediawiki.0003.xml-use-htmltidy.property=true

Turned on the user/timestamp conversion options:

Mediawiki.0004.userdate.class=com.atlassian.uwc.converters.mediawiki.UserDateConverter
Mediawiki.0004.users-must-exist.property=true
Mediawiki.0004.userdate-disabled.property=false

And I still get the same converter error.

CONVERTER_ERROR Error while parsing xml. Skipping

This time I tried exporting a random page by itself and it didn't report any errors (like usual) but when I looked at it in Confluence it was very obfuscated. Turns out the page uses Wiki table formatting (using brackets, pipes, etc) rather than the good ol' fashioned table, tr, and td HTML tags. Does UWC not support Wiki table formatting? Or is there something I'm missing still?

Ok, so in the HTML Conversion subsection of the doc it says:

option 1, try default settings (you've done that)

option 2, you have non wellformed html (and possibly xml but lets not worry about that yet), try htmltidy (you've done that)

option 3, you probably have both non wellformed html and xml or some other xml parsing related problem in your content which means you cant use htmltidy as the sole solution. (and follows some suggestions that I'm hearing you haven't tried yet)

At this point in a conversion testing process I would do a couple of things:

a) examine the specific error message in the uwc.log to determine what the issue is for each page that's having a problem. Could be entity related. Could be illegal characters bytes. (There's some troubleshooting advice in the Html Conversions link I gave you earlier that you might want to take a closer look at.)

b) examine the page content of the pages that are failing to see what kind of data there is which will help with determining course of action. (you've mentioned checking one of them, we'll get to that below)

re: one page vs. a group of pages

So, a group of pages is going to be comprised of very different data. Its unusual for every page in a data set to have the same Xml parsing errors (if there are errors) because the problems are caused by the data itself, which is different from page to page. So, if you have converter errors with a group of pages and not with one in particular, that just means some of your pages have the problem, some don't. You can identify which pages are having a problem by examining the uwc.log.

re: table - Mediawiki vs HTML

The Mediawiki converter handles both types of syntax, but that doesn't mean it can handle every possible variation of syntax combinations. For example, if your mediawiki table syntax contains additional html within the mediawiki syntax style table sections, that's not supported.

So, your options: try turning off the XmlConverter and turning on the optional unnested html converters. That may help with some portion of your html problems. It will certainly get rid of the parsing errors. I would recommend analysing your source data to determine how much of your data is comprised by what kinds of syntax. This will help you decide how much needs to be converted automatically by a tool and how much you would rather tranform manually with a team of content writers. You may also choose at that point to look into developing custom converters to handle pervasive use cases that aren't already handled automatically. The UWC is an open source extensible framework, and the expectation is that you can update the converters to suit your needs. If you do not have an inhouse development team, there are contractors available who do this kind of consulting.

Cheers,

Laura

Sorry for taking so long to get back to this, I've been working through simply getting the current revisions of every MediaWiki page to export/convert properly. And now that I'm finally there, I'm back to this.

I've decided to take a more narrowly-scoped approach to getting historical revisions into Confluence. I'm currently using one page with only two revisions and both revisions have well-formed HTML and MediaWiki syntax.

The revisions are indicated by "-#.txt" at the end of each filename.

When I try to convert the pages and upload them into Confluence (v4.1.3), I get an error indicating that the page already exists (I'm thinking it means the first revision was already uploaded and since both pages have the same name, it's throwing this error since it doesn't like overwriting pages):

2012-02-13 14:44:17,369 INFO  [Thread-5] - Uploading Pages to Confluence...
2012-02-13 14:44:18,006 INFO  [Thread-5] - page added may already exist
org.apache.xmlrpc.XmlRpcException: java.lang.Exception: com.atlassian.confluence.rpc.RemoteException: Unsupported operation: Wiki formatted content can no longer be retrieved from this API. Please use
 the version 2 API. The version 2 WSDL is available at: https://wiki-test.aviation.garmin.com/rpc/soap-axis/confluenceservice-v2?wsdl. XML-RPC requests should prefixed with "confluence2.". Please use
getPageSummary() to get page data without its content.
        at org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpcClientResponseProcessor.java:104)
        at org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcClientResponseProcessor.java:71)
        at org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:73)
        at ...
2012-02-13 14:44:18,033 INFO  [Thread-5] - Uploaded 1 out of 2 page.
2012-02-13 14:44:18,034 INFO  [Thread-5] - Conversion Complete

Any thoughts?

Phillip, Laura, I'm interested in doing the same thing. Were either of you able to get the multi-page import to work?

Suggest an answer

Log in or Sign up to answer
Atlassian Community Anniversary

Happy Anniversary, Atlassian Community!

This community is celebrating its one-year anniversary and Atlassian co-founder Mike Cannon-Brookes has all the feels.

Read more
Community showcase
Kesha Thillainayagam
Posted Apr 13, 2018 in Confluence

We want to hear how your non-technical teams are using Confluence!

Hi Community! Kesha (kay-sha) from the Confluence marketing team here! Can you share stories with us on how your non-technical (think Marketing, Sales, HR, legal, etc.) teams are using Confluen...

1,785 views 25 10
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you