I'm trying to export MediaWiki pages into Confluence using the Universal Wiki Converter.
I've managed to get all of my MediaWiki pages (with history) saved locally, but whenever I go to convert them to Confluence, I get the following error:
CONVERTER_ERROR Error while parsing xml. Skipping
The UWC UI keeps spitting out this error as it goes through each page. Note that I've added the "exported_mediawiki_pages\Pages" folder to the Pages list and every txt file in that folder is a historical revision of each page.
It works (kinda) whenever I select a single txt file in the Pages directory. But that's a different issue entirely (I think). That is, the page gets converted and uploaded to Confluence just fine, but the page is pretty nonsensical. It just shows the user that created the page, followed by the date it was created, and then the page's contents.
{user:<username>} {timestamp:20110930164746} <Page text goes here>
I realize that the {user} and {timestamp} strings are really macro's, but shouldn't those execute upon rendering the page? Is there a plugin I'm missing? Furthermore, shouldn't that information be in the page's header where it says:
Added by <last_name>, <first_name>, last edited by <last_name>, <first_name> on <date>
How can I use UWC to convert all of my local MediaWiki pages into Confluence pages while preserving the history?
I finally managed to get things working with Confluence 4.1.3.
Historical revisions now are imported into Confluence as well as author and timestamp of each revision.
I did it by simply checking out the latest devel version of UWC and building it with ANT.
Running the devel version seems to work for everything I need it to. Woot! :)
Thanks for your help, Laura.
Ok, so in the HTML Conversion subsection of the doc it says:
option 1, try default settings (you've done that)
option 2, you have non wellformed html (and possibly xml but lets not worry about that yet), try htmltidy (you've done that)
option 3, you probably have both non wellformed html and xml or some other xml parsing related problem in your content which means you cant use htmltidy as the sole solution. (and follows some suggestions that I'm hearing you haven't tried yet)
At this point in a conversion testing process I would do a couple of things:
a) examine the specific error message in the uwc.log to determine what the issue is for each page that's having a problem. Could be entity related. Could be illegal characters bytes. (There's some troubleshooting advice in the Html Conversions link I gave you earlier that you might want to take a closer look at.)
b) examine the page content of the pages that are failing to see what kind of data there is which will help with determining course of action. (you've mentioned checking one of them, we'll get to that below)
re: one page vs. a group of pages
So, a group of pages is going to be comprised of very different data. Its unusual for every page in a data set to have the same Xml parsing errors (if there are errors) because the problems are caused by the data itself, which is different from page to page. So, if you have converter errors with a group of pages and not with one in particular, that just means some of your pages have the problem, some don't. You can identify which pages are having a problem by examining the uwc.log.
re: table - Mediawiki vs HTML
The Mediawiki converter handles both types of syntax, but that doesn't mean it can handle every possible variation of syntax combinations. For example, if your mediawiki table syntax contains additional html within the mediawiki syntax style table sections, that's not supported.
So, your options: try turning off the XmlConverter and turning on the optional unnested html converters. That may help with some portion of your html problems. It will certainly get rid of the parsing errors. I would recommend analysing your source data to determine how much of your data is comprised by what kinds of syntax. This will help you decide how much needs to be converted automatically by a tool and how much you would rather tranform manually with a team of content writers. You may also choose at that point to look into developing custom converters to handle pervasive use cases that aren't already handled automatically. The UWC is an open source extensible framework, and the expectation is that you can update the converters to suit your needs. If you do not have an inhouse development team, there are contractors available who do this kind of consulting.
Cheers,
Laura
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Sorry for taking so long to get back to this, I've been working through simply getting the current revisions of every MediaWiki page to export/convert properly. And now that I'm finally there, I'm back to this.
I've decided to take a more narrowly-scoped approach to getting historical revisions into Confluence. I'm currently using one page with only two revisions and both revisions have well-formed HTML and MediaWiki syntax.
The revisions are indicated by "-#.txt" at the end of each filename.
When I try to convert the pages and upload them into Confluence (v4.1.3), I get an error indicating that the page already exists (I'm thinking it means the first revision was already uploaded and since both pages have the same name, it's throwing this error since it doesn't like overwriting pages):
2012-02-13 14:44:17,369 INFO [Thread-5] - Uploading Pages to Confluence... 2012-02-13 14:44:18,006 INFO [Thread-5] - page added may already exist org.apache.xmlrpc.XmlRpcException: java.lang.Exception: com.atlassian.confluence.rpc.RemoteException: Unsupported operation: Wiki formatted content can no longer be retrieved from this API. Please use the version 2 API. The version 2 WSDL is available at: https://wiki-test.aviation.garmin.com/rpc/soap-axis/confluenceservice-v2?wsdl. XML-RPC requests should prefixed with "confluence2.". Please use getPageSummary() to get page data without its content. at org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeException(XmlRpcClientResponseProcessor.java:104) at org.apache.xmlrpc.XmlRpcClientResponseProcessor.decodeResponse(XmlRpcClientResponseProcessor.java:71) at org.apache.xmlrpc.XmlRpcClientWorker.execute(XmlRpcClientWorker.java:73) at ... 2012-02-13 14:44:18,033 INFO [Thread-5] - Uploaded 1 out of 2 page. 2012-02-13 14:44:18,034 INFO [Thread-5] - Conversion Complete
Any thoughts?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Phillip, Laura, I'm interested in doing the same thing. Were either of you able to get the multi-page import to work?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Phillip,
You may want to read the configuration options section of the uwc mediawiki notes. In particular, you're hitting issues with HTML conversion caused by non-wellformed html or xml, so check out the advice in the HTML Conversions subsection. You've also turned on user and timestamp export option, but you haven't yet turned on the associated conversion option, so check out the user/timestamp subsection. You'll need to install the UDMF plugin for that option to work (as Confluence does not support user and timestamp alterations out of the box).
Cheers,
Laura
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I already had the UDMF plugin installed, so what I've done is turned on htmltidy:
Mediawiki.0003.xml-use-htmltidy.property=true
Turned on the user/timestamp conversion options:
Mediawiki.0004.userdate.class=com.atlassian.uwc.converters.mediawiki.UserDateConverter Mediawiki.0004.users-must-exist.property=true Mediawiki.0004.userdate-disabled.property=false
And I still get the same converter error.
CONVERTER_ERROR Error while parsing xml. Skipping
This time I tried exporting a random page by itself and it didn't report any errors (like usual) but when I looked at it in Confluence it was very obfuscated. Turns out the page uses Wiki table formatting (using brackets, pipes, etc) rather than the good ol' fashioned table, tr, and td HTML tags. Does UWC not support Wiki table formatting? Or is there something I'm missing still?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.