Hi all,
we are figuring out issues on restoring of a confluence space export.
Short story first, we have a confluence datacenter running on our own and for worst case, total failure we want to have an offline backup of some confluence spaces on two local notebooks. Therefore we got two notebooks, installed the same confluence version as on prem and a database (postgres instead of MS SQL which is used on prem) and try to restore the space exports which were made on our on prem instance.
This takes some timebut was working in the past. Now, when restoring, we get an error, that there might be invalid characters in the export. Using the atlassian-xml-cleaner-0.1.jar was helpful the last time, but now i get still an error that there is still an invalid xml character in the export.
So, how should i proceed if the xml cleaner does not work?
Or more general, is this the prefered way, how to make a small offline backup of some confluence spaces for emergency use or is there a more efficient way to handle this?
best
edgar
The xml cleaner was not used with perl, instead it was java.
First error messages says: "com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: An invalid XML character (Unicode: 0x0) was found in the CDATA section." After using the cleaner script there is the following error message: "com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: Content is not allowed in prolog."
An invalid XML character (Unicode: 0x0) was found in the CDATA section.
The 0x0 is the offending character, in which case this should be "NULL":
# https://www.ssec.wisc.edu/~tomw/java/unicode.html#x0000
You can clear specific unicode characters with some out of box editors such as sed or tr, so googling a bit for 0x0 removal, e.g. https://superuser.com/questions/287997/how-to-use-sed-to-remove-null-bytes
You could try something like this to get rid of it:
sed 's/\x0//g' file1 > file2
You should also be able to grep for those null characters to verify the lines having such problems, e.g.
grep -Pa '\x00' problematic.xml
to see before/after if any of the tr/sed worked.
Those unicodes generally happen when people copy paste something into confluence, which will accept it and store it, even though Confluence will fail to restore the content from an xml backup in the future.
Testing:
$ cat -A xx3.txt
this is null ^@
// knowing that null is 00, we expect to see '00' at the end of file
// 20 is a space, so 2000 = {space}NULL
$ xxd xx3.txt
00000000: 7468 6973 2069 7320 6e75 6c6c 2000 this is null .
// grepping for it (will grep, but not print the NULL in output)
$ grep -Pa '\x00' xx3.txt
this is null
// removing it
$ sed 's#\x0##g' xx3.txt > xx4.txt
// trying to grep for it now
$ grep -Pa '\x00' xx4.txt || echo 'no match'
no match
// hexa proof for good measure, no more null at the end of this file
$ xxd xx4.txt
00000000: 7468 6973 2069 7320 6e75 6c6c 20 this is null
As for the other error you are getting
Unable to complete import: Content is not allowed in prolog.
This doesn't really ring a bell to me. I did a ton of migrations and a ton of backup modifications, but this is unknown to me. This happened after you ran the atlassian cleaner?
All I get to is https://confluence.atlassian.com/jirakb/upgrade-fails-due-to-content-is-not-allowed-in-prolog-error-245825960.html which goes into the same cleaner you used.
But anyway.. perhaps after removing any of the remaining unicodes it might go away. Let's take it a step at a time and get rid of the null first to see what happens next.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi @Edgar Fast
Can you check if there are Perl issues as described here in your exported spaces?
The cleaner doesn't fix issues like these ones and you need to run the perl command as well.
Test this and if the issue persists, can you share the exact error or perhaps raise a support ticket with Atlassian if the data involved is sensitive.
Regards.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
If that doesn't help please include the error message you're getting, because it should indicate the invalid character and/or tag or line where it was found.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.