Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,369,622
Community Members
 
Community Events
168
Community Groups

Technical issues during restore of space export

Hi all,

we are figuring out issues on restoring of a confluence space export.

Short story first, we have a confluence datacenter running on our own and for worst case, total failure we want to have an offline backup of some confluence spaces on two local notebooks. Therefore we got two notebooks, installed the same confluence version as on prem and a database (postgres instead of MS SQL which is used on prem) and try to restore the space exports which were made on our on prem instance.

This takes some timebut was working in the past. Now, when restoring, we get an error, that there might be invalid characters in the export. Using the atlassian-xml-cleaner-0.1.jar was helpful the last time, but now i get still an error that there is still an invalid xml character in the export.

So, how should i proceed if the xml cleaner does not work?

Or more general, is this the prefered way, how to make a small offline backup of some confluence spaces for emergency use or is there a more efficient way to handle this?

 

best
edgar

2 answers

0 votes

The xml cleaner was not used with perl, instead it was java.

First error messages says: "com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: An invalid XML character (Unicode: 0x0) was found in the CDATA section." After using the cleaner script there is the following error message: "com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: Content is not allowed in prolog."

An invalid XML character (Unicode: 0x0) was found in the CDATA section.

The 0x0 is the offending character, in which case this should be "NULL":

 # https://www.ssec.wisc.edu/~tomw/java/unicode.html#x0000

 

You can clear specific unicode characters with some out of box editors such as sed or tr, so googling a bit for 0x0 removal, e.g. https://superuser.com/questions/287997/how-to-use-sed-to-remove-null-bytes

You could try something like this to get rid of it:

sed 's/\x0//g' file1 > file2

 

You should also be able to grep for those null characters to verify the lines having such problems, e.g.

grep -Pa '\x00' problematic.xml

to see before/after if any of the tr/sed worked.

 

Those unicodes generally happen when people copy paste something into confluence, which will accept it and store it, even though Confluence will fail to restore the content from an xml backup in the future.

 

Testing:

$ cat -A xx3.txt
this is null ^@


// knowing that null is 00, we expect to see '00' at the end of file
// 20 is a space, so 2000 = {space}NULL
$ xxd xx3.txt
00000000: 7468 6973 2069 7320 6e75 6c6c 2000 this is null .

// grepping for it (will grep, but not print the NULL in output)
$ grep -Pa '\x00' xx3.txt
this is null

// removing it
$ sed 's#\x0##g' xx3.txt > xx4.txt

// trying to grep for it now
$ grep -Pa '\x00' xx4.txt || echo 'no match'
no match

// hexa proof for good measure, no more null at the end of this file
$ xxd xx4.txt
00000000: 7468 6973 2069 7320 6e75 6c6c 20 this is null

 

 

As for the other error you are getting

Unable to complete import: Content is not allowed in prolog.

 

This doesn't really ring a bell to me. I did a ton of migrations and a ton of backup modifications, but this is unknown to me. This happened after you ran the atlassian cleaner?

All I get to is https://confluence.atlassian.com/jirakb/upgrade-fails-due-to-content-is-not-allowed-in-prolog-error-245825960.html which goes into the same cleaner you used.

But anyway.. perhaps after removing any of the remaining unicodes it might go away. Let's take it a step at a time and get rid of the null first to see what happens next.

0 votes
Ismael Jimoh Community Leader Aug 29, 2022

Hi @Edgar Fast 

 

Can you check if there are Perl issues as described here in your exported spaces?

The cleaner doesn't fix issues like these ones and you need to run the perl command as well.

Test this and if the issue persists, can you share the exact error or perhaps raise a support ticket with Atlassian if the data involved is sensitive.

Regards.

If that doesn't help please include the error message you're getting, because it should indicate the invalid character and/or tag or line where it was found.

Suggest an answer

Log in or Sign up to answer
DEPLOYMENT TYPE
SERVER
VERSION
7.3.4
TAGS

Atlassian Community Events