Removing invalid Characters from Confluence Restore Entities.xml

I ran into an old space restore error recently on a Confluence 7.15.1 (datacenter): 

Import failed. Check your server logs for more information. com.atlassian.confluence.importexport.ImportExportException: Unable to complete import: An invalid XML character (Unicode: 0xb) was found in the CDATA section.

This is just a post of solution notes using PowerShell to resolve the issue.

Background

Known Confluence bug: Unable to complete import: An invalid XML character (Unicode: 0xffff) was found in the CDATA section

https://jira.atlassian.com/browse/CONFSERVER-38089?error=login_required&error_description=Login+required&state=a429aefd-9884-4656-a89b-32a7f64e8c43

Solution

Atlassian Solution reference:

Identifying Invalid Characters: https://confluence.atlassian.com/confkb/how-to-identify-invalid-characters-on-a-confluence-page-911180364.html

Removing Invalid Characters: https://confluence.atlassian.com/jira/removing-invalid-characters-from-xml-backups-12079.html

Unicode Lookup: https://unicodelookup.com/#0xb/1

Using a Window's OS computer, leverage the use of a PowerShell and PowerShell script to remove the invalid characters from the Entities.xml and re-zip the restore file.

Content of PowerShell script:

$yourfile = "C:\Users\x889390\Downloads\ADTA Wiki Copy\entities.xml"

$outputfile = "C:\Users\x889390\Downloads\ADTA Wiki Copy\entities_clean.xml"

function Repair-XmlString {

    [CmdletBinding()]

    param(

        [Parameter(Mandatory = $true, Position = 0,ValueFromPipeline)]

        [ValidateNotNullOrEmpty()]

        [string]$String

    )

    #Write-Host "Cleaning string for XML parsing [String: $($String)]"

    $rPattern = "[^\x09\x0A\x0D\x20-\xD7FF\xE000-\xFFFD\x10000\x10FFFF]"

    $cleaned = $String -replace $rPattern, ''

    #Write-Host "Returning parsed string [String cleaned: $($cleaned)]"

    return $cleaned

}

Repair-XmlString (Get-Content $yourfile -Raw) |Set-Content $outputfile 

#/get-content -Raw -path $yourfile | out-file $outputfile -encoding utf8

Useful references for PowerShell

https://powershellexplained.com/2017-07-31-Powershell-regex-regular-expression/

https://stackoverflow.com/questions/45706565/how-to-remove-special-bad-characters-from-xml-using-powershell

https://stackoverflow.com/questions/70451059/on-converting-the-uft-8-xml-to-unicode-in-powershell-encoding-attribute-value

0 comments

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events