How can I handle special characters when importing Word documents?

We have a set of old Microsoft Word documents (primarily .doc) that we want to import into Confluence (5.9.12).  Most content imports OK, but Word's "special characters" that were inserted as symbols do not.  For example, a μ (mu) symbol from these documents shows up as  in the Confluence web interface.  I can import a test .doc file with both a proper unicode μ and the non-functional  from the word documents.  The unicode works where Word's "symbol" doesn't.  So, it seems to be a problem of handling whatever Microsoft Word is doing when it stores these special characters.  Does anyone know of a way that Confluence could handle this, or failing that, that we could convert these goofy characters into their unicode equivalents before importing?

A bit more gory detail on my troubleshooting:

If I copy and paste the  into a text file and check the contents on that one-character file byte-for-byte, I see 0xef81ad, which matches what I get if I copy the character directly from the Word document.  I can also do a manual search-and-replace in the Confluence web interface for that specific  (literally pasting in the box symbol) and put μ in its place, and the replacement leaves alone the other identical-looking but different special characters (like for a degree symbol).  So it does seem that Confluence has all the information after importing, the display is just garbled since it doesn't know that 0xef81ad should be shown as a mu character.  I'm playing around with the XML-RPC API to see if I can do a batch search-and-replace, but then I still need to figure out all the possible characters we'd run into and make sure I can actually get at that weird text via the API.

Thanks in advance for any ideas,


2 answers

0 votes
Ann Worley Atlassian Team Jul 11, 2017

Are you using MySQL as the underlying database for your Confluence instance? If so, you could be impacted by: MySQL databases incapabable of handling 4byte UTF-8 Characters. Confluence should handle this gracefully

It looks like some folks experienced Word import issues due to database collation as well:


Thanks Ann.  We're on MariaDB but configured for UTF8.  These symbols look like three bytes and I don't have any errors matching "Incorrect string" in the logs, so I don't think it looks like we're suffering from that problem.

Everything's running smoothly right up until the special characters are presented to the web browser, but then it has no knowlege of how to render it.  When I looked into it more just now, I found that Word is apparently using one of the "private use areas" in unicode, which are by definition left undefined.  Here's that character being used by Word's mu symbol:

So it seems like any attempt to import these special characters would need to understand what Microsoft's custom encoding is to handle them properly.  Any chance Confluence's import feature can do that?  Or are we stuck with some kind of search-and-replace to get "real unicode"?  Thanks!

I also put in a ticket -- CSP-208778 -- before I saw your reply here, thinking this was more likely something Atlassian could help us with directly, but thanks for the quick response on this side.

My suport request ticket led to a bug ticket, so it looks like this is a limitation of the import process after all, whatever the database used:

So I suppose the answer to my question for the time being is, perform a search-and-replace on any strange characters, at least until issue 52857 is fixed somehow.  Thanks for your help!

Ann Worley Atlassian Team Jul 13, 2017

Thank you so much for circling back to let the Community know the outcome!

Suggest an answer

Log in or Sign up to answer
Community showcase
Published Mar 12, 2019 in Confluence

Confluence Admin Certification now $150 for Community Members

More and more people are building their careers with Atlassian, and we want you to be at the front of this wave! Important Dates Start the Certification Prep Course by 2 April 2019 Take your e...

286 views 2 11
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you