Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,330,545
Community Members
 
Community Events
169
Community Groups

How to retain Word document formatting when converting to Confluence?

Hi community,

We have many MS Word documents (most in the 10-30 page range) that we'd like to convert to Confluence pages, but the resulting pages lose quite a bit of the formatting and would create a huge amount of manual clean up for us.  These word files include tables (with and without merged cells), images, numbered lists, bullet lists with hierarchies, and more.  Specifically, the loss of text alignments and tabs will likely cause us the most aggravation since the spacing in most of these documents is critical. Copy/pasting tab characters seems to work well enough, but copy/pasting has other limitations that might be just as bad.

I'm not a programmer, but I thought that we might have better results if, rather than importing or copy/pasting, we convert the word files to code (HTML or XML or ???) and then use a source code editor plugin to copy it into a Confluence page's source code, but that keeps throwing errors when we try it - I guess because the languages don't exactly match up.  Maybe this could work if we did it the right way or made a couple tweaks to the process?

So in a nutshell, my question is: what's the best way nowadays to convert word documents into Confluence pages so that you lose as little content/formatting as possible?  If initial set up takes a while but creates a repeatable process, it would be worth it because we have many documents.

Thanks so much for any help!

2 answers

1 accepted

1 vote
Answer accepted
James Dellow Community Leader May 03, 2019

Confluence stores content in what they call Confluence storage format - it is 'XHTML-based', but not pure XML or normal HTML, as it contains special tags related to Confluence functionality.

Some of the issues could be CSS related too, particularly for text alignments and tabs.

What do your documents look like if you use a stand alone Word to HTML converter? Try using one that is designed to help people publish content drafted in Word so it can be copied into a generic Web Content Management System - they'll strip some the incompatible formatting from Word.

But if this is causing you enough pain and retaining the formatting is important to your business, I would consider engaging a developer to help solve this.

Thanks James.  I did only some limited research into converters thinking that the Save As feature in Word basically did the same thing, but that is not at all true as I'm finding out! 

Today I tried https://documentconverter.pro/ and both the desktop app and online app are working much better than other methods so far.  The desktop app also has the option to do multiple files at once which will certainly come in handy.  If anyone has recommendations for the most useful word to html converters, I'm all ears.

Thanks James!

0 votes
Bill Bailey Community Leader May 04, 2019

First thing to keep in mind, that this is HTML, so you have to think in what is possible. For example, tabs are a foreign concept in HTML. And you shouldn't really be using them much in Word either (too many people use Word like a typewriter).

And even if your try to use HTML to paste into the source editor, even if it doesn't through errors, it will often strip out most manual formatting.

The editor in Confluence is limited by design. But there are things you can replicate with custom CSS and user macros. But it will take some work. And if you want to control the format tightly then you need to be come a power user.

Bottom line, you will have to stop thinking in Word and move to thinking in Confluence and using its macros and methods for formatting content.

Yes, that's the idea.  It's just the conversion to Confluence that is the challenge now.  Once we're up and running, we plan on making full use of Confluence's macros and other formatting features.  Thanks Bill for the input.

Bill Bailey Community Leader May 06, 2019

Generally my process for Word docs is to import them, then use Regex in the Source Editor to clean out all the low-level formatting, then go from there.

Bill_Bailey. 
I found myself on the same situation as Miguel. 
we have a very large amount of existing documents that we are looking forward to import into confluence. But Format is very important. 
could you please elaborate on your comment about your word process import process using Regex? 
Thanks in advance. 

Bill_Bailey. 
I found myself on the same situation as Miguel or Jaime
We too have a large amount of existing documents that we are looking forward to import into confluence. But Format is very important. 
could you please elaborate on your comment about your word process import process using Regex? 

Or any other workaround please
Thanks in advance. 

Bill Bailey Community Leader Aug 21, 2020

There is in a source editor you can install, that give you a source editor. I think use the source editor to use Regex to clean up the imported HTML. It is best to start with clean HTML when working with imported content.

Once you have clean HTML, you can adjust the formatting using Confluence tools.

Suggest an answer

Log in or Sign up to answer
TAGS
Community showcase
Published in Confluence

Collaboratory Video Series: Confluence Your Way

Hey there, collaborators! We're continuing on with Work and Wellness with a special video series focused on team connection and collaboration, wherever and whenever.  Introducing Confluence ...

96 views 1 9
Read article

Atlassian Community Events