Michelle's "Five Stages of Word Import" Survival Guide

A great thing about the Atlassian Community: we are not alone in our struggles. 

I'd developed some "personal coping mechanisms" for using the Import Word Document feature in Confluence. When I joined the Atlassian Community, I learned that others had the same struggles, some on a much larger scale. So I'm sharing my personal "import survival guide," which has helped me move through the "denial" and "anger" stages of import to the more useful "bargaining" stage, even if I never make it all the way to full "acceptance." 

I don't have all the answers here. My way of "getting closure" is to concede that text-heavy content with simple images and formatting is the best match to the import tool's modest capabilities. Until something changes, having realistic expectations of the tool and taking some steps to prep documents increases my chances of getting a significant head start on building new pages. (It lowers my blood pressure too.)

Denial

I hadn't encountered any issues with importing Word documents until I helped a developer import a particularly stubborn one. My eyes were opened: Word import is both convenient AND confusing. I had to weigh the benefits of instant import against the need to clean up the new pages. Was it worth it? I investigated.

Anger

I experienced the same confusion and frustration as other users, though on a smaller scale. My opinion: we've been spoiled by Word's very robust formatting capabilities and our expectations are high. However, the importer's capabilities are very modest (and sometimes buggy). That's not a good combination! Since "lowering my expectations" isn't a solution, I investigated to see what I could actually do something about. 

Bargaining

In the "bargaining" stage of Word import, I discovered what risk factors existed and how I could mitigate them. This is the bulk of my survival guide.

General

  • Backup. I always make a backup first.
  • File type. Must be  .doc or .docx saved in MS Word. (Other apps such as LibreOffice may not encode the same)
  • File size. If the Word document is larger than Confluence's attachment size limit, I usually save large images to disk (see next section) then remove them from the Word doc. Other options include splitting the document into multiple documents or increasing the attachment file size limit.
  • Scan. I go through each Word document for problem areas and add text "bookmarks" such as CHECK_THIS or FIX_THIS to mark sections for review after import.

Images

Images get some extra attention, since images larger than 900 x 1200 pixels will stop the import process. Screenshots created on large monitors can easily exceed this limit. As suggested by the error message, one option might be to raise the size limit on imported images.

image2021-1-19_16-10-5.png

In addition, images may be resized by the import process; I personally think that converted images appear less crisp and sharp after import. I very often save images to disk and add them to the page afterwards.

  • I set Word to show image dimensions in pixels. In Word, I went to File > Options > Advanced > Display > checked the box for "Show pixels for HTML features". Now I can right-click on an image in Word and choose Size and Position. I can see what size the image is and whether it has been scaled down.
    image2021-1-19_15-32-45.png
  • Reset image size. I use the Reset button here to restore images to 100% size, which seems to improve post-import image quality. It's OK if an image extends past the margins of the document.
  • Large images. If the image is larger than 900 x 1200 pixels, I save it to a folder (right-click and choose "save as picture"), add an "INSERT_PICTURE" reminder with the filename, then remove it from the Word document. After the import, I insert it into the Confluence page. (There are other ways to extract the images from Word docs, such as by using macros or changing the file type to .zip.)

MS Word features, objects or complex formatting

Some MS Word features won't import, while others may return unexpected results. Your mileage may vary based on Confluence and/or Word versions. If I find these elements while scanning I bookmark them with CHECK_THIS or FIX_THIS, then I check them after import.

  • Marked-up images. (Images with added arrows, lines, boxes, callouts, etc.) These markup objects won't import. I either save the image to disk and mark it up in an image editor, or I take a screenshot of the marked-up image and add it to the page afterwards. 

    image2021-1-19_16-33-30.png
  • SmartArt won't import, so I take screenshots.
  • Charts won't import, so I take screenshots.
  • Icons import but I still check them.
  • Fonts don't appear to import. Bold and italic usually does.
  • WordArt and text boxes might get converted to plain text.
  • Tables of contents convert to lists of links that go to new anchors. I add a Table of Contents macro and check for headings set correctly. Sometimes I remove the anchors and sometimes I just leave them.
  • Footnotes and/or endnotes may become inline text. 
  • Comments may be omitted or become inline text.
  • Numbering of lists may change.
  • Numbered headers may change.
  • Heading levels may change (for example, heading 2 on a document becomes heading 1 if heading 1 has become the page title), so I review and/or fix.
  • Spacing and carriage returns may change so I scan for paragraph weirdness.
  • Symbols such as μ or characters inserted as "special characters" may or may not import correctly. 

Markup characters

If the Word document contains characters special to Confluence wiki markup such as - (dash), * (asterisk), _ (underscore), | (pipe),  or ! (exclamation point), etc. especially if found at the beginning of a line, they may be converted to their wiki markup value, or wrapped in a wiki markup macro to preserve the formatting. Pipe characters in particular can create strange-looking tables with cells and columns all over the place!

  • Markup characters. I add "FIX_THIS" reminders. Other options include adding a ' (apostrophe) at the beginning of the line, or finding/replacing ASCII characters with their entity numbers.

Here's what happened to a code comment box made with pipe characters after import:

image2021-1-19_16-37-43.png

When I added an apostrophe at the beginning of each line, I got this instead, which I can work with.

image2021-1-19_16-38-21.png

Headings

My investigations revealed that identical heading text plus specific import options could lead to overwriting existing pages without any warning! Yikes! So I take the following precautions.

  • I always, always choose "new page" for "Where to import."
  • I always, always choose "rename imported pages if exists" for "Title conflicts."
  • I usually split documents at headings 1, 2 or 3 to avoid nesting pages too deep.
  • Modifying headings slightly before import could reduce this risk as well, though I have not had to do that.

image2020-8-20_11-6-16.png

After import

  • I check the top-level pages to ensure that any text before the first heading (page split) has imported correctly. (Sometimes it works, sometimes it doesn't.)
  • I fix the page titles if necessary.
  • I move pages to the correct location in the page tree if necessary. (Sometimes imported pages end up at the root of the page tree.)
  • I tag or label each page as needed.
  • I set any page restrictions necessary.
  • I review and/or fix any issues I bookmarked earlier.
  • I share the newly imported pages for review and collaboration.

Depression

That's a lot of preventative measures, review and fixing. It doesn't scale. I'm sorry.

If I had a lot of Word documents to process, I would consider Confluence add-ons that can bulk import MS Word documents, such as the All-in-one File Importer for Confluence or the Import/Export Utility for Confluence. Or I would look for a converter, browser extension or MS Word plugin to convert the document type. If I find an answer to that dilemma, I'll share.

Acceptance

Some documents do import smoothly. On those days, I feel blessed.

Other times, I end up creating, copying and pasting content one page at a time. Advantages: new pages where I want them; formatting preserved better; I can strip formatting altogether and start over; I'm warned if a page title conflicts; and touching the source document provides a review opportunity. This is often the method I recommend, unless the plan is to rely on Confluence's collaborative authoring to share and refresh the document.

Fortunately, I don't import many Word documents, and the ones I do don't give me too much grief. At least I can now more realistically assess the amount of effort it will take, and can better calculate the return on investment of using the tool. Its limitations are a barrier to sharing via Confluence, so I very much look forward to future improvements. 

If readers have other tips for ensuring Word import success, I'd love to include them in the survival guide!

8 comments

Matt Reiner _K15t_
Marketplace Partner
Marketplace Partners provide apps and integrations available on the Atlassian Marketplace that extend the power of Atlassian products.
January 20, 2021

Fantastic article @Michelle Rau HP

Thanks for sharing all your knowledge with the community.

Like Dan Winkler likes this
Donna Marr January 20, 2021

Thanks @Matt Reiner _K15t_ and great article @Michelle Rau HP

One struggle that I am encountering is Server vs. Cloud capabilities of Confluence. In particular it seems there are limitations against usage of HTML with Cloud.

Based on this two questions:

- Is this article referencing Cloud or Server or formatting detailed is for both?

-Any other great references specific to Cloud and formatting capabilities (inclusive of HTML)? 

Thanks,

Donna 

Like Michelle Rau HP likes this
Michelle Rau HP
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
January 20, 2021

Thank you @Donna Marr My investigation so far has been in Data Center. I will compare functionality in my personal Cloud instance in the near future. I have gotten used to the add-ons we have at work and I'm not sure what I'll do without them in Cloud!

I too would appreciate a comparison chart that is more detailed regarding specific macros/features/functionality in the different flavors of Confluence. There is this one,  https://www.atlassian.com/migration/cloud/explore , but that is a fairly high-level comparison.

Like Darryl Lee likes this
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 25, 2021

There's a huge need for side-by-side comparisons of DC/Server vs Cloud features/usage. I think maybe I'll start a collaborative doc for this.

Like # people like this
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 25, 2021

Ok, I've created some very rough tables for people to contribute to:

Basically I did a quick copy/paste of the existing KB articles (referenced in the Notes column), and tried to boil things down to a binary yes/no for each feature.

I know there is much much more to this, down to very detailed differences. I welcome everyone's input (hopefully it doesn't get too out of control.)

Thank you!

Like Michelle Rau HP likes this
Darryl Lee
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 25, 2021
Like Michelle Rau HP likes this
Jay Maechtlen
Contributor
September 1, 2022

Thanks for the article. You'll find that for Word graphics, you can:

1) use a Canvas and place the image and its markup on that canvas.

2) Right-click the canvas and 'save as picture'.

3) import the picture back to Word.

This gives good rasterization and includes any border that the Canvas had.

Yeah, if the graphic took any effort to create, save it in a Word doc so you have the original editable version. Makes it much easier if you need to change anything later...

(Oh, yeah - Confluence ignores borders on graphics on import - but doing the above builds it into the image so it shows up in the resulting page.)

cheers

Jay

Jay Maechtlen
Contributor
September 2, 2022

Another aspect of Word import - links and cross-references:

1. External hyperlinks import and work fine

2. hyperlinks to bookmarks in the doc work fine.

3. cross-references to headings and such - {REF } and {PAGEREF } fail.
REF and PAGEREF links get the blue text, they look like links - but they're dead, Jim.

Only workaround I see it to insert a bookmark at each target and link to the bookmark.

(the bookmarks become anchors when the Word doc is imported)

(maybe I can do this with VBA, I have a LOT of cross-refs in my doc)

 

And one last thing: The links issues apply to docs embedded with the Office Word macro.

We're running Confluence 7.13.8

That's a LTS server version, not Cloud.

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events