Import of existing Documentation: Best Practice for Automation, Administration and Naming of Images

Hello,

I have a huge task before me:

We have an existing documentation with appr. 300 pages and 500 images that must be transferred from another content management system ("Help&Manual") into confluence.

What is important, is the existing structure of that documentation:
In the current form all the images lie in one separate directory from where they are linked to the pages where they appear. With this, updating images is very easy. Instead of browsing through the pages to search for outdated images, my collegue just jumps to that image directory, checks the images, uploads new versions and all the rest happens automatically.

What I am searching for is a process that allows me to achieve the following:

  1. Automatically transfer the existing documentation into confluence
  2. Preserve the existing page structure (chapters, subchapters, pages, etc.)
  3. Have all the images of the documenation in one central space (from where they are automatically linked to the pages)
  4. Preserve the existing names of all that images (e.g. keep names like "DeviceA_FunctionB.png" and not change it into something like "worddavb444d7b3bded254698986b55a645d015.png")

I am aware of two methods that would help me achieve one half of what I want:

  • With the "Import Word Document" I can transfer the existing documentation. (I have a Word version of this documentation available and I have done that before with other documents. So that would work.) -> Point 1 + 2
  • For the images I could create one page that functions as a "container", attach all the images to that one page and then insert them into other pages with a link to that container page. -> Point 3 + 4

BUT:

I cannot combine these two methods:

  • When I do a Word import, the images are attached to all those different pages where they appear and the naming of the images is completely destroyed (the "worddavb44....a645d015.png" example above)
  • Creating a container page must be done "by hand" and I don't know any way to combine that with an automated approach

So what I want should be clear. I am searching for something that is better than...

  • ...doing a Word import and then afterwards check hundreds of pages and images and correct the source file of it or
  • ...creating a container page and afterwards split it apart, create new pages and copy and paste all the pages' content by hand

Would be great if someone came up with some ideas.

Thanks, Steffen

--------------------------

We are using confluence (Download), Version 4.3.7. The other CMS is "Help&Manual".
What is available are all the image files, all the .mnl text files (the Help&Manual format) and one big Word files with the complete documentation, exported from the old CMS.

3 answers

can you provide a sample of the .mnl format and maybe a example of your folder structure?

I must correct myself.

The .mnl files are not the main text files. I just recognized, that the content is stored in normal .xml files. Additionally, the whole content is also available as .htm files (the CMS can also be used to create .chm help files). What the .mnl files do, I don't know. That must be some generic file type.

The thing is, I do have all the original files available but I haven't even installed the CMS software to open them. My plan didn't include doing too much with these files anyhow. I was hoping to get away with a Word export/import.

But if there is a better approach I could change that plan.

@Discountrobot:

What kind of folder structure do you mean?

The finished document has a normal chapter hierarchie with three levels of headings.
In confluence I would like to rebuild that structure with pages/subpages/subsubpages and a "container" page for the images.
The Help&Manual files are spread over lots of different folders. Obviously with different folders for xml, html, images and others. But I don't know exactly how that interacts.

Alright. I don't have any experience with the Word importer so i cannot help you with that.

My initial thought was to fetch the files in some format (.xml .mnl) and parse them accordingly with something like the following steps

  1. Parse the tree structure
  2. Create space with a remote API (i prefer the XMLRPC)
  3. Parse each page and it's subchildren and resolve image urls
  4. Create the page with a remote API and attatch said images

Not sure how the link structure is for linking to images residing on a different page, but i'd create a '_resource' page decoupled from the space tree and reference images from there.

I've done something very similar when creating a parser for the dita DTD.

Hi Discountrobot,

thanks for explaining. I am afraid that is something where I am missing the needed experience. I will browse for some of the keywords but I think this will be a step too big for now.

Suggest an answer

Log in or Sign up to answer
Community showcase
Posted Oct 24, 2018 in Confluence

Atlassian Research opportunity with Confluence templates

Do you use templates with Confluence? Take part in a remote 1-hr workshop. You'll receive USD $100 for your time!   We're looking for people to participate in a   remote 1-hr workshop...

1,168 views 20 14
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you