Import of existing Documentation: Best Practice for Automation, Administration and Naming of Images

Steffen Heller
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 7, 2014

Hello,

I have a huge task before me:

We have an existing documentation with appr. 300 pages and 500 images that must be transferred from another content management system ("Help&Manual") into confluence.

What is important, is the existing structure of that documentation:
In the current form all the images lie in one separate directory from where they are linked to the pages where they appear. With this, updating images is very easy. Instead of browsing through the pages to search for outdated images, my collegue just jumps to that image directory, checks the images, uploads new versions and all the rest happens automatically.

What I am searching for is a process that allows me to achieve the following:

  1. Automatically transfer the existing documentation into confluence
  2. Preserve the existing page structure (chapters, subchapters, pages, etc.)
  3. Have all the images of the documenation in one central space (from where they are automatically linked to the pages)
  4. Preserve the existing names of all that images (e.g. keep names like "DeviceA_FunctionB.png" and not change it into something like "worddavb444d7b3bded254698986b55a645d015.png")

I am aware of two methods that would help me achieve one half of what I want:

  • With the "Import Word Document" I can transfer the existing documentation. (I have a Word version of this documentation available and I have done that before with other documents. So that would work.) -> Point 1 + 2
  • For the images I could create one page that functions as a "container", attach all the images to that one page and then insert them into other pages with a link to that container page. -> Point 3 + 4

BUT:

I cannot combine these two methods:

  • When I do a Word import, the images are attached to all those different pages where they appear and the naming of the images is completely destroyed (the "worddavb44....a645d015.png" example above)
  • Creating a container page must be done "by hand" and I don't know any way to combine that with an automated approach

So what I want should be clear. I am searching for something that is better than...

  • ...doing a Word import and then afterwards check hundreds of pages and images and correct the source file of it or
  • ...creating a container page and afterwards split it apart, create new pages and copy and paste all the pages' content by hand

Would be great if someone came up with some ideas.

Thanks, Steffen

--------------------------

We are using confluence (Download), Version 4.3.7. The other CMS is "Help&Manual".
What is available are all the image files, all the .mnl text files (the Help&Manual format) and one big Word files with the complete documentation, exported from the old CMS.

3 answers

0 votes
Steffen Heller
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 7, 2014

Hi Discountrobot,

thanks for explaining. I am afraid that is something where I am missing the needed experience. I will browse for some of the keywords but I think this will be a step too big for now.

0 votes
Steffen Heller
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 7, 2014

I must correct myself.

The .mnl files are not the main text files. I just recognized, that the content is stored in normal .xml files. Additionally, the whole content is also available as .htm files (the CMS can also be used to create .chm help files). What the .mnl files do, I don't know. That must be some generic file type.

The thing is, I do have all the original files available but I haven't even installed the CMS software to open them. My plan didn't include doing too much with these files anyhow. I was hoping to get away with a Word export/import.

But if there is a better approach I could change that plan.

@Discountrobot:

What kind of folder structure do you mean?

The finished document has a normal chapter hierarchie with three levels of headings.
In confluence I would like to rebuild that structure with pages/subpages/subsubpages and a "container" page for the images.
The Help&Manual files are spread over lots of different folders. Obviously with different folders for xml, html, images and others. But I don't know exactly how that interacts.

discountrobot
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 7, 2014

Alright. I don't have any experience with the Word importer so i cannot help you with that.

My initial thought was to fetch the files in some format (.xml .mnl) and parse them accordingly with something like the following steps

  1. Parse the tree structure
  2. Create space with a remote API (i prefer the XMLRPC)
  3. Parse each page and it's subchildren and resolve image urls
  4. Create the page with a remote API and attatch said images

Not sure how the link structure is for linking to images residing on a different page, but i'd create a '_resource' page decoupled from the space tree and reference images from there.

I've done something very similar when creating a parser for the dita DTD.

0 votes
discountrobot
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 7, 2014

can you provide a sample of the .mnl format and maybe a example of your folder structure?

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events