Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,369,313
Community Members
 
Community Events
168
Community Groups

Migrating Data from confluence to google Drive

Hi,

 

My team is currently planning to move our documentation to the shared drive. I tried searching for an add-on or a way to do it in one go but unfortunately, I haven't found any.

 

Is there a way to migrate data from confluence to google drive in one go?

3 answers

What I did :

- retrieve the whole space in HTML format

- use Pandoc to convert your HTML files to .docx. Do it in the folder where you unzipped the archive containing the Confluence files, so that dhe .docx conversion will include all the images and links

A nice Windows script is available here : https://gist.github.com/pagelab/fa73b9ee0263a7d25ae7eba420ad8f67

Pandoc : https://pandoc.org/

 

- last thing, make a script to recreate the folder hierarchy - can't post it here

If you want to move our documents from Atlassian to Google Drive, then:

  1. Click Space Settings from the sidebar menu in Confluence. 
  2. Click Content Tools
  3. Click Export
  4. Pick HTML and click Next.
  5. Choose Custom Export
  6. Download Zip File on desktop
  7. Expand Zip File
  8. Open Google Drive and create folder
  9. Copy expanded zip files to Google drive folder
  10. Right click html file and select "Open with" > "google docs" option
  11. Open index.html first. This creates your file tree that each file references
  12. Delete HTML version converted after each "open with" (you still have export on desktop)

This is laborious if you have a ton of files, but will get the job done.

Confluence, please be more transparent and helpful. There are times where it is important to move these files. It could be company edict, financial, whatever! If you create more reasons to stay on the platform, there should be less risk in doing so.

Thank's for your help, Frank! 

I have an error regarding the document references. 

I opened the index.html first with the option "open with google docs" and then i converted the other .html-Files

The link tree tries to references the pages inside Google Docs but the path isn't correct and endet up in 404 Pages. Am i missing here something? 

I also tried the other way. First converting all other html-files and then the index.html. Same result... 

Thanks 
Dennis

0 votes

Hi @Anand Vardhan 

How do want the export of data? 

  • PDF,
  • HTML,
  • XML ?

Confluence has a built in Export allowing you to pick the data you want and to export it in the 3 formats above. 

 

-Mike

@mike  Could you please guide me on how to do the bulk data export? Also, the preferred format is .doc or .docx

Hi @Anand Vardhan 

You can't export to .doc or .docx, the three choices are PDF, HTML, XML. 

To get the Export section do the following: 

  1. Click Space Settings from the sidebar menu in Confluence. 
  2. Click Content Tools
  3. Click Export
  4. Pick HTML or XML or PDF and click Next.
  5. Pick Normal Export (Generates a PDF file for each page in this space, excluding blogs, comments, and attachments) or Custom Export (Generates a PDF file of selected pages based on options that you choose from below).
  6. If Custom Export you choose which pages you want, or select all or deselect all.

 

-Mike

What a disappointing reply.

Confluence should support the mass export of Word files so that its customers can more easily migrate away from the service is required.

This sort of 'lock in' can only generate bad feeling.

Like # people like this

Same question for us, same disappointment.

Like Dennis Born likes this

The same for us. 

The silence from Atlassian on the comments is deafening! 

Guys, I think you're being a little difficult. .docx is a proprietary format that is controlled by an external party. The other formats, PDF, XML and HTML are all open formats and you can actually take those, extract needed information and get a .docx format if you needed to.

Especially the XML and HTML formats are good as they are easy to import into other applications, whereas DOCX is not, and subject to non-sanctioned changes, that Atlassian then has to support.

My personal opinion: The supported options check all the boxes of open formats, export is possible, and ... you're not running a professional organisation if you believe that your knowledge repository is better served in static word pages from Microsoft. 

@Simon Kaastrup-Olsen  You are wrong. DOCX is not proprietary format and Implemented close to ISO standard of open xml. So Atlassian don't need to support it and that's actually much easier to do export to docx than to pdf(which atlassian have). And with docx that much easier to move documents between google doc, Onedrive and so on as it possible to edit the documents in the clouds.

Export to Markdown should be the chosen path here.  Atlassian should offer export to Markdown.

@Ilya Mokin, while the XML encoding might be open, I don't see how the schema is. E.g. MS could rename, add or remove any XML elements (<toc></toc>) it wants without prior "standards" request-for-comment process. IF DOCX schema/document structure is open standards, then there should be a standards document with versioning, like RFC standards documents.

 

While I wish there was an export to GDoc, I'm glad that there is at least a mass export to PDF. Just cut-paste when one needs to edit the doc.

 IF DOCX schema/document structure is open standards, then there should be a standards document with versioning, like RFC standards documents.

yes...

https://docs.microsoft.com/en-us/openspecs/office_standards/ms-docx/b839fe1f-e1ca-4fa6-8c26-5954d0abbccd

Like Van Nguyen likes this

Suggest an answer

Log in or Sign up to answer
TAGS

Atlassian Community Events