file storage limits on confluence - feasibility

We have 350.000 files, distributed over 54.000 different folders for a total storage capacity of 750 GB.

  • Would Confluence be able to handle this amount of files (ie - with one page per folder)
  • What's the impact on the index (size, ...)
  • Anyone experience migrating this amount towards confluence. 

 

 

 

1 answer

A very interesting problem indeed!

A simple question to start with: Would each of these 54,000 pages benefit from collaboration that Confluence provides?  I mean to say, would you imagine each of these pages requiring further conversation above and beyond simply adding a comment here or there?

Next question would be around storage: would you imagine that Confluence would be storing these files?  Would creating a page with a link to each of the 54,000 folders be adequate enough to make the jump into the relevant content? Or would you see this content embedded in some way where the contents were visible?

All of the answers to the above would pick the platform(s) you might use, and subsequently the answer of index/performance/migration would become a bit more clear I think.

Thanks @Neal Riley

  1. usage: no, the majority of these 54000 pages will be just read only (or better download only) pages. Let's say that about 10% of those will be collaboration pages where people will use comments to cooperate.
  2. storage: again, I think that for most of the files it would be sufficient to have a hyperlink to jump to the relevant content. So let's assume that 70% of the content could live in another repo, and that 30% is stored in Confluence. What other file storage do you then refer to? Git, using LFS? Or more classic webdav linking?

 

 

Ok let's work backwards

  • Storage: The next question then is where/how do these files exist today, and who uses them?  This would dictate how one would store such a file repository.  I would caution not to use Confluence as a CDN out of the box.  Something like Bitbucket with the recent LFS support might be a good fit, but again it depends on how these files are (re)used outside of the Atlassian ecosystem.
  • Usage: Assuming the answers to the above, next would be to ask: Would you need to automatically create/update 5400 pages as files change/move etc.  Or would it be a better UX to make the process of creating a page (automatically linked to the file's current location) in Confluence so that collaboration can happen when the user needs it?  My gut says the second option would be most ideal, but this will depend on the customers requirements.

Neal,

storage: today these files live in a custom CMS. Now they want one big repo for their files and want to have collaboration. So basically seen the number of files, they want to use Confluence as a DMS.

If we would be using the Git plugin, or webdav integration, are the files than still indexed by Confluence, I mean could you still search within these files?

usage: it will be a deep nested hierarchy of pages: starting from a company homepage, than having different product families, than page per product, etc.

The numbers given above in the initial question where only for 1 product family (but it's the biggest one). They guess that the total amount of files and folders for all product families is about a factor 2 to 3 more.

Do you have numbers on the biggest Confluence install out there?

thanks

As per the Git plugin, I would check with the vendor (http://addons.avisi.com/git-for-confluence/documentation/) to see whether they have integrated Git all the way to the Lucene index for searching.  

Confluence inherently checks Page Content/etc. , and depending on which indexing module is enabled/disabled will scan certain file types.  One could write a custom extractor using the following information: https://developer.atlassian.com/confdev/confluence-plugin-guide/confluence-plugin-module-types/extractor-module as a plugin if the information that was needed was not automatically extracted.

54,000 pages, representing the largest product as you say, is quite large, but I still question whether the entirety of such as system would necessarily need to be migrated.  In fact, your example highlights a perfect reason why the "biggest Confluence install out there" is an slightly misleading metric: Performance is impacted by the size, complexity, overall use, the underlying infrastructure, etc. etc.  I would highly suggest that such an install like this would require Confluence Data Center, but this would need to be accurately determined further along in the testing phase.

 

Suggest an answer

Log in or Sign up to answer
Atlassian Community Anniversary

Happy Anniversary, Atlassian Community!

This community is celebrating its one-year anniversary and Atlassian co-founder Mike Cannon-Brookes has all the feels.

Read more
Community showcase
Kesha Thillainayagam
Posted Friday in Confluence

We want to hear how your non-technical teams are using Confluence!

Hi Community! Kesha (kay-sha) from the Confluence marketing team here! Can you share stories with us on how your non-technical (think Marketing, Sales, HR, legal, etc.) teams are using Confluen...

278 views 11 10
Join discussion

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you