We have 350.000 files, distributed over 54.000 different folders for a total storage capacity of 750 GB.
A very interesting problem indeed!
A simple question to start with: Would each of these 54,000 pages benefit from collaboration that Confluence provides? I mean to say, would you imagine each of these pages requiring further conversation above and beyond simply adding a comment here or there?
Next question would be around storage: would you imagine that Confluence would be storing these files? Would creating a page with a link to each of the 54,000 folders be adequate enough to make the jump into the relevant content? Or would you see this content embedded in some way where the contents were visible?
All of the answers to the above would pick the platform(s) you might use, and subsequently the answer of index/performance/migration would become a bit more clear I think.
Thanks @Neal Riley
Ok let's work backwards
storage: today these files live in a custom CMS. Now they want one big repo for their files and want to have collaboration. So basically seen the number of files, they want to use Confluence as a DMS.
If we would be using the Git plugin, or webdav integration, are the files than still indexed by Confluence, I mean could you still search within these files?
usage: it will be a deep nested hierarchy of pages: starting from a company homepage, than having different product families, than page per product, etc.
The numbers given above in the initial question where only for 1 product family (but it's the biggest one). They guess that the total amount of files and folders for all product families is about a factor 2 to 3 more.
Do you have numbers on the biggest Confluence install out there?
As per the Git plugin, I would check with the vendor (http://addons.avisi.com/git-for-confluence/documentation/) to see whether they have integrated Git all the way to the Lucene index for searching.
Confluence inherently checks Page Content/etc. , and depending on which indexing module is enabled/disabled will scan certain file types. One could write a custom extractor using the following information: https://developer.atlassian.com/confdev/confluence-plugin-guide/confluence-plugin-module-types/extractor-module as a plugin if the information that was needed was not automatically extracted.
54,000 pages, representing the largest product as you say, is quite large, but I still question whether the entirety of such as system would necessarily need to be migrated. In fact, your example highlights a perfect reason why the "biggest Confluence install out there" is an slightly misleading metric: Performance is impacted by the size, complexity, overall use, the underlying infrastructure, etc. etc. I would highly suggest that such an install like this would require Confluence Data Center, but this would need to be accurately determined further along in the testing phase.
Connect with like-minded Atlassian users at free events near you!Find a group
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no AUG chapters near you at the moment.Start an AUG