LinkedIn, the world's largest business network has an exceptionally large Confluence instance. In their case, large means millions of pages and thousands of spaces. In this Showcase, we're focusing in on how LinkedIn uses an Atlassian Marketplace App, Archiving Plugin for Confluence, to implement proper Content Lifecycle Management. This Showcase is contributed to by both LinkedIn and Midori (makers of the Archiving Plugin), as they want to jointly share their findings with companies facing the same problems of using Confluence at a large, enterprise scale.
LinkedIn's internal Confluence instance is used by 16,000 employees and consultants. It contains 120,000 unique pages, and 55,000 of them have not been updated for 2 years or longer. We waited too long (10 years) to deploy an archiving strategy at LinkedIn, and the user frustration level was very high.
Here are some deployment tips and lessons learned based on their experience with Archiving Plugin 4.3.0 running in Confluence 5.7.4.
Don't put it off. The sooner employees are accustomed to automatic archiving the better. As junk pages pile up over time, the archive loses its credibility and usefulness.
Doing this gives you a preview of an archived space, identifies any issues or challenges, gives management a picture of what archiving looks like (assists with justifying the cost), and refines your deployment game plan.
In a future plugin versions we plan to offer an actual dry run feature.
Although it will not completely replace the staging environment experiments, it will work like this: "show me what would happen if I ran the archiving job with the current settings - but do not actually archive anything!".
Dry runs will significantly reduce the efforts to find the best archiving settings for your content.
Aron Gombas, lead developer of the Archiving Plugin
Start small. Don't surprise your users by archiving hundreds of pages at once.
LinkedIn archived their space over several batches like this:
At LinkedIn, they have one Engineering space containing 70,000 pages. 32,000 of these had to be archived. In a staging environment (with the mail server disabled), they attempted to expire and archive all the (32,000!) pages at once. Unfortunately, it took several weeks for the plugin to run and crashed the staging space several times. They changed their approach and decided to expire/archive pages using smaller date ranges. In the end, a 2,000-page archiving job took approximately 1.5 days to run:
Pre-5.1.0 plugin versions implement the following archiving strategy: copy the page to the archive space, then trash that in its original space. (Note that it requires replicating the page content, its comments, its labels and even its attachments.) This strategy is resource intensive, although the heaviest parts are done in Confluence core, not in the plugin.
5.1.0 and newer versions offer an alternative, the "move" strategy. As the name suggests, instead of copying data, it moves that, resulting in improved performance. (We also kept the "copy and trash" strategy as an option, which also has its own merits).
Linkedin measured a 30% spike in CPU usage during large archiving jobs (2,000 pages and greater). To save CPU and memory resources, they decided to run the plugin on scheduled intervals when most employees are not using their Confluence instance: Saturdays at 3:00 A.M.
The "move" strategy mentioned previously decreases this load. In addition, the archiving procedure has been fully rewritten in plugin version 6.0.0, optimizing the add-on for robustness and scalability. Nevertheless running the job out of regular working hours is a good idea, to avoid conflicts.
Many employees filter the mail sent by Confluence, so they might not see the expiration emails sent by the plugin. If you are the person running the plugin, copy yourself as a Space Admin on all messages. You'll be able to reference a paper trail of communication if needed.
You could alternatively select any user as "supervisor". It does not require changing the space admin permissions.
When the plugin moves pages into the archive, all space watchers are email notified. Contact these users and make them aware of the issue. No space watchers want to receive 10,000 emails in their inbox!
Email flooding should not occur in more recent plugin versions.
We invested major efforts in suppressing Confluence's built-in notifications.
LinkedIn's strategy was to expire pages, wait for 30 days, and then archive them.
Some employees do not want their pages archived, regardless of the age or value of the content.
First, do a dry run in a staging environment (not production). This approach enables you to preview all the broken links and pages that must be fixed after the Archiving Plugin has completed.
In your customized Velocity email template, include a link to an archiving support page with FAQs and contact information.
After a space had been archived successfully, LinkedIn waited for 2 months then emptied the space's trash. Keeping only archived pages (not trashed copies) is sufficient for page restoration. With the now default "move" strategy it's obviously not necessary anymore, as the archived pages are efficiently moved to their new home in their archive space.
Cheers! We hope you enjoyed this Showcase. If you have any questions for Midori, makers of Archiving Plugin, please comment below.
Calling all Confluence Cloud Admins! We created a new Community Group to support your unique needs as Confluence admins. This is a group where you can ask questions, access resou...
Connect with like-minded Atlassian users at free events near you!Find an event
Connect with like-minded Atlassian users at free events near you!
Unfortunately there are no Community Events near you at the moment.Host an event
You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events