How LinkedIn does Content Lifecycle Management (12 Steps)

LinkedIn, the world's largest business network has an exceptionally large Confluence instance. In their case, large means millions of pages and thousands of spaces. In this Showcase, we're focusing in on how LinkedIn uses an Atlassian Marketplace App, Archiving Plugin for Confluence, to implement proper Content Lifecycle Management. This Showcase is contributed to by both LinkedIn and Midori (makers of the Archiving Plugin), as they want to jointly share their findings with companies facing the same problems of using Confluence at a large, enterprise scale. 

The problem

LinkedIn's internal Confluence instance is used by 16,000 employees and consultants. It contains 120,000 unique pages, and 55,000 of them have not been updated for 2 years or longer. We waited too long (10 years) to deploy an archiving strategy at LinkedIn, and the user frustration level was very high.

Here are some deployment tips and lessons learned based on their experience with Archiving Plugin 4.3.0 running in Confluence 5.7.4. 

Tip 1: Install the Archiving Plugin as soon as possible

Don't put it off. The sooner employees are accustomed to automatic archiving the better. As junk pages pile up over time, the archive loses its credibility and usefulness.

content-quality.png

Tip 2: First use a staging test instance for a dry run

Doing this gives you a preview of an archived space, identifies any issues or challenges, gives management a picture of what archiving looks like (assists with justifying the cost), and refines your deployment game plan.

In a future plugin versions we plan to offer an actual dry run feature.
Although it will not completely replace the staging environment experiments, it will work like this: "show me what would happen if I ran the archiving job with the current settings - but do not actually archive anything!".
Dry runs will significantly reduce the efforts to find the best archiving settings for your content.

Aron Gombas, lead developer of the Archiving Plugin

Tip 3: Phase in archiving gradually

Start small. Don't surprise your users by archiving hundreds of pages at once.

LinkedIn archived their space over several batches like this:

  1. Batch 1: 60 pages, small spaces 1 (a lightweight rollout)
  2. Batch 2: 100 pages, small spaces 2
  3. Batch 3: 1000 pages, medium spaces 1
  4. Batch 4: 1300 pages, medium spaces 2
  5. Batch 5: 1800 pages, medium spaces 3
  6. Batch 6: 1000 pages, medium spaces 4
  7. Batch 7: 2900 pages, large spaces 1
  8. Batch 8: 2900 pages, large spaces 2
  9. Batch 9: 2000 pages, large spaces 3
  10. Batch 10: 1400 pages, large spaces 4
  11. Batch 11: 7600 pages, jumbo space 1
  12. Batch 12: 32,000 pages, jumbo space 2

Tip 4: Archive large spaces in small batches

At LinkedIn, they have one Engineering space containing 70,000 pages. 32,000 of these had to be archived. In a staging environment (with the mail server disabled), they attempted to expire and archive all the (32,000!) pages at once. Unfortunately, it took several weeks for the plugin to run and crashed the staging space several times. They changed their approach and decided to expire/archive pages using smaller date ranges. In the end, a 2,000-page archiving job took approximately 1.5 days to run:

  1. Archive in 1-year batches:
    1. 2920+ days (8+ years), 308 pages
    2. 2555+ days (7+ years), 1500 pages
    3. 2190+ days (6+ years), 1520 pages
    4. 1825+ days (5+ years), 2468 pages
  2. Archive in 6-month batches:
    1. 1642+ days (4.5+ years), 2000 pages
    2. 1460+ days (4+ years), 2000 pages
    3. 1277+ days (3.5+ years), 3210 pages
  3. Archive in 3-month batches:
    1. 1186+ days (3.25+ years), 2500 pages
    2. 1095+ days (3+ years), 2500 pages
    3. 1003+ days (2.75+ years), 3162 pages
    4. 912+ days (2.50+ years), 3625 pages
    5. 821+ days (2.25+ years), 3655 pages
    6. 760+ days (2+ years), 2862 pages

Pre-5.1.0 plugin versions implement the following archiving strategy: copy the page to the archive space, then trash that in its original space. (Note that it requires replicating the page content, its comments, its labels and even its attachments.) This strategy is resource intensive, although the heaviest parts are done in Confluence core, not in the plugin.
5.1.0 and newer versions offer an alternative, the "move" strategy. As the name suggests, instead of copying data, it moves that, resulting in improved performance. (We also kept the "copy and trash" strategy as an option, which also has its own merits).

Tip 5: Run the plugin during off hours

Linkedin measured a 30% spike in CPU usage during large archiving jobs (2,000 pages and greater). To save CPU and memory resources, they decided to run the plugin on scheduled intervals when most employees are not using their Confluence instance: Saturdays at 3:00 A.M.

The "move" strategy mentioned previously decreases this load. In addition, the archiving procedure has been fully rewritten in plugin version 6.0.0, optimizing the add-on for robustness and scalability. Nevertheless running the job out of regular working hours is a good idea, to avoid conflicts.

Tip 6: Beware of email filters

Many employees filter the mail sent by Confluence, so they might not see the expiration emails sent by the plugin. If you are the person running the plugin, copy yourself as a Space Admin on all messages. You'll be able to reference a paper trail of communication if needed.

You could alternatively select any user as "supervisor". It does not require changing the space admin permissions.

status-indicator.png

Tip 7: Remove space watchers before they hate you

When the plugin moves pages into the archive, all space watchers are email notified. Contact these users and make them aware of the issue. No space watchers want to receive 10,000 emails in their inbox!

Email flooding should not occur in more recent plugin versions.
We invested major efforts in suppressing Confluence's built-in notifications.

Tip 8: Give employees enough time to keep pages

LinkedIn's strategy was to expire pages, wait for 30 days, and then archive them.

Tip 9: Brace yourself for pack rats

Some employees do not want their pages archived, regardless of the age or value of the content.

Tip 10: Prepare to fix or remove broken links everywhere

First, do a dry run in a staging environment (not production). This approach enables you to preview all the broken links and pages that must be fixed after the Archiving Plugin has completed.

Tip 11: Create an FAQ support page

In your customized Velocity email template, include a link to an archiving support page with FAQs and contact information.

Tip 12: Empty the trash

After a space had been archived successfully, LinkedIn waited for 2 months then emptied the space's trash. Keeping only archived pages (not trashed copies) is sufficient for page restoration. With the now default "move" strategy it's obviously not necessary anymore, as the archived pages are efficiently moved to their new home in their archive space.

Cheers! We hope you enjoyed this Showcase. If you have any questions for Midori, makers of Archiving Plugin, please comment below. 

0 comments

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events