Automating Confluence PDF Exports for Offline Access and Archiving on External Storage Systems

November 22, 2025

Dear Atlassian-Community,

many teams rely on Confluence Cloud as their single source of truth. But what happens when Confluence is down or users need content in other platforms like SharePoint or cloud file systems?

In this post, I’ll share a pattern we use to automatically export selected Confluence Cloud pages as PDFs using Scroll PDF Exporter, and then sync those files to external storage such as Microsoft SharePoint, AWS EFS, Azure Storage, or Google Cloud Filestore. The goal: keep critical documentation available and readable, even when Confluence isn’t.

Core idea: a governed “Master Page”

The solution is driven by a single Master Page in Confluence:

Only a small group of owners can edit it.
They maintain links to the pages that should be exported.
The automation reads this list and takes care of exporting and syncing.

This keeps control with content owners while avoiding ad‑hoc, manual exports.

How the automation works

On a Linux VM, a scheduled job (cron) runs a small toolchain:

Confluence REST API
- Resolve the Master Page.
- Collect linked/child pages and their IDs.
Scroll PDF Exporter REST API
- Trigger a PDF export job for the selected pages.
- Store the resulting PDFs locally on the VM.
Sync script (Python)
- Scan the export directory.
- Push new/updated files to your targets, for example:
  - SharePoint libraries
  - AWS EFS
  - Azure Storage
  - Google Cloud Filestore

You can control the schedule (e.g., hourly/daily) depending on how “fresh” the exported content needs to be.

What problems this solves

Offline access during Confluence downtime
Runbooks, procedures, and escalation guides are available as PDFs in external storage if Confluence is under maintenance or unavailable.
Long‑term archiving
Spaces or specific pages can be exported on a schedule, creating a consistent, auditable PDF archive in external repositories.
Multi‑platform availability
Teams working primarily in Microsoft 365, AWS, or GCP can access key Confluence content in their own ecosystem, without always jumping into Confluence.

Governance and next steps

Because exports are driven by the Master Page and a service account, you can:

Define exactly which content is allowed to leave Confluence.
Align schedules and retention with compliance or business needs.

Greetings,

Alex

6 comments

Really appreciate you sharing this pattern @Alexander Nilsson especially the governed Master Page approach! What you described toward the end, the model of defining exactly what content is allowed to leave Confluence is spot-on for keeping things compliant and predictable while also being able to allow for offline access and archiving.

If you ever want to layer in stronger governance around what gets exported (age of pages, classifications, required reviews, retention rules, etc.), apps like Content Retention Manager for Confluence (Opus Guard) and Better Archiving (Midori) can handle that automatically before anything ever leaves the system. It’s a great complement to solve that last nuance you brought up.

Great post, thanks for sharing it with the community!

Hi Alexander,

thanks a lot for sharing this pattern — the approach with a governed Master Page plus automated Scroll PDF exports is really elegant. It strikes a very good balance between governance, automation, and simplicity, especially for teams that need offline availability or external backups of critical Confluence content.

I’m wondering if you’ve also explored a direction that goes one step further in terms of portability and long-term maintainability: Docs-as-Code.

In this model, documentation is written and maintained in text formats such as Markdown or AsciiDoc, stored in Git, and then rendered into whichever output formats or platforms teams need — HTML, PDF, Confluence Cloud/DC, SharePoint, static websites, etc. With tools like MkDocs, Docusaurus, Antora, Asciidoctor, or GitHub Pages, you can get very nice, professional-looking results with relatively little effort.

Some advantages this can bring on top of your current approach:

Strong governance via Git: reviews, approvals, and change history are built-in through pull requests.
Versioning & branching: documentation versions align naturally with software versions or release trains.
Multi-platform publishing: once you have the docs in Markdown/AsciiDoc, different publishers can generate outputs for Confluence, SharePoint, PDF, AWS/GCP storage, static sites, etc.
Separation of content & presentation: the same source can be styled differently depending on the target platform.
Easy CI/CD integration: automated pipelines can publish changes in seconds.

Of course, the downside is that Git and associated tooling introduces more complexity for less technical contributors, so it’s not always a universal fit. But for many engineering or DevOps-heavy teams, Docs-as-Code has proven to be extremely powerful.

Your approach seems like a great foundation — especially for teams that prefer Confluence as their source of truth — and combining it with Docs-as-Code concepts could create a very robust documentation ecosystem.

Curious if you see this as useful and it might be a fit for you.

BR,

Jonas

@Jonas Pencke

Doc-as-code may work for some type of documentation in specific circumstances but the fact is that Git was designed for code, not to handle natural language and complex text structures.

In my previous gigs, we realized that with Confluence and marketplace apps, we were saving about 30% of time compared to doc-as-code solutions. And that included multiple sites, multiple versions, languages, conditional content, and even a site that combined semantic versioning with continuous delivery.

Mind, those 30% time saving only concern actual product docs work in what amounts to a continuous delivery scenario.

Had I included the tooling setup and short/long term maintenance of the entire git-based pipeline (nevermind integration of things like Elastic search or AI), the time difference, and resource costs, would be even greater.

Setting up the entire pipeline - from authoring environment, thru conditional content setup, building TWO websites (one public, one with SSO), on our own domain, and the SSO setup, took 30 minutes. As a result, I was able to edit a single page (in any doc, any version, any variant, any language) at any moment in time and publish it to a live website in seconds without dependendencies, branching, editing the master, or, heaven forbid, the site directly.

All without writing a single line fo code.

As writers, we don't need most of the functionality that's designed specifically for code. In fact these redundant features make our lives harder - the need to create branches, create merge requests, solve conflicts. It's a waste of time.
If the docs live in the same repo as the code does, the merge request will require to pass through all the checks that are there for the code but may not make sense for the docs - meaning your doc merge requests will be failing. Is there the will and the resources to customize processes specifically for the doc folders? From my experience, it's rare. And then... what's the point of using git in the first place then if you disable half of its functionality for the docs....
Getting doc updates to the master will simply take more time. Where in Confluence you simply open a page for editing and then click Save, gits requires so much more time and effort, even if you are git-fluent.
What if I need to change docs independently of changes in the code? I wanna change docs where the code did not change. Chunk and dechunk pages, rework the entire doc structure... As writers, we do it all the time.
Limited formatting capabilities - markdown does not work. Editing text in WYSIWYG is so much easier, adding a pic to a Confluence article is literally a matter of 1,5 sec (in git, you need to place a pic to a special place and then put a link to this pic to the article). Try adding a table... A writer who asks 'how to code an expand box' is, by definition, using a wrong tool for the job.
If the main reason is "we need the docs to live next to the code because this way we will for sure remember to update it"... You can do this with other tools as well. It's a function of process, not a tool.
That you can technically maintain docs in git, it does not mean you should. The same way developers use git to code, writers use writing-specific software to write, it's that simple. We're not asking devs to do code-as-docs ;)
Also, it's 2025 and typing code just to create bold text or a list is like using a typewriter to send an email.

Now, for those who NEED git integration, I recommend Archbee. Why? Because while it's git underneath, for writers it essentially mimics how Confluence works (with limits imposed by git limitations).

Like • Nils Bier _K15t_ likes this

Hello @Alexander Nilsson

Thanks for sharing your advanced use case and even more advanced programmatic solution.

And of course, thanks for mentioning and using our app, especially in this creative way.

Single source of truth is crucial - and I'd love to bundle the concept of everyone, from any team, on the same page, at any momemnt, at any stage of the documentation life cycle.

I'm intrigued by the notion of 'Confluence down' - not that it never happens, but I associate it mainly with my Confluence Server experience

However... :)

There might be a smoother solution available for you. I actually deployed a version of it during my previous gig at Emplifi.

The goal is: keep critical documentation available and readable, even when Confluence isn’t.

So....

On the editing side, you wouldn't change a thing. However, instead of publishing PDFs to Sharepoint etc., you would publish your space (spaces) to a website that retains the navigation, structure, and convenience of your Confluence content.

We have an app called Scroll Sites.

What it does is it creates a static website from a Confluence space (no anynymous access required) . There are a couple of cool things that you will find useful

the website is totally independent of Confluence - it doesn't live on Confluence
you can put that site behind your SSO or a password (you can also make your scroll site public - exhibit A: https://docs.emplifi.io/ )
you can set up Scroll Sites in such a way that the moment you save a Confluence page, it gets synced to that Scroll site

This ensures that you control who has access - independently of Confluence, and that any saved content is available at any time to anyone who has access to the site.

Now, if the goal is to be able to EDIT and CREATE new content while your Confluence is down, there is an app for that. Space Sync for Confluence can synchronize spaces, bi- and multi-directionally among several Confluence sites. I actually recommended that solution to a couple of people who wanted this multi-site collaboration (company > client etc.).

Here, the assumption would be that one site of the two would be still operational if the other is down (yes, it does sound like Geographically Dispersed Parallel Sysplex from Mainframes :) ).

You can, of course, combined this Geographically Dispersed Parallel Confluence with Scroll Sites but that's the realm of Confluence metaphysics :)

Clarification: I joined K15t a couple of months ago after being a customer for many years. So the advise I give here comes from experience of actually deploying the apps and solutions.

We use Scroll Sites for our documentation management, it's amazing. @Kris Klima _K15t_

Like • like this

@Darin - Opus Guard As a Scrol Sites (Viewport) user of 10 years, I can confirm :)

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Automating Confluence PDF Exports for Offline Access and Archiving on External Storage Systems

6 comments

Comment

Was this helpful?

Thanks!

About this author

TAGS

Atlassian Community Events