Do you archive/delete old Confluence pages or keep everything for Rovo to have it "in context"?

As more teams start using Rovo, I've been wondering whether our approach to Confluence content should change.

In the past, many of us tried to keep Confluence clean by archiving or deleting outdated pages. It made search results more relevant and reduced clutter.

Now I'm not so sure.

On one hand, outdated content can confuse users and AI if it isn't clearly marked. On the other hand, keeping historical documentation might give Rovo more context and help it provide better answers by understanding the evolution of projects, decisions, and processes.

I'm curious how other teams are approaching this.

Do you regularly archive or delete old Confluence pages?
Do you keep everything unless there is a compliance reason to remove it?
Have you changed your content strategy since adopting Rovo?
Have you noticed any difference in the quality of Rovo's answers depending on how much historical content is available?

I'd love to hear about your experience, best practices, or lessons learned.

8 comments

Comment

I don't have a solution yet, but I can say with some confidence that not archiving and deleting is problematic, and doing something about it is high on my list. Rovo finds more stuff, spends more time (and tokens) trying to make sense of it and often pays attention to the wrong things. I believe too much of the wrong content is frequently more of an issue than the occasional regret of archived/deleted context being missed.

My hunch is the solution is multi-pronged, including:

Rovo getting smarter about what to pay attention to. This could be in the form of improvements to baked in skills/prompts, better signals in the teamwork graph, or more ways for organisations and teams to provide hints. I'm confident this will happen with product improvements; meanwhile I'm encouraging folks to prompt more explicitly about where to look and what rudimentary signals to consider for the task at hand (eg updated recently, is it referenced, is it in a shared space)
Pragmatic knowledge management
- Doing pretty much what you're doing, archiving and deleting outdated content. The pragmatic part is the tasteful judgement applied to deciding what's useful historical context vs outdated, potentially misleading, or just expensive to chew on
- Distillation and curation. Agents should be able to help compress knowledge over time, updating new knowledge, making what's still relevant more accessible, archiving old source material, keeping links to archived content for when it's needed, and keeping humans in the loop. The broader space of knowledge and context management with AI is fast moving but still messy.

I look forward to hearing more about what people are doing and what's working for them!

Like • like this

I completely agree. I don't think the challenge is the amount of content, it's knowing which content can be trusted.

Historical pages are still valuable, but they shouldn't compete with the current source of truth. Simple signals like when a page was last reviewed, whether it has an owner, or if it's been superseded could help both people and Rovo make much better decisions.

I think content trust is going to become just as important as content search in the AI era.

Like • like this

I think it depends on the content on the page. For example, if the content is no longer true, accurate or relevant, then archiving makes sense. However, if it is content that is more like lessons learned for projects or like PIRs for post-incident management, as long as they're still relevant, I think keeping those even if on the older side, will help provide Rovo context and content to have that full-picture.

I feel like the more historical content available, it does take longer. For example, today I was asking a question about asset rest apis to get a single object, and I had a guide on how to use the import rest apis in my internal knowledge base. It had to read internal pages first even though they were related although it realized after a few seconds that it was not relevant at all. I guess it was not historical just related in the sense that the more pages you have the longer it will take.

Like • like this

I don't think the question is whether we should keep or delete old pages anymore. It's whether people (and AI) can tell which content they should trust.

We've found that older pages still have a lot of value because they explain why decisions were made, even if they're no longer the current way of doing things. That context can be really helpful for Rovo.

The real problem starts when an old page looks just as relevant as one that was reviewed last week. Without any indication of freshness, it's hard for both people and AI to know which one should be treated as the source of truth.

For me, it's becoming less about cleaning up Confluence and more about making the status of content obvious. Knowing when a page was last reviewed, who owns it, and whether it's still considered or still trusted feels much more important in an AI-driven world.

Like • like this

My philosophy is the past is the past, focus on the present and the future.
This means up to date documentation reflecting current state of the projects.
Historical information is contained in the issues in Jira if Rovo needs to access it. Comments inside issues explaining decisions taken etc. remain.

Obsolete documentation is archived the deleted. I consider obsolete any piece of information which doesn't hold any valuable information about the project (whether its past state or current state) and only creates noise.

I actually ask Rovo to audit Confluence spaces and give me a full plan for cleaning up spaces. It does a wonderful job in this respect. It retrieves the entire hierarchy of the space, finds duplicates, empty pages, pages with no readers and documentation which may be obsolete (I still need to check it visually myself to make sure it doesn't do mistakes here but it saves a lot of time).

Like • like this

Our company routinely faces document retention requirements, so it's baked in for us to create "archive" spaces where we file old documents. Any company working in a space that is heavily regulated should be keeping its documentation, even when the documentation is no longer relevant. You never know when you'll have to answer for why a situation was handled the way it was. OP brings an even greater reason to light - context! However much it costs to store data, commit to it and create a document retention plan. It's better to have it and not need it, than need it and not have it.

Like • like this

Interesting split in this thread between "archive aggressively" and "keep everything." What I keep seeing is that the archive-vs-keep decision usually gets made on the wrong signal: page age.

Last-updated dates and review checkmarks are proxies. A page untouched for two years can still be completely correct, and a page reviewed last month can already be wrong - because what actually invalidates technical docs isn't time passing, it's the system changing underneath them. The doc about service X goes stale the day someone ships a change to X, not on its 180-day anniversary.

So before archiving by age, the question I'd ask is: which of these pages still agree with what

they describe? For docs that describe code or systems, the honest way to know is to check the

claims against the source - same spirit as MeghnaP's point about trust signals, just taken one

step further: not "was this reviewed recently" but "is this still true."

In practice most teams have no way to answer that at scale, so they fall back on age-based cleanup and hope. Curious how others here handle the "which pages are now wrong" part - manual review cycles, page owners, something tied to the dev workflow, or honestly nothing?

Like • like this

I try to ensure that my content stays relevant by updating when Atlassian and other SaaS providers send our their release notes and updates. I feel strongly that old / outdated content should be either updated or retired, but the how is interesting to me.

When a regulatory or contractual document retention limit is reached (such as for client information), deletion is fine (and maybe required). When something is no longer needed (e.g. documentation for a SaaS we no longer use) I archive.

We also use a custom status in Confluence to denote that content is Retired, which allows it to stay visible in a space but gives us a way to explicitly include or exclude it from Rovo's consideration. This one is handy when you have that sneaky feeling that something used to work one way and now doesn't seem to, or that a Client decision has done a 180.

Like • like this

Anne - "that sneaky feeling that something used to work one way and now doesn't" is the best description of this problem I've seen in this thread. And I think it points at the gap in every age- or status-based scheme: a page doesn't go wrong because time passed. It goes wrong because something else changed - a client decision does a 180, a system gets replaced, a setting flips. The page has no way of knowing that happened.
Your Retired status handles the pages you've already caught, and Michelle, a governance policy handles the cadence - both sensible. The part I haven't seen anyone solve with process alone is discovery: between two review cycles, which pages did last month's changes silently invalidate? The review finds them eventually, but "eventually" is exactly the window where a teammate (or an AI assistant) reads the stale version and acts on it.
Anne, your release-notes trigger is interesting because it's event-driven rather than calendar-driven - the vendor tells you something changed, and that prompts the review. Curious whether either of you has found an equivalent trigger for internal systems and decisions, where nobody sends you release notes. That's the case that keeps defeating us.

Like • like this

I created a policy for this. Hope this is helpful. I also have something similar for Jira projects (yes, I said projects).

We implemented these Guidelines BECAUSE of ROVO, to keep the content current. Confluence has also been billed as our company's internal Knowledge Base, so the need to keep things up to date and accurate is essential.

Like • like this

@michelle_bachmann this is the most concrete thing in the thread, and "we implemented these Guidelines BECAUSE of ROVO" is exactly the right instinct. A written, approved policy with definitions beats every abstract principle above it, mine included.

One dimension I would bolt onto a policy like yours, coming at it from the admin side: alongside the time-based triggers, a validity trigger. Time-based rules catch abandoned content well, but they miss the two edge cases that matter most for AI answers. A how-to written 3 months ago can go stale next month because the system it describes changed. A 5-year-old decision record can still be perfectly valid, and it is exactly the historical context that makes agent answers richer. Age and validity only partly overlap. A page goes invalid when the system changes, not when time passes.

The reason this matters more with Rovo than it did before: unless an agent is deliberately scoped, it reads all organizational knowledge. I was digging into a related issue this week (an agent that kept answering from a deleted Confluence source) and the likely root cause was exactly this. Giving an agent a new source does not restrict anything by itself. Scope only narrows when someone actively switches the agent from the org-wide default to Custom knowledge and selects sources. Until then, whatever survives an archive policy is not sitting quietly in a drawer. It is competing inside every default-scoped agent's context, every day. "Keep it for context" really means "let semantic search choose between the current process doc and the old version of it." Sometimes it picks the old one.

In practice that splits content into two buckets with different rules:

Decision records, post-incident reviews, lessons learned: keep them, close to forever. They answer "why is it like this," they rarely mislead, and they age well.
How-tos, process docs, configuration guides: these are the dangerous ones. They answer "how do I do this" incorrectly with full confidence, and an agent will happily serve them. These should be archived or updated the moment the system they describe changes, regardless of how recent they are.

And one practical note for anyone who cannot win the curation battle org-wide: you do not have to. For agents that answer operational questions, switch them from the org-wide knowledge default to Custom knowledge scoped to the spaces you actively maintain. Curating a whole site is a marathon. Scoping one agent takes a minute and protects its answers today.

Your policy plus a validity trigger plus agent scoping is close to a complete governance story. Each one covers what the other two cannot.

Like • Anwesha Pan likes this

Recommended Learning For You

Level up your skills with Atlassian learning

Make AI a part of the team

Avoid common AI pitfalls and follow best practices to make AI work for your team.

25m Beginner

Free

Learning Path

Get the most out of Rovo

Learn how to use Rovo, Atlassian's AI-powered product, to find, learn, and act on information faster.

2.5h Beginner

Free

Use Rovo across your organization

As an Atlassian organization admin, learn the capabilities of Rovo and how to enable it across products.

15m Advanced

Free

Was this helpful?

Thanks!

Atlassian AI & Rovo

Forums

Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

Do you archive/delete old Confluence pages or keep everything for Rovo to have it "in context"?

8 comments

Comment

Was this helpful?

Thanks!

TAGS

Atlassian Community Events