Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

backup to offline instance

reversed-hitech January 10, 2023

Hey,

Due to security demands, I want to implement a unique confluence setup:

2 air-gapped servers, where the content of one is being updated from the other. Once in a month I plan to import the data from one confluence to the other. I started to get familiar with this setup, but I don't know any good way to update the other offline instance.

I tried to query the diff of the important DBs - content, bodycontent, users, group, etc...
but I can't manage to get it right.

Is there any good way to implement this?

Thanks in advance,
Yossi

1 answer

0 votes
Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 10, 2023

Welcome to the Atlassian Community!

There's no incremental change functionality for Confluence, unless you were to un-air-gap them and implement something that pushed changes from one to the other as they happen.  Even then, you would end up with different data sets.

The only way you're going to get a complete solution is to tear down the target system - take a database dump of the source, and restore it into the target database (you'll then need to re-index, and sync any changed/added/deleted attachments from the file system, but that's not too hard)

This will of course, destroy anything you've done on the target system.

reversed-hitech January 11, 2023

Thank you Nic.

I realize I didn't mention it before - the second instance is readonly, and is not meant to accept changes.

Sadly I cannot un-air-gap the instances due to the sensitivity of the information stored.
However, you mentioned that pushing the changes as they happen may do the job - is there any builtin way to do this? is it a DB trigger? I'm thinking that maybe I could log all those queries, transfer to the air-gapped network and update the other instance.

I looked at confluence's DB a bit, and wonder if a query like this might work: query all from confluence.CONTENT with a time filter from the last time I synced the instances.
Then I'll dump confluence.BODYCONTENT and the rest of the relevant tables with the values of contentid in the first query, and import the results to the other air-gapped instance.

Do you think that something like this might work? I'm trying to collect all relevant tables - do you have a list of such tables?

Best regards,
Yossi

Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 11, 2023

That's a massive amount of work that won't capture everything, and leave your target Confluence in a different state.  It will be close, as you say it's "read only", but still not quite right.

Forget databases, with Atlassian stuff, it's all-or-nothing (or a massive amount of coding).  

As your target is read-only though, there is a simple solution - dump and restore the database from the source.  Sync the attachments and re-index after you've restarted the target.

reversed-hitech January 11, 2023

I do not have the ability to do a full dump of the instance due to the size of the database (a few hundred gigabytes), it would also be a pretty big hassle to do it overtime.

Are there no scripts on your end that can do that? or at least any documentation that explains how the database stores everything to make sure our development team can code this the right way without losing the important data. I mostly care about the content of the pages and the spaces, I don't really care about the comments/inline comments, attachments, etc.

Thank you for the fast reply!
Yossi

Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 11, 2023

The content includes comments, so if you wanted to code for this, you'd have to make it even more complex if you wanted to drop them, and you'd still have the problem that the two Confluences would not be the same.

The only feasible way to do this is with a database dump.

reversed-hitech January 11, 2023

> The content includes comments, so if you wanted to code for this, you'd have to make it even more complex if you wanted to drop them

I understand, what I meant is that the comments are not important for me, so if it's easier to sync the pages without comments, that's fine with me. I didn't specifically say I want no comments at all.

 

> and you'd still have the problem that the two Confluences would not be the same.

Why would they not be the same if you're copying the changes from A to B (considering B is going to be read-only)? Would you be able to link me to the documentation for how the database works so our development team can code something for us that would sync the changes if there are no Atlassian scripts available that perform the sync?

Yossi

Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
January 11, 2023

Again, you can't sync the databases.  Two separate instances run separately.  Even if one is "read only", it's going to be doing different things than the other, and hence the data is going to be different.  Trying to jam data from the other one into it is going to be an absolute nightmare.   You can't just copy a row from a table, you're going to have to emulate every edit action - i.e. looking through the target to see where it can be added, and amending the incoming data so that it fits in the right place.

You do not really have any choice here.  You've got months to years of development to write this stuff.

Or you can do space export / import and write something that works out the diffs so you can work out what you can manually delete and replace or update. Still going to take a while, and it's going to be a slow manual process (and the database is the worst place to try to work it out from)

Or do it properly, by cloning the database.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events