Atlassian high availavility: cold failover?

We are trying to setup a Disaster Recovery solution for our Atlassian applications (so far, Crowd, JIRA, Confluence and Crucible). The production environment consist of the following servers:

  • Crowd: running on Windows, linked with AD to provide user authentication
  • JIRA, Confluence and Crucible: Each of this runs on its own Ubuntu 10.04 server
  • MS SQL Server 2008: Database server shared by all the above

Our approach is to have every server replicated to a cold server (in a different geographical location), do an rsync to keep the different data folders up to date and have a secondary database server that we keep up to date with database replication.

First issue is to make sure we filter what files are replicated through rsync, so we do not overwrite the cold server settings like database configuration (should point to the failover DB server).

Second problem is to filter what tables get replicated for the databases. The last releases of Atlassian apps have the User Directory configuration stored in the DB. This means that if we do not filter this settings, we'd have the failover JIRA server pointing to the production Crowd, instead of the failover one.

Still haven't completed this setup, but would like to hear of any thoughts about this setup and other possible solutions to provide resiliance to our Atlassian environment. I'm specially concerned of the administrative burden that this will bring when upgrading the live environment. Also, any changes in the configuration files and/or configuration settings stored in the DB in future releases would probably mean our cold failover environment will be broken.

2 answers

1 accepted

Sounds unnecessarily complex... your JDBC url should contain the DNS alias for the database server, such that if the database is failed over then it the same url automatically points to the DR database system. Unless you are a very small company this should be provided for you by the DBAs I would have thought.

I don't use Crowd, but the same thing applies to LDAP servers. You point to one that gets round-robinned by DNS, and any that are down get dropped automatically. So I'd suggest you just set up DR for Crowd and use F5s or whatever to automatically have the crowd url directed to the correct crowd server.

We use a clustered filesystem so in the event of failover the filesystem is automatically mounted on the DR machine. If we had to change configuration files or ensure that they had not been synced that would just increase the chance of a problem in an already panicked situation.

In short, at least for the DB thing, try to leverage whatever your DBAs recommend.

Thanks for the tick, hopefully other people will chime him with more information. One final piece of advice - test it! And then again every 6 months or so.

Have to say your solution is embarrassingly simple :)

I agree it'd be good to hear from other people implementations.

I'm thinking of creating static entries in the failover servers hosts files to point to LDAP and DB server. This way we can test it without bringing the prod environment up and there'll be less steps to follow in case of failover. We are thinking of doing this manually, no F5s ;)

3 votes
Stefan Broda Atlassian Team Jun 05, 2012

On this topic: Atlassian has just released a dedicated best practice guide for High Availability. It covers a cold failover scenario and includes implementation details on reverse proxying, monitoring, replication and failover mechanisms:

https://confluence.atlassian.com/display/ATLAS/Failover+for+JIRA

how does one access this document? We're about to start a migration/combination and this doc would really come in handy

No... I can't access it anymore -presumably as data center is available, then this document has been retired?

It would be useful for the rest of us, as I need to test our cold standby environment, and it's been a few months since I last reviewed this doc!

Can someone at Atlassian free it up from it's black hole?

Hi, you can find the newest version of the document here: https://confluence.atlassian.com/display/ENTERPRISE/Failover+for+JIRA+Data+Center

Hi Christine, I don't see any data other than a basic image.

Your previous doc had heartbeat and brbd information and a bit on database replication.

Cheers

Sadly, the new link doesn't have much information at all. There are many of us who are either not using Jira Data Center yet, or choose not to for various reasons. For example, my company has datacenters in different geographical regions. Jira Data Center doesn't cluster between different geographic locations yet. So for us, the cold failover approach makes more sense.. But I can't seem to find cold failover documents for Jira *anywhere* on atlassian -- the few pages that still exist appear to be restricted. I see stuff for Confluence, Bamboo, stash... but not Jira. If I were a conspiracy theorist, it would appear that we are being heavily encouraged to use Jira Data Center. ;)

Suggest an answer

Log in or Sign up to answer
Community showcase
Published Jan 08, 2019 in Jira

How to Jira for designers

I’m a designer on the Jira team. For a long time, I’ve fielded questions from other designers about how they should be using Jira Software with their design team. I’ve also heard feedback from other ...

954 views 3 9
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you