Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Disaster Recovery deployment for JSM

dujas March 15, 2022

Guys,

This is a pure discussion about deployment of DR for JSM.

Previously, I was trying the solution in link https://confluence.atlassian.com/enterprise/disaster-recovery-guide-for-jira-692782022.html, but I found the whole process is quite complicated. Consequently, I tried another procedure according to my own environment:

We have a PostgreSQL cluster bundled via Consul and Patroni (3 postgresql instances, one is primary and the other two are replicas, only the primary is R/W). In the production site, I deploy the JSM connecting to the PostgreSQL via HAproxy, it is working fine so far. And in the DR site, I installed another instance of JSM without bringing it up. This is the initial(maybe normal in the future) status.

In this experiment, I copy the dbconfig.xml to the DR site and shut down the production JSM manually, then start the JSM in DR, it is up and running. Since the data is retrieved from the same datbase, all system level configuration is identical, including base URL, system id and so on (for now I have to modify the base URL for further access since there is no DNS or LB configured in the very front).

What I am concerning about now is how could I sync those files generated/installed in production site (such as attachments, avatars and installed plugins. I uncheck the index snapshots as I would like to run a full re-index once the failover is done) to the DR site, I tried the Replication function and indeed it would copy those files to the secondary site (I tried NFS, the permission part almost killed me) but the files are too scatterred to sync back to the expected directory in DR. 

Your suggestion would be much appreciated and I will try it out in my lab then.

Thanks.

Jason Du

2 comments

DG January 29, 2023

Hi Jason,

We also have the JSM Data Center version with HA concept and Patroni Cluster in use. However, the application displays I/O errors. The DB is not corrupt, it is still a I/O error.

May I ask if there were any special features during the setup?

We run the cluster across different data centers, our Data center and Azure.

Did you work according to the official recommended documentation from Atlassian?

https://confluence.atlassian.com/adminjiraserver/running-jira-data-center-in-a-cluster-993929598.html

 

Best wishes,

Dennis

Colin_McDermott February 8, 2023

"What I am concerning about now is how could I sync those files generated/installed in production site (such as attachments, avatars and installed plugins. I uncheck the index snapshots as I would like to run a full re-index once the failover is done) to the DR site, I tried the Replication function and indeed it would copy those files to the secondary site (I tried NFS, the permission part almost killed me) but the files are too scatterred to sync back to the expected directory in DR. "

Maybe you shouldn't. 

So your <atl-home>/shared directory should be where everything lives bar installed plugins (and they should re-download).  Attachments and Avatars should be in /shared and I believe a copy of plugins should be there. 

This <atl-home>/shared should be a FAST NFS share. It should be accessible from all nodes for node/failure as a common directory. With EC2 were talking an EBS share, with Azure you can do an equivalent. 

 

However this is a DR site. Not production, DR. If I was doing DR, I would have a replication copy that is 1 month older then production of the NFS share with a monthly/yearly archive, why 

RANSOMWARE. 

The biggest call on DR is most likely a hacker, particularly when you are talking Cloud (not cloud security but someone hacking a workstation and uploading malware on your jira tickets). When recovering to DR, I would want to know I am going to a clean environment then restoring latest. 

Just food for thought. 

Comment

Log in or Sign up to comment
TAGS
AUG Leaders

Atlassian Community Events