Data Center Cache Replication problems Edited

I have been trying to set up a demo Data Center instance. It's on Azure, not that I think that matters. I did NOT use the Azure Data Center Marketplace Template. I wanted to set everything up manually. Everything works other than I am failing the 'HealthCheck: Cluster Cache Replication'.

If I am logged into node1 it tells me that node2 isn't replicating. If I am logged into node2 it tells me that node1 isn't replicating.

The Shared Home is in /mnt/sharedhome

The Shared Home is on a different vm and shared through NFS.

Nodes are CentOS 7.4.

Both nodes can see that mount fine. I've already chown'd that directory to the jira user

root $ chown jira /mnt/sharedhome/
root $ chown -R jira /mnt/sharedhome/

I think I've given the jira user the right permissions.

root $ chmod -R u+rwx /mnt/sharedhome/

The jira user can create and delete files in the directory. If I make a change to the directory on one node the change is almost immediately reflected on the other.

jira $ touch /mnt/sharedhome/test.txt      (on node1)
jira $ rm /mnt/sharedhome/test.txt         (on node2)

My cluster.properties file is  (changed to node2 on node2 obviously)

# This ID must be unique across the cluster 
jira.node.id = node1
# The location of the shared home directory for all JIRA nodes
jira.shared.home = /mnt/sharedhome

 

On both I am seeing similar messages in catalina.out

2018-02-25 10:49:51,328 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Start replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, stacktrace: <only-in-trace>
2018-02-25 10:49:51,343 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Done replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, numberOfPeers: 1, numberOfSuccess: 1, timeMillis: 14, stacktrace: <only-in-trace>
2018-02-25 10:50:13,997 HealthCheck:thread-7 WARN taylor-local 627x61x2 c52hux 98.247.96.192 /rest/troubleshooting/1.0/check/process/ [c.a.t.j.healthcheck.cluster.ClusterReplicationHealthCheck] Node node1 does not seem to replicate its cache

Specifically bother are saying that they  are 

Done replicating cache

while both warning that the other node 

does not seem to replicate its cache

EDIT: Based on 
https://community.atlassian.com/t5/Jira-questions/JIRA-DC-Node-ehcache-connection-refused/qaq-p/634262

https://jira.atlassian.com/browse/JRASERVER-64974

https://jira.atlassian.com/browse/JRASERVER-66608

https://community.atlassian.com/t5/Jira-questions/What-is-the-random-port-opened-by-Jira-Datacenter-used-for-and/qaq-p/346614

I've added this to my cluster.properties file

ehcache.object.port = 40011

And opened that port for both inbound and outbound, as well as port 40001, did not fix the issue. 

Again, it's just the Cluster Cache Replication health check that's failing. The Cluster Index Replication and Shared Home health checks are fine.

2 answers

Just in case anyone else is interested in this. We had same issue on Azure. 

Adding the IP of each node to its cluster.properties file fixed the issue. 

# This ID must be unique across the cluster
jira.node.id = node1

# The location of the shared home directory for all JIRA nodes
jira.shared.home = <path/to/shared/jirahome>

# IP Address used by this node for cache replication ehcache.listener.hostName = <Node 1 Server IP> # Ports used by this node for cache replication ehcache.listener.port = 40001 ehcache.object.port = 40011

Also make sure you sure both nodes can reach each other.

ping <node 1 ip>

ping <node 2 ip> 

If my sharedhome is in another server how can I specify it?

1.1.1.1/data/jira/sharedhome? //1.1.1.1/data/jira/sharedhome?

Mount the folder on each server that jira dc will be running on, you cannot refer to it over another server for DC, create a support ticket with Atlassian Support if in doubt.

For the mount part, check on stack overflow or similar places, or google mount nfs, there are few examples of how to map directories from 1 server into another depending on the OS / type and make sure the mount path is same for all nodes. 

Got it, Thanks @Ankit Dahiya I already found the instructions to mount.

We are experiencing the same issue in AWS. Did you ever get this figured out?

Hi Jeff,

 

Above comment explains how it got fixed in our case. 

Better to create support ticket with Atlassian if it's urgent. 

One thing I found that should probably explain why it happened in first place.

Jira DC is picking the hostname from the server and if DC and load balancer are using different node name then you need to define the parameters (ehcache.listener.hostname, listener and object port).

Suggest an answer

Log in or Sign up to answer
Community showcase
Published Wednesday in Jira

Make your Atlassian Cloud products more secure: our NEW admin security guide

Hey admins! I’m Dave, Principal Product Manager here at Atlassian working on our cloud platform and security products. Cloud security is a moving target. As you adopt more products, employees consta...

116 views 0 5
Read article

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you