It's not the same without you

Join the community to find out what other Atlassian users are discussing, debating and creating.

Atlassian Community Hero Image Collage

Data Center Cache Replication problems Edited

I have been trying to set up a demo Data Center instance. It's on Azure, not that I think that matters. I did NOT use the Azure Data Center Marketplace Template. I wanted to set everything up manually. Everything works other than I am failing the 'HealthCheck: Cluster Cache Replication'.

If I am logged into node1 it tells me that node2 isn't replicating. If I am logged into node2 it tells me that node1 isn't replicating.

The Shared Home is in /mnt/sharedhome

The Shared Home is on a different vm and shared through NFS.

Nodes are CentOS 7.4.

Both nodes can see that mount fine. I've already chown'd that directory to the jira user

root $ chown jira /mnt/sharedhome/
root $ chown -R jira /mnt/sharedhome/

I think I've given the jira user the right permissions.

root $ chmod -R u+rwx /mnt/sharedhome/

The jira user can create and delete files in the directory. If I make a change to the directory on one node the change is almost immediately reflected on the other.

jira $ touch /mnt/sharedhome/test.txt      (on node1)
jira $ rm /mnt/sharedhome/test.txt         (on node2)

My cluster.properties file is  (changed to node2 on node2 obviously)

# This ID must be unique across the cluster 
jira.node.id = node1
# The location of the shared home directory for all JIRA nodes
jira.shared.home = /mnt/sharedhome

 

On both I am seeing similar messages in catalina.out

2018-02-25 10:49:51,328 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Start replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, stacktrace: <only-in-trace>
2018-02-25 10:49:51,343 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Done replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, numberOfPeers: 1, numberOfSuccess: 1, timeMillis: 14, stacktrace: <only-in-trace>
2018-02-25 10:50:13,997 HealthCheck:thread-7 WARN taylor-local 627x61x2 c52hux 98.247.96.192 /rest/troubleshooting/1.0/check/process/ [c.a.t.j.healthcheck.cluster.ClusterReplicationHealthCheck] Node node1 does not seem to replicate its cache

Specifically bother are saying that they  are 

Done replicating cache

while both warning that the other node 

does not seem to replicate its cache

EDIT: Based on 
https://community.atlassian.com/t5/Jira-questions/JIRA-DC-Node-ehcache-connection-refused/qaq-p/634262

https://jira.atlassian.com/browse/JRASERVER-64974

https://jira.atlassian.com/browse/JRASERVER-66608

https://community.atlassian.com/t5/Jira-questions/What-is-the-random-port-opened-by-Jira-Datacenter-used-for-and/qaq-p/346614

I've added this to my cluster.properties file

ehcache.object.port = 40011

And opened that port for both inbound and outbound, as well as port 40001, did not fix the issue. 

Again, it's just the Cluster Cache Replication health check that's failing. The Cluster Index Replication and Shared Home health checks are fine.

2 answers

Just in case anyone else is interested in this. We had same issue on Azure. 

Adding the IP of each node to its cluster.properties file fixed the issue. 

# This ID must be unique across the cluster
jira.node.id = node1

# The location of the shared home directory for all JIRA nodes
jira.shared.home = <path/to/shared/jirahome>

# IP Address used by this node for cache replication ehcache.listener.hostName = <Node 1 Server IP> # Ports used by this node for cache replication ehcache.listener.port = 40001 ehcache.object.port = 40011

Also make sure you sure both nodes can reach each other.

ping <node 1 ip>

ping <node 2 ip> 

If my sharedhome is in another server how can I specify it?

1.1.1.1/data/jira/sharedhome? //1.1.1.1/data/jira/sharedhome?

Mount the folder on each server that jira dc will be running on, you cannot refer to it over another server for DC, create a support ticket with Atlassian Support if in doubt.

For the mount part, check on stack overflow or similar places, or google mount nfs, there are few examples of how to map directories from 1 server into another depending on the OS / type and make sure the mount path is same for all nodes. 

Got it, Thanks @Ankit Dahiya I already found the instructions to mount.

0 votes
Jeff Tillett Community Leader May 17, 2018

We are experiencing the same issue in AWS. Did you ever get this figured out?

Hi Jeff,

 

Above comment explains how it got fixed in our case. 

Better to create support ticket with Atlassian if it's urgent. 

One thing I found that should probably explain why it happened in first place.

Jira DC is picking the hostname from the server and if DC and load balancer are using different node name then you need to define the parameters (ehcache.listener.hostname, listener and object port).

Suggest an answer

Log in or Sign up to answer
Community showcase
Asked in Data Center

AMA: How to plan ahead for Data Center - Expert advice from an Atlassian panel

This AMA is now closed Hi! I'm Jacob Shepard a Product Marketing Manager on Atlassian’s Enterprise Team. We know that moving to Data Center is no small task. To do so effectively demands extensive ...

8,716 views 47 36
View question

Community Events

Connect with like-minded Atlassian users at free events near you!

Find an event

Connect with like-minded Atlassian users at free events near you!

Unfortunately there are no Community Events near you at the moment.

Host an event

You're one step closer to meeting fellow Atlassian users at your local event. Learn more about Community Events

Events near you