Missed Team ’24? Catch up on announcements here.

×
Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

Data Center Cache Replication problems

Taylor Huston February 25, 2018

I have been trying to set up a demo Data Center instance. It's on Azure, not that I think that matters. I did NOT use the Azure Data Center Marketplace Template. I wanted to set everything up manually. Everything works other than I am failing the 'HealthCheck: Cluster Cache Replication'.

If I am logged into node1 it tells me that node2 isn't replicating. If I am logged into node2 it tells me that node1 isn't replicating.

The Shared Home is in /mnt/sharedhome

The Shared Home is on a different vm and shared through NFS.

Nodes are CentOS 7.4.

Both nodes can see that mount fine. I've already chown'd that directory to the jira user

root $ chown jira /mnt/sharedhome/
root $ chown -R jira /mnt/sharedhome/

I think I've given the jira user the right permissions.

root $ chmod -R u+rwx /mnt/sharedhome/

The jira user can create and delete files in the directory. If I make a change to the directory on one node the change is almost immediately reflected on the other.

jira $ touch /mnt/sharedhome/test.txt      (on node1)
jira $ rm /mnt/sharedhome/test.txt         (on node2)

My cluster.properties file is  (changed to node2 on node2 obviously)

# This ID must be unique across the cluster 
jira.node.id = node1
# The location of the shared home directory for all JIRA nodes
jira.shared.home = /mnt/sharedhome

 

On both I am seeing similar messages in catalina.out

2018-02-25 10:49:51,328 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Start replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, stacktrace: <only-in-trace>
2018-02-25 10:49:51,343 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Done replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, numberOfPeers: 1, numberOfSuccess: 1, timeMillis: 14, stacktrace: <only-in-trace>
2018-02-25 10:50:13,997 HealthCheck:thread-7 WARN taylor-local 627x61x2 c52hux 98.247.96.192 /rest/troubleshooting/1.0/check/process/ [c.a.t.j.healthcheck.cluster.ClusterReplicationHealthCheck] Node node1 does not seem to replicate its cache

Specifically bother are saying that they  are 

Done replicating cache

while both warning that the other node 

does not seem to replicate its cache

EDIT: Based on 
https://community.atlassian.com/t5/Jira-questions/JIRA-DC-Node-ehcache-connection-refused/qaq-p/634262

https://jira.atlassian.com/browse/JRASERVER-64974

https://jira.atlassian.com/browse/JRASERVER-66608

https://community.atlassian.com/t5/Jira-questions/What-is-the-random-port-opened-by-Jira-Datacenter-used-for-and/qaq-p/346614

I've added this to my cluster.properties file

ehcache.object.port = 40011

And opened that port for both inbound and outbound, as well as port 40001, did not fix the issue. 

Again, it's just the Cluster Cache Replication health check that's failing. The Cluster Index Replication and Shared Home health checks are fine.

3 answers

Suggest an answer

Log in or Sign up to answer
4 votes
Ankit Dahiya April 24, 2018

Just in case anyone else is interested in this. We had same issue on Azure. 

Adding the IP of each node to its cluster.properties file fixed the issue. 

# This ID must be unique across the cluster
jira.node.id = node1

# The location of the shared home directory for all JIRA nodes
jira.shared.home = <path/to/shared/jirahome>

# IP Address used by this node for cache replication ehcache.listener.hostName = <Node 1 Server IP> # Ports used by this node for cache replication ehcache.listener.port = 40001 ehcache.object.port = 40011

Also make sure you sure both nodes can reach each other.

ping <node 1 ip>

ping <node 2 ip> 

Daniel Alonso September 27, 2018

If my sharedhome is in another server how can I specify it?

1.1.1.1/data/jira/sharedhome? //1.1.1.1/data/jira/sharedhome?

Ankit Dahiya September 28, 2018

Mount the folder on each server that jira dc will be running on, you cannot refer to it over another server for DC, create a support ticket with Atlassian Support if in doubt.

For the mount part, check on stack overflow or similar places, or google mount nfs, there are few examples of how to map directories from 1 server into another depending on the OS / type and make sure the mount path is same for all nodes. 

Daniel Alonso September 28, 2018

Got it, Thanks @Ankit Dahiya I already found the instructions to mount.

Medhat Ahmed June 2, 2020

Thanks @Ankit Dahiya this worked perfectly for me.

0 votes
Adam Theis June 29, 2022

all nodes in the cluster think that all the other nodes in the cluster are not replicating their cache. 
Spend a day on it, and removed the /etc/hosts entry suggested for 127.0.1.1
cluster .properties is just the node id and home path. 
hostnames are resolvable
cache is replicating again,

0 votes
Jeff Tillett
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 17, 2018

We are experiencing the same issue in AWS. Did you ever get this figured out?

Ankit Dahiya May 17, 2018

Hi Jeff,

 

Above comment explains how it got fixed in our case. 

Better to create support ticket with Atlassian if it's urgent. 

One thing I found that should probably explain why it happened in first place.

Jira DC is picking the hostname from the server and if DC and load balancer are using different node name then you need to define the parameters (ehcache.listener.hostname, listener and object port).

Like Jeff Tillett likes this
TAGS
AUG Leaders

Atlassian Community Events