Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in
Celebration

Earn badges and make progress

You're on your way to the next level! Join the Kudos program to earn points and save your progress.

Deleted user Avatar
Deleted user

Level 1: Seed

25 / 150 points

Next: Root

Avatar

1 badge earned

Collect

Participate in fun challenges

Challenges come and go, but your rewards stay with you. Do more to earn more!

Challenges
Coins

Gift kudos to your peers

What goes around comes around! Share the love by gifting kudos to your peers.

Recognition
Ribbon

Rise up in the ranks

Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!

Leaderboard

Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,456,541
Community Members
 
Community Events
176
Community Groups

Data Center Cache Replication problems

Edited

I have been trying to set up a demo Data Center instance. It's on Azure, not that I think that matters. I did NOT use the Azure Data Center Marketplace Template. I wanted to set everything up manually. Everything works other than I am failing the 'HealthCheck: Cluster Cache Replication'.

If I am logged into node1 it tells me that node2 isn't replicating. If I am logged into node2 it tells me that node1 isn't replicating.

The Shared Home is in /mnt/sharedhome

The Shared Home is on a different vm and shared through NFS.

Nodes are CentOS 7.4.

Both nodes can see that mount fine. I've already chown'd that directory to the jira user

root $ chown jira /mnt/sharedhome/
root $ chown -R jira /mnt/sharedhome/

I think I've given the jira user the right permissions.

root $ chmod -R u+rwx /mnt/sharedhome/

The jira user can create and delete files in the directory. If I make a change to the directory on one node the change is almost immediately reflected on the other.

jira $ touch /mnt/sharedhome/test.txt      (on node1)
jira $ rm /mnt/sharedhome/test.txt         (on node2)

My cluster.properties file is  (changed to node2 on node2 obviously)

# This ID must be unique across the cluster 
jira.node.id = node1
# The location of the shared home directory for all JIRA nodes
jira.shared.home = /mnt/sharedhome

 

On both I am seeing similar messages in catalina.out

2018-02-25 10:49:51,328 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Start replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, stacktrace: <only-in-trace>
2018-02-25 10:49:51,343 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Done replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, numberOfPeers: 1, numberOfSuccess: 1, timeMillis: 14, stacktrace: <only-in-trace>
2018-02-25 10:50:13,997 HealthCheck:thread-7 WARN taylor-local 627x61x2 c52hux 98.247.96.192 /rest/troubleshooting/1.0/check/process/ [c.a.t.j.healthcheck.cluster.ClusterReplicationHealthCheck] Node node1 does not seem to replicate its cache

Specifically bother are saying that they  are 

Done replicating cache

while both warning that the other node 

does not seem to replicate its cache

EDIT: Based on 
https://community.atlassian.com/t5/Jira-questions/JIRA-DC-Node-ehcache-connection-refused/qaq-p/634262

https://jira.atlassian.com/browse/JRASERVER-64974

https://jira.atlassian.com/browse/JRASERVER-66608

https://community.atlassian.com/t5/Jira-questions/What-is-the-random-port-opened-by-Jira-Datacenter-used-for-and/qaq-p/346614

I've added this to my cluster.properties file

ehcache.object.port = 40011

And opened that port for both inbound and outbound, as well as port 40001, did not fix the issue. 

Again, it's just the Cluster Cache Replication health check that's failing. The Cluster Index Replication and Shared Home health checks are fine.

3 answers

Just in case anyone else is interested in this. We had same issue on Azure. 

Adding the IP of each node to its cluster.properties file fixed the issue. 

# This ID must be unique across the cluster
jira.node.id = node1

# The location of the shared home directory for all JIRA nodes
jira.shared.home = <path/to/shared/jirahome>

# IP Address used by this node for cache replication ehcache.listener.hostName = <Node 1 Server IP> # Ports used by this node for cache replication ehcache.listener.port = 40001 ehcache.object.port = 40011

Also make sure you sure both nodes can reach each other.

ping <node 1 ip>

ping <node 2 ip> 

If my sharedhome is in another server how can I specify it?

1.1.1.1/data/jira/sharedhome? //1.1.1.1/data/jira/sharedhome?

Mount the folder on each server that jira dc will be running on, you cannot refer to it over another server for DC, create a support ticket with Atlassian Support if in doubt.

For the mount part, check on stack overflow or similar places, or google mount nfs, there are few examples of how to map directories from 1 server into another depending on the OS / type and make sure the mount path is same for all nodes. 

Got it, Thanks @Ankit Dahiya I already found the instructions to mount.

Thanks @Ankit Dahiya this worked perfectly for me.

all nodes in the cluster think that all the other nodes in the cluster are not replicating their cache. 
Spend a day on it, and removed the /etc/hosts entry suggested for 127.0.1.1
cluster .properties is just the node id and home path. 
hostnames are resolvable
cache is replicating again,

0 votes

We are experiencing the same issue in AWS. Did you ever get this figured out?

Hi Jeff,

 

Above comment explains how it got fixed in our case. 

Better to create support ticket with Atlassian if it's urgent. 

One thing I found that should probably explain why it happened in first place.

Jira DC is picking the hostname from the server and if DC and load balancer are using different node name then you need to define the parameters (ehcache.listener.hostname, listener and object port).

Like Jeff Tillett likes this

Suggest an answer

Log in or Sign up to answer
TAGS

Atlassian Community Events