I have been trying to set up a demo Data Center instance. It's on Azure, not that I think that matters. I did NOT use the Azure Data Center Marketplace Template. I wanted to set everything up manually. Everything works other than I am failing the 'HealthCheck: Cluster Cache Replication'.
If I am logged into node1 it tells me that node2 isn't replicating. If I am logged into node2 it tells me that node1 isn't replicating.
The Shared Home is in /mnt/sharedhome
The Shared Home is on a different vm and shared through NFS.
Nodes are CentOS 7.4.
Both nodes can see that mount fine. I've already chown'd that directory to the jira user
root $ chown jira /mnt/sharedhome/
root $ chown -R jira /mnt/sharedhome/
I think I've given the jira user the right permissions.
root $ chmod -R u+rwx /mnt/sharedhome/
The jira user can create and delete files in the directory. If I make a change to the directory on one node the change is almost immediately reflected on the other.
jira $ touch /mnt/sharedhome/test.txt (on node1)
jira $ rm /mnt/sharedhome/test.txt (on node2)
My cluster.properties file is (changed to node2 on node2 obviously)
# This ID must be unique across the cluster
jira.node.id = node1
# The location of the shared home directory for all JIRA nodes
jira.shared.home = /mnt/sharedhome
On both I am seeing similar messages in catalina.out
2018-02-25 10:49:51,328 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Start replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, stacktrace: <only-in-trace>
2018-02-25 10:49:51,343 Caesium-1-2 INFO ServiceRunner [c.a.j.c.cache.ehcache.BlockingParallelCacheReplicator] Done replicating cache: com.atlassian.jira.plugins.healthcheck.service.HeartBeatService.heartbeat, operation: put, key: <only-in-debug>, numberOfPeers: 1, numberOfSuccess: 1, timeMillis: 14, stacktrace: <only-in-trace>
2018-02-25 10:50:13,997 HealthCheck:thread-7 WARN taylor-local 627x61x2 c52hux 98.247.96.192 /rest/troubleshooting/1.0/check/process/ [c.a.t.j.healthcheck.cluster.ClusterReplicationHealthCheck] Node node1 does not seem to replicate its cache
Specifically bother are saying that they are
Done replicating cache
while both warning that the other node
does not seem to replicate its cache
EDIT: Based on
https://community.atlassian.com/t5/Jira-questions/JIRA-DC-Node-ehcache-connection-refused/qaq-p/634262
https://jira.atlassian.com/browse/JRASERVER-64974
https://jira.atlassian.com/browse/JRASERVER-66608
https://community.atlassian.com/t5/Jira-questions/What-is-the-random-port-opened-by-Jira-Datacenter-used-for-and/qaq-p/346614
I've added this to my cluster.properties file
ehcache.object.port = 40011
And opened that port for both inbound and outbound, as well as port 40001, did not fix the issue.
Again, it's just the Cluster Cache Replication health check that's failing. The Cluster Index Replication and Shared Home health checks are fine.
Just in case anyone else is interested in this. We had same issue on Azure.
Adding the IP of each node to its cluster.properties file fixed the issue.
# This ID must be unique across the cluster
jira.node.id = node1
# The location of the shared home directory for all JIRA nodes
jira.shared.home = <path/to/shared/jirahome>
# IP Address used by this node for cache replication ehcache.listener.hostName = <Node 1 Server IP> # Ports used by this node for cache replication ehcache.listener.port = 40001 ehcache.object.port = 40011
Also make sure you sure both nodes can reach each other.
ping <node 1 ip>
ping <node 2 ip>
If my sharedhome is in another server how can I specify it?
1.1.1.1/data/jira/sharedhome? //1.1.1.1/data/jira/sharedhome?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Mount the folder on each server that jira dc will be running on, you cannot refer to it over another server for DC, create a support ticket with Atlassian Support if in doubt.
For the mount part, check on stack overflow or similar places, or google mount nfs, there are few examples of how to map directories from 1 server into another depending on the OS / type and make sure the mount path is same for all nodes.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Got it, Thanks @Ankit Dahiya I already found the instructions to mount.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Thanks @Ankit Dahiya this worked perfectly for me.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
all nodes in the cluster think that all the other nodes in the cluster are not replicating their cache.
Spend a day on it, and removed the /etc/hosts entry suggested for 127.0.1.1
cluster .properties is just the node id and home path.
hostnames are resolvable
cache is replicating again,
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
We are experiencing the same issue in AWS. Did you ever get this figured out?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Jeff,
Above comment explains how it got fixed in our case.
Better to create support ticket with Atlassian if it's urgent.
One thing I found that should probably explain why it happened in first place.
Jira DC is picking the hostname from the server and if DC and load balancer are using different node name then you need to define the parameters (ehcache.listener.hostname, listener and object port).
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.