How to set confluence data center cluster with different IP segment?

Raw Main January 30, 2020

I have two different IP segment servers:

Region 1:

- 10.2.100.1
- 10.2.100.2

Region 2:

- 10.2.200.1
- 10.2.200.2

They are using for 4 servers.

I want to use confluence data center product to join them together in one cluster. So I set these items in each server:

<property name="confluence.cluster">true</property>
<property name="confluence.cluster.home">/home/confluence/sharedhome</property>
<property name="confluence.cluster.interface">bond0</property>
<property name="confluence.cluster.join.type">tcp_ip</property>
<property name="confluence.cluster.name">confluence</property>
<property name="confluence.cluster.peers">10.2.100.1,10.2.100.2,10.2.200.1,10.2.200.2</property>

But I can see only two servers in the confluence dashboard in region 1 with segment IP `10.2.100`. If I switch to region 2 server cluster, I can see two nodes with segment IP `10.2.200`.

They didn't join together.

As official guide, if use `multicast IP` maybe another result:

https://confluence.atlassian.com/doc/change-node-discovery-from-multicast-to-tcp-ip-or-aws-792297728.html

But how to set multicast IP in this case?

1 answer

0 votes
Max Malygin January 30, 2020

Greetings!


You have an interesting cluster configuration. Looking on your settings, everything is correct:

  1. We switch the way to build the cluster on tcp_ip, and do not use multicast since the machines are in different network segments.
  2. We list the IP addresses of all the nodes in the cluster so that they find each other.

And the two halves of the cluster are even joining in cluster, so the settings work.

Since the nodes from different network segments do not see each other, it can be assumed that there is a firewall between the segments that does not allow connections on the required ports.

For verification, you can use the article How to run a TCP network test between Confluence or Jira Data Center Nodes.

Raw Main January 30, 2020

Hi Max,

Thank you for your reply.

I will share my network config in bond0 about IPv4:

IPADDR0=10.2.100.1
PREFIX0=25
GATEWAY0=10.2.100.126

It's in node1. Other nodes(2, 3, 4) with the same format.

I checked the firewall rules on these hosts. Didn't find input rules. I'm not sure it's a firewall issue now.

By the way, I got this error on node1 after node3 and node4 joined the cluster:

[Origin node: d62bd1bf listening on /10.2.100.1:5801] Clustered Confluence: Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic. 
Max Malygin January 31, 2020

Hello!

Yes, these errors indicate that the nodes are not clustered, but at the same time they use one database.

This is more likely not the cause of the problem, but the consequences. You should look at earlier messages about attempts to connect one node to other nodes in the cluster.

Additionally, I recommend that you read the articles on the page Confluence Data Center Cluster Troubleshooting.

Raw Main January 31, 2020

Thank you for your answer again.

I tried again and got this error in one of the Region 2 servers:

File: confluence-home/logs/atlassian-confluence.log

...

--------------------------
Parameters
--------------------------
caused by: com.atlassian.util.concurrent.LazyReference$InitializationException: com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
at com.atlassian.util.concurrent.LazyReference.getInterruptibly(LazyReference.java:149)
caused by: com.hazelcast.core.HazelcastInstanceNotActiveException: Hazelcast instance is not active!
at com.hazelcast.instance.HazelcastInstanceProxy.getOriginal(HazelcastInstanceProxy.java:315)

2020-01-31 19:30:19,121 WARN [http-nio-8090-exec-2] [confluence.impl.vcache.SynchronousExternalCache] lambda$get$11 Failed to read entry from cache 'com.atlassian.bandana.BandanaPersister': Failed due to UNCLASSIFIED_FAILURE
-- traceId: 90949ba99d46e6e4
2020-01-31 19:30:19,125 WARN [http-nio-8090-exec-2] [confluence.impl.vcache.SynchronousExternalCache] lambda$get$11 Failed to read entry from cache 'com.atlassian.bandana.BandanaPersister': Failed due to UNCLASSIFIED_FAILURE
-- traceId: 90949ba99d46e6e4
2020-01-31 19:30:19,127 WARN [http-nio-8090-exec-2] [confluence.impl.vcache.SynchronousExternalCache] lambda$get$11 Failed to read entry from cache 'com.atlassian.bandana.BandanaPersister': Failed due to UNCLASSIFIED_FAILURE
-- traceId: 90949ba99d46e6e4

Max Malygin February 2, 2020

Hi!

Before this, there should be messages why the hazelcast service was not started or stopped working. Perhaps the data specified in the confluence.cluster.interface parameter is incorrect or another reason.

Raw Main February 2, 2020

Hi,

Thank you for answer me again.

I restarted these 2 servers again and they can work.

But as the result as before, the other servers in the other region downed due to this message:

Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic. 

I tried telnet and ping between these two kinds of servers in different regions. They can't communicate to each other by IPv4, but can ping with IPv6.

We don't have firewall for that but don't know why they can't communicate.

In our /etc/sysconfig/network-scripts/ifcfg-bond0:

IPADDR0=10.2.100.1
PREFIX0=25
GATEWAY0=10.2.100.126

 I think it's the key point. Without some configuration?

Max Malygin February 5, 2020

Hi, I apologize for the delay.

Unfortunately, I can’t help with the network settings, but the obvious inaccessibility of the ports required for Data Center work is the root of the problem.

I propose first to achieve the work of networking on IPv4 and then return to the launch of Confluence Data Center.

I wish you success!

Raw Main February 5, 2020

Yes, you are right. That's the key problem.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events