Confluence docker data center setup in AWS

radhika.punchepady September 18, 2020

I would really appreciate anyone's help if you were successful in setting up confluence datacenter in AWS environment using docker. (Due to organization requirements, we cannot use Atlassian provided AWS quick implementation ASI).

Details:

Our setup of environment is as follows:
1. Confluence software docker image from artifactory.cd-tech26.de/docker/atlassian/confluence-server is used for installing confluence.
2. The docker is installed in 3 different ec2 machines and cluster configuration is gives as AWS
<property name="confluence.cluster">true</property>
<property name="confluence.cluster.aws.access.key"></property>
<property name="confluence.cluster.aws.host.header"></property>
<property name="confluence.cluster.aws.iam.role">XXAWS_IAM_ROLEXX</property>
<property name="confluence.cluster.aws.region">eu-XXXX</property>
<property name="confluence.cluster.aws.secret.key"></property>
<property name="confluence.cluster.aws.security.group.name">XX-sec-default-XXX</property>
<property name="confluence.cluster.aws.tag.key">key</property>
<property name="confluence.cluster.aws.tag.value">value</property>
<property name="confluence.cluster.home">XX/shared_confluence/XX</property>
<property name="confluence.cluster.join.type">aws</property>
<property name="confluence.cluster.name">confluence_cluster</property>

(XX is added in order to remove actual value which we have given)
2. The node 1 is made up and we setup license and a admin account. Then other 2 nodes (with same docker confogurations) are made up and running without any issues.
3. Now, I can restart the docker one by one in cluster and make them join the cluster back. But if I completely shutdown the cluster (all nodes down), then I cannot start the cluster again by starting docker. It always ends with node failed to start error.

The error starts with hazelcast retrying the connection to other servers on port 5801(which was verified working)
"closed. Reason: Exception in Connection[id=344, /172.17.0.2:59510\->/10.51.110.199:5801, endpoint=\[xx.xx.xx.xx]:5801, alive=true, type=NONE], thread=hz.confluence.IO.thread-in-1 java.io.IOException: Connection reset by peer"
This warning continuously appear for a while and then end with below error:
2020-08-27 15:24:44,201 ERROR [Catalina-utility-1] [com.hazelcast.instance.Node] log [xx.xx.xx.xx]]:5801 [confluence_cluster] [3.11.6] Could not join cluster. Shutting down now!
2020-08-27 15:24:44,207 WARN [Catalina-utility-1] [com.hazelcast.instance.Node] log [xx.xx.xx.xx]]:5801 [confluence_cluster] [3.11.6] Terminating forcefully...
2020-08-27 15:24:44,274 WARN [Catalina-utility-1] [com.hazelcast.util.PhoneHome] log [xx.xx.xx.xx]]:5801 [confluence_cluster] [3.11.6] Could not schedule phone home task! Most probably Hazelcast failed to start.
2020-08-27 15:24:44,280 ERROR [Catalina-utility-1] [atlassian.confluence.setup.ConfluenceConfigurationListener] contextInitialized An error was encountered while bootstrapping Confluence (see below):
Node failed to start!
java.lang.IllegalStateException: Node failed to start!

This error appears in all nodes even if we clear the caches, logs and other directories and try to do a fresh restart. Effectively, none of the nodes will start and we have to rebuild the servers once again and clear DB to make it work. Is there something we are missing or is it expected behaviour of confluence cluster?

1 answer

1 accepted

1 vote
Answer accepted
radhika.punchepady September 28, 2020

We switched docker to use host network and this issue doesn't occur anymore. The cluster can be restarted/redeployed without any issues.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events