Hi, we are hosting our confluence inside GKE.
It has been running smoothly so far, but we got the issue with scaling Confluence (running in multiple pods)
The cluster panic event occurs everytime we scale Statefulset to more than 1 replicas. It may be caused by multicast connectivity between pods (Confluence Nodes).
We took a look at Confluence Discovery strategy (multicast, tcp/ip) but none of them works.
- VPC in GCP does not support Multicast
- TCP/IP requires fixed IP of pods, which we don't know until it's scaled.
Could anyone please suggest a workaround for this problem?
Please let me know if I should provide any more detail information.
We don't really need AUTO scaling, just be able to run mulitple Confluence instance should be ok.
Thank you guys very much !
Hello Phong, I was wondering if you were able to resolve this issue since I'm facing the exact same issue and still not able to resolve it.
if there are any tips or articles that helped you it would be much appreciated if you share them with me since we're currently stuck on this point in our new setup.
Thanks,
Qusai Atoon.
Hey @Phong Vũ Quốc ,
I also deployed confluence DC in kubernetes and I found out that instead of multicast or tcp/ip you can use “kubernetes” as your join type for discovering the other pods in the cluster.
you can change this configuration in your confluence.cfg.xml file inside the pod and do a rollback restart to the statefulset.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Ariel,
Thanks for sharing the joining type "kubernetes" which made me able to start the cluster with one node, but after scaling up to 2 replicas I still have the same ClusterPanicEvent.
Can you please share the source where you got the information about the joining type because we weren't able to find any documentation about this type of setup.
Thanks,
Qusai Atoon.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Phong Vũ Quốc Confluence DC uses Hazelcast K8s discovery method. How did you deploy your Confluence DC? Official Helm charts? K8s discovery method is configured by default.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi @Yevhen ,
Thank you so much for quick response.
Yes I used Helm chart (latest version) to deploy Confluence, and leave Clustering-related values as default. (clustering enabled and use pod name as node name)
I have tried to deploy Confluence chart with 1 replica intially -> complete initial setup -> increase the replica to 2.
And I got com.atlassian.confluence.cluster.safety.ClusterPanicEvent after a while, and both pods become NotReady.
│ confluence 2022-03-10 07:51:32,286 ERROR [hz.confluence.cached.thread-5] [confluence.cluster.safety.ClusterPanicListener] onClusterPanicEvent Received a panic event, stopping processing o │
│ n the node: [Origin node: d5f116a0 listening on /172.16.8.38:5701] Clustered Confluence: Database is being updated by an instance which is not part of the current cluster. You should chec │
│ k network connections between cluster nodes, especially multicast traffic. │
│ confluence -- event: com.atlassian.confluence.cluster.safety.ClusterPanicEvent[source=null] | originatingMemberUuid: cd76c42c-95ad-42cd-834b-c75c65030f82 │
│ confluence 2022-03-10 07:51:32,288 WARN [hz.confluence.cached.thread-5] [confluence.cluster.safety.ClusterPanicListener] onClusterPanicEvent Shutting down scheduler │
│ confluence -- event: com.atlassian.confluence.cluster.safety.ClusterPanicEvent[source=null] | originatingMemberUuid: cd76c42c-95ad-42cd-834b-c75c65030f82 │
│ confluence 2022-03-10 07:51:34,289 WARN [hz.confluence.cached.thread-4] [internal.cluster.impl.MembershipManager] log [172.16.11.63]:5701 [confluence-test] [3.12.11] Member [172.16.8.38]: │
│ 5701 - cd76c42c-95ad-42cd-834b-c75c65030f82 is suspected to be dead for reason: No connection │
│ confluence 2022-03-10 07:51:34,296 INFO [hz.confluence.event-2] [confluence.cluster.hazelcast.LoggingClusterMembershipListener] memberRemoved [172.16.8.38]:5701 left the cluster │
│ confluence 2022-03-10 07:51:34,296 INFO [hz.confluence.event-2] [confluence.cluster.hazelcast.LoggingClusterMembershipListener] logClusterMembers Cluster now has 1 members: [[172.16.11.63 │
│ ]:5701]
Below is some addtional information (which I am not sure if it's related)
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Phong Vũ Quốc to me it looks like the database PV wasn't flushed and somehow you are using an existing database. It's just a theory though. This KB may help a bit. I failed to reproduce it in my lab cluster. Perhaps, it's worth trying running a test deployment with an in memory database just to see if it makes a difference?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@YevhenHi, thank you really much for the advice.
I've tried to clean everything (include PV provisioned by Confluence helm chart) but ClusterPanicEvent still happens.
I haven't figured out how to deploy confluence cluster in in-memory mode with Helm chart (currently, values supported in ```database``` values only are JDBC db type)
May I have your ```values.yaml``` file and other configuration files that you used in your lab environment? It would be very helpful for me.
Thanks again, Yevhen.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Phong Vũ Quốc I deployed with pretty much standard values.
Can you share your confluence statefulset yaml and your values file?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Also, there's another article https://confluence.atlassian.com/confkb/confluence-will-not-start-due-to-fatal-error-in-confluence-cluster-179439771.html which can help
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
@Yevhen May I also know how did you scale your confluence replicas?
Was it done after you've completed initial setup or it was done at the time you applied helm chart?
Here is my values file, I redacted the hostname/url related btw.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I have checked your values yaml. Nothing special in there. And yes, the typical way to scale Confluence is to deploy with 1 replica and then scane the statefulset directly or updated replicas in values and helm upgrade it.
Have you checked the link I have checked in a previous comment? I wonder if diagnosis and troubleshooting section is of any help.
It'd be also great to have complete logs from the two Confluence pods.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Just to confirm, you are using Confluence Data Center? You've mentioned server, not DC, so I want to check.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.