Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in
Celebration

Earn badges and make progress

You're on your way to the next level! Join the Kudos program to earn points and save your progress.

Deleted user Avatar
Deleted user

Level 1: Seed

25 / 150 points

Next: Root

Avatar

1 badge earned

Collect

Participate in fun challenges

Challenges come and go, but your rewards stay with you. Do more to earn more!

Challenges
Coins

Gift kudos to your peers

What goes around comes around! Share the love by gifting kudos to your peers.

Recognition
Ribbon

Rise up in the ranks

Keep earning points to reach the top of the leaderboard. It resets every quarter so you always have a chance!

Leaderboard

Come for the products,
stay for the community

The Atlassian Community can help you and your team get more value out of Atlassian products and practices.

Atlassian Community about banner
4,459,752
Community Members
 
Community Events
176
Community Groups

Confluence clustering in kubernetes (GKE)

Edited

Hi, we are hosting our confluence inside GKE.

It has been running smoothly so far, but we got the issue with scaling Confluence (running in multiple pods)

The cluster panic event occurs everytime we scale Statefulset to more than 1 replicas. It may be caused by multicast connectivity between pods (Confluence Nodes).

We took a look at Confluence Discovery strategy (multicast, tcp/ip) but none of them works.

- VPC in GCP does not support Multicast

- TCP/IP requires fixed IP of pods, which we don't know until it's scaled.

Could anyone please suggest a workaround for this problem?

Please let me know if I should provide any more detail information.

We don't really need AUTO scaling, just be able to run mulitple Confluence instance should be ok.

Thank you guys very much !

3 answers

0 votes
Ariel eli I'm New Here Sep 15, 2022

Hey @Phong Vũ Quốc ,

I also deployed confluence DC in kubernetes  and I found out that instead of multicast or tcp/ip you can use “kubernetes” as your join type for discovering the other pods in the cluster.

you can change this configuration in your confluence.cfg.xml file inside the pod and do a rollback restart to the statefulset.

0 votes
Yevhen Atlassian Team Mar 09, 2022

@Phong Vũ Quốc Confluence DC uses Hazelcast K8s discovery method. How did you deploy your Confluence DC? Official Helm charts? K8s discovery method is configured by default.

Hi @Yevhen ,

Thank you so much for quick response.

Yes I used Helm chart (latest version) to deploy Confluence, and leave Clustering-related values as default. (clustering enabled and use pod name as node name)

I have tried to deploy Confluence chart with 1 replica intially -> complete initial setup -> increase the replica to 2.

And I got com.atlassian.confluence.cluster.safety.ClusterPanicEvent after a while, and both pods become NotReady.

│ confluence 2022-03-10 07:51:32,286 ERROR [hz.confluence.cached.thread-5] [confluence.cluster.safety.ClusterPanicListener] onClusterPanicEvent Received a panic event, stopping processing o │
│ n the node: [Origin node: d5f116a0 listening on /172.16.8.38:5701] Clustered Confluence: Database is being updated by an instance which is not part of the current cluster. You should chec │
│ k network connections between cluster nodes, especially multicast traffic. │
│ confluence -- event: com.atlassian.confluence.cluster.safety.ClusterPanicEvent[source=null] | originatingMemberUuid: cd76c42c-95ad-42cd-834b-c75c65030f82 │
│ confluence 2022-03-10 07:51:32,288 WARN [hz.confluence.cached.thread-5] [confluence.cluster.safety.ClusterPanicListener] onClusterPanicEvent Shutting down scheduler │
│ confluence -- event: com.atlassian.confluence.cluster.safety.ClusterPanicEvent[source=null] | originatingMemberUuid: cd76c42c-95ad-42cd-834b-c75c65030f82 │
│ confluence 2022-03-10 07:51:34,289 WARN [hz.confluence.cached.thread-4] [internal.cluster.impl.MembershipManager] log [172.16.11.63]:5701 [confluence-test] [3.12.11] Member [172.16.8.38]: │
│ 5701 - cd76c42c-95ad-42cd-834b-c75c65030f82 is suspected to be dead for reason: No connection │
│ confluence 2022-03-10 07:51:34,296 INFO [hz.confluence.event-2] [confluence.cluster.hazelcast.LoggingClusterMembershipListener] memberRemoved [172.16.8.38]:5701 left the cluster │
│ confluence 2022-03-10 07:51:34,296 INFO [hz.confluence.event-2] [confluence.cluster.hazelcast.LoggingClusterMembershipListener] logClusterMembers Cluster now has 1 members: [[172.16.11.63 │
│ ]:5701]

 

Below is some addtional information (which I am not sure if it's related)

  • I deployed Confluence and its DB from scratch (everything from previous installation were purged)
Yevhen Atlassian Team Mar 14, 2022

@Phong Vũ Quốc  to me it looks like the database PV wasn't flushed and somehow you are using an existing database. It's just a theory though.  This KB may help a bit. I failed to reproduce it in my lab cluster.  Perhaps, it's worth trying running a test deployment with an in memory database just to see if it makes a difference?

@YevhenHi, thank you really much for the advice.

I've tried to clean everything (include PV provisioned by Confluence helm chart) but ClusterPanicEvent still happens.

I haven't figured out how to deploy confluence cluster in in-memory mode with Helm chart (currently, values supported in ```database``` values only are JDBC db type)

May I have your ```values.yaml``` file and other configuration files that you used in your lab environment? It would be very helpful for me.

Thanks again, Yevhen.

Yevhen Atlassian Team Mar 23, 2022

@Phong Vũ Quốc  I deployed with pretty much standard values. 

Can you share your confluence statefulset yaml and your values file?

@Yevhen  May I also know how did you scale your confluence replicas?

Was it done after you've completed initial setup or it was done at the time you applied helm chart?

Here is my values file, I redacted the hostname/url related btw.

https://pastebin.com/hzatQJpU

Yevhen Atlassian Team Mar 23, 2022

I have checked your values yaml. Nothing special in there. And yes, the typical way to scale Confluence is to deploy with 1 replica and then scane the statefulset directly or updated replicas in values and helm upgrade it.

Have you checked the link I have checked in a previous comment? I wonder if diagnosis and troubleshooting section is of any help.

It'd be also great to have complete logs from the two Confluence pods.

0 votes

Just to confirm, you are using Confluence Data Center?  You've mentioned server, not DC, so I want to check.

Yes, that's correct.

Mine is Confluence Data Center.

Suggest an answer

Log in or Sign up to answer
TAGS

Atlassian Community Events