Forums

Articles
Create
cancel
Showing results for 
Search instead for 
Did you mean: 

What are suggestions or best practices using the MySQL Master Server replication with clustered Confluence

Logan B April 14, 2015

Basically we are setting up MySQL Replication with one Master and 1 to N number of slaves.

Using NetScaler we will distribute the across the various databases where all the updates go to the Master and Read (selects) get routed to the Slaves. This part works fine.

The issue comes in when we initiate a failover of the Master and one of the Slave databases become the new Master. When we tried this we hit the error below. We are looking to see if this is even possible or what the best practice may be.

Clustered Confluence: Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic.

Our current knowledge of Confluence and this issue: 
Confluence has a CLUSTERSAFETY table (located in the database). This table exists even for non clustered environments. Every 30 seconds, Confluence checks this table and compares its value with the one it has in memory. If the new value differs from the one in memory, this error appears, and Confluence cannot proceed. This is the cluster safety mechanism.
How the cluster safety mechanism works...

The cluster safety mechanism is designed to ensure that your wiki cannot become inconsistent because updates by one user are not visible to another. A failure of this mechanism is a fatal error in Confluence and is called cluster panic. Because the cluster safety mechanism helps prevents data inconsistency whenever any two copies of Confluence running against the same database, it is enabled in all instances of Confluence, not just clusters.

A scheduled task, ClusterSafetyJob, runs every 30 seconds in Confluence. In a cluster, this job is run only on one of the nodes. The scheduled task operates on a safety number – a randomly generated number that is stored both in the database and in the distributed cache used across a cluster. It does the following:

Generate a new random number

Compare the existing safety numbers, if there is already a safety number in both the database and the cache.

If the numbers differ, publish a ClusterPanicEvent. Currently in Confluence, this causes the following to happen on each node in the cluster:

disable all access to the application

disable all scheduled tasks

In Confluence 5.5 and earlier, update the database safety number to a new value, which will cause all nodes accessing the database to fail. From Confluence 5.6 onwards, the database safety number is not updated, to allow the other Confluence node/s to continue processing requests.

If the numbers are the same or aren't set yet, update the safety numbers:

set the safety number in the database to the new random number

set the safety number in the cache to the new random number.

2 answers

0 votes
Logan B April 17, 2015

Data center

0 votes
Nic Brough -Adaptavist-
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
April 16, 2015

I'm a little confused on this one.

Confluence clustering was removed in 5.6, so what you're writing about cluster safety numbers and so-on is not relevant.  

Could you clarify - are you using Confluence (single node) or Confluence Data Centre?

Logan B April 23, 2015

yes, we are using Confluence Data Center, sorry about that

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events