What are suggestions or best practices using the MySQL Master Server replication with clustered Confluence

April 14, 2015

Basically we are setting up MySQL Replication with one Master and 1 to N number of slaves.

Using NetScaler we will distribute the across the various databases where all the updates go to the Master and Read (selects) get routed to the Slaves. This part works fine.

The issue comes in when we initiate a failover of the Master and one of the Slave databases become the new Master. When we tried this we hit the error below. We are looking to see if this is even possible or what the best practice may be.

Clustered Confluence: Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic.

Our current knowledge of Confluence and this issue:
Confluence has a CLUSTERSAFETY table (located in the database). This table exists even for non clustered environments. Every 30 seconds, Confluence checks this table and compares its value with the one it has in memory. If the new value differs from the one in memory, this error appears, and Confluence cannot proceed. This is the cluster safety mechanism.
How the cluster safety mechanism works...

The cluster safety mechanism is designed to ensure that your wiki cannot become inconsistent because updates by one user are not visible to another. A failure of this mechanism is a fatal error in Confluence and is called cluster panic. Because the cluster safety mechanism helps prevents data inconsistency whenever any two copies of Confluence running against the same database, it is enabled in all instances of Confluence, not just clusters.

A scheduled task, ClusterSafetyJob, runs every 30 seconds in Confluence. In a cluster, this job is run only on one of the nodes. The scheduled task operates on a safety number – a randomly generated number that is stored both in the database and in the distributed cache used across a cluster. It does the following:

Generate a new random number

Compare the existing safety numbers, if there is already a safety number in both the database and the cache.

If the numbers differ, publish a ClusterPanicEvent. Currently in Confluence, this causes the following to happen on each node in the cluster:

disable all access to the application

disable all scheduled tasks

In Confluence 5.5 and earlier, update the database safety number to a new value, which will cause all nodes accessing the database to fail. From Confluence 5.6 onwards, the database safety number is not updated, to allow the other Confluence node/s to continue processing requests.

If the numbers are the same or aren't set yet, update the safety numbers:

set the safety number in the database to the new random number

set the safety number in the cache to the new random number.

Forums

Product Q&A

Community resources

Support

Top groups

Community resources

Support

Learn

Community resources

Support

Events

Community resources

Support

What are suggestions or best practices using the MySQL Master Server replication with clustered Confluence

2 answers

Suggest an answer

Was this helpful?

Thanks!

TAGS

Atlassian Community Events