Basically we are setting up MySQL Replication with one Master and 1 to N number of slaves.
Using NetScaler we will distribute the across the various databases where all the updates go to the Master and Read (selects) get routed to the Slaves. This part works fine.
The issue comes in when we initiate a failover of the Master and one of the Slave databases become the new Master. When we tried this we hit the error below. We are looking to see if this is even possible or what the best practice may be.
Clustered Confluence: Database is being updated by an instance which is not part of the current cluster. You should check network connections between cluster nodes, especially multicast traffic.
Our current knowledge of Confluence and this issue:
Confluence has a CLUSTERSAFETY table (located in the database). This table exists even for non clustered environments. Every 30 seconds, Confluence checks this table and compares its value with the one it has in memory. If the new value differs from the one in memory, this error appears, and Confluence cannot proceed. This is the cluster safety mechanism.
How the cluster safety mechanism works...
The cluster safety mechanism is designed to ensure that your wiki cannot become inconsistent because updates by one user are not visible to another. A failure of this mechanism is a fatal error in Confluence and is called cluster panic. Because the cluster safety mechanism helps prevents data inconsistency whenever any two copies of Confluence running against the same database, it is enabled in all instances of Confluence, not just clusters.
A scheduled task, ClusterSafetyJob, runs every 30 seconds in Confluence. In a cluster, this job is run only on one of the nodes. The scheduled task operates on a safety number – a randomly generated number that is stored both in the database and in the distributed cache used across a cluster. It does the following:
Generate a new random number
Compare the existing safety numbers, if there is already a safety number in both the database and the cache.
If the numbers differ, publish a ClusterPanicEvent. Currently in Confluence, this causes the following to happen on each node in the cluster:
disable all access to the application
disable all scheduled tasks
In Confluence 5.5 and earlier, update the database safety number to a new value, which will cause all nodes accessing the database to fail. From Confluence 5.6 onwards, the database safety number is not updated, to allow the other Confluence node/s to continue processing requests.
If the numbers are the same or aren't set yet, update the safety numbers:
set the safety number in the database to the new random number
set the safety number in the cache to the new random number.
I'm a little confused on this one.
Confluence clustering was removed in 5.6, so what you're writing about cluster safety numbers and so-on is not relevant.
Could you clarify - are you using Confluence (single node) or Confluence Data Centre?
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.