Create
cancel
Showing results for 
Search instead for 
Did you mean: 
Sign up Log in

2 Crucible Instances on 1 Database

SomnathK April 24, 2014
We are trying to explore resiliency options for Crucible We are looking at the following configuration 2 Instances in Hot Warm Mode 2nd Crucible Instance will not be ópen for users We will keep it running as we want the Fisheye Cache to updated 2nd Instance will be configured with same data In case of primary node failure we would failover to 2nd node Thoughts?

1 answer

0 votes
Piotr Swiecicki
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
April 24, 2014

Hi Somnath,

I am assuming you want to configure both instances to connect to a single shared database, right? Otherwise they would not share reviews data etc.

I don't think it is a good idea to keep hot failover server connected to the same database. That would result in duplicate emails being sent with review reminders for example. Also, some of the sequences like review perm id generator are kept in memory, so keeping two instances connected to the same database would result in review perm id collisions. What about keeping it as a cold failover server though? You can keep it fully configured and ready for start, but not actually started until the primary server crashed.

Finally, you also need to ensure file system is synchronised between two servers. Bear in mind NFS is not supported, so you may want to set up some rsync synchronisation to run periodically from primary node to the failover one. And obviously such synchronisation would need to be run again if primary server crashed, assuming you can still access its file systems.

Hope that helps,
Piotr

SomnathK April 28, 2014

Hi Piotr,

Thanks for your thought. We currently have a similar setup as you said.

The secondary instance is Cold and we run a "rsync" between the 2 Filesystems to keep the DATADIR in-sync. But for last few Months we have see the Data on the Primary is getting corrupted ( See : https://support.atlassian.com/servicedesk/customer#fsh/problem-report-13977)

Once we stop the "rsync" the error did not return back.

Though theoritically we know that "rsync" would not change/put lock/corrupt the Source at all.

Moreover it is also true, that "rsync"-ing a running Instance of Fishye cache may give us a cache in an unstable state in the Secondary and we may not be able to fully recover as the cache that is carried over to secondary ( while the Fisheye was still running in the Primary ) may be is "in-process" kind of state.

So considering the above scenarios/conditions we are looking for an Alternative way to have an alternative way to handle resiliency.

Your thoughts ?

Foong
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
April 28, 2014

Hi Somnath,

Please check FishEye Repository indexing state with the REST API: https://docs.atlassian.com/fisheye-crucible/latest/wadl/fecru.html#d2e40

If it is not indexing, then rsync the cache

SomnathK April 28, 2014

Hi Foong,

Yeah that is one possibility for syncing var/cache/<REPO> folders.

But what about the Global Cache Indexes ? ( INST_FOLDER/cache/glo* ) ?

Regards,

Somnath

Foong
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
April 28, 2014

INST_FOLDER/cache/globalfe will be generated automatically from existing Repository Cache file during startup. You can delete INST_FOLDER/cache/globalfe before starting up 2nd FishEye/Crucible to let it re-generate again.

INST_FOLDER/cache/cruidx is cache file from Crucible Reviews. It is faster to be index compared to FishEye Repository. You can just index it at Administration > Global Settings > Crucible after 2nd FishEye/Crucible is started

By the way, we can use this REST API to check FishEye Repository indexing state too: https://docs.atlassian.com/fisheye-crucible/latest/wadl/fecru.html#rest-service-fecru:indexing-status-v1:status:repoName

Piotr Swiecicki
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
April 28, 2014

Also, you could consider doing snapshot on the file system level. Assuming you are using Linux you could consider using LVM or NILFS to snapshot whole file system atomically. These are just the first 2 I googled quickly, I am sure there are other alternatives available too.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events