Database connection conundrum

Steven Mustari
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 17, 2021

I understand before asking this questions much, much, more information to get an accurate answer, but I think some one could likely provide more insight to this situation, and any additional insight would be appreciated.

Contextural information:

We have a relatively complex server instance of JIRA/JIRA Service Management. Running on Windows Server as a service with a MSQL database. To give a bit of context: ~700 custom-fields, 30-40 actively managed workflows, many scripted fields, jobs, listeners, many behaviours, and a few fragments. We have many mail handlers consuming email from various locations, as well as a Service Desk mail handling. We generate around 75k-100k issues a year in all projects. We have about 25ish applications the big ones being:

Tempo
Structure
Scriptrunner
JMWE
Zephyr
Deviniti Extensions (Bundled fields, Queues)
JIRA Misc Custom Fields
In-Mail Handler
The Scheduler

Many others, but those being the biggest impacted I'm guessing.

A bit more before the questions...

We have some custom Powershell scripts ran weekly to copy the production instance and Database weekly. It copies the production JIRA data over to a test environment, and the Production database, we insert Dev keys for everything pre-boot, and then the instance boots up licensed Dev with all production data intact.

For the second time now, we've had a failure in this process which resulted in dbconfig config copy failures and inadvertently ended up with both Prod and Test JIRA pointed to the same DB for a time. The last time this happened it was ~12 hours. The most recent occurrence of this was closer to 72 hours as it included a weekend.

The Question(s)

Luckily not much activity takes place in the test instance, I am not so much worried about the data integrity from changes happening there. My big question is the long term impact from something like this happening.

Noticeable effects that I've seen both times prior to a disconnect and reboot are: Mail Handlers fail to work properly, anything dependent on cron statements tends to fail, configuration settings from test seem to at times override production config settings.

I've spoken with Atlassian support on this, and obviously the recommendations were to rollback, and little insight could be given to the impact of the 3rd party application configurations. Does anything stick out glaringly as a long term problem once the initial cause is remediated, and everything is pointed back to where it belongs and restarted?


Thank you for any insight on this topic, again I know a lot of specifics would be needed to truly understand all the impact.


2 answers

2 accepted

2 votes
Answer accepted
Daniel Ebers
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 23, 2021

Hi @Steven Mustari

for the unfortunate current situation you received many valid information from Nic.

For the future I am wondering if it makes sense to bring up some kind of firewalling between prod and non-prod environment.

Any other solution will work as well - basically anything that prevents an access from non-production environment to live database, even in case of a errorneous configuration.

Regards,
Daniel

Steven Mustari
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 24, 2021

Ultimately after reviewing the situation this is what we decided to do. Thank you for the additional suggestion.

2 votes
Answer accepted
Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
May 17, 2021

There's one very simple thing that stands out to me:

>which resulted in dbconfig config copy failures and inadvertently ended up with both Prod and Test JIRA pointed to the same DB for a time.

Why in heck's name are you copying production settings to a test system?

By all means, copy the database to a test system, and the attachments, and do all the things to isolate test from everywhere else, but why are you copying the database connection to production over to a test system?

The thing that sticks out as a long term problem is "stop connecting test systems to your production database"

I know that's quite a harsh and bloody-minded attitude, but I really can't see why you would do this, or think it's a useful way to get a test system.  This is not as complex as you seem to think - just stop connecting test to live and you are ok.

Steven Mustari
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 18, 2021

Thank you for the blunt answer, and I 100% and wholeheartedly agree, maybe I did a bad job explaining.

We never had purposely done this, there is some process error on the team here that has inadvertently caused this to happen, twice now. The scripting they did was to exclude all config settings files but somehow they were copied over. (outside of my scope)

My question above is very specific to the potential impacts inside JIRA from this happening. I think the feeling is mutual that this is kind of insane.

Nic Brough -Adaptavist-
Community Leader
Community Leader
Community Leaders are connectors, ambassadors, and mentors. On the online community, they serve as thought leaders, product experts, and moderators.
May 18, 2021

Ok, the potential impacts are very simple - a totally corrupt data set that initially appears to be ok, but fails later, is the worst case.

You absolutely need to go back to the backup you took before this error was made, or you'll never know if your data was damaged in ways that are going to bite you later.   I've seen something like this done in Jira break the attempted upgrade a year later, by which time of course, it's far too late to roll back to a backup.  

Steven Mustari
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
May 18, 2021

This was my fear, assumption, initial suggestion, and what I'll be trying to convey to management.

Thank you @Nic Brough -Adaptavist- 

Suggest an answer

Log in or Sign up to answer