Error when shutting down Confluence before backup

Andrew_Hall October 4, 2018

Hi Everyone,

 

When shutting down Confluence prior to application backup I often receive the following error:

 

Using CATALINA_BASE:   /opt/atlassian/confluence 
Using CATALINA_HOME:   /opt/atlassian/confluence
Using CATALINA_TMPDIR: /opt/atlassian/confluence/temp
Using JRE_HOME:        /opt/atlassian/confluence/jre/
Using CLASSPATH:       /opt/atlassian/confluence/bin/bootstrap.jar:/opt/atlassian/confluence/bin/tomcat-juli.jar
Using CATALINA_PID:    /opt/atlassian/confluence/work/catalina.pid
Tomcat did not stop in time.
To aid diagnostics a thread dump has been written to standard out.
Killing Tomcat with the PID: 1359
The Tomcat process has been killed.

 I have encountered this issue before and had previously been advised by Atlassian support to modify /opt/atlassian/confluence/bin/stop-confluence.sh to change the default shutdown timeout (from 20 seconds to):

 

exec $PRGDIR/shutdown.sh 120 -force $@

and

$sucmd -m $CONF_USER -c "$PRGDIR/shutdown.sh 120 -force $@"

 

This does not happen all the time but seems to be largely dependent on how long Confluence has been running. The above result was after 7 days of operation.

Additionally, I have been advised previously that it is best practice for the Confluence services to be stopped while taking a production backup ie stop confluence service, mysqldump, tar home and application paths, start confluence service. Are there any other options for doing this? Forcibly killing Tomcat all the time seems less than ideal and I'm a little afraid of data not being successfully flushed to disk by the time the stop-confluence.sh script forcibly kills Tomcat.

I have also tried setting a 180 second timeout in the shutdown script but the issue still occurs. Are there any additional diagnostic steps that can be done to figure out why Tomcat is not exiting cleanly? The server only receives very light traffic and there were zero people connected/editing at the time of the above shutdown.

 

Any advise you could provide would be much appreciated.


Regards

 

Andrew Hall

 

 

1 answer

1 vote
Daniel Eads
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 5, 2018

Hey Andrew,

Firstly with the shutdown command - this happens when the plugin subsystem is still running. It's really nothing to worry about. The main application has stopped serving requests at this point and we literally kill the pid so that the process doesn't just run in perpetuity waiting for non-responsive plugins to say they're dead. It's safe to not be concerned about that, and you can even set the kill timeout down lower if you want. We default to 20 seconds as you initially noted.

Now for backups! Mysqldump will take a hot backup with innodb (the database type we use for Confluence). You can also for the most part take hot backups of the home and install directories. So for daily backup purposes, you don't need to bring Confluence down to take a backup.

Restoring can get a little hairy though - there is  lock file that will be written to your backup (unless you take specific steps to omit it). When restoring and trying to bring Confluence up from a full backup, you'll need to delete the lock file before Confluence will start. It's at:

<confluence-home>/lock

Confluence will throw an error if you try and start it up and the lock file is still in place, so you will get some help if you forget to remove it. But you could also use flags in your tar command to omit it from your backup as well. In either case, no need to fully shut down the application to take a backup!

Cheers,
Daniel 

Andrew_Hall October 6, 2018

Hi Daniel,

Thanks heaps for the response!

I know its a bit out of scope of the question above, but does what you say about not needing be too concerned about shutting down / not requiring the shutdown of the application apply to Bitbucket and Jira as well?

Regards,

Andrew

Daniel Eads
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 8, 2018

Hey Andrew!

I'm sure it is "expected" on Jira but just double-checked with a Bitbucket Server-specific engineer. It's expected across all three (although the default wait period on Bitbucket is 30 seconds).

Cheers,
Daniel

Andrew_Hall October 10, 2018

So its safe to take a tar/file level backup of the home and application folders for Bitbucket and Jira without shutting down the app first?

Daniel Eads
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 11, 2018

I've personally done this for Jira and it's not a problem. What you will see if you restore the backup is that Jira will complain the index is in an inconsistent state with the database and it will want you to do a locking reindex of all your issues. Depending on how many issues/plugins you've got in your Jira, this could take minutes to hours.

Given the tradeoff between reindexing and downtime every time you take a backup, it's preferable to reindex when restoring from a live backup.

Andrew_Hall October 15, 2018

And Bitbucket is somewhat the same?

Daniel Eads
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
October 16, 2018

Yes.

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events