My service start/stop script stops the application process but doesn't release sockets held by a child process. Help?!

Sam Caldwell Atlassian Team Aug 26, 2016

Background

Ok.  So the question I am going to propose tonight is straight from the actual problems that come across my desk here within the walls of Atlassian.  I figured I would share this with the community to help save someone else the pain of experiencing this.

(1) There exists a start/stop script for an application.  It starts an application process as a service user account and when appropriate a stop command will stop the application process.  

(2) However, recently when the start and stop did not correspond with a startup and shutdown of the underlying machine, the application could not be started after it was stopped.

(3) Log analysis revealed the following:

SEVERE: Failed to initialize end point associated with ProtocolHandler ["http-bio-127.0.0.1-8080"]

java.net.BindException: Address already in use /127.0.0.1:8080

(4) After stopping the application, netstat showed: 

tcp6       0      0 127.0.0.1:8080          :::*          LISTEN      23681/java      
tcp6       0      0 127.0.0.1:9080          :::*          LISTEN      23681/java      
tcp6       0      0 127.0.0.1:8005          :::*          LISTEN      23681/java      
tcp6       0      0 :::8009                 :::*          LISTEN      23681/java

(5) The problem is that the PID_FILE for the application process (before stop) had contained a different process id: 

#cat $PID_FILE
23680

(6) After shutdown, process id 23680 had been stopped successfully

(7) If process id 23681 is killed manually, the application will start without issue.

Problem Statement

The application in question does not shutdown properly, leaving a child process running as a daemon with TCP sockets open.  This prevents the application from being restarted without a manual process termination or a restart of the underlying machine.

1 answer

1 accepted

2 votes
Accepted answer
Sam Caldwell Atlassian Team Aug 26, 2016

Root Cause:

The init script that starts/stops the application terminates the parent process with "kill -9 <PID>" rather than a more graceful "kill <PID>"

The application does not have a chance to cleanup after itself.

Solution:

Change the init script to use 

kill $(cat $PID_FILE)

rather than 

kill -9 $(cat $PID_FILE)
Steven Behnke Community Champion Aug 26, 2016

neat.png

Suggest an answer

Log in or Sign up to answer
Community showcase
Posted Tuesday in United States

Topic Tuesday: What's your favorite topic?

Good morning All, Our goal is to get you into the habit of while enjoying your favorite morning drink you are checking the NOVA "space" for topics and comments. Your input is really needed and...

49 views 4 0
View post

Atlassian User Groups

Connect with like-minded Atlassian users at free events near you!

Find a group

Connect with like-minded Atlassian users at free events near you!

Find my local user group

Unfortunately there are no AUG chapters near you at the moment.

Start an AUG

You're one step closer to meeting fellow Atlassian users at your local meet up. Learn more about AUGs

Groups near you