My service start/stop script stops the application process but doesn't release sockets held by a child process. Help?!

no_longer_in_sudoers_file
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 26, 2016

Background

Ok.  So the question I am going to propose tonight is straight from the actual problems that come across my desk here within the walls of Atlassian.  I figured I would share this with the community to help save someone else the pain of experiencing this.

(1) There exists a start/stop script for an application.  It starts an application process as a service user account and when appropriate a stop command will stop the application process.  

(2) However, recently when the start and stop did not correspond with a startup and shutdown of the underlying machine, the application could not be started after it was stopped.

(3) Log analysis revealed the following:

SEVERE: Failed to initialize end point associated with ProtocolHandler ["http-bio-127.0.0.1-8080"]

java.net.BindException: Address already in use /127.0.0.1:8080

(4) After stopping the application, netstat showed: 

tcp6       0      0 127.0.0.1:8080          :::*          LISTEN      23681/java      
tcp6       0      0 127.0.0.1:9080          :::*          LISTEN      23681/java      
tcp6       0      0 127.0.0.1:8005          :::*          LISTEN      23681/java      
tcp6       0      0 :::8009                 :::*          LISTEN      23681/java

(5) The problem is that the PID_FILE for the application process (before stop) had contained a different process id: 

#cat $PID_FILE
23680

(6) After shutdown, process id 23680 had been stopped successfully

(7) If process id 23681 is killed manually, the application will start without issue.

Problem Statement

The application in question does not shutdown properly, leaving a child process running as a daemon with TCP sockets open.  This prevents the application from being restarted without a manual process termination or a restart of the underlying machine.

1 answer

1 accepted

2 votes
Answer accepted
no_longer_in_sudoers_file
Atlassian Team
Atlassian Team members are employees working across the company in a wide variety of roles.
August 26, 2016

Root Cause:

The init script that starts/stops the application terminates the parent process with "kill -9 <PID>" rather than a more graceful "kill <PID>"

The application does not have a chance to cleanup after itself.

Solution:

Change the init script to use 

kill $(cat $PID_FILE)

rather than 

kill -9 $(cat $PID_FILE)
Steven F Behnke
Rising Star
Rising Star
Rising Stars are recognized for providing high-quality answers to other users. Rising Stars receive a certificate of achievement and are on the path to becoming Community Leaders.
August 26, 2016

neat.png

Suggest an answer

Log in or Sign up to answer
TAGS
AUG Leaders

Atlassian Community Events