Starting and stopping java services remotely via ssh is a simple process, however in the situation where you need to manipulate or understand active processes remotely, things are more difficult.
Script Base
This problem arose for me when a newly written Tomcat UI suite was failing to shut down gracefully, so required the remote killing of specific processes in order to bounce the UI services. The Tomcat ports were built across two separate LPARs:
- HOST1:7510
- HOST1:7520
- HOST2:7510
- HOST2:7520
On an individual LPAR basis, the Tomcat instances all conformed to the same folder structure, /opt/UI/tomcat-PORT/
(where PORT
is the port number). As such, a two-dimensional array of hosts and ports was deemed the easiest way to control turning specific ports on and off:
_connections="HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520"
So the basics of dissecting this _connections
variable therefore would form the main portion of our script, let’s call it manipulateTomcatServices.sh
(note that the _connections
and _action
variables can later become an input):
#!/bin/bash _connections="HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520" _action="stop" _failure=0 echo "Manipulate Tomcat Service for connections '${_connections}' and action'${_action}' STARTED" for _connection in ${_connections//,/ } do _newconnection=TRUE for _part in ${_connection//:/ } do if [ ${_newconnection} == "TRUE" ] then _host=${_part} else _port=${_part} fi _newconnection=FALSE done echo "Performing '${_action}' action on host '${_host}', port '${_port}'..." # SEE NEXT SECTION # _response=$? if [ ${_response} != 0 ];then _failure=1 fi done _end="Manipulate Tomcat Service for connections '${_connections}' and action'${_action}'" if [ ${_failure} = 0 ];then echo "${_end} COMPLETE" else echo "${_end} FAILED" exit 1 fi
So far, this script will produce the following output:
Manipulate Tomcat Service for connections 'HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520' and action'stop' STARTED Performing 'stop' action on host 'HOST1', port '7510'... Performing 'stop' action on host 'HOST1', port '7520'... Performing 'stop' action on host 'HOST2', port '7510'... Performing 'stop' action on host 'HOST2', port '7520'... Manipulate Tomcat Service for connections 'HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520' and action'stop' COMPLETE
Creating the Connection and Finding the Process
Once we have our host and port, we can initiate our connection and find our initial process ID that we want to work with (where “USER
” is the username of the remote host):
ssh -T USER@${_host} <<SSHUI if [ ! -d /opt/UI/tomcat-${_port}/bin/ ] then exit 1 else _processID=\$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}') # SEE NEXT SECTION # fi SSHUI
Once the ssh connection is made, everything between the two SSHUI
codewords will be executed on the remote host. The key to variable manipulation is when to cancel the $
reserved character, any variable without this cancelled will reference outside the connection, and any command substitutions would also work the same way.
The main portion of the command is as follows:
ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}'This would run on an individual LPAR, however we need to get this in to a variable for PID manipulation.
_processID=$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print $2}')Again, running on an individual LPAR, this command substitution would populate the
${_processID}
variable. If this code was run in within theSSHUI
block however, it would only look on the current host, as the$
is not escaped.
_processID=\$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}')Here, the
$
reserved character is escaped in two places (the start of the command substitution and the print command), but remains unescaped when referring to the${_port}
variable, as this variable is referenced from outside theSSHUI
block.
The new _processID
variable within the SSHUI
block must now always be referred to with an escaped $
(as \${_processID}
), as it doesn’t exist outside the block.
Actions
Now we have our full framework in place, we just need to complete the actions. After development of the script started, the scope was expanded to include not only the killing of remote processes, but also the choice to start the remote UIs and also check if the processes are running, so the final piece of code is as follows:
case ${_action} in start) if [ -z \${_processID} ] then /opt/UI/tomcat-${_port}/bin/startup.sh fi ;; stop) if [ ! -z \${_processID} ] then /opt/UI/tomcat-${_port}/bin/shutdown.sh sleep 10 _processCounter=0 _processID=\$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}') if [ ! -z \${_processID} ] then while [ \${_processCounter} -lt 3 ] && [ ! -z \${_processID} ] do ((_processCounter++)) kill \${_processID} sleep 10 _processID=\$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}') done fi if [ ! -z \${_processID} ] then exit 1 fi fi ;; checkProcess) if [ -z \${_processID} ] then exit 1 fi ;; *) exit 2 ;; esac
Note that any exit
command within this block will not exit the script itself, but will exit the SSHUI
block with an error code. This response will be recorded by the base code in the script, and ultimately record a failure. The reason the code is trapped in this manner is so that in the instance where the first HOST:PORT combination fails but the second HOST:PORT combination is successful, the script will correctly report a failure back to the shell once completed.
This can obviously be expanded, for example a further bug encountered in a production environment was that a Tomcat service was occasionally left active but disconnected, meaning that performing a “start” action on it would do nothing; in this instance, we wanted the start action to exit 1
if a \${processID}
was found.
And that completes the script. Obviously it can be given inputs instead of hard-coding the connections/action; ultimately the best way to call this would be:
./manipulateTomcatServices.sh "HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520" "stop"
But that’s easy enough to write in.