Remote Manipulation of Processes in a Bash Script

Starting and stopping java services remotely via ssh is a simple process, however in the situation where you need to manipulate or understand active processes remotely, things are more difficult.

Script Base

This problem arose for me when a newly written Tomcat UI suite was failing to shut down gracefully, so required the remote killing of specific processes in order to bounce the UI services. The Tomcat ports were built across two separate LPARs:

  • HOST1:7510
  • HOST1:7520
  • HOST2:7510
  • HOST2:7520

On an individual LPAR basis, the Tomcat instances all conformed to the same folder structure, /opt/UI/tomcat-PORT/ (where PORT is the port number). As such, a two-dimensional array of hosts and ports was deemed the easiest way to control turning specific ports on and off:

_connections="HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520"

So the basics of dissecting this _connections variable therefore would form the main portion of our script, let’s call it manipulateTomcatServices.sh (note that the _connections and _action variables can later become an input):

#!/bin/bash

_connections="HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520"
_action="stop"
_failure=0

echo "Manipulate Tomcat Service for connections '${_connections}' and action'${_action}' STARTED"

for _connection in ${_connections//,/ }
do
  _newconnection=TRUE
  for _part in ${_connection//:/ }
  do
    if [ ${_newconnection} == "TRUE" ]
    then
      _host=${_part}
    else
      _port=${_part}
    fi
    _newconnection=FALSE
  done
  echo "Performing '${_action}' action on host '${_host}', port '${_port}'..."

# SEE NEXT SECTION #

  _response=$?
  if [ ${_response} != 0 ];then
    _failure=1
  fi
done

_end="Manipulate Tomcat Service for connections '${_connections}' and action'${_action}'"

if [ ${_failure} = 0 ];then
  echo "${_end} COMPLETE"
else
  echo "${_end} FAILED"
  exit 1
fi

So far, this script will produce the following output:

Manipulate Tomcat Service for connections 'HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520' and action'stop' STARTED
Performing 'stop' action on host 'HOST1', port '7510'...
Performing 'stop' action on host 'HOST1', port '7520'...
Performing 'stop' action on host 'HOST2', port '7510'...
Performing 'stop' action on host 'HOST2', port '7520'...
Manipulate Tomcat Service for connections 'HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520' and action'stop' COMPLETE

Creating the Connection and Finding the Process

Once we have our host and port, we can initiate our connection and find our initial process ID that we want to work with (where “USER” is the username of the remote host):

    ssh -T USER@${_host} <<SSHUI
      if [ ! -d /opt/UI/tomcat-${_port}/bin/ ]
      then
        exit 1
      else
        _processID=\$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}')

# SEE NEXT SECTION #

      fi
SSHUI

Once the ssh connection is made, everything between the two SSHUI codewords will be executed on the remote host. The key to variable manipulation is when to cancel the $ reserved character, any variable without this cancelled will reference outside the connection, and any command substitutions would also work the same way.

The main portion of the command is as follows:

ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}'

This would run on an individual LPAR, however we need to get this in to a variable for PID manipulation.

 

_processID=$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print $2}')

Again, running on an individual LPAR, this command substitution would populate the ${_processID} variable. If this code was run in within the SSHUI block however, it would only look on the current host, as the $ is not escaped.

 

_processID=\$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}')

Here, the $ reserved character is escaped in two places (the start of the command substitution and the print command), but remains unescaped when referring to the ${_port} variable, as this variable is referenced from outside the SSHUI block.

The new _processID variable within the SSHUI block must now always be referred to with an escaped $ (as \${_processID}), as it doesn’t exist outside the block.

Actions

Now we have our full framework in place, we just need to complete the actions. After development of the script started, the scope was expanded to include not only the killing of remote processes, but also the choice to start the remote UIs and also check if the processes are running, so the final piece of code is as follows:

        case ${_action} in
          start)
            if [ -z \${_processID} ]
            then
              /opt/UI/tomcat-${_port}/bin/startup.sh
            fi
            ;;
          stop)
            if [ ! -z \${_processID} ]
            then
              /opt/UI/tomcat-${_port}/bin/shutdown.sh
              sleep 10
              _processCounter=0
              _processID=\$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}')
              if [ ! -z \${_processID} ]
              then
                while [ \${_processCounter} -lt 3 ] && [ ! -z \${_processID} ]
                do
                  ((_processCounter++))
                  kill \${_processID}
                  sleep 10
                  _processID=\$(ps -ef | grep tomcat-${_port} | grep -v grep | awk '{print \$2}')
                done
              fi
              if [ ! -z \${_processID} ]
              then
                exit 1
              fi
            fi
            ;;
          checkProcess)
            if [ -z \${_processID} ]
            then
              exit 1
            fi
            ;;
          *)
            exit 2
            ;;
        esac

Note that any exit command within this block will not exit the script itself, but will exit the SSHUI block with an error code. This response will be recorded by the base code in the script, and ultimately record a failure. The reason the code is trapped in this manner is so that in the instance where the first HOST:PORT combination fails but the second HOST:PORT combination is successful, the script will correctly report a failure back to the shell once completed.

This can obviously be expanded, for example a further bug encountered in a production environment was that a Tomcat service was occasionally left active but disconnected, meaning that performing a “start” action on it would do nothing; in this instance, we wanted the start action to exit 1 if a \${processID} was found.

And that completes the script. Obviously it can be given inputs instead of hard-coding the connections/action; ultimately the best way to call this would be:

./manipulateTomcatServices.sh "HOST1:7510,HOST1:7520,HOST2:7510,HOST2:7520" "stop"

But that’s easy enough to write in.

Leave a Comment

Your email address will not be published. Required fields are marked *