Re: Monit believes process failed when it didn't

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Monit believes process failed when it didn't

Eric Montellese
Got an odd one for ya...

I have a (legacy) shell script that I need to call from monit.  This shell script runs an infinite loop.  The platform is a busybox-based openwrt platform (so, the script is running 'ash').  

On this platform, it appears that the timing of background processes is not quite as expected.  I'd like to understand the expected methodology.  The method from https://mmonit.com/wiki/Monit/FAQ#pidfile would seem to be foolproof to avoid the issue I'm seeing (below).  However, this method fails outright (see below).

I'm currently running Monit version 5.26.0

The monit config is pretty simple:

check process myprocess with pidfile /tmp/myprocess.pid
     start program = "/etc/monit.rc/myprocess.init start"
     stop program  = "/etc/monit.rc/myprocess.init stop"
     depends on other_process

myprocess.init is also quite simple (just showing the 'start' method).  Here are three different things I've tried:

1.  Following the example in the monit docs:
start() {
    echo $$ > /tmp/myprocess.pid
    exec /usr/bin/myprocess.sh
}

In this case, monit says that the "process never returned" and tries to restart it.  Of course the process didn't return, so why is this the documented method?  Is this a difference in versions of monit (vs the documentation I'm using)?

2. Jam that sucker into the background
start() {
    /usr/bin/myprocess.sh &
    echo $! > /tmp/myprocess.pid
}

Surprisingly, this also does not work.  In this case, the pid file is created as expected, but monit does *not* think that the process is running.

3. Try something silly?
start() {
    /usr/bin/myprocess.sh &
    echo $! > /tmp/myprocess.pid
    sleep 1
}

Adding a 'sleep' fixes the issue... but why?

For debug, instead of the 'sleep' I've also tried putting 
'ps | grep myprocess > /tmp/output'

In this case, I *do* see the process listed in the /tmp/output file -- but in this case, monit also returns happily. (So it's a heisenbug)

Questions:
1.  What is the "normal" way to do this?
2.  Anyone seen this sort of behavior on an embedded system?

Best Regards,
Eric


Reply | Threaded
Open this post in threaded view
|

Re: Monit believes process failed when it didn't

fusillator
Hi Because the process id is handled by the background process, a timeout allows it to terminate and creates pid.. usually the init script or a wrapper process waits for initializations to end before demonize itself. Some script use a loop with sleep and a bound limit, analogous to your idea, see jboss startup.. other uses async messages , see systemd service type description for some idea..

On Thu, Dec 3, 2020, 10:47 PM Eric Montellese <[hidden email]> wrote:
Got an odd one for ya...

I have a (legacy) shell script that I need to call from monit.  This shell script runs an infinite loop.  The platform is a busybox-based openwrt platform (so, the script is running 'ash').  

On this platform, it appears that the timing of background processes is not quite as expected.  I'd like to understand the expected methodology.  The method from https://mmonit.com/wiki/Monit/FAQ#pidfile would seem to be foolproof to avoid the issue I'm seeing (below).  However, this method fails outright (see below).

I'm currently running Monit version 5.26.0

The monit config is pretty simple:

check process myprocess with pidfile /tmp/myprocess.pid
     start program = "/etc/monit.rc/myprocess.init start"
     stop program  = "/etc/monit.rc/myprocess.init stop"
     depends on other_process

myprocess.init is also quite simple (just showing the 'start' method).  Here are three different things I've tried:

1.  Following the example in the monit docs:
start() {
    echo $$ > /tmp/myprocess.pid
    exec /usr/bin/myprocess.sh
}

In this case, monit says that the "process never returned" and tries to restart it.  Of course the process didn't return, so why is this the documented method?  Is this a difference in versions of monit (vs the documentation I'm using)?

2. Jam that sucker into the background
start() {
    /usr/bin/myprocess.sh &
    echo $! > /tmp/myprocess.pid
}

Surprisingly, this also does not work.  In this case, the pid file is created as expected, but monit does *not* think that the process is running.

3. Try something silly?
start() {
    /usr/bin/myprocess.sh &
    echo $! > /tmp/myprocess.pid
    sleep 1
}

Adding a 'sleep' fixes the issue... but why?

For debug, instead of the 'sleep' I've also tried putting 
'ps | grep myprocess > /tmp/output'

In this case, I *do* see the process listed in the /tmp/output file -- but in this case, monit also returns happily. (So it's a heisenbug)

Questions:
1.  What is the "normal" way to do this?
2.  Anyone seen this sort of behavior on an embedded system?

Best Regards,
Eric


Reply | Threaded
Open this post in threaded view
|

Re: Monit believes process failed when it didn't

Lutz Mader
In reply to this post by Eric Montellese
Hello Eric,
you can not use scripts are running in an infinite loop.
The script must spawn or you wrapper script should do this.

> I have a (legacy) shell script that I need to call from monit.  This shell
> script runs an infinite loop.

A good source are the scripts used with systemd or initd.

Unfortunalely, there is no general way to create an useful script, but
the script should not quit immediately. To start Jetty I use something
like the following, without any sleep.

case "$1" in
    'start')
        cd "${DIR}/jetty"
        $PRG status
        if [ $? -ne $ONLINE ]; then
#          export TZ=Europe/Berlin
          export TZ=MEZ-1MESZ,M3.5.0,M10.5.0
          export PATH=${DIR}/Java/jre8/bin:$PATH
          export JAVA_HOME=${DIR}/Java/jre8
#          export JETTY_BASE=${DIR}/jetty
          export JETTY_HOME=${DIR}/jetty

          nohup java -jar $JETTY_HOME/start.jar >> $LOGFILE 2>&1 &
          echo $! > $PIDFILE
        fi
        RC=0
        ;;

But for some scripts are spawning itself I use a sleep too for some
other not. The answer to your problem, it depends.

Sorry,
with regards,
Lutz