Monit shows "statistic error"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Monit shows "statistic error"

Ani A
Hello,

I am running Monit 5.17.1 on Ubuntu 14.04, in some rare occasions
I see that following error in the log:

2020-11-17 18:47:22.347 monit[2954]: system statistic error -- cannot
read /proc/3560/stat

And then, monit thinks that my daemon is not running and restarts it!
This restart triggers some unwanted other actions and I want to avoid it.
Can anyone please help why this can occur and how this can be prevented?

Thanks.
--
Ani

Reply | Threaded
Open this post in threaded view
|

Re: Monit shows "statistic error"

Ani A
Sorry, small correction:

Monit version 5.25.1 on Ubuntu 18.04.4.

--
Ani

On Wed, Nov 18, 2020 at 11:17 PM Ani A <[hidden email]> wrote:

>
> Hello,
>
> I am running Monit 5.17.1 on Ubuntu 14.04, in some rare occasions
> I see that following error in the log:
>
> 2020-11-17 18:47:22.347 monit[2954]: system statistic error -- cannot
> read /proc/3560/stat
>
> And then, monit thinks that my daemon is not running and restarts it!
> This restart triggers some unwanted other actions and I want to avoid it.
> Can anyone please help why this can occur and how this can be prevented?
>
> Thanks.
> --
> Ani

Reply | Threaded
Open this post in threaded view
|

Re: Monit shows "statistic error"

Lutz Mader
Hello Ani,
are you able to check this behaviour based on Monit 5.27.1.
The way to get the process information was changed with Monit 5.26.0.

> Monit version 5.25.1 on Ubuntu 18.04.4.
>> I am running Monit 5.17.1 on Ubuntu 14.04, in some rare occasions
>> I see that following error in the log:
>>
>> 2020-11-17 18:47:22.347 monit[2954]: system statistic error -- cannot
>> read /proc/3560/stat

With regards,
Lutz

Reply | Threaded
Open this post in threaded view
|

Re: Monit shows "statistic error"

martinp@tildeslash.com
In reply to this post by Ani A
Hello Ani,

it may happen if the process exits while monit is collecting the data. No need to worry about it.

Best regards,
Martin


> On 18 Nov 2020, at 18:49, Ani A <[hidden email]> wrote:
>
> Sorry, small correction:
>
> Monit version 5.25.1 on Ubuntu 18.04.4.
>
> --
> Ani
>
> On Wed, Nov 18, 2020 at 11:17 PM Ani A <[hidden email]> wrote:
>>
>> Hello,
>>
>> I am running Monit 5.17.1 on Ubuntu 14.04, in some rare occasions
>> I see that following error in the log:
>>
>> 2020-11-17 18:47:22.347 monit[2954]: system statistic error -- cannot
>> read /proc/3560/stat
>>
>> And then, monit thinks that my daemon is not running and restarts it!
>> This restart triggers some unwanted other actions and I want to avoid it.
>> Can anyone please help why this can occur and how this can be prevented?
>>
>> Thanks.
>> --
>> Ani
>


Reply | Threaded
Open this post in threaded view
|

Re: Monit shows "statistic error"

Lutz Mader
In reply to this post by Ani A
Hello Ani,
I checked some of my logs and find a similar problem all the time the
workload is very high (on a AIX system).

[MESZ May  8 05:29:14] error    : 'D100SPUABC00' mem usage of 95.5%
matches resource limit [mem usage > 95.0%]
[MESZ May  8 05:31:14] error    : 'Manager' failed to get process data

>> I am running Monit 5.17.1 on Ubuntu 14.04, in some rare occasions
>> I see that following error in the log:
>>
>> 2020-11-17 18:47:22.347 monit[2954]: system statistic error -- cannot
>> read /proc/3560/stat

As long as this is a workload problem you can configure Monit to delay a
restart. With a additinal "not exist" rule

  if not exist for 5 cycles then start

in the "check process" service, Monit will start/restart the service
after 5 checks only. If Monit can not get the process data only once,
nothing will happen (I append a sample).

A suggestion only,
Lutz

Appendage:
A sample of one of the used service definitions:

check process Serv_server1 with pidfile
"/usr/local/var/wlp/servers/.pid/server1.pid"
  start program "/usr/local/etc/monit/scripts/wlpserv.sh start" with
timeout 180 seconds
  stop program "/usr/local/etc/monit/scripts/wlpserv.sh stop" with
timeout 120 seconds
  restart program "/usr/local/etc/monit/scripts/wlpserv.sh restart" with
timeout 300 seconds
#  if failed host hostname.local port 8901 then alert
#  if failed host hostname.local port 9901 then alert
  if not exist for 5 cycles then start
  if 5 restarts within 50 cycles then unmonitor

The "not exist" rule delays the start to five checks and the "restart"
rule prevent endless recovery.