Good default values for disk monitoring

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Good default values for disk monitoring

Teresa e Junior
Hello! I'm in the process of discovering Monit, and I am happy with the
results so far!

I've set up a couple of rules already, but disk I/O is something I find
very difficult to understand. Based on the examples I found in the wiki,
I have the following rules:

check filesystem root with path /
   if read  rate > 1 MB/s for 30 cycles then alert
   if write rate > 1 MB/s for 30 cycles then alert
   if service time > 10 milliseconds for 3 times within 5 cycles then alert

I would like to know, please, what could be a recommended default value!
For instance, I have a personal server in a VMware VPS, with an Intel
Xeon E5-2650L v4 CPU, 1 GB RAM, and 20 GB of allocated storage on SSD.

Thank you for your attention!

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general
Reply | Threaded
Open this post in threaded view
|

Re: Good default values for disk monitoring

Paul Theodoropoulos
It primarily depends upon your workload. What sorts of applications are
you running on the server? Websites? Database? Email?  Is usage
generally steady, or bursty?

You'll need to generate some metrics in order to find the best alerting
point.  For disk IO, I'd recommend using iostat, and leave it running
for an interval during which you'd feel comfortable that the machine has
been under what might be peak or high(er) load. e.g.

iostat -hmy 60

will gather statistics for MB/s read/write for the last 60 seconds,
repeatedly. Watch those values for a while (or go have a coffee and come
back and examine the scrollback :) - then check what the values are.

To get the 'service time' data, you'd run iostat in 'extended
statistics' mode:

iostat -hmxy 60

What you might then do is set the monit tests to 1.5x the highest value
you saw, as a starting point. Obviously you don't want to be alerted all
the time for otherwise-normal activity, so you can tune it higher/lower
from there.

On 6/1/2018 22:42 PM, Teresa e Junior wrote:

> Hello! I'm in the process of discovering Monit, and I am happy with
> the results so far!
>
> I've set up a couple of rules already, but disk I/O is something I
> find very difficult to understand. Based on the examples I found in
> the wiki, I have the following rules:
>
> check filesystem root with path /
>   if read  rate > 1 MB/s for 30 cycles then alert
>   if write rate > 1 MB/s for 30 cycles then alert
>   if service time > 10 milliseconds for 3 times within 5 cycles then
> alert
>
> I would like to know, please, what could be a recommended default
> value! For instance, I have a personal server in a VMware VPS, with an
> Intel Xeon E5-2650L v4 CPU, 1 GB RAM, and 20 GB of allocated storage
> on SSD.
>
> Thank you for your attention!
>

--
Paul Theodoropoulos
www.anastrophe.com


--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general
Reply | Threaded
Open this post in threaded view
|

Re: Good default values for disk monitoring

Teresa e Junior
Em 03/06/2018 21:10, Paul Theodoropoulos escreveu:
> You'll need to generate some metrics in order to find the best alerting
> point.  For disk IO, I'd recommend using iostat, and leave it running
> for an interval during which you'd feel comfortable that the machine has
> been under what might be peak or high(er) load. e.g.
>
> iostat -hmxy 60
 >
> What you might then do is set the monit tests to 1.5x the highest value
> you saw, as a starting point. Obviously you don't want to be alerted all
> the time for otherwise-normal activity, so you can tune it higher/lower
> from there.

First of all, thank you for your help! I left iostat running for 24
hours with cycles of 60 seconds, and the results are at the bottom. My
server is mostly idle, with some usage bursts when I personally use it.

So I guess, based on the results below, the following would be more than
enough:
   if read rate > 0.1 MB/s
     for 20 times within 30 cycles then alert
   if write rate > 0.1 MB/s
     for 20 times within 30 cycles then alert
   if service time > 0.1 milliseconds
     for 20 times within 30 cycles then alert

Regarding the service time, though, `man iostat` has the following to say:
The average service time (in milliseconds) for I/O requests that were
issued to the device. Warning! Do not trust this field any more. This
field will be removed in a future sysstat version.
The average service time (svctm field) value is meaningless, as I/O
statistics are now calculated at block level, and we don't know when the
disk driver starts to process a request. For this reason, this field
will be removed in a future sysstat version.

$ iostat -mx 60 1440 | awk '/^dm-0/{print $6, $7, $13}' | sort | uniq -c
     159 0,00 0,00 0,00
       6 0,00 0,00 0,05
      42 0,00 0,00 0,06
      14 0,00 0,00 0,07
       1 0,00 0,00 0,08
       1 0,00 0,00 0,09
       5 0,00 0,00 0,11
       1 0,00 0,00 0,12
       1 0,00 0,00 0,13
       2 0,00 0,00 0,15
       2 0,00 0,00 0,16
       1 0,00 0,00 0,18
       1 0,00 0,00 0,28
       1 0,00 0,00 0,30
       1 0,00 0,00 0,35
     583 0,00 0,01 0,00
       1 0,00 0,01 0,02
      86 0,00 0,01 0,03
     165 0,00 0,01 0,04
     134 0,00 0,01 0,05
      23 0,00 0,01 0,06
      34 0,00 0,01 0,07
      26 0,00 0,01 0,08
      20 0,00 0,01 0,09
      19 0,00 0,01 0,10
      13 0,00 0,01 0,11
       5 0,00 0,01 0,12
       9 0,00 0,01 0,13
       6 0,00 0,01 0,14
       8 0,00 0,01 0,15
       4 0,00 0,01 0,16
       6 0,00 0,01 0,17
       2 0,00 0,01 0,18
       5 0,00 0,01 0,20
       1 0,00 0,01 0,21
       2 0,00 0,01 0,22
       2 0,00 0,01 0,24
       1 0,00 0,01 0,25
       1 0,00 0,01 0,27
       1 0,00 0,01 0,28
       1 0,00 0,01 0,29
       1 0,00 0,01 0,31
       3 0,00 0,01 0,35
       1 0,00 0,01 0,38
       1 0,00 0,01 0,41
       1 0,00 0,01 0,44
       1 0,00 0,01 0,51
       1 0,00 0,01 0,53
       1 0,00 0,01 0,55
       1 0,00 0,01 0,56
       1 0,00 0,01 0,65
       1 0,00 0,01 0,68
       1 0,00 0,02 0,00
       1 0,00 0,03 0,12
       1 0,00 0,10 0,21
       1 0,00 0,11 0,08
       1 0,00 0,26 0,18
       1 0,00 0,51 0,11
       1 0,01 0,01 0,06
       1 0,01 0,01 0,07
       1 0,01 0,01 0,08
       1 0,01 0,01 0,11
       1 0,02 0,01 0,06
       1 0,02 0,01 0,08
       1 0,02 0,01 0,12
       1 0,03 0,00 0,08
       1 0,04 0,01 0,09
       1 0,05 0,01 0,19
       1 0,06 0,01 0,14
       1 0,06 0,02 0,10
       1 0,07 0,01 0,10
       1 0,07 0,01 0,13
       1 0,10 0,01 0,23
       1 0,11 0,04 0,14
       1 0,14 0,11 0,16
       1 0,22 0,02 0,17
       1 0,31 0,01 0,22
       1 0,88 0,02 0,20
       1 0,97 0,14 0,09
       1 2,12 1,49 0,11
       1 29,79 0,02 0,28
       1 3,59 3,20 0,23
       1 65,48 1,25 0,29

--
To unsubscribe:
https://lists.nongnu.org/mailman/listinfo/monit-general