SKS Memory pattern anomaly

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

SKS Memory pattern anomaly

Jeremy T. Bouse
    Has anyone else been monitoring the memory pattern for SKS and
noticed an exceedingly high memory usage pattern? My secondary nodes are
generally showing < 11% of the instance memory used but for some reason
I'm seeing my primary node using nearly 100% of memory, and CPU for that
matter. My primary node is the only one peering outside my network and
has a limited number of peers while the secondary nodes only peer with
themselves and the primary. I've placed the short-circuit hack to NGINX
for the bad keys that have been mentioned which has shown to lower CPU
usage overall but nothing has seemed to improve the primary node. I see
my primary spend much of the time at 100% CPU and 50-90% Memory while
it's in recon mode and it only appears to dip down when it recalculates
it's stats.


_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: SKS Memory pattern anomaly

Jeremy T. Bouse

So I have all my nodes synced with around 5445343 keys after disabling all external peering and letting the 4 nodes get in sync. I then added a single external peer and then my SKS DB process goes into a funky state once it begins peering externally. I installed strace and when I ran a 'strace -q -c' against the 'sks db' process on the 3 secondary nodes it comes out with something along the lines of:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.004000         500         8           select
  0.00    0.000000           0         5           read
  0.00    0.000000           0         7           write
  0.00    0.000000           0         5           close
  0.00    0.000000           0         2           stat
  0.00    0.000000           0        10        10 lseek
  0.00    0.000000           0         3           brk
  0.00    0.000000           0        16           alarm
  0.00    0.000000           0         5           accept
------ ----------- ----------- --------- --------- ----------------
100.00    0.004000                    61        10 total

These are running what appears to be normal with under 10% CPU and 10% MEM according to ps.. My primary node on the other hand is another story entirely... For almost a half hour now it has shown using 80% MEM  and CPU % varies but strace shows a totally different pattern.

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.77    0.026483           9      2865           pread64
  0.23    0.000060           0      2563           pwrite64
  0.00    0.000000           0       141           ftruncate
------ ----------- ----------- --------- --------- ----------------
100.00    0.026543                  5569           total

I don't know enough about the internal operations to know exactly what's going on but given the high memory usage and the fact my node is running 1 CPU core at 100% idle while the other is in nearly 100% io wait state that it's caught in some sore of loop unable to get out of it. If I look under /var/lib/sks/DB I've got multiple 100MB log files starting to build up but no other files appears to have their timestamps updating showing any sign of modification/update except the __db.00[123] files.

Anyone have any thoughts for next steps?

On 3/5/2019 1:09 AM, Jeremy T. Bouse wrote:
    Has anyone else been monitoring the memory pattern for SKS and
noticed an exceedingly high memory usage pattern? My secondary nodes are
generally showing < 11% of the instance memory used but for some reason
I'm seeing my primary node using nearly 100% of memory, and CPU for that
matter. My primary node is the only one peering outside my network and
has a limited number of peers while the secondary nodes only peer with
themselves and the primary. I've placed the short-circuit hack to NGINX
for the bad keys that have been mentioned which has shown to lower CPU
usage overall but nothing has seemed to improve the primary node. I see
my primary spend much of the time at 100% CPU and 50-90% Memory while
it's in recon mode and it only appears to dip down when it recalculates
it's stats.


_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: SKS Memory pattern anomaly

Jonathon Weiss
Jeremy,

When I applied the recommended configuration (especially: "command_timeout: 600") to allow sufficient time to merge some of the really large keys that became an issue earlier this year, I noticed a significant memory spike (I think SKS was actively merging one of those large keys at the time).  I suspect that SKS is memory inefficient in that operation, but my experiences are probably only worth a little more than any random anecdata.  I ended up throwing some more RAM at my server, and yes, the merge literally took a few minutes.

  Jonathon

  Jonathon Weiss <[hidden email]>
  MIT/IS&T/Cloud Platforms


On Tue, 5 Mar 2019, Jeremy T. Bouse wrote:

>
> So I have all my nodes synced with around 5445343 keys after disabling all external peering and letting the 4 nodes get in sync. I then added a single external peer and then my SKS DB process goes into a
> funky state once it begins peering externally. I installed strace and when I ran a 'strace -q -c' against the 'sks db' process on the 3 secondary nodes it comes out with something along the lines of:
>
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.004000         500         8           select
>   0.00    0.000000           0         5           read
>   0.00    0.000000           0         7           write
>   0.00    0.000000           0         5           close
>   0.00    0.000000           0         2           stat
>   0.00    0.000000           0        10        10 lseek
>   0.00    0.000000           0         3           brk
>   0.00    0.000000           0        16           alarm
>   0.00    0.000000           0         5           accept
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.004000                    61        10 total
>
> These are running what appears to be normal with under 10% CPU and 10% MEM according to ps.. My primary node on the other hand is another story entirely... For almost a half hour now it has shown using
> 80% MEM  and CPU % varies but strace shows a totally different pattern.
>
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  99.77    0.026483           9      2865           pread64
>   0.23    0.000060           0      2563           pwrite64
>   0.00    0.000000           0       141           ftruncate
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.026543                  5569           total
>
> I don't know enough about the internal operations to know exactly what's going on but given the high memory usage and the fact my node is running 1 CPU core at 100% idle while the other is in nearly 100%
> io wait state that it's caught in some sore of loop unable to get out of it. If I look under /var/lib/sks/DB I've got multiple 100MB log files starting to build up but no other files appears to have their
> timestamps updating showing any sign of modification/update except the __db.00[123] files.
>
> Anyone have any thoughts for next steps?
>
> On 3/5/2019 1:09 AM, Jeremy T. Bouse wrote:
>
>     Has anyone else been monitoring the memory pattern for SKS and
> noticed an exceedingly high memory usage pattern? My secondary nodes are
> generally showing < 11% of the instance memory used but for some reason
> I'm seeing my primary node using nearly 100% of memory, and CPU for that
> matter. My primary node is the only one peering outside my network and
> has a limited number of peers while the secondary nodes only peer with
> themselves and the primary. I've placed the short-circuit hack to NGINX
> for the bad keys that have been mentioned which has shown to lower CPU
> usage overall but nothing has seemed to improve the primary node. I see
> my primary spend much of the time at 100% CPU and 50-90% Memory while
> it's in recon mode and it only appears to dip down when it recalculates
> it's stats.
>
>
> _______________________________________________
> Sks-devel mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/sks-devel
_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel