Peering Issues - High IO ending with Eventloop.SigAlarm always occur with 1 peer

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Peering Issues - High IO ending with Eventloop.SigAlarm always occur with 1 peer

Todd Fleisher
Hi All,
I recently joined the pool and started having an issue after adding a second external peer to my membership file. The symptoms are abnormally high IO load on the disk whenever my server tries to reconcile with the second peer (149.28.198.86), ending with a failure message "add_keys_merge failed: Eventloop.SigAlarm”. It appears it tries to reconcile a large number of keys (100) consistently when this happens. I’ve read previous list threads about this message (e.g. https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html & https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html) which mention the cause potentially being a large key trying to be processed and failing. I tried increasing my client_max_body_size from 8m -> 32m in NGINX, but the issue persisted. For now, I have removed the second peer from my membership file to keep from over-taxing my server with no apparent benefits. I have included an excerpt of my logs showing the behavior. Can someone please advise what might be causing this issue and what can be done to resolve it? Thanks in advance.



-T


_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel

skslog2.txt (18K) Download Attachment
signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Peering Issues - High IO ending with Eventloop.SigAlarm always occur with 1 peer

Todd Fleisher
Hi All,
I wanted to follow up on this and add some new data points. I tried building some new SKS instances based on a more recent dump (specifically 2018-10-07 from https://keyserver.mattrude.com/dump/) and found those instances were plagued by the same issue when I began peering with my existing instances. When I re-built the new instances from an older dump (specifically 2018-10-01 from the same source), the issues went away. This seems to imply some problematic data was introduced into the pool during the first week of October that is causing the issues.


For now, I’m able to keep my instances stable by building them from the earlier 2018-10-01 dump and not adding the second peer to my membership file. I would like to better understand why this is happening and figure out how to go about fixing it, in part so I can begin peering with more servers to improve the mesh.

-T

On Oct 8, 2018, at 1:54 PM, Todd Fleisher <[hidden email]> wrote:

Hi All,
I recently joined the pool and started having an issue after adding a second external peer to my membership file. The symptoms are abnormally high IO load on the disk whenever my server tries to reconcile with the second peer (149.28.198.86), ending with a failure message "add_keys_merge failed: Eventloop.SigAlarm”. It appears it tries to reconcile a large number of keys (100) consistently when this happens. I’ve read previous list threads about this message (e.g. https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html & https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html) which mention the cause potentially being a large key trying to be processed and failing. I tried increasing my client_max_body_size from 8m -> 32m in NGINX, but the issue persisted. For now, I have removed the second peer from my membership file to keep from over-taxing my server with no apparent benefits. I have included an excerpt of my logs showing the behavior. Can someone please advise what might be causing this issue and what can be done to resolve it? Thanks in advance.

<skslog2.txt>

-T



_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Peering Issues - High IO ending with Eventloop.SigAlarm always occur with 1 peer

Paul M Furley
Hi Todd, for what it's worth, I've been experiencing this too since March.

The hangs are so severe my keyserver would fail to respond to requests. In order not to provide a poor experience to users of the pool, I removed myself from it.

Anecdotally it appears other keyservers still in the pool are similarly affected: I experience high rates of timeout and failure when using the pool these days.

I installed Hockeypuck on another server and peered it with my SKS instance. It syncs successfully but Hockeypock *also* goes nuts periodically while syncing. Its memory and CPU use rockets, often pushing into gigabytes of swap space, so that server is pretty unresponsive too.

I'm about to arrive at the OpenPGP Email summit in Brussels, I'm sure this will come up as a topic, I shall report back...

Paul

On Wed, 10 Oct 2018, at 19:00, Todd Fleisher wrote:

> Hi All,
> I wanted to follow up on this and add some new data points. I tried
> building some new SKS instances based on a more recent dump
> (specifically 2018-10-07 from https://keyserver.mattrude.com/dump/ 
> <https://keyserver.mattrude.com/dump/>) and found those instances were
> plagued by the same issue when I began peering with my existing
> instances. When I re-built the new instances from an older dump
> (specifically 2018-10-01 from the same source), the issues went away.
> This seems to imply some problematic data was introduced into the pool
> during the first week of October that is causing the issues.
>
> I found an existing issue logged about this behavior @
> https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface 
> <https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface>
>
> For now, I’m able to keep my instances stable by building them from the
> earlier 2018-10-01 dump and not adding the second peer to my membership
> file. I would like to better understand why this is happening and figure
> out how to go about fixing it, in part so I can begin peering with more
> servers to improve the mesh.
>
> -T
>
> > On Oct 8, 2018, at 1:54 PM, Todd Fleisher <[hidden email]> wrote:
> >
> > Hi All,
> > I recently joined the pool and started having an issue after adding a second external peer to my membership file. The symptoms are abnormally high IO load on the disk whenever my server tries to reconcile with the second peer (149.28.198.86), ending with a failure message "add_keys_merge failed: Eventloop.SigAlarm”. It appears it tries to reconcile a large number of keys (100) consistently when this happens. I’ve read previous list threads about this message (e.g. https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html <https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html> & https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html <https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html>) which mention the cause potentially being a large key trying to be processed and failing. I tried increasing my client_max_body_size from 8m -> 32m in NGINX, but the issue persisted. For now, I have removed the second peer from my membership file to keep from over-taxing my server with no apparent benefits. I have included an excerpt of my logs showing the behavior. Can someone please advise what might be causing this issue and what can be done to resolve it? Thanks in advance.
> >
> > <skslog2.txt>
> >
> > -T
> >
>
> _______________________________________________
> Sks-devel mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/sks-devel
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Peering Issues - High IO ending with Eventloop.SigAlarm always occur with 1 peer

Todd Fleisher
I had gotten things under control after sending this, but starting yesterday it came back when reconciling with a different peer. I commented that peer out for now and things are back to normal.

Is anyone else seeing similar behavior? Is there anything that can be done other than pausing reconciliation with peers that bring on the issue?

Here is a graph of my IO during the issue. You can see it drop back to normal immediately after I commented out the problem peer.


-T

On Oct 19, 2018, at 11:38 PM, Paul Fawkesley <[hidden email]> wrote:

Hi Todd, for what it's worth, I've been experiencing this too since March.

The hangs are so severe my keyserver would fail to respond to requests. In order not to provide a poor experience to users of the pool, I removed myself from it.

Anecdotally it appears other keyservers still in the pool are similarly affected: I experience high rates of timeout and failure when using the pool these days.

I installed Hockeypuck on another server and peered it with my SKS instance. It syncs successfully but Hockeypock *also* goes nuts periodically while syncing. Its memory and CPU use rockets, often pushing into gigabytes of swap space, so that server is pretty unresponsive too.

I'm about to arrive at the OpenPGP Email summit in Brussels, I'm sure this will come up as a topic, I shall report back...

Paul

On Wed, 10 Oct 2018, at 19:00, Todd Fleisher wrote:
Hi All,
I wanted to follow up on this and add some new data points. I tried
building some new SKS instances based on a more recent dump
(specifically 2018-10-07 from https://keyserver.mattrude.com/dump/
<https://keyserver.mattrude.com/dump/>) and found those instances were
plagued by the same issue when I began peering with my existing
instances. When I re-built the new instances from an older dump
(specifically 2018-10-01 from the same source), the issues went away.
This seems to imply some problematic data was introduced into the pool
during the first week of October that is causing the issues.

I found an existing issue logged about this behavior @
https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface
<https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface>

For now, I’m able to keep my instances stable by building them from the
earlier 2018-10-01 dump and not adding the second peer to my membership
file. I would like to better understand why this is happening and figure
out how to go about fixing it, in part so I can begin peering with more
servers to improve the mesh.

-T

On Oct 8, 2018, at 1:54 PM, Todd Fleisher <[hidden email]> wrote:

Hi All,
I recently joined the pool and started having an issue after adding a second external peer to my membership file. The symptoms are abnormally high IO load on the disk whenever my server tries to reconcile with the second peer (149.28.198.86), ending with a failure message "add_keys_merge failed: Eventloop.SigAlarm”. It appears it tries to reconcile a large number of keys (100) consistently when this happens. I’ve read previous list threads about this message (e.g. https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html <https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html> & https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html <https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html>) which mention the cause potentially being a large key trying to be processed and failing. I tried increasing my client_max_body_size from 8m -> 32m in NGINX, but the issue persisted. For now, I have removed the second peer from my membership file to keep from over-taxing my server with no apparent benefits. I have included an excerpt of my logs showing the behavior. Can someone please advise what might be causing this issue and what can be done to resolve it? Thanks in advance.

<skslog2.txt>

-T


_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Email had 1 attachment:
+ signature.asc
 1k (application/pgp-signature)

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel


_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Peering Issues - High IO ending with Eventloop.SigAlarm always occur with 1 peer

Todd Fleisher
Looks like the issue has spread to other peers as the behavior returned when reconciling with yet another peer (pgpkeys.co.uk), so I’ve uncommented the previous peer (sks.infcs.de) and will wait for someone to advise if there’s anything that can be done to reduce this extra IO load (https://imgur.com/a/wHPYGsK)

-T

On Dec 11, 2018, at 10:10 AM, Todd Fleisher <[hidden email]> wrote:

Signed PGP part
I had gotten things under control after sending this, but starting yesterday it came back when reconciling with a different peer. I commented that peer out for now and things are back to normal.

Is anyone else seeing similar behavior? Is there anything that can be done other than pausing reconciliation with peers that bring on the issue?

Here is a graph of my IO during the issue. You can see it drop back to normal immediately after I commented out the problem peer.



-T

On Oct 19, 2018, at 11:38 PM, Paul Fawkesley <[hidden email]> wrote:

Hi Todd, for what it's worth, I've been experiencing this too since March.

The hangs are so severe my keyserver would fail to respond to requests. In order not to provide a poor experience to users of the pool, I removed myself from it.

Anecdotally it appears other keyservers still in the pool are similarly affected: I experience high rates of timeout and failure when using the pool these days.

I installed Hockeypuck on another server and peered it with my SKS instance. It syncs successfully but Hockeypock *also* goes nuts periodically while syncing. Its memory and CPU use rockets, often pushing into gigabytes of swap space, so that server is pretty unresponsive too.

I'm about to arrive at the OpenPGP Email summit in Brussels, I'm sure this will come up as a topic, I shall report back...

Paul

On Wed, 10 Oct 2018, at 19:00, Todd Fleisher wrote:
Hi All,
I wanted to follow up on this and add some new data points. I tried
building some new SKS instances based on a more recent dump
(specifically 2018-10-07 from https://keyserver.mattrude.com/dump/
<https://keyserver.mattrude.com/dump/>) and found those instances were
plagued by the same issue when I began peering with my existing
instances. When I re-built the new instances from an older dump
(specifically 2018-10-01 from the same source), the issues went away.
This seems to imply some problematic data was introduced into the pool
during the first week of October that is causing the issues.

I found an existing issue logged about this behavior @
https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface
<https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface>

For now, I’m able to keep my instances stable by building them from the
earlier 2018-10-01 dump and not adding the second peer to my membership
file. I would like to better understand why this is happening and figure
out how to go about fixing it, in part so I can begin peering with more
servers to improve the mesh.

-T

On Oct 8, 2018, at 1:54 PM, Todd Fleisher <[hidden email]> wrote:

Hi All,
I recently joined the pool and started having an issue after adding a second external peer to my membership file. The symptoms are abnormally high IO load on the disk whenever my server tries to reconcile with the second peer (149.28.198.86), ending with a failure message "add_keys_merge failed: Eventloop.SigAlarm”. It appears it tries to reconcile a large number of keys (100) consistently when this happens. I’ve read previous list threads about this message (e.g. https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html <https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html> & https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html <https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html>) which mention the cause potentially being a large key trying to be processed and failing. I tried increasing my client_max_body_size from 8m -> 32m in NGINX, but the issue persisted. For now, I have removed the second peer from my membership file to keep from over-taxing my server with no apparent benefits. I have included an excerpt of my logs showing the behavior. Can someone please advise what might be causing this issue and what can be done to resolve it? Thanks in advance.

<skslog2.txt>

-T


_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Email had 1 attachment:
+ signature.asc
 1k (application/pgp-signature)

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel




_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Peering Issues - High IO ending with Eventloop.SigAlarm always occur with 1 peer

Steffen Kaiser
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Wed, 12 Dec 2018, Todd Fleisher wrote:

> Looks like the issue has spread to other peers as the behavior returned

when I started my server initially, it was hit by the same problem with
great I/O and CPU consumption. I had to throow more and more RAM to it.
Actually building up the database from scratch required me to give it 4GB
or 8GB RAM, then I lowered the limit and it looks like that some keys
require the same amount. For my server "the problem suddenly stopped", I
don't know why, except to assume the bad keys are included now.

Wasn't there a thread some monthes back telling that some index will be
read into memory or something like that?

Well, it's a strange user experience that users cannot query the server,
when it is adding keys and stalls like that.

> when reconciling with yet another peer (pgpkeys.co.uk
> <http://pgpkeys.co.uk/>), so I’ve uncommented the previous peer
> (sks.infcs.de <http://sks.infcs.de/>) and will wait for someone to
> advise if there’s anything that can be done to reduce this extra IO load
> (https://imgur.com/a/wHPYGsK <https://imgur.com/a/wHPYGsK>)
>
> -T
>
>> On Dec 11, 2018, at 10:10 AM, Todd Fleisher <[hidden email]> wrote:
>>
>> Signed PGP part
>> I had gotten things under control after sending this, but starting yesterday it came back when reconciling with a different peer. I commented that peer out for now and things are back to normal.
>>
>> Is anyone else seeing similar behavior? Is there anything that can be done other than pausing reconciliation with peers that bring on the issue?
>>
>> Here is a graph of my IO during the issue. You can see it drop back to normal immediately after I commented out the problem peer.
>>
>>
>>
>> -T
>>
>>> On Oct 19, 2018, at 11:38 PM, Paul Fawkesley <[hidden email] <mailto:[hidden email]>> wrote:
>>>
>>> Hi Todd, for what it's worth, I've been experiencing this too since March.
>>>
>>> The hangs are so severe my keyserver would fail to respond to requests. In order not to provide a poor experience to users of the pool, I removed myself from it.
>>>
>>> Anecdotally it appears other keyservers still in the pool are similarly affected: I experience high rates of timeout and failure when using the pool these days.
>>>
>>> I installed Hockeypuck on another server and peered it with my SKS instance. It syncs successfully but Hockeypock *also* goes nuts periodically while syncing. Its memory and CPU use rockets, often pushing into gigabytes of swap space, so that server is pretty unresponsive too.
>>>
>>> I'm about to arrive at the OpenPGP Email summit in Brussels, I'm sure this will come up as a topic, I shall report back...
>>>
>>> Paul
>>>
>>> On Wed, 10 Oct 2018, at 19:00, Todd Fleisher wrote:
>>>> Hi All,
>>>> I wanted to follow up on this and add some new data points. I tried
>>>> building some new SKS instances based on a more recent dump
>>>> (specifically 2018-10-07 from https://keyserver.mattrude.com/dump/ <https://keyserver.mattrude.com/dump/>
>>>> <https://keyserver.mattrude.com/dump/ <https://keyserver.mattrude.com/dump/>>) and found those instances were
>>>> plagued by the same issue when I began peering with my existing
>>>> instances. When I re-built the new instances from an older dump
>>>> (specifically 2018-10-01 from the same source), the issues went away.
>>>> This seems to imply some problematic data was introduced into the pool
>>>> during the first week of October that is causing the issues.
>>>>
>>>> I found an existing issue logged about this behavior @
>>>> https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface <https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface>
>>>> <https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface <https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface>>
>>>>
>>>> For now, I’m able to keep my instances stable by building them from the
>>>> earlier 2018-10-01 dump and not adding the second peer to my membership
>>>> file. I would like to better understand why this is happening and figure
>>>> out how to go about fixing it, in part so I can begin peering with more
>>>> servers to improve the mesh.
>>>>
>>>> -T
>>>>
>>>>> On Oct 8, 2018, at 1:54 PM, Todd Fleisher <[hidden email] <mailto:[hidden email]>> wrote:
>>>>>
>>>>> Hi All,
>>>>> I recently joined the pool and started having an issue after adding a second external peer to my membership file. The symptoms are abnormally high IO load on the disk whenever my server tries to reconcile with the second peer (149.28.198.86), ending with a failure message "add_keys_merge failed: Eventloop.SigAlarm”. It appears it tries to reconcile a large number of keys (100) consistently when this happens. I’ve read previous list threads about this message (e.g. https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html <https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html> <https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html <https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html>> & https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html <https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html> <https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html <https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html>>) which mention the cause potentially being a large key trying to be processed and failing. I tried increasing my client_max_body_size from 8m -> 32m in NGINX, but the issue persisted. For now, I have removed the second peer from my membership file to keep from over-taxing my server with no apparent benefits. I have included an excerpt of my logs showing the behavior. Can someone please advise what might be causing this issue and what can be done to resolve it? Thanks in advance.
>>>>>
>>>>> <skslog2.txt>
- --
Steffen Kaiser
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQEVAwUBXBH8IyOjcd6avHZPAQKd9Af9Gtj7DWqmVDAoFjUfzeIMHG+5hZz0zDXZ
Jqw37yjhzc3wrd2JIiTHeE6l2OsfmSnjqtctpIQGsT4uueQveXpquiW05PoAOPmS
FuLPTlgh7BBQDPrfibaeGwq5JWO78X1DYhE/swREELePpjKsCWgXNEMIFEo4nl7I
X4qN1xSTG5pHd79UR4fjbzrxxG1/SuvusrRFFmccGmEGyGLDLUznytDLTf82pgGY
vMezUIfZc5fd5LVv4TdCz7UsWbXqw+oFqTaUjpfQukjkdXjFtgYNTyYXSaMZMx13
TszxIlnqKV2mUvrwIgHxgnvoW18Xh7Vg/OM88d4UMdS5zDxv7lhJew==
=Q3IB
-----END PGP SIGNATURE-----
_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Peering Issues - High IO ending with Eventloop.SigAlarm always occur with 1 peer

Moritz Wirth-2

Hi,

this issue already known for several months now - see [0], [1].

The keys used for this are very large (around 30-60MB). Syncing them takes some bandwith and indexing/writing them to the disk consumes a lot of CPU and I/O resources. If the addition to the database fails, the key is added again when another peer synchronizes it, causing the same load (which results in the large I/O spikes that you see).

SKS is single threaded and therefore, any other action is blocked while the key addition takes place. This is also known for years and the fact that it can be easily used for attacks is ignored.

If I remember correctly, the behaviour above is intended and therefore, I would not expect any fixes in the next months. There have been some fixes which exclude some of the bad keys [2] (which might be included in the ubuntu/debian sks packages so this may be why it stopped over the last months), however this only works as long as nobody generates and uploads a new key. 

Best Regards,

Moritz

[0]: https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface
[1]: https://bitbucket.org/skskeyserver/sks-keyserver/issues/60/denial-of-service-via-large-uid-packets
[2]: https://lists.nongnu.org/archive/html/sks-devel/2018-07/msg00053.html

Am 13.12.18 um 07:28 schrieb Steffen Kaiser:

On Wed, 12 Dec 2018, Todd Fleisher wrote:

> Looks like the issue has spread to other peers as the behavior returned

when I started my server initially, it was hit by the same problem with great I/O and CPU consumption. I had to throow more and more RAM to it. Actually building up the database from scratch required me to give it 4GB or 8GB RAM, then I lowered the limit and it looks like that some keys require the same amount. For my server "the problem suddenly stopped", I don't know why, except to assume the bad keys are included now.

Wasn't there a thread some monthes back telling that some index will be read into memory or something like that?

Well, it's a strange user experience that users cannot query the server, when it is adding keys and stalls like that.

> when reconciling with yet another peer (pgpkeys.co.uk <http://pgpkeys.co.uk/>), so I’ve uncommented the previous peer (sks.infcs.de <http://sks.infcs.de/>) and will wait for someone to advise if there’s anything that can be done to reduce this extra IO load (https://imgur.com/a/wHPYGsK <https://imgur.com/a/wHPYGsK>)

> -T

>> On Dec 11, 2018, at 10:10 AM, Todd Fleisher [hidden email] wrote:
>>
>> Signed PGP part
>> I had gotten things under control after sending this, but starting yesterday it came back when reconciling with a different peer. I commented that peer out for now and things are back to normal.
>>
>> Is anyone else seeing similar behavior? Is there anything that can be done other than pausing reconciliation with peers that bring on the issue?
>>
>> Here is a graph of my IO during the issue. You can see it drop back to normal immediately after I commented out the problem peer.
>>
>>
>>
>> -T
>>
>>> On Oct 19, 2018, at 11:38 PM, Paul Fawkesley <[hidden email] [hidden email]> wrote:
>>>
>>> Hi Todd, for what it's worth, I've been experiencing this too since March.
>>>
>>> The hangs are so severe my keyserver would fail to respond to requests. In order not to provide a poor experience to users of the pool, I removed myself from it.
>>>
>>> Anecdotally it appears other keyservers still in the pool are similarly affected: I experience high rates of timeout and failure when using the pool these days.
>>>
>>> I installed Hockeypuck on another server and peered it with my SKS instance. It syncs successfully but Hockeypock *also* goes nuts periodically while syncing. Its memory and CPU use rockets, often pushing into gigabytes of swap space, so that server is pretty unresponsive too.
>>>
>>> I'm about to arrive at the OpenPGP Email summit in Brussels, I'm sure this will come up as a topic, I shall report back...
>>>
>>> Paul
>>>
>>> On Wed, 10 Oct 2018, at 19:00, Todd Fleisher wrote:
>>>> Hi All,
>>>> I wanted to follow up on this and add some new data points. I tried
>>>> building some new SKS instances based on a more recent dump
>>>> (specifically 2018-10-07 from https://keyserver.mattrude.com/dump/ <https://keyserver.mattrude.com/dump/>
>>>> <https://keyserver.mattrude.com/dump/ <https://keyserver.mattrude.com/dump/>>) and found those instances were
>>>> plagued by the same issue when I began peering with my existing
>>>> instances. When I re-built the new instances from an older dump
>>>> (specifically 2018-10-01 from the same source), the issues went away.
>>>> This seems to imply some problematic data was introduced into the pool
>>>> during the first week of October that is causing the issues.
>>>>
>>>> I found an existing issue logged about this behavior @
>>>> https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface <https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface>
>>>> <https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface <https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface>>
>>>>
>>>> For now, I’m able to keep my instances stable by building them from the
>>>> earlier 2018-10-01 dump and not adding the second peer to my membership
>>>> file. I would like to better understand why this is happening and figure
>>>> out how to go about fixing it, in part so I can begin peering with more
>>>> servers to improve the mesh.
>>>>
>>>> -T
>>>>
>>>>> On Oct 8, 2018, at 1:54 PM, Todd Fleisher <[hidden email] [hidden email]> wrote:
>>>>>
>>>>> Hi All,
>>>>> I recently joined the pool and started having an issue after adding a second external peer to my membership file. The symptoms are abnormally high IO load on the disk whenever my server tries to reconcile with the second peer (149.28.198.86), ending with a failure message "add_keys_merge failed: Eventloop.SigAlarm”. It appears it tries to reconcile a large number of keys (100) consistently when this happens. I’ve read previous list threads about this message (e.g. https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html <https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html> <https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html <https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00051.html>> & https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html <https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html> <https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html <https://lists.nongnu.org/archive/html/sks-devel/2011-06/msg00077.html>>) which mention the cause potentially being a large key trying to be processed and failing. I tried increasing my client_max_body_size from 8m -> 32m in NGINX, but the issue persisted. For now, I have removed the second peer from my membership file to keep from over-taxing my server with no apparent benefits. I have included an excerpt of my logs showing the behavior. Can someone please advise what might be causing this issue and what can be done to resolve it? Thanks in advance.
>>>>>
>>>>> <skslog2.txt>

-- Steffen Kaiser
> > _______________________________________________ > Sks-devel mailing list > [hidden email] > https://lists.nongnu.org/mailman/listinfo/sks-devel

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Peering Issues - High IO ending with Eventloop.SigAlarm always occur with 1 peer

Kim Minh Kaplan
Moritz Wirth wrote:

>
> Hi,
>
> this issue already known for several months now - see [0], [1].
>
> The keys used for this are very large (around 30-60MB). Syncing them takes some bandwith and indexing/writing them to the disk consumes a lot of CPU and I/O resources. If the addition to the database fails, the key is added again when another peer synchronizes it, causing the same load (which results in the large I/O spikes that you see).
>
> SKS is single threaded and therefore, any other action is blocked while the key addition takes place. This is also known for years and the fact that it can be easily used for attacks is ignored.
>
> If I remember correctly, the behaviour above is intended and therefore, I would not expect any fixes in the next months. There have been some fixes which exclude some of the bad keys [2] (which might be included in the ubuntu/debian sks packages so this may be why it stopped over the last months), however this only works as long as nobody generates and uploads a new key.
>
> Best Regards,
>
> Moritz
>
> [0]: https://bitbucket.org/skskeyserver/sks-keyserver/issues/61/key-addition-failed-blocks-web-interface
> [1]: https://bitbucket.org/skskeyserver/sks-keyserver/issues/60/denial-of-service-via-large-uid-packets
> [2]: https://lists.nongnu.org/archive/html/sks-devel/2018-07/msg00053.html

Another solution is to tune settings so that the key is downloaded and saved.

https://lists.nongnu.org/archive/html/sks-devel/2018-06/msg00072.html

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel