|
> On 5 May 2018, at 09:38, Andrew Gallagher <[hidden email]> wrote: > > >> On 5 May 2018, at 09:03, Phil Pennock <[hidden email]> wrote: >> >> While you could modify the protocol to do something like announce a >> key-count first, that's still only protection against accidental >> misconfiguration > Sorry for the double. We don’t need to modify the protocol to enable such checks. Whenever a server tries to recon with us, we can perform a callback against its status page and run whatever sanity tests we want before deciding whether to allow recon to proceed. This could be rolled out without any need for coordination. A _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
In reply to this post by Andrew Gallagher
> > Requests may be "iterative" or "recursive" (words are stolen from DNS).
> > Users send recursive request: "I don't care how many peers > > you ask, but tell me the key with all signatures." > > The DNS has a hierarchical structure that allows the authoritative source for data to be found within a small number of requests that depends on the number of components in the fqdn. There is no such structure in sks, and no way of knowing that all I no has been found, so the *best* case scenario is that every server has to be polled for every request. Suboptimal solutions are also acceptable. I don't think we alwys need the best (and most expensive way). "Almost the best" is good enough in most of practical cases. However some simulation of spreading keys and signatures would be really useful. Gabor _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
In reply to this post by Phil Pennock-17
On 05/05/2018 03:48 AM, Phil Pennock wrote:
> On 2018-05-04 at 17:13 +0100, Andrew Gallagher wrote: >> AFAICT, the limitation that SKS servers should only recon with known >> peers was introduced as a measure against abuse. But it's a pretty >> flimsy anti-abuse system considering that anyone can submit or search >> for anything over the HKP interface without restriction. >> >> I think all SKS servers should attempt to recon with as many other >> servers as they can find. > > The SKS reconciliation algorithm scales with the count of the > differences in key-counts. If you peer with someone with no keys > loaded, it will render your server nearly inoperable. > > We've seen this failure mode before. Repeatedly. It's part of why I > wrote the initial Peering wiki document. It's why I walked people > through showing how many keys they have loaded, and is why peering is so > much easier these days: most people who post to sks-devel follow the > guidance and take the hints, and get things sorted out before they post. turnup, which is why i offer keydumps[0] myself (available via both http and rsync, compressed - maybe FTP someday as well), and offer instructions in that section. and why i wrote this query tool[1]. and this dumping script[2]. and packaged this[3]. (thanks, phil, by the way for those instructions. i found them super helpful when i first turned up. and thanks to whomever it was on IRC(?) that gave me the brilliant idea of running a modified second SKS instance locally for no-downtime dumps!) one of the key (no pun intended) criteria i have for peering is their delta for # of keys off from mine. (i should add in a delta/comparison function to [1] at some point. hrmmm...) it is SO IMPORTANT for both ends of the peering to have a relatively recent keyset. i don't see how we can "fix" this without entirely restructuring how HKP recon behaves, which is no easy task from my understanding (should it be even necessary first - i don't believe it requires "fixing", personally). > > This is why we only peer with people we whitelist, and why most people > look for as much demonstration of Clue as they can get before peering, > and it's a large part of why we do see de-peering when actions > demonstrate a lack of trustworthiness. relevant to this point, i'm still relatively new to keyserver administration and this list - is there a sort of established procedure or policy for "announcing" a peer that individuals should de-peer with (should they be peering with said peer)? what incident response policy should one follow? what criteria/actions would lead to suggested de-peering? i diverted the thread because i feel we're crossing into off-topic with those questions i had and i don't want to hijack the original topic, since it seems to still be under consideration. [0] http://mirror.square-r00t.net/#dumps [1] https://git.square-r00t.net/OpTools/tree/gpg/keystats.py [2] https://git.square-r00t.net/OpTools/tree/gpg/sksdump.py [3] https://aur.archlinux.org/packages/sks-local/ -- brent "i said 'peer(ing|ed|)' too many times in this email" saner https://square-r00t.net/ GPG info: https://square-r00t.net/gpg-info _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
In reply to this post by Gabor Kiss
> On 5 May 2018, at 10:55, Kiss Gabor (Bitman) <[hidden email]> wrote: > > Suboptimal solutions are also acceptable. > I don't think we alwys need the best (and most expensive way). > "Almost the best" is good enough in most of practical cases. We need to define our metric to determine what particular degrees of “suboptimal” are acceptable. Timeliness has never been a strong feature of sks and so probably shouldn’t be prioritised now. Accessibility however is crucial, and anything that could result in updated data being uploaded somewhere it is unlikely to be found is a deal breaker imo. DNS uses a hierarchical structure to ensure accessibility, sks uses gossip. If we get rid of gossip but don’t impose a hierarchy we could go far beyond “suboptimal”. > However some simulation of spreading keys and signatures would > be really useful. Agreed. Anything that changes the behaviour of a distributed system needs realistic performance testing in virtuo. A _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
In reply to this post by brent s.
> On 5 May 2018, at 11:31, brent s. <[hidden email]> wrote: > > it is SO IMPORTANT for both ends of the peering to have a relatively > recent keyset. i don't see how we can "fix" this without entirely > restructuring how HKP recon behaves, Yes. Perhaps it would be a good idea to systematise the dump/restore process so that instead of a human being following written instructions, a new peer of server A will attempt to a) probe server A to find the key difference b) if the difference is large, download a dump from some standard place c) reinitialise itself before trying again. Removing human error from such processes is A Good Thing in any case... A _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
On 05/05/2018 08:30 AM, Andrew Gallagher wrote:
> >> On 5 May 2018, at 11:31, brent s. <[hidden email]> wrote: >> >> it is SO IMPORTANT for both ends of the peering to have a relatively >> recent keyset. i don't see how we can "fix" this without entirely >> restructuring how HKP recon behaves, > > Yes. Perhaps it would be a good idea to systematise the dump/restore process so that instead of a human being following written instructions, a new peer of server A will attempt to a) probe server A to find the key difference b) if the difference is large, download a dump from some standard place c) reinitialise itself before trying again. > > Removing human error from such processes is A Good Thing in any case... > > A > > (b) is the "standard place" - SKS/recon/HKP/peering is, by nature, unfederated/decentralized. sure, there's the SKS pool, but that certainly isn't required for peering (even with keyservers that ARE in the pool) nor running sks. how does one decide the "canonical" dump to be downloaded in (b)? i WOULD say that removing human error is good, and normally i'd totally agree - but i think this should instead be solved in documentation, as implementing it in the software itself seems like a lot of work that even breaks part of SKS/peering philosophy (to me, at least) with low payoff. i can't speak to it, but i'd be curious if anyone could anecdotally recall how often peering requests are made to this list without them first importing a dump. i instead propose that: - in the default membership file, a note should be added to the comments at the beginning about importing a dump first for peering with "public(?) peers" should be done (and link to one or both of [0]) - in the man page for sks, under "FILES..membership", a note be added saying the same/similar - in <src>/README.md, under "Setup and Configuration..### Membership file", the same note be added This way, there is *no possible way* a new keyserver administrator will even know HOW to peer WITHOUT first knowing that they should use a keydump import beforehand. Adding in an optional refusal threshold directive (max_key_delta or something?) for a keycount delta of more than /n/ to sks.conf (optionally perhaps with the ability to override that value per-peer in membership?), however, would absolutely hold value, I think. [0] https://bitbucket.org/skskeyserver/sks-keyserver/wiki/Peering https://bitbucket.org/skskeyserver/sks-keyserver/wiki/KeydumpSources -- brent saner https://square-r00t.net/ GPG info: https://square-r00t.net/gpg-info _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
> On 5 May 2018, at 15:00, brent s. <[hidden email]> wrote: > > (a) is taken care of by recon already (in a way), According to a list message from earlier today it is not. If the delta is small, recon proceeds. If it is large, it breaks catastrophically. There is no (current) way to test nicely. > but the problem for > (b) is the "standard place" - SKS/recon/HKP/peering is, by nature, > unfederated/decentralized. sure, there's the SKS pool, but that > certainly isn't required for peering (even with keyservers that ARE in > the pool) nor running sks. how does one decide the "canonical" dump to > be downloaded in (b)? There can be no canonical dump of course. Each peer can provide its own dump at a well known local URL. This is even more important if and when we allow divergent policy. A _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
On 05/05/2018 10:22 AM, Andrew Gallagher wrote:
> >> On 5 May 2018, at 15:00, brent s. <[hidden email]> wrote: >> >> (a) is taken care of by recon already (in a way), > > According to a list message from earlier today it is not. If the delta is small, recon proceeds. If it is large, it breaks catastrophically. There is no (current) way to test nicely. > sorry, should have clarified- i mean the "generating deltas" part of (a). >> but the problem for >> (b) is the "standard place" - SKS/recon/HKP/peering is, by nature, >> unfederated/decentralized. sure, there's the SKS pool, but that >> certainly isn't required for peering (even with keyservers that ARE in >> the pool) nor running sks. how does one decide the "canonical" dump to >> be downloaded in (b)? > > There can be no canonical dump of course. Each peer can provide its own dump at a well known local URL. This is even more important if and when we allow divergent policy. hrm. i suppose, but i'm under the impression not many keyserver admins run their own dumps? (which i don't fault them for; the current dump i have in its uncompressed form is 11 GB (5054255 keys). granted, you don't see new keyserver turnups often, but still -- that can be a lengthy download, plus the fairly sizeable chunk of time it takes for the initial import.) -- brent saner https://square-r00t.net/ GPG info: https://square-r00t.net/gpg-info _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
In reply to this post by Phil Pennock-17
The underlying recon algorithm can be stopped at any time and only the discovered
differences can be processed. In other words, it should be possible to put an explicit timeout on recon time - you will get a partial synchronization, but that might be good enough as long as you reconcile at a faster rate than the average number of differences.
--- _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
In reply to this post by Andrew Gallagher
On 2018-05-05 at 10:27 +0100, Andrew Gallagher wrote:
> Sorry for the double. We don’t need to modify the protocol to enable > such checks. Whenever a server tries to recon with us, we can perform > a callback against its status page and run whatever sanity tests we > want before deciding whether to allow recon to proceed. This could be > rolled out without any need for coordination. You'll need to ensure that initial_stat defaults to true and so forth then, since by default keyservers don't calculate the stats at startup, so such a keyserver won't be able to start peering for up to a day (3am by default). It's probably reasonable to change the default, but you'll want to make this explicit when you draw up your full workflow. Note though that the status pages are intended for humans and SKS the keyserver can speak SKS and reconcile with other codebases, such as Hockeypuck, which uses a different output format. You'll probably want to look into standardizing on something like a JSON output format, with fallback to heuristic matching upon the output formats used by the two current codebases. But still my point stands: the moment you change to defaulting to recon-open-to-everyone the scope of what counts as a security vulnerability changes, and being open to anyone causing unbounded computation will be a DoS security vulnerability. With enough other issues being tackled, I nudge once more to reconsider such a change. -Phil _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
In reply to this post by brent s.
On 05/05/18 17:28, brent s. wrote:
> >>> but the problem for >>> (b) is the "standard place" - SKS/recon/HKP/peering is, by nature, >>> unfederated/decentralized. sure, there's the SKS pool, but that >>> certainly isn't required for peering (even with keyservers that ARE in >>> the pool) nor running sks. how does one decide the "canonical" dump to >>> be downloaded in (b)? >> >> There can be no canonical dump of course. Each peer can provide its own dump at a well known local URL. This is even more important if and when we allow divergent policy. > > hrm. i suppose, but i'm under the impression not many keyserver admins > run their own dumps? (which i don't fault them for; the current dump i > have in its uncompressed form is 11 GB (5054255 keys). granted, you > don't see new keyserver turnups often, but still -- that can be a > lengthy download, plus the fairly sizeable chunk of time it takes for > the initial import.) I've thought about this a bit more, and the bootstrapping issue can be solved without requiring every keyserver to produce a unique dump. We just need one more database [table]...! Let us call it Limbo. It contains the hashes of objects that the local server does not have and has never seen (so has never had the chance to test against policy), but knows must exist because they were in another server's blacklist. When bootstrapping, all that the new server needs to know is a reasonably complete list of hashes. If it knows the real data as well, all the better. But for recon to get started, given that we can perform fake recon, the hashes are sufficient. When performing a dump, a reference server also dumps its local blacklist. When loading that dump, the blacklist of the reference is used to populate the fresh server's Limbo. Now, the fresh server can generate a low-delta fake recon immediately, by merging the DB, Local-BL (initially empty) and Limbo hash lists. Recon then proceeds as discussed before, and so long as the peer graph is well-connected, new peers can be added without having to reference their dumps. Limbo entries will return 404, just like missing entries (and unlike blacklist entries). But the server will request a proportion of the Limbo entries from its peers during each catchup. This would happen at a much higher rate than the blacklist cache refresh, but still low enough that its peers shouldn't suffer from the extra load. Let's say that at each recon, the number of missing keys is found to be N. The local server will then request these N keys from its peer. If at the same time it were to also request (M=a*N) limbo entries thus: (SELECT hash from limbo where hash NOT IN (SELECT hash from peer_bl_cache where peer = $PEER) LIMIT $M)` the extra load on the peer should not be excessive, and Limbo should be drained at a rate roughly proportional to the parameter `a` and the rate of new keys. (This would also be a good place to perform the peer_bl_cache refresh). When calculating key deltas for pool membership purposes, the fresh server should not include its Limbo database in the count. This will ensure that servers do not get added to the pool until their Limbo is well drained. Alternatively, we could make an explicitly drained Limbo a condition for pool membership. This still leaves the issue of eventual consistency as an open problem, but it can be addressed manually by encouraging good graph connectivity. -- Andrew Gallagher _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
|
In reply to this post by Phil Pennock-17
Hi, all.
There has been a lot of chatter re possible improvements to SKS on the list lately, and lots of ideas thrown around. So I thought I'd summarise the proposals here, and try to separate them out into digestible chunks. I've ordered them from less to more controversial. My personal preference is for the first two sections (resiliency, type filters) to be implemented, and for the rest to be parked. This has turned out to be a much longer document than I expected. I don't intend to spend any further time or energy on local blacklisting, as its technical complexity increases every time I think about it, and its politics and effectiveness are questionable. A. Concrete proposals ================== Version 1.X: Resiliency ----------------------- These are ideas that fell out of the other discussions, but are applicable independently. If we want to make backwards-incompatible changes, then automatic verification of status, versions etc will probably be necessary to prevent recon failure. ### JSON status A standardised JSON status page could be served by all SKS-speaking services. This would ease fault detection and pool management, and is a prerequisite for reliable sanity callbacks. ### Default initial_stat=true Also a prerequisite for sanity callbacks. Otherwise useful for debugging and fault detection. ### Sanity callbacks Currently, a server has no way to determine if its peers are correctly set up or have key deltas within the recon limit. If each host served a JSON status page, peers could perform a sanity check against it before allowing recon to continue. This would help contain the effects of some of the more common failure modes. ### Empty db protection If the number of keys in the local database is less than a configured threshold, an sks server should disable recon, and throw a warning. The particular threshold could be set in the conf file, and a sensible default provided in the distro. This should prevent new servers from attempting recon until a reasonable dump is loaded. Version 2.0: Type filters with version ratchet ---------------------------------------------- This proposal seems to have the most support in principle. It is relatively easy to implement, and directly addresses both illegal content and database bloat. It does however require precise choreography. It should be possible to alter the sks code during a version bump so that: 1. All objects of an expanded but hardcoded set of types (private keys, localsigs, photo IDs, ...) are silently dropped if submitted 2. Any existing objects in the database of these types are treated as nonexistent for all operations (queries, recon, dumps, ...) 3. The above rules are only enabled on a future flag day, say 180 days after the release date of the new version 4. The version criterion for pool membership is bumped a few days in advance of the flag day 5. A crufty database could be cleaned by dumping and reloading the db locally, or a database cleaner could be run on a schedule from within SKS itself This would purge the pool of the most obviously objectionable content (child porn, copyrighted material), with minimal collateral damage. The disadvantage is that any noncompliant peer would fail recon after flag day due to excessive delta, and thus would need to be either depeered manually, or have its recon attempts denied by a sanity callback. Other implementations (i.e. hockeypuck) would have to move in lockstep or be depeered. Future speculation ================== Future A: Policy blacklisting ----------------------------- Pay attention, kid. This is where it gets complicated. Version ratchets may not be flexible or responsive enough to deal with specific legal issues. Policy-based blacklisting gives server operators a fine-grained tool to clean their databases of all sorts of content without having to move in lockstep with their peers. These proposals are more controversial, given that individual operators will have hands-on responsibility for managing policy, and thereby potentially be more exposed legally. It should be noted however that technical debt may not be a valid defence against legal liability. IANAL. All of the changes in this section must be made simultaneously, otherwise various forms of recon failure are inevitable. This will involve a major rewrite of the code, which may not be considered a good use of time. If type filters have been implemented (see above), the need for local policy would be considerably reduced. If however type filters were not used, then policy blacklists would be the main method for filtering objectionable content, which might be prohibitive. Note that locally-divergent blacklist policies have the potential to break eventual consistency across the graph (see below). ### Local blacklist An SKS server may maintain a local blacklist of hashes that it does not want to store. At submission time, any object found in the blacklist is silently dropped. Any requests for objects in the blacklist should return `310 Gone`. ### Local dumps When an SKS server is making a dump, it should dump all of its databases, including blacklist, peer_bl_cache and limbo (see below). This is useful for a) restoring state locally after a disaster, but also b) helping new servers bootstrap themselves to a low-delta state. ### Bootstrap limbo When restoring from a dump, a server may simply restore the dumped blacklist and continue. But if the new server has a different policy than the source, this is not sufficient. Hashes that were added to the original blacklist for violating policies that the new server does not enforce should not be blacklisted on the new server. But they cannot be added to the local database either, because the actual data will not be found in the dump. Instead, these hashes are added to a `limbo` database that will be progressively drained as and when the hashes are encountered again during submission or catchup. This is important to ensure that recon can start immediately with a complete set of hashes. Any requests for objects in limbo should return `404 Not Found`. If an object is successfully submitted or fetched that matches a hash in limbo, then the hash will be removed from limbo before the object is processed by policy. ### Peer blacklist cache When fetching new objects from a peer during catchup, the peer may throw `310 Gone` - if this happens then we know that the peer has blacklisted it and we should not request it again from that peer for some time. We store the triple `(hash, peer, timestamp)` in the database `peer_bl_cache`. Similarly, if we receive `404 Not Found` during catchup, then this object is in the remote server's limbo. We add it to `peer_bl_cache` as if it were a `310`. Cache invalidation should reap it eventually. ### Fake recon The recon algorithm is modified to operate against the set of unique hashes: ``` (SELECT hash FROM local_db) JOIN (SELECT hash FROM local_bl) JOIN (SELECT hash FROM limbo) JOIN (SELECT hash FROM peer_bl_cache WHERE peer="$PEER"); ``` This ensures that deltas are kept to a minimum. Note that this may cause the remote server to request items that it does not have but are in our blacklist or our limbo. This should only happen once, after which the offending hash should be stored in the peer's blacklist cache against our hostname. If the remote server requests an object that we have stored in our `peer_bl_cache` against its name, then our cache is obviously invalid and we should remove that entry from the cache and respond with our copy of the object, if we have one. ### Conditional catchup Instead of requesting the N missing hashes from the delta, the server will request the following hashes: ``` (SELECT hash FROM missing_hashes) JOIN (SELECT hash FROM peer_bl_cache WHERE peer="$PEER" ORDER BY timestamp LIMIT a) JOIN (SELECT hash FROM limbo LIMIT b*N); ``` where `a` is small, perhaps even a weighted random integer from (0,1), and `b` is O(1). These parameters will be adjusted so that a balance is maintained between (on one hand) timely cache invalidation and limbo draining; and (on the other) the impact upon the remote peer of excessive requests. ### Policy enforcement Each server would be able to define its own policy. The simplest policy would be one that bans certain packet types (e.g. photo IDs). During both catchup and submission (but after limbo draining), the new object is compared with local policy. If it offends then its hash is added to the local blacklist with a reference to the offending policy, and the data is silently dropped. Policy should be defined in a canonical form, so that a) local policy can be reported on the status pages and b) remote dumps can be compared with local policy to minimise the number of hashes that need to be placed in limbo during bootstrap. ### Local database cleaner If policy changes, there will in general be objects left behind in the db that violate the new policy. A cleaner routine should periodically walk the database and remove any offending objects, adding their hashes to the local blacklist as if they had been submitted. This could be implemented as an extension of the type-filter database cleaner above. Open problem: Eventual Consistency ---------------------------------- Any introduction of blacklists opens the possibility of "policy firewalls", where servers with permissive policies may be effectively isolated from each other if all of the recon pathways between them pass through servers with more restrictive policies. Policy would therefore not only prevent the storage of violating objects locally, but prevent their propagation across the network. The only way to break this firewall is to create a new recon pathway that bypasses it. This could be done manually, but this places responsibility on operators to understand the policies of all other servers on the graph. ### Recon for all It might be possible to move from a recon whitelist to a recon blacklist model. Servers would spider the graph to find peers and automatically try to peer with them. This would ensure that eventual consistency is obtained quickly, by maximising the core graph of servers that are mutually directly connected (and thus immune to firewalling). The main objection is that moving from a whitelist to blacklist recon model opens up a significant attack surface. Sanity callbacks could be used to mitigate against human error, but not sabotage. ### Hard core Alternatively, a group of servers that do not intend to introduce any policy restrictions could agree to remain mutually well-connected, and stay open to peering requests from all comers (subject to good behaviour). This would effectively operate as a clearing house for objects. The main objections are a) these servers must all operate in jurisdictions where the universality of their databases is legally sound (e.g. no right to be forgotten), and b) some animals would be more equal than others. Future B: Austerity ------------------- In an extreme scenario, handling of any user IDs may be impossible due to data protection regulations. On the same grounds, it may not even be possible to store third-party signatures as these leak relationship data. In such a case, it may still be possible to run an austere keyserver network for self-signatures (i.e. expiry dates) and revocations only. This would require a further version ratchet with a type filter permitting a minumum of packet types, shorn of all personal identifying information. _______________________________________________ Sks-devel mailing list [hidden email] https://lists.nongnu.org/mailman/listinfo/sks-devel |
| Free forum by Nabble | Edit this page |
