Seemingly corrupted DB, not syncing

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Seemingly corrupted DB, not syncing

Gunnar Wolf
Hi,

I noticed today my keyserver has been failing to sync for several days
(to say the least – it still reports knowing only about 3920177 keys,
while I see 4005128 in other servers I supposedly peer with).

Looking at the SKS logs, I see entries such as:

==> db.log <==
2015-08-05 18:01:27 <mail transmit keys> error in callback.: Bdb.DBError("BDB0060 PANIC: fatal region error detected; run recovery")

(but I don't know what recovery it talks about — BDB's?)

Or:

==> recon.log <==
2015-08-05 18:02:40 Raising Sys.Break -- PTree may be corrupted: Failure("remove_from_node: attempt to delete non-existant element from prefix tree")
2015-08-05 18:02:40 DB closed

So... In your experience, what should be done? Is my best bet to just
drop my DB and download a set of dumps again?

Thanks,

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

Daniel Kahn Gillmor-7
On Wed 2015-08-05 19:06:36 -0400, Gunnar Wolf wrote:

> Hi,
>
> I noticed today my keyserver has been failing to sync for several days
> (to say the least – it still reports knowing only about 3920177 keys,
> while I see 4005128 in other servers I supposedly peer with).
>
> Looking at the SKS logs, I see entries such as:
>
> ==> db.log <==
> 2015-08-05 18:01:27 <mail transmit keys> error in callback.: Bdb.DBError("BDB0060 PANIC: fatal region error detected; run recovery")
>
> (but I don't know what recovery it talks about — BDB's?)
>
> Or:
>
> ==> recon.log <==
> 2015-08-05 18:02:40 Raising Sys.Break -- PTree may be corrupted: Failure("remove_from_node: attempt to delete non-existant element from prefix tree")
> 2015-08-05 18:02:40 DB closed
>
> So... In your experience, what should be done? Is my best bet to just
> drop my DB and download a set of dumps again?

You might be able to just drop your PTree (not the DB) and have sks
rebuild it, without needing a new set of dumps.  I'm sorry to not have
any more sophisticated suggestions.  bdb's failure modes continue to
perplex me. :/

hopefully someone else has better answers,

        --dkg

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

malte@wk3.org
In reply to this post by Gunnar Wolf
On Wed, 5 Aug 2015 18:06:36 -0500
Gunnar Wolf <[hidden email]> wrote:

> (but I don't know what recovery it talks about — BDB's?)

I don't know either, but in my experience it does not hurt to stop the service and run "db5.x_recover" in the folders containing the databases. There is also a file you can delete which will be regenerated (without a performance impact like reimporting everything), but I don't remember which one it was...

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

Daniel Kahn Gillmor-7
On Thu 2015-08-06 02:49:24 -0400, [hidden email] wrote:
> On Wed, 5 Aug 2015 18:06:36 -0500
> Gunnar Wolf <[hidden email]> wrote:
>
>> (but I don't know what recovery it talks about — BDB's?)
>
> I don't know either, but in my experience it does not hurt to stop the
> service and run "db5.x_recover" in the folders containing the
> databases.

Make sure you do this as the non-privileged user who owns the databases,
though.  If you do this as root, it's possible to make the databases
unusable by the normal non-priv user.

         --dkg

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

Gunnar Wolf
In reply to this post by Daniel Kahn Gillmor-7
Daniel Kahn Gillmor dijo [Wed, Aug 05, 2015 at 07:22:26PM -0400]:
> > So... In your experience, what should be done? Is my best bet to just
> > drop my DB and download a set of dumps again?
>
> You might be able to just drop your PTree (not the DB) and have sks
> rebuild it, without needing a new set of dumps.  I'm sorry to not have
> any more sophisticated suggestions.  bdb's failure modes continue to
> perplex me. :/

OK, this *seems* to have worked: After removing and creating a new,
empty /var/lib/sks/PTree directory, I started sks, and am getting
several such log messages:

==> /var/log/sks/db.log <==
2015-08-06 10:49:16 Sending LogResp size 5000

==> /var/log/sks/recon.log <==
2015-08-06 10:49:16 Added 5000 hash-updates. Caught up to 1425051646.231266

PTree has already 137M (and growing), so... Time will tell. But it
feels as if it were working \o/

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

Gunnar Wolf
Gunnar Wolf dijo [Thu, Aug 06, 2015 at 10:50:59AM -0500]:
> OK, this *seems* to have worked: After removing and creating a new,
> empty /var/lib/sks/PTree directory, I started sks, and am getting
> several such log messages:

Umh, spoke too fast:

2015-08-06 10:56:46 <reconciliation handler> error in callback.: End_of_file
2015-08-06 10:56:46 <reconciliation handler> error in callback.: End_of_file
2015-08-06 10:56:46 <reconciliation handler> error in callback.: End_of_file
2015-08-06 10:57:45 <recon as client> error in callback.: Sys_error("Connection reset by peer")

Ok, one more 'service sks restart', and found that:

2015-08-06 10:59:49 Malformed entry  EF5

So... Without further ado, I'll stop wasting time and fetch a new dump :-P

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

Daniel Kahn Gillmor-7
On Thu 2015-08-06 12:03:28 -0400, Gunnar Wolf wrote:

> Gunnar Wolf dijo [Thu, Aug 06, 2015 at 10:50:59AM -0500]:
>> OK, this *seems* to have worked: After removing and creating a new,
>> empty /var/lib/sks/PTree directory, I started sks, and am getting
>> several such log messages:
>
> Umh, spoke too fast:
>
> 2015-08-06 10:56:46 <reconciliation handler> error in callback.: End_of_file
> 2015-08-06 10:56:46 <reconciliation handler> error in callback.: End_of_file
> 2015-08-06 10:56:46 <reconciliation handler> error in callback.: End_of_file
> 2015-08-06 10:57:45 <recon as client> error in callback.: Sys_error("Connection reset by peer")

fwiw, the above is usually just one of your sks peers (or the netwokr
between you) being flakey.  It's not a symptom of a failed sks
installation.

> Ok, one more 'service sks restart', and found that:
>
> 2015-08-06 10:59:49 Malformed entry  EF5

This i've never seen before, and have no idea what it represents.

     --dkg

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

Arnold-27
In reply to this post by Gunnar Wolf
On 06-08-15 17:50, Gunnar Wolf wrote:

> Daniel Kahn Gillmor dijo [Wed, Aug 05, 2015 at 07:22:26PM -0400]:
>>> So... In your experience, what should be done? Is my best bet to just
>>> drop my DB and download a set of dumps again?
>>
>> You might be able to just drop your PTree (not the DB) and have sks
>> rebuild it, without needing a new set of dumps.  I'm sorry to not have
>> any more sophisticated suggestions.  bdb's failure modes continue to
>> perplex me. :/
>
> OK, this *seems* to have worked: After removing and creating a new,
> *empty* /var/lib/sks/PTree directory, I started sks,

You did run something like
$ if ! /usr/sbin/sks pbuild -cache 20 -ptree_cache 70; then echo "fail"; else echo
"OK"; fi; tail /var/log/sks/pbuild.log

to rebuild the PTree database, did you? I would be surprised if it works without
(but it might just as well do :-) ).

As a last resort, instead of downloading a full set of keys, you can stop all sks
processes and generate a key dump yourself. It only needs the DB for that (PTree is
for the recon process).

Good luck!
  Arnold

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

Gunnar Wolf
In reply to this post by Daniel Kahn Gillmor-7
Daniel Kahn Gillmor dijo [Thu, Aug 06, 2015 at 03:29:05PM -0400]:

> On Thu 2015-08-06 12:03:28 -0400, Gunnar Wolf wrote:
> > Gunnar Wolf dijo [Thu, Aug 06, 2015 at 10:50:59AM -0500]:
> >> OK, this *seems* to have worked: After removing and creating a new,
> >> empty /var/lib/sks/PTree directory, I started sks, and am getting
> >> several such log messages:
> >
> > Umh, spoke too fast:
> >
> > 2015-08-06 10:56:46 <reconciliation handler> error in callback.: End_of_file
> > 2015-08-06 10:56:46 <reconciliation handler> error in callback.: End_of_file
> > 2015-08-06 10:56:46 <reconciliation handler> error in callback.: End_of_file
> > 2015-08-06 10:57:45 <recon as client> error in callback.: Sys_error("Connection reset by peer")
>
> fwiw, the above is usually just one of your sks peers (or the netwokr
> between you) being flakey.  It's not a symptom of a failed sks
> installation.

Right. I gave it some time while I was working on other stuff, and it
synced correctly. I have an up-to-date SKS server again! \o/

> > Ok, one more 'service sks restart', and found that:
> >
> > 2015-08-06 10:59:49 Malformed entry  EF5
>
> This i've never seen before, and have no idea what it represents.

/me silently crosses fingers...

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

Gunnar Wolf
In reply to this post by Arnold-27
Arnold dijo [Fri, Aug 07, 2015 at 12:16:32AM +0200]:
> > OK, this *seems* to have worked: After removing and creating a new,
> > *empty* /var/lib/sks/PTree directory, I started sks,
>
> You did run something like
> $ if ! /usr/sbin/sks pbuild -cache 20 -ptree_cache 70; then echo "fail"; else echo
> "OK"; fi; tail /var/log/sks/pbuild.log
>
> to rebuild the PTree database, did you? I would be surprised if it works without
> (but it might just as well do :-) ).

I tried, first, to no avail. I just did a 'rm -r PTree;mkdir PTree; chown...;chmod...'
as everything was lost already, and it worked :-)

_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: Seemingly corrupted DB, not syncing

Jeffrey Johnson-5
In reply to this post by Gunnar Wolf

> On Aug 5, 2015, at 7:06 PM, Gunnar Wolf <[hidden email]> wrote:
>
> Hi,
>
> I noticed today my keyserver has been failing to sync for several days
> (to say the least – it still reports knowing only about 3920177 keys,
> while I see 4005128 in other servers I supposedly peer with).
>
> Looking at the SKS logs, I see entries such as:
>
> ==> db.log <==
> 2015-08-05 18:01:27 <mail transmit keys> error in callback.: Bdb.DBError("BDB0060 PANIC: fatal region error detected; run recovery")
>
> (but I don't know what recovery it talks about — BDB's?)
>

Recovery is basically
        cd /var/lib/sks/PTree
        dbXYrecover -ev
where (XY is often the version of Berkeley DB you are running).

(aside)
running dbrecover automagically on next startup isn’t impossibly hard: its just re-opening
the database file with an additional flag dependent on an error condition.


> Or:
>
> ==> recon.log <==
> 2015-08-05 18:02:40 Raising Sys.Break -- PTree may be corrupted: Failure("remove_from_node: attempt to delete non-existant element from prefix tree")
> 2015-08-05 18:02:40 DB closed
>
> So... In your experience, what should be done? Is my best bet to just
> drop my DB and download a set of dumps again?
>

Spend a little time figuring out how to automate the mixup. I have posted a simple
script in the archives (and can repost again) that I run when a machine crashes and
the sky databases need “fixing”.

But recreating from a dump will “work”  as well.

73 de Jeff
> Thanks,
> _______________________________________________
> Sks-devel mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/sks-devel


_______________________________________________
Sks-devel mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/sks-devel