corrupt PTree?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

corrupt PTree?

Jonathon Weiss

Whenever I try to start the reconciliation server, it dies:

2009-09-24 15:05:53 Opening log
2009-09-24 15:05:53 sks_recon, SKS version 1.1.0
2009-09-24 15:05:53 Copyright Yaron Minsky 2002-2003
2009-09-24 15:05:53 Licensed under GPL.  See COPYING file for details
2009-09-24 15:05:53 Opening PTree database
2009-09-24 15:05:53 Setting up PTree data structure
2009-09-24 15:05:53 PTree setup complete
2009-09-24 15:05:53 Initiating catchup
2009-09-24 15:06:09 Raising Sys.Break -- PTree may be corrupted: Failure("remove_from_node: attempt to delete non-existant element from prefix tree")
2009-09-24 15:06:09 DB closed


Even after I rebuilt the PTree from scratch (admittedly with the sks
server running the whole time (that should be safe, rihgt?)) I
continue to get the same error.  Any thoughts or recommendations?

        Jonathon

        Jonathon Weiss <[hidden email]>
        MIT/IS&T/OIS  Server Operations


_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: corrupt PTree?

Jason Harris
On Thu, Sep 24, 2009 at 03:10:02PM -0400, Jonathon Weiss wrote:
>
> Whenever I try to start the reconciliation server, it dies:

> 2009-09-24 15:06:09 Raising Sys.Break -- PTree may be corrupted: Failure("remove_from_node: attempt to delete non-existant element from prefix tree")
> 2009-09-24 15:06:09 DB closed
 
> Even after I rebuilt the PTree from scratch (admittedly with the sks
> server running the whole time (that should be safe, rihgt?)) I
> continue to get the same error.  Any thoughts or recommendations?

This can be made into a nonfatal error, but at the expense of
losing new and updated keys from the pTree, thus breaking gossip
synchronization.  Because the key DBs are separate from the pTree
DB, the pTree DB cannot be kept fully synchronized with the key DBs
via transactions that can't span both DBs.  Thus the ./KDB/time DB
is used as a workaround.  Unless/until the DBs are consolidated into
a single BDB instance, you'll need to either run "sks db" in read-only
mode (not currently supported) or shut it down during the "sks pbuild."

--
Jason Harris           |  NIC:  JH329, PGP:  This _is_ PGP-signed, isn't it?
[hidden email] _|_ web:  http://keyserver.kjsl.com/~jharris/
          Got photons?   (TM), (C) 2004

_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel

attachment0 (322 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: corrupt PTree?

Jonathon Weiss

> > Whenever I try to start the reconciliation server, it dies:
>
> > 2009-09-24 15:06:09 Raising Sys.Break -- PTree may be corrupted: Failure(=
> "remove_from_node: attempt to delete non-existant element from prefix tree")
> > 2009-09-24 15:06:09 DB closed
> =20
> > Even after I rebuilt the PTree from scratch (admittedly with the sks
> > server running the whole time (that should be safe, rihgt?)) I
> > continue to get the same error.  Any thoughts or recommendations?
>
> This can be made into a nonfatal error, but at the expense of
> losing new and updated keys from the pTree, thus breaking gossip
> synchronization.  Because the key DBs are separate from the pTree
> DB, the pTree DB cannot be kept fully synchronized with the key DBs
> via transactions that can't span both DBs.  Thus the ./KDB/time DB
> is used as a workaround.  Unless/until the DBs are consolidated into
> a single BDB instance, you'll need to either run "sks db" in read-only
> mode (not currently supported) or shut it down during the "sks pbuild."

Well, I tried shutting down sksd and rebuilding the PTree db, and am
still losing.  The recon server log says:

2009-10-03 04:36:16 Opening log
2009-10-03 04:36:16 sks_recon, SKS version 1.1.0
2009-10-03 04:36:16 Copyright Yaron Minsky 2002-2003
2009-10-03 04:36:16 Licensed under GPL.  See COPYING file for details
2009-10-03 04:36:16 Opening PTree database
2009-10-03 04:36:16 Setting up PTree data structure
2009-10-03 04:36:16 PTree setup complete
2009-10-03 04:36:16 Initiating catchup
2009-10-03 04:36:16 Added 5000 hash-updates. Caught up to 1245856578.239066
2009-10-03 04:36:16 Added 5000 hash-updates. Caught up to 1245856578.424982
2009-10-03 04:36:16 Added 5000 hash-updates. Caught up to 1245856578.608924
2009-10-03 04:36:16 Added 5000 hash-updates. Caught up to 1245856579.369999
2009-10-03 04:36:17 Added 5000 hash-updates. Caught up to 1245856579.570681
2009-10-03 04:36:17 Added 5000 hash-updates. Caught up to 1245856579.757335
2009-10-03 04:36:35 Added 5000 hash-updates. Caught up to 1245856580.592294
2009-10-03 04:36:36 Added 5000 hash-updates. Caught up to 1245856580.785099
2009-10-03 04:36:38 Added 5000 hash-updates. Caught up to 1245856580.973482
2009-10-03 04:36:39 Added 5000 hash-updates. Caught up to 1245856581.867670
2009-10-03 04:36:45 Added 5000 hash-updates. Caught up to 1245856582.054365
2009-10-03 04:36:46 Added 5000 hash-updates. Caught up to 1245856582.240482
2009-10-03 04:36:46 Added 5000 hash-updates. Caught up to 1245856583.526407
2009-10-03 04:36:47 Raising Sys.Break -- PTree may be corrupted: Failure("add_to_node: attempt to reinsert element into prefix tree")
2009-10-03 04:36:47 DB closed
2009-10-05 16:39:10 Opening log
2009-10-05 16:39:10 sks_recon, SKS version 1.1.0
2009-10-05 16:39:10 Copyright Yaron Minsky 2002-2003
2009-10-05 16:39:10 Licensed under GPL.  See COPYING file for details
2009-10-05 16:39:10 Opening PTree database
2009-10-05 16:39:14 Setting up PTree data structure
2009-10-05 16:39:14 PTree setup complete
2009-10-05 16:39:14 Initiating catchup
2009-10-05 16:39:15 Raising Sys.Break -- PTree may be corrupted: Failure("add_to_node: attempt to reinsert element into prefix tree")
2009-10-05 16:39:15 DB closed

Which means it isn't getting very far before finding corruption, even
on a brand new DB.  Any thoughts?  The only things I can even think of
to try are different arguments to "sks prebuild" or running "sks
cleandb" in case that's where the problem really is.

        Jonathon

        Jonathon Weiss <[hidden email]>
        MIT/IS&T/OIS  Server Operations



_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Re: corrupt PTree?

Jonathon Weiss
>
> > > Whenever I try to start the reconciliation server, it dies:
> >
> > > 2009-09-24 15:06:09 Raising Sys.Break -- PTree may be corrupted: Failure(=
> > "remove_from_node: attempt to delete non-existant element from prefix tree")
> > > 2009-09-24 15:06:09 DB closed
> > =20
> > > Even after I rebuilt the PTree from scratch (admittedly with the sks
> > > server running the whole time (that should be safe, rihgt?)) I
> > > continue to get the same error.  Any thoughts or recommendations?
> >
> > This can be made into a nonfatal error, but at the expense of
> > losing new and updated keys from the pTree, thus breaking gossip
> > synchronization.  Because the key DBs are separate from the pTree
> > DB, the pTree DB cannot be kept fully synchronized with the key DBs
> > via transactions that can't span both DBs.  Thus the ./KDB/time DB
> > is used as a workaround.  Unless/until the DBs are consolidated into
> > a single BDB instance, you'll need to either run "sks db" in read-only
> > mode (not currently supported) or shut it down during the "sks pbuild."
>
> Well, I tried shutting down sksd and rebuilding the PTree db, and am
> still losing.  The recon server log says:
>
> 2009-10-03 04:36:16 Opening log
> 2009-10-03 04:36:16 sks_recon, SKS version 1.1.0
> 2009-10-03 04:36:16 Copyright Yaron Minsky 2002-2003
> 2009-10-03 04:36:16 Licensed under GPL.  See COPYING file for details
> 2009-10-03 04:36:16 Opening PTree database
> 2009-10-03 04:36:16 Setting up PTree data structure
> 2009-10-03 04:36:16 PTree setup complete
> 2009-10-03 04:36:16 Initiating catchup
> 2009-10-03 04:36:16 Added 5000 hash-updates. Caught up to 1245856578.239066
> 2009-10-03 04:36:16 Added 5000 hash-updates. Caught up to 1245856578.424982
> 2009-10-03 04:36:16 Added 5000 hash-updates. Caught up to 1245856578.608924
> 2009-10-03 04:36:16 Added 5000 hash-updates. Caught up to 1245856579.369999
> 2009-10-03 04:36:17 Added 5000 hash-updates. Caught up to 1245856579.570681
> 2009-10-03 04:36:17 Added 5000 hash-updates. Caught up to 1245856579.757335
> 2009-10-03 04:36:35 Added 5000 hash-updates. Caught up to 1245856580.592294
> 2009-10-03 04:36:36 Added 5000 hash-updates. Caught up to 1245856580.785099
> 2009-10-03 04:36:38 Added 5000 hash-updates. Caught up to 1245856580.973482
> 2009-10-03 04:36:39 Added 5000 hash-updates. Caught up to 1245856581.867670
> 2009-10-03 04:36:45 Added 5000 hash-updates. Caught up to 1245856582.054365
> 2009-10-03 04:36:46 Added 5000 hash-updates. Caught up to 1245856582.240482
> 2009-10-03 04:36:46 Added 5000 hash-updates. Caught up to 1245856583.526407
> 2009-10-03 04:36:47 Raising Sys.Break -- PTree may be corrupted: Failure("add_to_node: attempt to reinsert element into prefix tree")
> 2009-10-03 04:36:47 DB closed
> 2009-10-05 16:39:10 Opening log
> 2009-10-05 16:39:10 sks_recon, SKS version 1.1.0
> 2009-10-05 16:39:10 Copyright Yaron Minsky 2002-2003
> 2009-10-05 16:39:10 Licensed under GPL.  See COPYING file for details
> 2009-10-05 16:39:10 Opening PTree database
> 2009-10-05 16:39:14 Setting up PTree data structure
> 2009-10-05 16:39:14 PTree setup complete
> 2009-10-05 16:39:14 Initiating catchup
> 2009-10-05 16:39:15 Raising Sys.Break -- PTree may be corrupted: Failure("add_to_node: attempt to reinsert element into prefix tree")
> 2009-10-05 16:39:15 DB closed
>
> Which means it isn't getting very far before finding corruption, even
> on a brand new DB.  Any thoughts?  The only things I can even think of
> to try are different arguments to "sks prebuild" or running "sks
> cleandb" in case that's where the problem really is.

I eventually got around to trying the sks cleandb, but that was
apparently a no-op:

        2009-10-14 18:38:43 Opening log
        2009-10-14 18:38:43 Opening KeyDB database
        2009-10-14 18:38:43 Keydb opened
        2009-10-14 18:38:43 Database already deduped
        2009-10-14 18:38:43 Database already merged


Anyone have any thoughts on this?  Either solutions, or next steps in
debugging that might garner some additional information?

        Jonathon

        Jonathon Weiss <[hidden email]>
        MIT/IS&T/OIS  Server Operations


_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel