Server won't start after copy of DB

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Server won't start after copy of DB

Arnold-27
Hello,

My Debian SKS-server was running very well, until this weekend.

I moved the database to an SSD hard disk and since that moment it
immediately exits after it starts.

These are the last lines of my db.log and recon.log, with debug level 100
(same result at 10 or 9):
$ tail -n 3 /var/log/sks/db.log
2010-06-22 22:06:00 Membership: <ADDR_INET
76.185.38.113:11370>(keyserver.gingerbear.net 11370 ), <ADDR_INET
76.191.185.172:11370>(www.mainframe.cx 11370 ), <ADDR_INET
83.169.43.165:11370>(keyserver.ccc-hanau.de 11370 ), <ADDR_INET
195.111.98.30:11370>(keys.niif.hu 11370 ), <ADDR_INET
69.134.24.120:11370>(keys.rpm5.org 11370 ), <ADDR_INET
198.49.244.154:11370>(keys.n3npq.net 11370 ), <ADDR_INET
85.113.243.78:11370>(keyserver.nijkamp.net 11370 ), <ADDR_INET
140.186.70.102:11370>(keys.sugarlabs.org 11370  ), <ADDR_INET
84.16.235.61:11370>(keyserver.pki.scientia.net 11370 ), <ADDR_INET
66.109.111.12:11370>(keyserver.kjsl.org 11370  ), <ADDR_INET
203.33.246.146:11370>(keyserver.oeg.com.au 11370 ), <ADDR_INET
208.84.222.190:11370>(ice.mudshark.org 11370 )
2010-06-22 22:06:00 Opening KeyDB database
2010-06-22 22:06:00 Shutting down database

$ tail  /var/log/sks/recon.log
2010-06-22 22:06:00 sks_recon, SKS version 1.1.0
2010-06-22 22:06:00 Copyright Yaron Minsky 2002-2003
2010-06-22 22:06:00 Licensed under GPL.  See COPYING file for details
2010-06-22 22:06:00 recon port: 11370
2010-06-22 22:06:00 Opening PTree database
2010-06-22 22:06:00 Setting up PTree data structure
2010-06-22 22:06:00 PTree setup complete
2010-06-22 22:06:00 Initiating catchup
2010-06-22 22:06:00 Marshalling: LogQuery: (5000,1276993585.849555)
2010-06-22 22:06:00 DB closed


The system is running
- Debian Stable 'Lenny' with
- Linux 2.6.32-bpo.4-686 (from the back ported archive)
- SKS-server v 1.1.0

The database is now on this partition:
- /dev/sdb13 on /srv type ext4 (rw,nosuid,nodev,user_xattr)

With the change of HD, I also moved from ext3 to ext4, which is better for
an SSD. Ext4 is also the reason to use the newer, back ported Linux kernel
(running this Linux kernel for several weeks already). The mount option
'user_xattr' was already in use for one or two weeks to accommodate
'calendar server' (Apple's open source CalDav server).

Also this weekend several packages were updated. I think I did not reboot
after installing these updates. So the move of the database to the SSD was
the first restart of the SKS-server.

  The following packages will be upgraded:
    bind9 bind9-doc bind9-host bind9utils dnsutils libbind9-50 libdns55
    libisc52 libisccc50 libisccfg50 liblwres50 libsmbclient libwbclient0
    samba samba-common samba-doc smbclient smbfs sudo swat


Here the log lines (default debug level 4) of the last working situation and
the first problem, after the copy to the new SSD.

2010-06-20 02:26:15 Hashes recovered from <ADDR_INET 83.169.43.165:11371>
2010-06-20 02:26:15     419EEB36F639BD5CB7FB89D684566FBF
2010-06-20 02:26:15     91FB304214BE8D083426E37480435103
2010-06-20 02:26:15     C165660D7A7D0AA0EFA1198BAC1811DD
2010-06-20 02:26:15     D2F3FCAE96B06E766E891F24997F76C4
2010-06-20 02:26:25 Requesting 4 missing keys from <ADDR_INET
83.169.43.165:11371>, starting with 419EEB36F639BD5CB7FB89D684566FBF
2010-06-20 02:26:25 4 keys received
2010-06-20 02:26:26 Added 7 hash-updates. Caught up to 1276993585.849555
2010-06-20 02:26:28 DB closed
2010-06-20 05:10:58 Opening log
2010-06-20 05:10:58 sks_recon, SKS version 1.1.0
2010-06-20 05:10:58 Copyright Yaron Minsky 2002-2003
2010-06-20 05:10:58 Licensed under GPL.  See COPYING file for details
2010-06-20 05:10:58 Opening PTree database
2010-06-20 05:10:58 Setting up PTree data structure
2010-06-20 05:10:58 PTree setup complete
2010-06-20 05:10:58 Initiating catchup
2010-06-20 05:11:00 DB closed

From 2:26 to 5:10, I copied several partitions.


So, has anybody a clue why the SKS-server refuses operation? The log lines
(at debug level 10 or 100) don't help me further. Are there any known issues
with SKS and ext4 or with the last Debian update to bind, samba, etc? Any
other things I can try? (Oh, I copied the /srv/pub/sks directories again,
now with "rsync -xaX" instead of "cp -a", but with the same result...)

Please, help. TIA!

   Arnold


_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Server won't start after copy of DB

C.J. Adams-Collier KF7BMP
Yowza.  There could be many things causing this.  I can't tell you how
to fix this, but I can recommend using kvm or qemu or xen or something
to split off part of your server so that you can do tasks related only
to your sks server on a separate system.

Sorry I can't help more.

On Wed, 2010-06-23 at 00:49 +0200, Arnold wrote:

> Hello,
>
> My Debian SKS-server was running very well, until this weekend.
>
> I moved the database to an SSD hard disk and since that moment it
> immediately exits after it starts.
>
> These are the last lines of my db.log and recon.log, with debug level 100
> (same result at 10 or 9):
> $ tail -n 3 /var/log/sks/db.log
> 2010-06-22 22:06:00 Membership: <ADDR_INET
> 76.185.38.113:11370>(keyserver.gingerbear.net 11370 ), <ADDR_INET
> 76.191.185.172:11370>(www.mainframe.cx 11370 ), <ADDR_INET
> 83.169.43.165:11370>(keyserver.ccc-hanau.de 11370 ), <ADDR_INET
> 195.111.98.30:11370>(keys.niif.hu 11370 ), <ADDR_INET
> 69.134.24.120:11370>(keys.rpm5.org 11370 ), <ADDR_INET
> 198.49.244.154:11370>(keys.n3npq.net 11370 ), <ADDR_INET
> 85.113.243.78:11370>(keyserver.nijkamp.net 11370 ), <ADDR_INET
> 140.186.70.102:11370>(keys.sugarlabs.org 11370  ), <ADDR_INET
> 84.16.235.61:11370>(keyserver.pki.scientia.net 11370 ), <ADDR_INET
> 66.109.111.12:11370>(keyserver.kjsl.org 11370  ), <ADDR_INET
> 203.33.246.146:11370>(keyserver.oeg.com.au 11370 ), <ADDR_INET
> 208.84.222.190:11370>(ice.mudshark.org 11370 )
> 2010-06-22 22:06:00 Opening KeyDB database
> 2010-06-22 22:06:00 Shutting down database
>
> $ tail  /var/log/sks/recon.log
> 2010-06-22 22:06:00 sks_recon, SKS version 1.1.0
> 2010-06-22 22:06:00 Copyright Yaron Minsky 2002-2003
> 2010-06-22 22:06:00 Licensed under GPL.  See COPYING file for details
> 2010-06-22 22:06:00 recon port: 11370
> 2010-06-22 22:06:00 Opening PTree database
> 2010-06-22 22:06:00 Setting up PTree data structure
> 2010-06-22 22:06:00 PTree setup complete
> 2010-06-22 22:06:00 Initiating catchup
> 2010-06-22 22:06:00 Marshalling: LogQuery: (5000,1276993585.849555)
> 2010-06-22 22:06:00 DB closed
>
>
> The system is running
> - Debian Stable 'Lenny' with
> - Linux 2.6.32-bpo.4-686 (from the back ported archive)
> - SKS-server v 1.1.0
>
> The database is now on this partition:
> - /dev/sdb13 on /srv type ext4 (rw,nosuid,nodev,user_xattr)
>
> With the change of HD, I also moved from ext3 to ext4, which is better for
> an SSD. Ext4 is also the reason to use the newer, back ported Linux kernel
> (running this Linux kernel for several weeks already). The mount option
> 'user_xattr' was already in use for one or two weeks to accommodate
> 'calendar server' (Apple's open source CalDav server).
>
> Also this weekend several packages were updated. I think I did not reboot
> after installing these updates. So the move of the database to the SSD was
> the first restart of the SKS-server.
>
>   The following packages will be upgraded:
>     bind9 bind9-doc bind9-host bind9utils dnsutils libbind9-50 libdns55
>     libisc52 libisccc50 libisccfg50 liblwres50 libsmbclient libwbclient0
>     samba samba-common samba-doc smbclient smbfs sudo swat
>
>
> Here the log lines (default debug level 4) of the last working situation and
> the first problem, after the copy to the new SSD.
>
> 2010-06-20 02:26:15 Hashes recovered from <ADDR_INET 83.169.43.165:11371>
> 2010-06-20 02:26:15     419EEB36F639BD5CB7FB89D684566FBF
> 2010-06-20 02:26:15     91FB304214BE8D083426E37480435103
> 2010-06-20 02:26:15     C165660D7A7D0AA0EFA1198BAC1811DD
> 2010-06-20 02:26:15     D2F3FCAE96B06E766E891F24997F76C4
> 2010-06-20 02:26:25 Requesting 4 missing keys from <ADDR_INET
> 83.169.43.165:11371>, starting with 419EEB36F639BD5CB7FB89D684566FBF
> 2010-06-20 02:26:25 4 keys received
> 2010-06-20 02:26:26 Added 7 hash-updates. Caught up to 1276993585.849555
> 2010-06-20 02:26:28 DB closed
> 2010-06-20 05:10:58 Opening log
> 2010-06-20 05:10:58 sks_recon, SKS version 1.1.0
> 2010-06-20 05:10:58 Copyright Yaron Minsky 2002-2003
> 2010-06-20 05:10:58 Licensed under GPL.  See COPYING file for details
> 2010-06-20 05:10:58 Opening PTree database
> 2010-06-20 05:10:58 Setting up PTree data structure
> 2010-06-20 05:10:58 PTree setup complete
> 2010-06-20 05:10:58 Initiating catchup
> 2010-06-20 05:11:00 DB closed
>
> From 2:26 to 5:10, I copied several partitions.
>
>
> So, has anybody a clue why the SKS-server refuses operation? The log lines
> (at debug level 10 or 100) don't help me further. Are there any known issues
> with SKS and ext4 or with the last Debian update to bind, samba, etc? Any
> other things I can try? (Oh, I copied the /srv/pub/sks directories again,
> now with "rsync -xaX" instead of "cp -a", but with the same result...)
>
> Please, help. TIA!
>
>    Arnold
>
> _______________________________________________
> Sks-devel mailing list
> [hidden email]
> http://lists.nongnu.org/mailman/listinfo/sks-devel

_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Server won't start after copy of DB

C.J. Adams-Collier KF7BMP
In reply to this post by Arnold-27
oh.  I just thought of something: You will likely get this done more
quickly if you grab the snapshot and start over.

$ sudo dpkg --purge sks
$ sudo rm -rf /var/lib/sks
$ sudo apt-get install sks
$ sudo -u debian-sks  -s
$ screen -S fetchdump
$ cd /var/lib/sks/dump
$ for i in {0..113}
do
  wget ftp://ftp.pramberger.at/services/keyserver/keydump/`printf "keydump-sks-%0.4i.pgp.bz2\n" $i`
done'
$ cd ..
$ /usr/sbin/sks merge dump/*.pgp

YMMV.

On Wed, 2010-06-23 at 00:49 +0200, Arnold wrote:

> Hello,
>
> My Debian SKS-server was running very well, until this weekend.
>
> I moved the database to an SSD hard disk and since that moment it
> immediately exits after it starts.
>
> These are the last lines of my db.log and recon.log, with debug level 100
> (same result at 10 or 9):
> $ tail -n 3 /var/log/sks/db.log
> 2010-06-22 22:06:00 Membership: <ADDR_INET
> 76.185.38.113:11370>(keyserver.gingerbear.net 11370 ), <ADDR_INET
> 76.191.185.172:11370>(www.mainframe.cx 11370 ), <ADDR_INET
> 83.169.43.165:11370>(keyserver.ccc-hanau.de 11370 ), <ADDR_INET
> 195.111.98.30:11370>(keys.niif.hu 11370 ), <ADDR_INET
> 69.134.24.120:11370>(keys.rpm5.org 11370 ), <ADDR_INET
> 198.49.244.154:11370>(keys.n3npq.net 11370 ), <ADDR_INET
> 85.113.243.78:11370>(keyserver.nijkamp.net 11370 ), <ADDR_INET
> 140.186.70.102:11370>(keys.sugarlabs.org 11370  ), <ADDR_INET
> 84.16.235.61:11370>(keyserver.pki.scientia.net 11370 ), <ADDR_INET
> 66.109.111.12:11370>(keyserver.kjsl.org 11370  ), <ADDR_INET
> 203.33.246.146:11370>(keyserver.oeg.com.au 11370 ), <ADDR_INET
> 208.84.222.190:11370>(ice.mudshark.org 11370 )
> 2010-06-22 22:06:00 Opening KeyDB database
> 2010-06-22 22:06:00 Shutting down database
>
> $ tail  /var/log/sks/recon.log
> 2010-06-22 22:06:00 sks_recon, SKS version 1.1.0
> 2010-06-22 22:06:00 Copyright Yaron Minsky 2002-2003
> 2010-06-22 22:06:00 Licensed under GPL.  See COPYING file for details
> 2010-06-22 22:06:00 recon port: 11370
> 2010-06-22 22:06:00 Opening PTree database
> 2010-06-22 22:06:00 Setting up PTree data structure
> 2010-06-22 22:06:00 PTree setup complete
> 2010-06-22 22:06:00 Initiating catchup
> 2010-06-22 22:06:00 Marshalling: LogQuery: (5000,1276993585.849555)
> 2010-06-22 22:06:00 DB closed
>
>
> The system is running
> - Debian Stable 'Lenny' with
> - Linux 2.6.32-bpo.4-686 (from the back ported archive)
> - SKS-server v 1.1.0
>
> The database is now on this partition:
> - /dev/sdb13 on /srv type ext4 (rw,nosuid,nodev,user_xattr)
>
> With the change of HD, I also moved from ext3 to ext4, which is better for
> an SSD. Ext4 is also the reason to use the newer, back ported Linux kernel
> (running this Linux kernel for several weeks already). The mount option
> 'user_xattr' was already in use for one or two weeks to accommodate
> 'calendar server' (Apple's open source CalDav server).
>
> Also this weekend several packages were updated. I think I did not reboot
> after installing these updates. So the move of the database to the SSD was
> the first restart of the SKS-server.
>
>   The following packages will be upgraded:
>     bind9 bind9-doc bind9-host bind9utils dnsutils libbind9-50 libdns55
>     libisc52 libisccc50 libisccfg50 liblwres50 libsmbclient libwbclient0
>     samba samba-common samba-doc smbclient smbfs sudo swat
>
>
> Here the log lines (default debug level 4) of the last working situation and
> the first problem, after the copy to the new SSD.
>
> 2010-06-20 02:26:15 Hashes recovered from <ADDR_INET 83.169.43.165:11371>
> 2010-06-20 02:26:15     419EEB36F639BD5CB7FB89D684566FBF
> 2010-06-20 02:26:15     91FB304214BE8D083426E37480435103
> 2010-06-20 02:26:15     C165660D7A7D0AA0EFA1198BAC1811DD
> 2010-06-20 02:26:15     D2F3FCAE96B06E766E891F24997F76C4
> 2010-06-20 02:26:25 Requesting 4 missing keys from <ADDR_INET
> 83.169.43.165:11371>, starting with 419EEB36F639BD5CB7FB89D684566FBF
> 2010-06-20 02:26:25 4 keys received
> 2010-06-20 02:26:26 Added 7 hash-updates. Caught up to 1276993585.849555
> 2010-06-20 02:26:28 DB closed
> 2010-06-20 05:10:58 Opening log
> 2010-06-20 05:10:58 sks_recon, SKS version 1.1.0
> 2010-06-20 05:10:58 Copyright Yaron Minsky 2002-2003
> 2010-06-20 05:10:58 Licensed under GPL.  See COPYING file for details
> 2010-06-20 05:10:58 Opening PTree database
> 2010-06-20 05:10:58 Setting up PTree data structure
> 2010-06-20 05:10:58 PTree setup complete
> 2010-06-20 05:10:58 Initiating catchup
> 2010-06-20 05:11:00 DB closed
>
> From 2:26 to 5:10, I copied several partitions.
>
>
> So, has anybody a clue why the SKS-server refuses operation? The log lines
> (at debug level 10 or 100) don't help me further. Are there any known issues
> with SKS and ext4 or with the last Debian update to bind, samba, etc? Any
> other things I can try? (Oh, I copied the /srv/pub/sks directories again,
> now with "rsync -xaX" instead of "cp -a", but with the same result...)
>
> Please, help. TIA!
>
>    Arnold
>
> _______________________________________________
> Sks-devel mailing list
> [hidden email]
> http://lists.nongnu.org/mailman/listinfo/sks-devel

_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Server won't start after copy of DB

Arnold-27
In reply to this post by Arnold-27
Hello Peter, C.J.,

On 06/23/2010 10:04 AM, Peter Pramberger wrote:
> Am Mi, 23.06.2010, 00:49, schrieb Arnold:
>> I moved the database to an SSD hard disk and since that moment it
>> immediately exits after it starts.
>
> Silly question, but have you done a "db_recover" in the new location?

Not a silly question! ;-)  I am not so familiar with all the db_xxx tools.
(I did db_verify, though.) Now, I've done db_recover (without options), but
the same result. Note that the 'old' server was stopped in a controlled way,
it was no crash.

> I'm not sure if the region files somehow reference to the original
> database location (or maybe inodes)...

Thanks, it must be the inodes...

Both the old and the new path are /srv/pub/sks/.... (with a sym-link from
/var/lib/sks to that location). So, it's not the path. This raises the
question, how does one restore a database to a new system after a crash?

Now, I mounted the old partition somewhere else and sym-linked /srv/pub/sks
to that new location. SKS is running happily again!!!


On 06/23/2010 05:40 PM, C.J. Adams-Collier wrote:
> oh.  I just thought of something: You will likely get this done more
> quickly if you grab the snapshot and start over.

I am afraid you are right... Now that the server is running again, I can
create my own dump instead of downloading it and rebuild from that. This
will be some job for the weekend.

If, in the meantime, anybody comes up with a solution to reuse an existing
database on a new partition with new inodes, I'd be more than happy to learn
about it! :-)

Thanks for the help!

    Arnold


_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Server won't start after copy of DB

Arnold-27
On 06/24/2010 12:18 AM, Arnold wrote:
> On 06/23/2010 10:04 AM, Peter Pramberger wrote:
>> Am Mi, 23.06.2010, 00:49, schrieb Arnold:
>>> I moved the database to an SSD hard disk and since that moment it
>>> immediately exits after it starts.
>> I'm not sure if the region files somehow reference to the original
>> database location (or maybe inodes)...
> Thanks, it must be the inodes...

> On 06/23/2010 05:40 PM, C.J. Adams-Collier wrote:
>> oh.  I just thought of something: You will likely get this done more
>> quickly if you grab the snapshot and start over.
> I am afraid you are right... Now that the server is running again, I can
> create my own dump instead of downloading it and rebuild from that. This
> will be some job for the weekend.

So, this is what I did. I kept the server running with the database
physically located at the old hard disk. From there I made a full key dump
(totally up to date and much faster than downloading several GB). Then, I
rebuilt (full build) the whole database from scratch on the new hard disk
using the key dump.
Now it is working on the new partition! :-)

The question now is, how to do a backup that can be restored to a new disk?

Best regards,
   Arnold


_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Server won't start after copy of DB

C.J. Adams-Collier KF7BMP
On Sun, 2010-06-27 at 14:44 +0200, Arnold wrote:

> > On 06/23/2010 05:40 PM, C.J. Adams-Collier wrote:
> >> oh.  I just thought of something: You will likely get this done more
> >> quickly if you grab the snapshot and start over.
> > I am afraid you are right... Now that the server is running again, I can
> > create my own dump instead of downloading it and rebuild from that. This
> > will be some job for the weekend.
>
> So, this is what I did. I kept the server running with the database
> physically located at the old hard disk. From there I made a full key dump
> (totally up to date and much faster than downloading several GB). Then, I
> rebuilt (full build) the whole database from scratch on the new hard disk
> using the key dump.
> Now it is working on the new partition! :-)
>
> The question now is, how to do a backup that can be restored to a new disk?
>
> Best regards,
>    Arnold
Isn't that the question you just answered? :)


_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (205 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Server won't start after copy of DB

Jeff Johnson-12

On Jun 27, 2010, at 2:25 PM, C.J. Adams-Collier wrote:

>>
>> The question now is, how to do a backup that can be restored to a new disk?
>>
>> Best regards,
>>   Arnold
>
> Isn't that the question you just answered? :)
>

Well, not quite ...

Procedures for backup of a Berkeley DB (with logs that are path/record sensitive)
are described in the doco @oracle.com. An SKS keyserver database is no different.

73 de Jeff


_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel
Reply | Threaded
Open this post in threaded view
|

Backup/restore DB (Re: Server won't start after copy of DB)

Arnold-27
On 06/27/2010 08:28 PM, Jeff Johnson wrote:
> On Jun 27, 2010, at 2:25 PM, C.J. Adams-Collier wrote:
>>> The question now is, how to do a backup that can be restored to a new disk?
>>
>> Isn't that the question you just answered? :)
>
> Well, not quite ...

Indeed, I had my old database in good working order available (I just had to
mount the partition) to do the dump. From that I "restored" (= rebuilt) the
new database.
This is very different from recovery from backup after the database is
destroyed.


> Procedures for backup of a Berkeley DB (with logs that are path/record sensitive)
> are described in the doco @oracle.com. An SKS keyserver database is no different.

OK, I've looked it up and here it is for the archive. The procedures are
described in chapter 5 of the doc's:
http://www.oracle.com/technology/documentation
/berkeley-db/db/gsg_txn/c/index.html

Important steps:
- Before backing up the files (offline), do a checkpoint (db_checkpoint
command line utility).
- Use catastrophic recovery when you are recovering your databases from a
previously created backup to an empty directory (db_recover command line
utility with the the -c option).

Although I did run db_recover, I did not use the '-c' option. It would have
saved me some time...

Hope this helps others! :-)

   Arnold


_______________________________________________
Sks-devel mailing list
[hidden email]
http://lists.nongnu.org/mailman/listinfo/sks-devel

signature.asc (205 bytes) Download Attachment