Problem with Detection of Multiple rdiff-backup instances

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with Detection of Multiple rdiff-backup instances

dean (Bugzilla)
I've come across an issue with the way that rdiff-backup ensures that only one
server is accessing a backup dataset.

When rdiff-backup starts it checks the metadata to see if another instance of
rdiff-backup is performing a backup.  If it finds one then it checks the PID to
see if the other instance is still running and if not, assumes the other
instance has crashed so it regresses the previous incomplete backup.

I am running rdiff-backup on Amazon's cloud computing resource.  Whenever I
want to backup I start a new virtual server, run rdiff-backup and then shut it
down.

Recently I had a backup fail, probably because of a network outage.  All
subsequent backups refuse to run because rdiff-backup believes the failed rdiff-
backup instance is still running - even though this is clearly impossible
because it is a totally different instance of the virtual server.

This had me stumped for a while but I finally figured out what is happening.

Because I start a new virtual server instance each time and I run the backup
from a script, everything happens in a consistent order.  As a result the
instance of rdiff-backup running on the server for each backup session almost
always has the same PID.  So when a backup fails, the subsequent backup looks
at the metadata, finds the PID of the failed backup and sees that that PID is
still running - not realising that the other instance is actually itself.

I'm not sure of a way of working around this problem as the virtual machine is
always started from a known state and hasn't been running long enough to build
up any entropy to generate unique random numbers between different sessions.


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Detection of Multiple rdiff-backup instances

Steven Willoughby
Dean Cording wrote:
> I've come across an issue with the way that rdiff-backup ensures that only one
> server is accessing a backup dataset.
...

> Recently I had a backup fail, probably because of a network outage.  All
> subsequent backups refuse to run because rdiff-backup believes the failed rdiff-
> backup instance is still running - even though this is clearly impossible
> because it is a totally different instance of the virtual server.
>
> This had me stumped for a while but I finally figured out what is happening.
>
> Because I start a new virtual server instance each time and I run the backup
> from a script, everything happens in a consistent order.  As a result the
> instance of rdiff-backup running on the server for each backup session almost
> always has the same PID.  So when a backup fails, the subsequent backup looks
> at the metadata, finds the PID of the failed backup and sees that that PID is
> still running - not realising that the other instance is actually itself.

A cursory look at regress.py seems to confirm this behavior:
Specifically in check_pids() it says:

     if pid is not None and pid_running(pid):

This could say:

     if pid is not None and pid is not os.getpid() and pid_running(pid):

>
> I'm not sure of a way of working around this problem as the virtual machine is
> always started from a known state and hasn't been running long enough to build
> up any entropy to generate unique random numbers between different sessions.
>

The current time adds a little randomness.  A silly workaround would be
to call the following perl script before running rdiff-backup:

#!/usr/bin/perl
`/bin/true` for 0..int(rand(100));

This will increase the pid and should stop your job from failing
continuously.

Steven


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Detection of Multiple rdiff-backup instances

Jakob Unterwurzacher-2
Steven Willoughby schrieb:
> #!/usr/bin/perl
> `/bin/true` for 0..int(rand(100));
>
> This will increase the pid and should stop your job from failing
> continuously.
>
> Steven

Exactly.

Same thing in bash:

#!/bin/bash
for i in `seq 1 $((RANDOM%100))`; do /bin/true; done


But you should make sure that $RANDOM assumes different values after a
reboot (same for perl's rand(100), which is likely more advanced and
less likely to suffer from that problem).

Jakob


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Detection of Multiple rdiff-backup instances

Dominic Raferd-2
Jakob Unterwurzacher wrote:

> in bash:
>
> #!/bin/bash
> for i in `seq 1 $((RANDOM%100))`; do /bin/true; done
>
>
> But you should make sure that $RANDOM assumes different values after a
> reboot (same for perl's rand(100), which is likely more advanced and
> less likely to suffer from that problem).
>
> Jakob
>  
this can be done by prefixing your 'for' line with:

RANDOM=$(($(date +%s) % 32768))

this seeds the random number generator based on the number of seconds
since 1/1/70, so is likely to avoid repeats.

Dominic


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki