State of the rdiff-backup project

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

State of the rdiff-backup project

Claus-Justus Heine
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi there,

so this is not an email of some fancy package maintainer complaining
about a missing upstream-responsible.

So what is this? I'm using rdiff-backup -- successfully, including
critical restores -- for more than five years now, maybe much longer,
just don't remember any more.

There are some shortcomings, some missing distro support (why the heck
is using all of the Debian related world still V1.2.8? Hey)

One other point is the reason for this email:

- - the efficiency of rdiff-backup sometimes just sucks. No flames
intended. And first: oh well, efficiency. But hey: it does its job. Its
slow, but data safety is first place -- at least I never experienced any
data-lossage caused by a failing backup.

- - still, performance sucks: if a previous backup failed, then the
"regression" regularly takes ages (I am not talking about hours, but
several days for large backup sets)

- - if a remote directory is large, meaning has many entries, say some
multiple of 1e3, then backup speed sucks, especially, when the backup
set carries a large history record (i.e. one reaching back for half a
year or so)

AFAIK, these problems are not new. And so what. Still, a slightly
improved performance for the necessary "regression" after a failed
backup, or some slight speedup in the case of massively changed
backup-sets would really really be nice (actually: the regression case
is just the thing which really is a pain in the ass).

My question here: is there any work going on in this direction? I know
that there is no official "movement", but any hints about existing
trials, analysis, experience, work-arounds would be rellay appreciated.

OTHO, I just started to do some hacking. Did some "porting" to pypy,
which needed a rewrite of the cpython add-on module. Unfortunately, all
this is coincident with a larger hardware-update of my backup-server. So
thing are going on slowly. My current impression is that rdiff-backup
does an extremely bad job on directories with many entries and with a
large history. Maybe it calls readdir() just too often (and maybe some
associated sorting algorithm).

I do not volunteer as maintainer. But is there anyone's else experience
available, even some trials, partial insight, analyzed test-cases which
could shine some light on rdiff-backup obfuscated internals?

Some heretical last question: is there a real -- non-profit --
alternative, as featureful, as solid, heavily incremental? To my
knowledge: no. (once upon a time in the past, when Gentoo killed the
rdiff-backup, there were some comments from the Gentoo-maintainer, but
these somehow deemed me low featured).

Thanks (not least to the original authors of this nice "beast"), best,

Claus

- --
Claus-Justus Heine                      [hidden email]

http://www.claus-justus-heine.de/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJVzOVmAAoJEF8rbXubOwvKv/UP/i8gGpHTGFfP15sMjIPBb5Gm
Qx+BwGQzVib2Tqi6gYDoAZKeaTuG5McEEynXZB7m8UvKybFFOTrHckz1R+K5dUwz
I9HBZ7IJ4gMkYO8w7RcE7MnYZmotBxDji0G7tNtQlbBTloxQ+JxAC56cRQHtrEQi
/eWDinnqUBQCJmtyi4DMWAcVNBSyJ8mc5vYIl7svUMksjhLBkDjA9NpTIFcQPzzw
eTg7RtyhY8tV6tBb9YeailKe1BwYiEQG2AKIGe7uixHpbuousgvNaBqX61mvi6Y4
MbNJf0pUDMe4W6u4O49kWbbeT1wu8DSYWbC3mE5CHXBiOHrlu0/B0jLWmE6p5ebw
0XRrfBb8SYqA2Qd0ZVZhlARhVQn7+nrcUKbDI4HdqCN2HAMovB6awujGKO9Uc6HU
Qq47rDFtxRKa0JyRxdw+W1XH1iPXCX4Zgt9BiHssS4y4wVrWGl3yh3lHpF+VZR6k
PUR4qrYf62hfSAK1dZvAkhhX9gumqCKq9tmeIyiOSkJiJbg1D+yOawIoTrTXiVhZ
65sMlMphx3RzpjE39hhfLdupZvngqdNbchXBMZf2t7uUaUGTtLIOx+oZVIZrdtVw
Thdl71bUYxzG+ujFk1w54jPQqrxntMrG+kcVNLdlEewaciXO6GAeYqEKs/ZMGpXj
hEwaeNS67Prq291Vbs/t
=ddOk
-----END PGP SIGNATURE-----

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Tobias Leupold
> (once upon a time in the past, when Gentoo killed the rdiff-backup, there
> were some comments from the Gentoo-maintainer, but these somehow deemed me
> low featured).

Meanwhile, it's back in Gentoo. Probably too many people complained about
rdiff-backup being gone. I'm a Gentoo user and I was really searching for
alternatives. I didn't find any. There's a program called rsnapshot which
looked very promising, easy to use and solid. It also creates incremental
backups, and using hard links, all states are accessible as-is. Sadly, it (of
course, due to the hard link concept) can't increment changed files, as rdiff-
backup does. So if you backup large files, a new version will be created as
soon as it changes.

I would really love to see rdiff-backup's development going on. It's the
backup tool I have been using for ever, and it's really fine.

Probably, a first step would be to port it to Python 3? Just saying ...

Cheers, Tobias

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Claus-Justus Heine
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Am 13.08.2015 um 21:06 schrieb Tobias Leupold:
>> (once upon a time in the past, when Gentoo killed the rdiff-backup, t
here
>> were some comments from the Gentoo-maintainer, but these somehow deem
ed me
>> low featured).
>
> Meanwhile, it's back in Gentoo. Probably too many people complained ab
out
> rdiff-backup being gone. I'm a Gentoo user and I was really searching
for
> alternatives. I didn't find any. There's a program called rsnapshot wh
ich
> looked very promising, easy to use and solid. It also creates incremen
tal
> backups, and using hard links, all states are accessible as-is. Sadly,
 it (of
> course, due to the hard link concept) can't increment changed files, a
s rdiff-
> backup does. So if you backup large files, a new version will be creat
ed as
> soon as it changes.
>
> I would really love to see rdiff-backup's development going on. It's t
he
> backup tool I have been using for ever, and it's really fine.
>
> Probably, a first step would be to port it to Python 3? Just saying ..
.
>

Yeah. Looked at it. But there are also sayings: Python is just fine. But
for the bug of Python3 (that it exists at all)

Not going so far for myself, I'm really no python guy. But: I really
would like to have the "regression takes ages" and "crowded directories
take ages" being resolved. OR OR OR: would like to understand this
issue. Cannot claim that I will be able to fix it. But I would very much
like to understand what the heck is going on there. Just doesn't feel sa
ne.

Very kind thanks for the feed-back,

Claus

> Cheers, Tobias
>
> _______________________________________________
> rdiff-backup-users mailing list at [hidden email]
> https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
> Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBac
kupWiki
>


- --
Claus-Justus Heine                      [hidden email]

http://www.claus-justus-heine.de/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJVzO0WAAoJEF8rbXubOwvKTbUP+wckT2+GpOMRlER3SoswE4V6
L+Gh7rVXT4wc41LfNSvcdQqUed1qect3+DSJ12mugdRlQROM3XZb8QhE16XUAzsn
wVdf/fXCwFHhsTzaK4U66mxj7dclP+2w6nuG4VE+fwTgfD5t4V1PQRXLyQWODjkQ
tPY+zzpamHBQxUnsNcDsw/9V4oz7T+kJFRXXe6W5sOktUjjbw5BgIZ6AQInzkiuA
3WQzoeZsRUUzRe0eF+SuJ2tGNP/9eEw/bM3nn9FWEZlq5hhpIb4cqk6y7pkpfja1
3mYDf9GEYMAZmcidEg3X4e5JsdufIXawvhejINRxI/oWzj5o2nOhrqwE8yq+XRFf
g/eyYTpsCLcGh8fznRfwN6j08aszZakAZrzp3hQHhsbHe+b9wrgiEqqJyc35ZWCD
S5c9yMQ2eDr1t3gayOuRyVf5oM4XjEkuOAKEiHY5ez+k1Ne+oLavzt3+x+afRFJv
AFlgOHYzt+483HXFuMic+EGwTNMTgiKztOks2cXvn6V2r3iGfE2tAWBEBt2yy/wA
nwMtPCb3ASMn6JQYcSjxnuln5PQhf6o1EIvg4vo4P1UApK1HGVcWOOBOpdVRJgtI
1h3CQzwxCR6zJh2Aume5U9IeKmD4wVj+0S31Y6VYUtqpjuAsB0K813EP+CGU1mDI
U1F5XGiN7+HPsZxJRKsZ
=FW48
-----END PGP SIGNATURE-----

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Robert Nichols-2
On 08/13/2015 02:16 PM, Claus-Justus Heine wrote:
>  I'm really no python guy. But: I really
> would like to have the "regression takes ages" and "crowded directories
> take ages" being resolved. OR OR OR: would like to understand this
> issue. Cannot claim that I will be able to fix it. But I would very much
> like to understand what the heck is going on there. Just doesn't feel sa
> ne.

I doubt that the "regression takes ages" problem can be fixed within
rdiff-backup. It's inherently a complex operation that requires
searching throughout the archive for things that aren't consistent with
the previous state. Remember that you can't trust that the latest
metadata files are consistent with the current state of the mirror and
increments.  In large part it's due to use of the filesystem as a
database, with bits of information scattered in file names in the
increments directory and various metadata files. You're not going to
change that without a major rewrite.

I suppose one solution to the regression issue is to store the archive
in a filesystem or LVM volume that supports snapshots.  Rather than let
rdiff-backup do the regression, stop it and restore the snapshot. I
suspect the penalty in space (transient, until the snapshot is deleted)
and performance for the backup would be serious. And that still leaves
the issue of regressing more than the last level.

Like you, I'm no Python guy. Every time I try to study it, I end up as a
lump in a snake's belly. I think it's because there are some things
about the language that I hate (starting with the use of whitespace as a
syntax element) and the incompatibility of major versions. And then
there is the tendency of Python programmers to believe that stack
backtraces are an acceptable substitute for meaningful error messages.
It all leaves a bad taste that I just can't get around.

--
Bob Nichols     "NOSPAM" is really part of my email address.
                 Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Tobias Leupold
> You're not going to change that without a major rewrite.

> Like you, I'm no Python guy.

I like Python. And I like rdiff-backup. But perhaps, some skilled programmer
has the energy to take the "chance" of the questionable state of rdiff-backup
by only continuing the concept of rdiff-backup and rewriting some similar
program from scratch, e. g. in C++. Perhaps with some metadata database to
solve the regression issue (which really sucks). And so on.

Not that I would be remotely skilled enough in C++ to do so ... I'm only
thinking about what could be done ...

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Claus-Justus Heine
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Am 14.08.2015 um 19:30 schrieb Tobias Leupold:
>> You're not going to change that without a major rewrite.
>
>> Like you, I'm no Python guy.

I do not have any serious objections against Python. Its widely used,
and the limiting time factor is IO efficiency, and not the question
about the fasted for-loop ever possible.

Concerning a "major rewrite": for me this is out of scope. I was rather
thinking about a code review. I first want to figure out whether there
are maybe places which can be optimized significantly, e.g. excessive
readir() stuff or something like this, or any other duplicate file IO
which doesn't hurt on small backup sets. Also before a major code
rewrite could be tackled at all, one would first have to understand what
rdiff-backup is doing beneath its skin. Otherwise it would not be clear
what the precise goal of such a major rewrite should be. Ok: to produce
something not worse than rdiff-backup. But to clarify this, you first
have to understand the existing program.

Cheers,

Claus

>
> I like Python. And I like rdiff-backup. But perhaps, some skilled prog
rammer
> has the energy to take the "chance" of the questionable state of rdiff
- -backup
> by only continuing the concept of rdiff-backup and rewriting some simi
lar
> program from scratch, e. g. in C++. Perhaps with some metadata databas
e to
> solve the regression issue (which really sucks). And so on.
>
> Not that I would be remotely skilled enough in C++ to do so ... I'm on
ly
> thinking about what could be done ...
>
> _______________________________________________
> rdiff-backup-users mailing list at [hidden email]
> https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
> Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBac
kupWiki
>


- --
Claus-Justus Heine                      [hidden email]

http://www.claus-justus-heine.de/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJVzjbEAAoJEF8rbXubOwvK7sAP/iyuj+1CGjOBt1VMUAf1Tr88
aM3/a1VevcwjLQ/KjyrExMYlzYI5tOU9Dx+JcB2pJzvEIcvb9vBl32hO0CUNUHvt
k+yz6D8C7nkXwZoZWDZH0Ct3yMPYfMCX/pa5B1OqrPLHaRrG6QIWNubDRlf8pKJL
e+lnQyyhfdOY1OBLH2GGQAuH4dNsTAwZYkXrvw1n0QA1mXU6cqbacZdv6mFi4wCb
O7gmzbA96agxmh9Z0zSg520IS9FPt0e2uwMzgTTKdOUX4zqD09v9/wWTGJSGjhhp
NQ3kQGylQRiPZAhnDzyeWjLy3/3nivj2F3aBcymhNZOqpVmtqgxEYgg9Ck6AokQM
dHmimEG6GR9VtxpU/MfwSAPXhNrr5IktofdyLebc+GYDmq/RlZmCoFl2WM70Prhw
6PU39kvC2Oa44whmJIK4aAyVUr9q92T5hv3VT+dI5vRh0rBSfDW9r2ypnMMu1+n5
GpfBU0ndOAOboqsufcyDBm5xakTthzFMS1mxFJq4dJjNdO/17naZNm3DG5O4d7hE
CDoCZ6tb+o3OipxF7AOoX0ZoxMs+lGlEOOHZx0pn3YW1oiAod9E6LYm9ba84dxAK
7NJfgqE9NGzuh8AP2GV099l8Ces32qNHI17W2n3IfMVwlt7VMXGvKidlR+vsVcwv
d1QGXv7KFo9Q+0y0+kLz
=qJ/a
-----END PGP SIGNATURE-----

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Tobias Leupold
> But to clarify this, you first  have to understand the existing program.

Of course! And speaking of me, I don't ... I was just thinking about the whole
situation.

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Dominic Raferd-3
In reply to this post by Tobias Leupold
My 2p...

Rdiff-backup has limitations and it would be *really* good if someone
who understood python could step up and maintain the code, but I don't
see the problem with regressions. Yes they are slow but they should be
emergency operations reserved for rare circumstances. If you are
frequently experiencing broken backup session then I think you should
look at why that is happening. We only use rdiff-backup within our lan
and backups always complete. I have only needed regressions when we have
inadventently backed up a lot of extraneous data, when I use them to
overcome bloating of the repository. For offsite backup (of the entire
set of repositories) we use rsync. I don't find backup speeds with
rdiff-backup particularly slow BTW, but we run the backups at a quiet
time when speed is not critical.

v1.2.8 is the official stable version, that's why Debian uses it. v1.3.3
is officially 'unstable' although I haven't heard of any problems with
it (and haven't used it).

I keep repositories on ext4 - I tried btrfs but found it very slow (on
our vm) and the big wins that I sought (compression, deduplication) made
it slower still - and btrfs deduplication is still buggy.

Dominic
http://www.timedicer.co.uk

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Claus-Justus Heine
In reply to this post by Tobias Leupold
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Am 14.08.2015 um 20:50 schrieb Tobias Leupold:
>> But to clarify this, you first  have to understand the existing progr
am.
>
> Of course! And speaking of me, I don't ... I was just thinking about t
he whole
> situation.

I tried to give it a round a couple of years ago, and gave up on it for
lack of time. And I do not know how things will be going on this time.
We will see.

Cheers,

Claus

>
> _______________________________________________
> rdiff-backup-users mailing list at [hidden email]
> https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
> Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBac
kupWiki
>


- --
Claus-Justus Heine                      [hidden email]

http://www.claus-justus-heine.de/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJVzjmuAAoJEF8rbXubOwvKhoEP/21Or6fBcSaSesR8jJn/l+A0
F1wUOu74ogcy2WzXDP1sBv2cSu3gevVeYBbQJOGrOafeBk7oXbkagGH/u6JKjCGy
6EhMzjjwLaehniU0hdlQB7XsjxMFE7Js8bJlnSMDgHHWvMpgsadOtPrl+qJ189nY
b181rTSdRmHSpFYxjCgjf5jcckwS7M9XkLpmz7/BJRzXh6pp43iIdJH8STd/4+1b
w80ApGs7eYbbi+Fh87y/dNNnVN4LpvZjZSB6qpPQR5f6TQ93EHoJRu3t4cHphzqZ
RN5iwK3jOygb6noCUiu3D0dRtCGccgP5JXZNrBLkyBDBzkdWcF/6Bd1XKwq1PmEH
9bjZld2HgdIA/3eUQUodJ1X2Hl27EZfSy8xrARbdLhJkK5nf1L7IemkUfYkCWTq3
Dl7sQ5X56KlyAgIDTxcPoh4vx7U7/IUXI1BILF/f0/vU3wgNnrImihBns380Udye
sm1TZsqKiQ94HViU1rW5VMx3ITk3dzAZiZ+Osml+r0LGzLMslCR8YFDTw6vZsA25
3oh/9pp29ZXNoRAgVWyCc8roCg/P2ko72DYf78qenGrWoU0YpMsxmA6mUKIcKw6k
La/1XUJZBqSsH2wmX/0BPrbsgIz5QWlob6bGafzDs2l4LoP9x9fsv4OtP5emrg7Q
X5L1lxju197veLHUQuyM
=n9Sq
-----END PGP SIGNATURE-----

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Claus-Justus Heine
In reply to this post by Dominic Raferd-3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Am 14.08.2015 um 20:53 schrieb Dominic Raferd:
> My 2p...
>
> Rdiff-backup has limitations and it would be *really* good if someone
> who understood python could step up and maintain the code, but I don't
> see the problem with regressions. Yes they are slow but they should be
> emergency operations reserved for rare circumstances. If you are
> frequently experiencing broken backup session then I think you should
> look at why that is happening. We only use rdiff-backup within our lan
> and backups always complete. I have only needed regressions when we ha
ve
> inadventently backed up a lot of extraneous data, when I use them to
> overcome bloating of the repository. For offsite backup (of the entire
> set of repositories) we use rsync. I don't find backup speeds with
> rdiff-backup particularly slow BTW, but we run the backups at a quiet
> time when speed is not critical.

Well, I'am actually in progress of figuring things out on my side.
However, I experience slowness at times. Also: even if regressions are
extra-ordinary events, it still does not feel right that they take so
long. At least I want to try to understand what is going on there.

>
> v1.2.8 is the official stable version, that's why Debian uses it. v1.3
.3
> is officially 'unstable' although I haven't heard of any problems with
> it (and haven't used it).

Silly me. You are right. Maybe I should volunteer as maintainer and as
first act copy v1.3.3 tp v1.4.0 and declare it stable ;)

> I keep repositories on ext4 - I tried btrfs but found it very slow (on
> our vm) and the big wins that I sought (compression, deduplication) ma
de
> it slower still - and btrfs deduplication is still buggy.

For my "slowness" experience this may very well be the problem, at the
moment I'm running btrfs. However, as I'm right now replacing the backup
hardware this would be the time to change something. We will see.

Many thanks for your feed-back,

Claus


- --
Claus-Justus Heine                      [hidden email]

http://www.claus-justus-heine.de/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJVzjrkAAoJEF8rbXubOwvKlZIQALXxUAme+GPIMxxMGUThpMsg
P2EjSTCZeJXRWMHg3obGLWFg7eVf3L2NNuPd/i94rEyHTVQFNpXYC5+aTH3I9eKM
8GnapWqlZWF2ldfw4GLe8RGzm45e+WkeVF3MkiwIaMTQJ1jtOIF/kEtmeNgMWpeo
Mt8H/J3Rzk3Iz+zRbI+nUKRLq0ZQ9LDpiL+ob+IdnLSHSKT6LKiQz4o6VKHfimf6
EcfuaecD6J6HRoaPOTcepk73r3IgjFHMx5yYdGveav7TbloBiNJqycaxS3XXfNLf
I3QBJt0AqXJ4dnXSv77XrpnyysMCpDcLFQEwncU5q7iOBEB9gpeOWt5TGpteawXP
4W+Mw5hqnOK0f37EwwdKEVU6KUU87l/jOkMpO3uPYEjdX9ukykeLqwGq3gZakSry
0tGrzOvUuzXouIFW0bFNyBjQv1WBGHupzEm/R3sShoLe1Yn7G2viUt8dgLioNvAq
IvoD80q18bRElzWHfY1iHgno3lIkqIw4XKeEMw7SLydW74warYjBimVxnJ7Mf7jp
uafhW1hrqUVkKspWEmfLJ64wrEEJsecfTV7MGaJsybZ8zTg2OhS39H08gqPXQcc3
DxWmzyK+ZMSx3fV3puXT7IBQdBq280Sd/DxnLDmf0P05PM8A2zPheuvr/h/CpZma
88qGbv58T6MN2rNkKwKA
=0HCf
-----END PGP SIGNATURE-----

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Claus-Justus Heine
In reply to this post by Robert Nichols-2
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Am 14.08.2015 um 19:12 schrieb Robert Nichols:
> On 08/13/2015 02:16 PM, Claus-Justus Heine wrote:
> increments.  In large part it's due to use of the filesystem as a
> database, with bits of information scattered in file names in the
> increments directory and various metadata files. You're not going to
> change that without a major rewrite.

Mmmh. Still I think it would be a good idea to first understand what's
going on there.

> I suppose one solution to the regression issue is to store the archive
> in a filesystem or LVM volume that supports snapshots.  Rather than le
t
> rdiff-backup do the regression, stop it and restore the snapshot. I
> suspect the penalty in space (transient, until the snapshot is deleted
)
> and performance for the backup would be serious. And that still leaves
> the issue of regressing more than the last level.

Well, that would be a nice solution. I do not understand this "more than
one level". If we are only talking about a failed backup, then one level
is the worst thing which can happen: last backup succeeded, this backup
fails, discard the file-system snapshot, fix the underlying error and re
try.

> Like you, I'm no Python guy. Every time I try to study it, I end up as
 a
> lump in a snake's belly. I think it's because there are some things
> about the language that I hate (starting with the use of whitespace as
 a
> syntax element) and the incompatibility of major versions. And then
> there is the tendency of Python programmers to believe that stack
> backtraces are an acceptable substitute for meaningful error messages.
> It all leaves a bad taste that I just can't get around.

Ok, so: for myself my largest programming experience is in C and C++,
more or less. Still: Python is widely used, it seems to be convenient
for beginners, and: rdiff-backup is written in Python, no need to start
something new. Seemingly, there is no real pressure to migrate to v3 ...

Concerning the Python language personally I rather feel the lack of
internal functions (private, protected etc.). However, so what. C++ is
really really by far (maybe farest) imperfect and still carries tons of
historical ballast.

Many thanks for the feedback, best,

Claus

- --
Claus-Justus Heine                      [hidden email]

http://www.claus-justus-heine.de/
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBCAAGBQJVzjveAAoJEF8rbXubOwvKnAcQAKzJTyTo/eUq2jnGuRkQbAAE
jbXukFwSbYukTpGmLC6ibdrRJd3AIuXQg0DnOPBV9DhxM/bBPBoYqXfUZ5GQT43Y
f4JIIiHk/cBrh41IhQIEWGJc12v/DnSaaToNIAR/yYPEb0o3yimVnMeXD10hPVAD
jm/JRhajcUIHkBJesk7YcJRhdko/aOAoNMcNSh+goM51EQNu6CF2zrvjvJzVriFy
BDGe/EyQaWDniynJ7OFj1P67IL3DJ1MkQETdr4Ehu6qUGOFRBfF5myKQondT+l+K
oD0N/O/xlu8JqFtFR2XNJQC8JNg6wEfOJOdPVwA1Xh0qeCOLcwSFKJGjPuGQ/O/5
8I7ThQwK1B4693HbiUtMuiTPuN8ogMW/J4CFlgPJEUQDI5b3kWZBQvbaihZHmbq0
RORw8eFAmTaSrnmJ4WYwPu6R51+BfvvikMV9b0JnSOAPPZZV+LfyTieQDQILR6Qk
D4/4LHQnIjHX4izPRFvQnb5dsh55zEgwbI9+w65siVRIgNPcHVYD6AIYSJ+8QHCf
Ebu2mL96LMKQMcvVt4CiRVp7W0AY16O66chkw1taHTKxfYl1j/v2MFisJCt0nZrw
pqU3NZR781EXH5Z8wiWr/wbIgUweFol1Z0lSxu5oMO4XD0IwglmIRIUESNftApOL
j4o9tKFYdxIrcxuZbh80
=ALgH
-----END PGP SIGNATURE-----

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Janne Peltonen-2
In reply to this post by Claus-Justus Heine
Hi!

On Thu, Aug 13, 2015 at 08:43:53PM +0200, Claus-Justus Heine wrote:
> - - still, performance sucks: if a previous backup failed, then the
> "regression" regularly takes ages (I am not talking about hours, but
> several days for large backup sets)

How large are your backup sets? Mine (on ext4 on luks on mdadm raid1 mirror on
a pair of usb3 disks) recover from a failed backup in well under an hour (it
actually took a lot less than an hour even when I had them on ext3 on mdadm
raid1 mirror on a pair of usb2 disks). But they are small by modern standards;
the current mirrors are less than a 100 GB each.

Would it be possible simply split extremely large directory trees into their
subtree components? Or is the problem that your backup sets just have horribly
large directories?

Another point: if the backups succeed, my scripts rsync all the backups to
another file system, so that if the main backup fails, I can just rsync
everything back from the "metabackup" (since the backup failed, there will be a
consistent previous version there). Maybe slightly faster than rdiff-backup
failure recovery. Or I can just swap the roles of backup and metabackup (by
remounting the block devices differently). Disk space is cheap. :)

(I'm not actually answering your questions, just offering my 2 cents... ;) )


BR
--
Janne Peltonen <[hidden email]> PGP Key ID: 0x9CFAC88B

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Frank Crawford
In reply to this post by Claus-Justus Heine
On Fri, 2015-08-14 at 21:00 +0200, Claus-Justus Heine wrote:

> Am 14.08.2015 um 20:53 schrieb Dominic Raferd:
> > My 2p...
> >
> > Rdiff-backup has limitations and it would be *really* good if
> > someone
> > who understood python could step up and maintain the code, but I
> > don't
> > see the problem with regressions. Yes they are slow but they should
> > be
> > emergency operations reserved for rare circumstances. If you are
> > frequently experiencing broken backup session then I think you
> > should
> > look at why that is happening. We only use rdiff-backup within our
> > lan
> > and backups always complete. I have only needed regressions when we
> > ha
> ve
> > inadventently backed up a lot of extraneous data, when I use them
> > to
> > overcome bloating of the repository. For offsite backup (of the
> > entire
> > set of repositories) we use rsync. I don't find backup speeds with
> > rdiff-backup particularly slow BTW, but we run the backups at a
> > quiet
> > time when speed is not critical.
>
> Well, I'am actually in progress of figuring things out on my side.
> However, I experience slowness at times. Also: even if regressions
> are
> extra-ordinary events, it still does not feel right that they take so
> long. At least I want to try to understand what is going on there.
As said elsewhere, the slow regression may be due to the I/O that rdiff
has to go through.

A different but related issue I just hit was a massive change to
metatdata on all the files (reset selinux contexts across the system)
caused the backup time to go from around 1 hour to over 10 hours, and
no indication of what was the problem.  The actual backup size wasn't
much, but having to read and compare all the metadata just sucked, and
it wasn't until later I realised why.  Of course it was also a once-off
issue.

> >
> > v1.2.8 is the official stable version, that's why Debian uses it.
> > v1.3
> .3
> > is officially 'unstable' although I haven't heard of any problems
> > with
> > it (and haven't used it).
>
> Silly me. You are right. Maybe I should volunteer as maintainer and
> as
> first act copy v1.3.3 tp v1.4.0 and declare it stable ;)
That would be good, but also to get all the patches that various
distributions, etc, have also applied would help.  I'm pretty sure that
the recent one to handle the upgrade to librsync isn't in the official
repo, but everyone needs it to work.

Also, given the slow pace of change, python3 will come around to bite
us soon enough.  It may be a few years before leading distributions
drop python2 entirely, but it will happen and then rdiff-backup has to
either adapt or be dropped.

> > I keep repositories on ext4 - I tried btrfs but found it very slow
> > (on
> > our vm) and the big wins that I sought (compression, deduplication)
> > ma
> de
> > it slower still - and btrfs deduplication is still buggy.
>
> For my "slowness" experience this may very well be the problem, at
> the
> moment I'm running btrfs. However, as I'm right now replacing the
> backup
> hardware this would be the time to change something. We will see.
Even these sort of comments need to be tracked, so people realise that
it isn't always rdiff-backup that is the issue.  The underlying system
can have big effects.

> Many thanks for your feed-back,
>
> Claus

Regards
Frank
>
> _______________________________________________
> rdiff-backup-users mailing list at [hidden email]
> https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
> Wiki URL:
> http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
>
_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

signature.asc (180 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Yves Martin
  Hello,

I just read "regress.py" and I am concerned by this comment:
<<
Currently this does not recover hard links.  This may make the
regressed directory take up more disk space, but hard links can still
be recovered.
>>

As I use rdiff-backup for my own laptop backups, it "often" fails and
regress for many reasons (no more disk space, got into sleep state...)

So do I understand well that there is a potential disk space leak in my
repository after multiple regressions leaving "non recovered hard
links" ?

If so, how should I evaluate this disk space loss and eventually recover
it "by hand" ?

Regards
--
Yves Martin



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Frank Crawford
Yves,

I don't think it is a major issue for you, unless you are regularly linking and unlinking files.  While I haven't studied the code, what I believe it is taking about is that when an archive is first made it will duplicate hard-links, if they exist in the source, unless --no-hard-links is specified.

However, during regression, if the change was the deletion of one of the linked files, it will not relink it, but create the file as a separate file.  This is not surprising as finding related linked files is a very hard problem, involving searching the entire archive, each time.

However, these days, hard links are not that common, as most people prefer symbolic links.

Also, ultimately, when you finally expire old archives, both copies of the files will be removed, so it is not that you will lose space long-term.

If you do want to see how common links are on your computer you can run something like:

find / -type f -links +1 -ls

and then try it on the areas that you usual have changes in, e.g. /home.

Regards
Frank

On Sat, 2015-08-15 at 10:30 +0200, Yves Martin wrote:
  Hello,

I just read "regress.py" and I am concerned by this comment:
<<
Currently this does not recover hard links.  This may make the
regressed directory take up more disk space, but hard links can still
be recovered.


As I use rdiff-backup for my own laptop backups, it "often" fails and
regress for many reasons (no more disk space, got into sleep state...)

So do I understand well that there is a potential disk space leak in my
repository after multiple regressions leaving "non recovered hard
links" ?

If so, how should I evaluate this disk space loss and eventually recover
it "by hand" ?

Regards

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Robert Nichols-2
On 08/15/2015 05:17 AM, Frank Crawford wrote:
> However, these days, hard links are not that common, as most people
> prefer symbolic links.

You might want to look in the directory trees under /usr/share/zoneinfo
and /var/lib/yum/yumdb and reconsider that statement. Files in that
latter tree can have over 1000 hard links, and there are changes there
whenever you install/update/uninstall packages. rdiff-backup does a
pretty horrible job of keeping the archive in sync with the source
when there are changes, and regressing a backup makes it worse. I have
an very complex audit that I run after every backup to put the archive
back in sync with the source tree.

--
Bob Nichols     "NOSPAM" is really part of my email address.
                 Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Frank Crawford
On Sat, 2015-08-15 at 09:12 -0500, Robert Nichols wrote:

> On 08/15/2015 05:17 AM, Frank Crawford wrote:
> > However, these days, hard links are not that common, as most people
> > prefer symbolic links.
>
> You might want to look in the directory trees under
> /usr/share/zoneinfo
> and /var/lib/yum/yumdb and reconsider that statement. Files in that
> latter tree can have over 1000 hard links, and there are changes
> there
> whenever you install/update/uninstall packages. rdiff-backup does a
> pretty horrible job of keeping the archive in sync with the source
> when there are changes, and regressing a backup makes it worse. I
> have
> an very complex audit that I run after every backup to put the
> archive
> back in sync with the source tree.

You are quiet likely right, although I doubt that you will have much
activity going on under /usr/share/zoneinfo.

However, /var/lib/yum/yumdb is a slightly different issue.  A quick
scan on my system shows that there are quiet a few hard links, but
depending on how often you do updates will affect changes there.  Also,
the size of them is usually less than a single block, so, yes, you will
chew up space but not a lot.  Personally, if I was concerned about
space, I'd be excluding /var/lib/yum/yumdb anyway, and rebuild it if
you ever do a full restore.

Regards
Frank

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: State of the rdiff-backup project

Marc Haber-36
In reply to this post by Claus-Justus Heine
On Thu, Aug 13, 2015 at 08:43:53PM +0200, Claus-Justus Heine wrote:
> There are some shortcomings, some missing distro support (why the heck
> is using all of the Debian related world still V1.2.8? Hey)

The debian package tracker refers to http://rdiff-backup.nongnu.org/
as the upstream home page, which lists 1.2.8 as current stable release
and 1.3.3, from 2009, as development/unstable version.

There is no bug report advising about a new project homepage or about
a new stable release in the Debian bug tracker system.

Generally, if a project changes its home page, it's a good idea to at
least have the old address point to the new one so that people pick up
the new developent. In case of rdiff-backup, either of this didn't
happen.

Greetings
Marc, who stopped rdiff-backup after a new upstream version broke
backwards compatiblity to history and the proposed solution was to
throw away history and to start over in earnest

--
-----------------------------------------------------------------------------
Marc Haber         | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany    |  lose things."    Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki