Activity

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Activity

Felix Rios
Hi,

Im wondering if there is anyone developing on rdiff-backup atm. Have not
seen so much activity in the cvs repository. Im asking because im
thinking of using it in larger scale and want to know if there is some
activity going on.

/

Felix




_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

D. Kriesel
Felix,

> Im wondering if there is anyone developing on rdiff-backup atm.

as for me, you are asking the crucial question concerning rdiff-backup.

There has not been a lot of development activity on rdiff-backup in the
recent times, and in addition, there are some fatal bugs in rdiff, causing
f*cked-up repositories especially when backuping to windows-hosted
targets.

Some time ago, I mailed the current maintainer, I guess it was Andrew
Ferguson, on the maintainance state of rdiff-backup -- however I got no
answer.

> Im asking because im
> thinking of using it in larger scale and want to know if there is some
> activity going on.

I also would like to use it in larger scale, as it is - to  the best of my
knowledge - the only free and flexible 4D-Backup-solution. However, I
experienced there are currently some caveats that I want to state here
(and propose some thoughts I had on what would make rdiff-backup more
useful).
  * The repository format. When recovering older files, rdiff-backup
really needs every single reverse delta, which is not only slow, but
also extremely fragile (if only one of those files is corrupted,
recovery will fail). A solution might be some additional,
larger-granularity reverse deltas that help speeding up recovery as well
as preserving integrity of "most of the timeline" even if some deltas
are corrupted.
  * Missing operators on an existing repository. For use of rdiff-backup
in larger scale it should be possible to e.g.
    - merge time steps
    - delete timesteps and correct deltas accordingly
    - remove subtrees (sometimes one backups large data sets by accident)
    - lots more.
  * Some bugs, especially on operating system independence. For example,
even though the issue was investigated sometime, it's still difficult to
use windows machines as target due to the "write only attribute on
folders" problem. Multiple users report mismatching hashes, and so on.
  * Maybe a dedicated network protocol would be nice (inspired by rsync),
but I think, this is less important.

Overall, am unsure whether it is more appropriate
  - to learn from the experience the great rdiffproject gave us and use
the base operators from rdiff-backup to maybe rewrite a whole new thing
with the above issues fixed (especially with a less fragile repository)
or
  - to continue fixing bugs on the small way in a project that seems
unmaintained (unfortunately, I lack the pro-grade python skills in order
to do it right).

I hope to start a discussion here on this thoughts, please contribute :)

Cheers
David.




_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

covici
In reply to this post by Felix Rios
I don't think so, yet what alternative is there which does the same job
as well -- particularly between two linux systems or to a hard drive on
the same system?

Felix Rios <[hidden email]> wrote:

> Hi,
>
> Im wondering if there is anyone developing on rdiff-backup atm. Have not
> seen so much activity in the cvs repository. Im asking because im
> thinking of using it in larger scale and want to know if there is some
> activity going on.
>
> /
>
> Felix
>
>
>
>
> _______________________________________________
> rdiff-backup-users mailing list at [hidden email]
> https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
> Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

--
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici
         [hidden email]

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

D. Kriesel
In reply to this post by D. Kriesel
A note I forgot in my first mail: I recently switched back to a
python-controlled rsync backup because of the issues mentioned in my first
mail ...



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

Dominic Raferd-3
In reply to this post by D. Kriesel
David,

On 01/08/2011 10:14, [hidden email] wrote:

> Felix,
>
>> Im wondering if there is anyone developing on rdiff-backup atm.
> as for me, you are asking the crucial question concerning rdiff-backup.
>
> There has not been a lot of development activity on rdiff-backup in the
> recent times, and in addition, there are some fatal bugs in rdiff, causing
> f*cked-up repositories especially when backuping to windows-hosted
> targets.
>
> Some time ago, I mailed the current maintainer, I guess it was Andrew
> Ferguson, on the maintainance state of rdiff-backup -- however I got no
> answer.

Yes I think rdiff-backup is currently unmaintained. Anyone who wants to
take it forward (and has the skills to do so, which unfortunately I have
not) might need to make a fork (which in due course could become
rdiff-backup2?)

>
>> Im asking because im
>> thinking of using it in larger scale and want to know if there is some
>> activity going on.
> I also would like to use it in larger scale, as it is - to  the best of my
> knowledge - the only free and flexible 4D-Backup-solution. However, I
> experienced there are currently some caveats that I want to state here
> (and propose some thoughts I had on what would make rdiff-backup more
> useful).
>    * The repository format. When recovering older files, rdiff-backup
> really needs every single reverse delta, which is not only slow, but
> also extremely fragile (if only one of those files is corrupted,
> recovery will fail). A solution might be some additional,
> larger-granularity reverse deltas that help speeding up recovery as well
> as preserving integrity of "most of the timeline" even if some deltas
> are corrupted.

Although using multiple delta files is slow if you are regressing back
through many previous backup runs (which is very rare in practice,
though of course very valuable when you need it), I don't see how
creating larger-granularity reverse-deltas would really make it more
robust, it would just make the archives bigger. Under normal circs I
expect a reverse-delta file covering 10 backups would be not much less
than 10x the size of each separate reverse-delta file. (It is different
if files have been backed up accidentally once and then removed from the
archive, accidents like this can certainly bloat an rdiff-backup
repository.) Although a single damaged reverse-delta file will 'break'
backup recovery this will only apply for backups earlier than the date
of the reverse-delta file.

>    * Missing operators on an existing repository. For use of rdiff-backup
> in larger scale it should be possible to e.g.
>      - merge time steps
>      - delete timesteps and correct deltas accordingly
>      - remove subtrees (sometimes one backups large data sets by accident)
>      - lots more.

yes these would be helpful especially to correct backup mistakes which
can permanently bloat a repository

>    * Some bugs, especially on operating system independence. For example,
> even though the issue was investigated sometime, it's still difficult to
> use windows machines as target due to the "write only attribute on
> folders" problem. Multiple users report mismatching hashes, and so on.

Yes, the best advice re a windows target seems to be: don't. I think you
can reliably use rdiff-backup.exe to backup windows data to a linux
target, though.

>    * Maybe a dedicated network protocol would be nice (inspired by rsync),
> but I think, this is less important.

and I would add:

  * ability to run a thorough verification of an rdiff-backup archive.
    The current verification process is flawed as has been discussed in
    earlier threads here. The best strategy at the moment is to run a
    verification for a date at or earlier than the earliest backup run
    date, and then to run one or two backups for dates between the
    earliest date and the current date, but although this provides 'high
    confidence' about the integrity of the overall archive it does not,
    at least from a theoretical point of view, guarantee that the full
    history of all files, whether currently present or deleted, can be
    recovered. The only way to get this at present is to run a separate
    verification for every previous backup run, which is not realistic
    for a long-standing repository.
  * add a switch to enable 'forced' regression of an archive. At present
    rdiff-backup will only regress an archive that it considers to be
    broken. (However you can work around this limitation.)

>
> Overall, am unsure whether it is more appropriate
>    - to learn from the experience the great rdiffproject gave us and use
> the base operators from rdiff-backup to maybe rewrite a whole new thing
> with the above issues fixed (especially with a less fragile repository)
> or
>    - to continue fixing bugs on the small way in a project that seems
> unmaintained (unfortunately, I lack the pro-grade python skills in order
> to do it right).
>
> I hope to start a discussion here on this thoughts, please contribute :)

There was a discussion a while ago here and there was a strong view that
the existing project should be fixed rather than a new one started, I
suppose because rdiff-backup as it stands is 99.5% perfect and any
project, even if it fixed the 0.5%, is likely to introduce new bugs and
failings. But in either case it needs someone to take on the
responsibility and workload. I think Daniel Miller began some work on a
replacement for rdiff-backup but I don't know where his project stands.

AFAIK the only other open source project like rdiff-backup is duplicity.
It has slightly different objectives, uses forward-deltas and has
different maintainers; maybe it is more actively maintained? But I value
the reverse-diff approach of rdiff-backup because it means the most
recent data is the most reliable and fastest to retrieve, and you can
continue to build up data history (for years even) without having to
start over at regular intervals. I would feel nervous if I had a 3 year
backup history but needed to use an original dataset and then 1000 daily
forward-diff files in order to get the latest backup of a file (which is
usually what you need). With rdiff-backup, if you do start to run out of
space, you can easily delete the older data without endangering more
recent backups.

Two other possibilities (neither of which I have tried) are:

  * use rsync (or scripts based on it such as rsnapshot) but store the
    backup datasets on a deduplication file system such as lessfs.
  * put the filesystem on top of lvm and just take and keep regular lvm
    snapshots, these can then be the backups. Recent linux kernels allow
    you to revert a volume to an earlier snapshot if required. I don't
    think this was an intended use of lvm snapshots, but it should work
    and be quick'n'easy too, though I don't think it could or should be
    used over a prolonged period because of space issues (and perhaps
    speed). Of course the backups remain in the same volume as the
    original data; they can be copied to another location but then they
    will each take up the full space of the data.


Dominic
http://www.timedicer.co.uk



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

D. Kriesel
> Yes I think rdiff-backup is currently unmaintained.
> Anyone who wants to take it forward (and has the skills to do so
> which unfortunately I have not) might need to make a fork
> (which in due course could become rdiff-backup2?)
> [...]
> There was a discussion a while ago here and there was a
> strong view that the existing project should be fixed rather than
> a new one started, I suppose because rdiff-backup as it stands
> is 99.5% perfect and any project, even if it fixed the 0.5%,
> is likely to introduce new bugs and failings.

I second that. I believe that it might be the optimal solution to fork a
project like rdiffbackup2, but not only to "fix" the existing rdiff-backup,
but also to reform some architectural things (like the repository format,
and a few other things named).

>I don't see how creating larger-granularity reverse-deltas
> would really make it more robust,
>it would just make the archives bigger.

Let us define the grade of robustness as the percentage of the backup
timeline a file is regressable to. Assume, one has got monthly reverse
deltas in addition to daily ones. Without the monthly deltas, one broken
daily delta makes a regression file for the complete backup timeline part
that is located earlier than the delta was created. With additional monthly
timelines, the corruption of some daily deltas just causes "gaps" in the
regressable timeline. Just to point out the principle, optimization needed
;). The archive would still be a lot smaller than a multi generation full
backup.  

However, maybe you are right and multigranularity reverse deltas are not the
way to do - still I see the problem of the "touchiness" of the backup
repository, with its lots and lots of single files that all need to be
correct without any file changed.

> >    * Missing operators on an existing repository. For use of
> > rdiff-backup in larger scale it should be possible to e.g.
> >      - merge time steps
> >      - delete timesteps and correct deltas accordingly
> >      - remove subtrees (sometimes one backups large data sets by
accident)
> >      - lots more.
>
> yes these would be helpful especially to correct backup mistakes which can
permanently bloat a repository

And in cases where you want to thin out the time accuracy of earlier backups
(e.g. reducing the cover profile from ten year old data from daily to
monthly), and in lots and lots of other cases ... :-)

> Yes, the best advice re a windows target seems to be: don't. I think you
can reliably use rdiff-backup.exe to backup windows data to a linux target,
though.
Yep, both of your statements are true, in my opinion. However, the hash
mismatches are not only reported by windows users. I believe that the main
problem with windows targets is indeed the write only attribute on folders
that is used by windows in another way as developers are used to from unix
based systems. Especially, the write only attribute on folders is *not* used
in windows to prevent accidental altering of a folder (like you are used to
in files). It is used to mark a folder as "special" for the windows
explorer, like the font folder or similar, see "cause" in the link
http://support.microsoft.com/kb/326549/en-us. The write only attribute on
folders is therefore managed by the system itself (personally, I think this
way of marking special folders is complete bullshit).
 
> and I would add:
> ability to run a thorough verification of an rdiff-backup archive.
> add a switch to enable 'forced' regression of an archive.

Agreed ;)

> AFAIK the only other open source project like rdiff-backup is duplicity.

I would wish for rdiff-backup2 to stick with the reverse delta concept for
the reasons you mentioned.




_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

Robert Nichols-2
In reply to this post by Dominic Raferd-3
On 08/01/2011 08:02 AM, [hidden email] wrote:

> and I would add:
>
> * ability to run a thorough verification of an rdiff-backup archive.
> The current verification process is flawed as has been discussed in
> earlier threads here. The best strategy at the moment is to run a
> verification for a date at or earlier than the earliest backup run
> date, and then to run one or two backups for dates between the
> earliest date and the current date, but although this provides 'high
> confidence' about the integrity of the overall archive it does not,
> at least from a theoretical point of view, guarantee that the full
> history of all files, whether currently present or deleted, can be
> recovered. The only way to get this at present is to run a separate
> verification for every previous backup run, which is not realistic
> for a long-standing repository.
> * add a switch to enable 'forced' regression of an archive. At present
> rdiff-backup will only regress an archive that it considers to be
> broken. (However you can work around this limitation.)

and I would add:

* Correct handling of hard-linked files.  This is currently broken in
two places.  (1) During a verify operation, rdiff-backup will
complain about a missing checksum for each link other than the one
that appears first in the mirror_metadata file.  (2) If you add more
hard links to a file that already had multiple hard links, then a
restore operation may result in those links being divided into two
or more groups, each with its own, independent copy of the file.

Auditing the database to detect and correct item (2) is decidedly
non-trivial since the reverse-diffs of the metadata file are also
affected.

--
Bob Nichols     "NOSPAM" is really part of my email address.
                 Do NOT delete it.


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

Wojciech Stryjewski
> I also would like to use it in larger scale, as it is - to  the best of my
> knowledge - the only free and flexible 4D-Backup-solution. However, I

If you don't mind using a free but non open source program, then there
is www.crashplan.com. Although their business model is providing
online space for backup storage, I believe you can still use their
backup application for free to do backups between your own machines.
And their software works on Windows, Linux, Mac, and Sun.

> * The repository format. When recovering older files, rdiff-backup
> really needs every single reverse delta, which is not only slow, but
> also extremely fragile (if only one of those files is corrupted,
> recovery will fail). A solution might be some additional,
> larger-granularity reverse deltas that help speeding up recovery as well
> as preserving integrity of "most of the timeline" even if some deltas
> are corrupted.

Git solves this problem with a pack.depth config option with a default
value of 50. Once the max delta depth is reached, the next version of
the file would be a separate copy.

One new feature I would personally like to see is a push/pull between
repositories that can move all or some snapshots in a network and disk
space efficient manner (e.g. without having to temporarily restore all
the data just to compute a different set of deltas for the 2nd
repository).

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

D. Kriesel
Thanks for the crash plan hit - I'm definetly going to give this one a try :)



Wojciech Stryjewski <[hidden email]> schrieb:

>> I also would like to use it in larger scale, as it is - to  the best
>of my
>> knowledge - the only free and flexible 4D-Backup-solution. However, I
>
>If you don't mind using a free but non open source program, then there
>is www.crashplan.com. Although their business model is providing
>online space for backup storage, I believe you can still use their
>backup application for free to do backups between your own machines.
>And their software works on Windows, Linux, Mac, and Sun.
>

--
D. Kriesel / dkriesel.com

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

D. Kriesel
In reply to this post by Wojciech Stryjewski
Crash plan looks nice just viewed by its features, not sure if the technical quality and stability reaches for example rsync, any experience? -- D. Kriesel / dkriesel.com
_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

Wojciech Stryjewski
On Mon, Aug 1, 2011 at 2:40 PM, D. Kriesel <[hidden email]> wrote:
> Crash plan looks nice just viewed by its features, not sure if the technical
> quality and stability reaches for example rsync, any experience? -- D.
> Kriesel / dkriesel.com

I haven't used it personally, but a friend of mine does daily backups
with it on a multi-terabyte data set, and it seems to work well for
him.

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

Joe Steele-2
In reply to this post by Robert Nichols-2
On 8/1/2011 10:29 AM, Robert Nichols wrote:

> and I would add:
>
> * Correct handling of hard-linked files. This is currently broken in
> two places. (1) During a verify operation, rdiff-backup will
> complain about a missing checksum for each link other than the one
> that appears first in the mirror_metadata file. (2) If you add more
> hard links to a file that already had multiple hard links, then a
> restore operation may result in those links being divided into two
> or more groups, each with its own, independent copy of the file.
>

FYI, I had submitted some patches which I think fix the problems
with hard-linked files:

http://savannah.nongnu.org/bugs/?26848

> Auditing the database to detect and correct item (2) is decidedly
> non-trivial since the reverse-diffs of the metadata file are also
> affected.
>

The patches won't correct any issues with previous backups in a
repository, but they should correct the repository going forward.

I've been using the patches for some time now without problems.
I would be interested in having some other people test them out.

--Joe

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

Alexander Samad
all i can say is me 2 I have asked this question a few times over the
last couple of years, like the idea and the premiss

But nobody with the skills has wanted to take up the job

Alex

On Tue, Aug 2, 2011 at 7:24 AM, Joe Steele <[hidden email]> wrote:

> On 8/1/2011 10:29 AM, Robert Nichols wrote:
>>
>> and I would add:
>>
>> * Correct handling of hard-linked files. This is currently broken in
>> two places. (1) During a verify operation, rdiff-backup will
>> complain about a missing checksum for each link other than the one
>> that appears first in the mirror_metadata file. (2) If you add more
>> hard links to a file that already had multiple hard links, then a
>> restore operation may result in those links being divided into two
>> or more groups, each with its own, independent copy of the file.
>>
>
> FYI, I had submitted some patches which I think fix the problems with
> hard-linked files:
>
> http://savannah.nongnu.org/bugs/?26848
>
>> Auditing the database to detect and correct item (2) is decidedly
>> non-trivial since the reverse-diffs of the metadata file are also
>> affected.
>>
>
> The patches won't correct any issues with previous backups in a repository,
> but they should correct the repository going forward.
>
> I've been using the patches for some time now without problems. I would be
> interested in having some other people test them out.
>
> --Joe
>
> _______________________________________________
> rdiff-backup-users mailing list at [hidden email]
> https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
> Wiki URL:
> http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
>

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Activity

Piotr Karbowski
In reply to this post by Joe Steele-2
On 08/01/2011 11:24 PM, Joe Steele wrote:

>
> FYI, I had submitted some patches which I think fix the problems with
> hard-linked files:
>
> http://savannah.nongnu.org/bugs/?26848
>
>
> The patches won't correct any issues with previous backups in a
> repository, but they should correct the repository going forward.
>
> I've been using the patches for some time now without problems. I would
> be interested in having some other people test them out.

I will test this patch, thanks.

Also, can you let me know what other critical bugs rdiff-backup have?

-- Piotr.

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki