Discussion of poll winning feature: repository editing

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Discussion of poll winning feature: repository editing

Ben Escoto
For the people that voted for repository editing, what did you want
exactly?  Here's the first thing that crossed my mind:  you have a
repository at /repo and maybe a directory like /repo/dir that is
taking up too much space.  You could run something like:

        rdiff-backup-editor delete /repo/dir

that would delete /repo/dir and remove all history of it from the
repository.  It might also be possible to do something like:

        rdiff-backup-editor move /repo/dir /repo/newname

which would move /repo/dir to /repo/newname, and alter all the history
so that all the changes that took place in /repo/dir now seem like
they took place in /repo/newname.  Or what else did you have in mind?

Also more generally, why do you want to edit an existing repository?
Is it to save disk space?


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Discussion of poll winning feature: repository editing

Charles Duffy-6
Ben Escoto wrote:
> For the people that voted for repository editing, what did you want
> exactly?
As for me, I want the ability to merge old increments, such that beyond
a given point I can abandon daily increments and store only weekly
increments; and beyond that only monthly. As the dataset I work with has
frequent enough changes that the same regions will be modified multiple
times during the course of a week, such combination will indeed decrease
(in some cases nontrivially) the amount of data being archived.

> Also more generally, why do you want to edit an existing repository?
> Is it to save disk space?
>  
Yes. I'm storing backups from potentially a very large number of remote
systems, and space is very much a potential issue.


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Discussion of poll winning feature:repository editing

Hans F. Nordhaug
* Charles Duffy <[hidden email]> [2006-01-01]:

> Ben Escoto wrote:
> >For the people that voted for repository editing, what did you want
> >exactly?
> As for me, I want the ability to merge old increments, such that beyond
> a given point I can abandon daily increments and store only weekly
> increments; and beyond that only monthly. As the dataset I work with has
> frequent enough changes that the same regions will be modified multiple
> times during the course of a week, such combination will indeed decrease
> (in some cases nontrivially) the amount of data being archived.
>
> >Also more generally, why do you want to edit an existing repository?
> >Is it to save disk space?
> >  
> Yes. I'm storing backups from potentially a very large number of remote
> systems, and space is very much a potential issue.

Ditto - to both points bove. (I didn't vote though...)

Hans


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Discussion of poll winning feature: repository editing

Randall Nortman-3
In reply to this post by Ben Escoto
On Sun, Jan 01, 2006 at 12:25:57AM -0600, Ben Escoto wrote:
> For the people that voted for repository editing, what did you want
> exactly?  Here's the first thing that crossed my mind:  you have a
> repository at /repo and maybe a directory like /repo/dir that is
> taking up too much space.  You could run something like:
>
> rdiff-backup-editor delete /repo/dir

That would be very useful.  I have had a need of this on many
occasions when I decide (for space or performance reasons) to exclude
something from the backup.  But adding it to the exclude list doesn't
remove the history of it from the archive.  On archives with long
history (and I would ideally like some of my archives to keep history
forever), this means that you can never really reclaim that space.
Make sure it works on individual files as well as directories -- I
would suggest allowing the same syntax currently used in
include/exclude lists, including globbing.

>
> that would delete /repo/dir and remove all history of it from the
> repository.  It might also be possible to do something like:
>
> rdiff-backup-editor move /repo/dir /repo/newname
>
> which would move /repo/dir to /repo/newname, and alter all the history
> so that all the changes that took place in /repo/dir now seem like
> they took place in /repo/newname.  Or what else did you have in mind?

You would only want to do that to mirror a similar move in the source
filesystem, but rdiff-backup wil pick up that move anyway.  The one
advantage of doing it as above would be to save the space taken up by
the gzipped copy of the file in its old location.  I suppose that if
you're moving big files or directory trees around, that could be a lot
of space, but you also have to remember to coordinate the repository
change with the filesystem change.  But you lose the record of the
rename, which means you will no longer be able to restore to the
original state -- unless the repository remembers when the move took
place and restores appropriately, as the more modern version control
systems do.  Overall, I'd rank this a much lower priority than the
delete feature.

I'd like to echo the requests to delete particular increments (i.e.,
merge increments).  This could be used if some part of the source
filesystem temporarily grew very large but you don't really want to
save that data.  You could then just delete/merge all the increments
that were made during that period.  But more importantly, it can be
used to implement the traditional daily/weekly/monthly backup
retention schemes.  For example, you may want to keep daily increments
for a month, then weekly increments for 6 months, then monthly
increments indefinitely.  This could be implemented as a script
(outside of rdiff-backup) which deletes/merges the daily increments
into weeklies once they are more than a month old, then merges
weeklies into monthlies.  For filesystems with lots of changes, this
could save a lot of space.

And one more idea -- splitting/merging repositories.  I currently back
up different parts of my filesystems into different repositories,
mostly because I want different backup frequencies and retention
policies.  For example, I back up mail every 15 minutes and keep
increments only for 30 days, but I back up /usr only once a day and
keep it for at least a year.  I split up the filesystem using a set of
fairly complex include/exclude rules that define each repository.
Sometimes, I might want to change my mind about how things are divided
up, so it would be useful to be able to merge two repositories
together, or split one repository into two, presumably by using
include/exclude lists to define what should remain in the repository
and what should be split out into another repository.  This ranks as a
"nice to have" feature for me, but once the other repository editing
features are in place, this might turn out to be easy to implement.

> Also more generally, why do you want to edit an existing repository?
> Is it to save disk space?

Yup, mostly to save disk space, but also to fix mistakes (such as
backing up something you didn't mean to, which might include sensitive
data -- passwords, etc..)

Randall


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Discussion of poll winning feature: repository editing

Errol Siegel
The example you gave would apply to me also,
particularly if it can work with individual files as
well as directories.  I would want this both to save
space and to reorganize things.  I have a few sources
I am backing up that have many large files.  There
have been times when I wanted to reorganize things a
bit but opted not to do so because of the penalty I
would pay with the backup.


       
               
__________________________________
Yahoo! for Good - Make a difference this year.
http://brand.yahoo.com/cybergivingweek2005/


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Discussion of poll winning feature: repository editing

Blair Zajac
In reply to this post by Ben Escoto
Ben Escoto wrote:

> For the people that voted for repository editing, what did you want
> exactly?  Here's the first thing that crossed my mind:  you have a
> repository at /repo and maybe a directory like /repo/dir that is
> taking up too much space.  You could run something like:
>
> rdiff-backup-editor delete /repo/dir
>
> that would delete /repo/dir and remove all history of it from the
> repository.  It might also be possible to do something like:
>
> rdiff-backup-editor move /repo/dir /repo/newname
>
> which would move /repo/dir to /repo/newname, and alter all the history
> so that all the changes that took place in /repo/dir now seem like
> they took place in /repo/newname.  Or what else did you have in mind?

Definitely the ability to delete older increments (files and directories) and to
merge older backups.

Not that interested in handling moves, but I think the better way would to use
inode tracking instead and let rdiff-backup handle it automatically.

I would like to see a stable version of 1.1.x released and then work on these
features in a new 1.2.x, as these are potentially destructive changes to a
backup repository.  BTW, how is numbering done?  Is 1.(even number) stable and
1.(odd number) unstable?

> Also more generally, why do you want to edit an existing repository?
> Is it to save disk space?

Yes.

Regards,
Blair

--
Blair Zajac, Ph.D.
<[hidden email]>
Subversion and Orca training and consulting
http://www.orcaware.com/svn/


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Discussion of poll winning feature: repository editing

Chris Wilson-3
Hi all,

On Mon, 2006-01-02 at 15:14 -0800, Blair Zajac wrote:

> Not that interested in handling moves, but I think the better way would to use
> inode tracking instead and let rdiff-backup handle it automatically.

I would really like to see rdiff-backup track files by strong checksum
(e.g. SHA1).

If a new file is created which has the same checksum as an existing file
in the repository, rdiff-backup should treat that as a copy operation,
which can be executed (and checked) very efficiently by the remote side.

If a new file is created which has the same checksum as a file that was
just deleted from the repository, rdiff-backup should treat that as a
move operation, which can also be executed (and checked) very
efficiently by the remote side.

Since restoring is an occasional operation, performance might not matter
too much, in which case the metadata stored for a move or copy operation
could be equivalent to the ("delete" and) "create" used at present.

However, to make this really efficient, it would be nice to store the
copy or move as such in the metadata, so that restoring a backup through
such a point would be equivalent to deleting the copy or moving the file
back to its original location.

That's just my two cents, I will still love rdiff-backup even if it
never implements these features :-)

> > Also more generally, why do you want to edit an existing repository?
> > Is it to save disk space?

Yes, me too.

Cheers, Chris.
--
  ___ __     _
 / __/ / ,__(_)_  | Chris Wilson <0000 at qwirx.com> - Cambs UK |
/ (_/ ,\/ _/ /_ \ | Security/C/C++/Java/Perl/SQL/HTML Developer |
\ _/_/_/_//_/___/ | We are GNU-free your mind-and your software |



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Discussion of poll winning feature: repository editing

Ben Escoto
In reply to this post by Blair Zajac
>>>>> Blair Zajac <[hidden email]>
>>>>> wrote the following on Mon, 02 Jan 2006 15:14:27 -0800
>
> I would like to see a stable version of 1.1.x released and then work
> on these features in a new 1.2.x, as these are potentially
> destructive changes to a backup repository.

I was planning on getting to these in 1.1.x eventually, but I would
probably package the editing stuff in a separate executable.  The risk
of introducing new bugs into the existing functions should be pretty
small.

>  BTW, how is numbering done?  Is 1.(even number) stable and 1.(odd
>  number) unstable?

Yes, like old linux kernel numbering.


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Discussion of poll winning feature: repository editing

Ben Escoto
In reply to this post by Chris Wilson-3
>>>>> Chris Wilson <[hidden email]>
>>>>> wrote the following on Mon, 02 Jan 2006 23:57:23 +0000
>
> I would really like to see rdiff-backup track files by strong
> checksum (e.g. SHA1).
>
> If a new file is created which has the same checksum as an existing
> file in the repository, rdiff-backup should treat that as a copy
> operation, which can be executed (and checked) very efficiently by
> the remote side.

Right now rdiff-backup's memory requirements are basically log(n)
where n is the number of files, but this would bump it up to n.  I'm
unsure whether it would consume too much memory in practical terms..
If it takes an extra 100 bytes per file, and there are 10M files,
that's ~1GB.

Anyway, this would only improve speed and/or disk space, and from the
survey I get the impression that few people care about that.


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Discussion of poll winning feature: repository editing

Charles Duffy-6
Ben Escoto wrote:
> Anyway, this would only improve speed and/or disk space, and from the
> survey I get the impression that few people care about that.
>  
I'd argue that a nontrivial subset of those voting for repository
editing did so to allow them to better optimize disk space usage -- at
least, that's my excuse, and I've seen it elsewhere as well.


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki