fuzzy match - moved/renamed

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

fuzzy match - moved/renamed

David-2
Hi All,

Are there plans to implement fuzzy match or similar algorithms to match
files moved/renamed files?

With scenario where large files are renamed or moved between folders
rdiff-backup treats these files as new ones and as result transfers
large amounts of data and takes a lot of data to store diffs, i.e. for
12 weeks or so, whilst the data is in fact the very same.

What I would like to suggest is:
in case of discovering new file, calculate checksum, check if the
checksum exists already in destination folder (under any subfolder):
a) if the file does exist in destination folder (different file
name/path) and the file name/path does not exist anymore in the source,
simply rename/move file
b) if the file does exist in destination folder (different file
name/path) and the file name/path _does_ exist in the source we have
situation of duplicate of the file and can either do hardlinking or
create local copy.

Above approach would solve the problem of transmitting and storing a lot
of data for the same files being moved between folders.

The deletions should be done at the very end of the process as by that
we could re-use files already store.

The diff between backups would then store only differences again and not
full copies of the files.

Does this sound like something which could be implemented in the near
future?

This might be not the best place to post this question, but if there is
a better backup solution handling situations like this, please let me
know too. I'm looking to keep all the goodies of rdiff-backup hence
rsync with fuzzy option is not a way to go for me.

Thanks,
David

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy match - moved/renamed

Greg Troxel

David <[hidden email]> writes:

> Are there plans to implement fuzzy match or similar algorithms to
> match files moved/renamed files?

In the last few years there have not been any updates, not even the
usual security and portability nits.

> What I would like to suggest is:
> in case of discovering new file, calculate checksum, check if the
> checksum exists already in destination folder (under any subfolder):
> a) if the file does exist in destination folder (different file
> name/path) and the file name/path does not exist anymore in the
> source, simply rename/move file

You may want to look at bup or attic, or similar tools.

If anyone is actually working on rdiff-backup, I'm sure  most readers of
this list would appreciate a "note from the maintainer" describing
status.

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

signature.asc (186 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy match - moved/renamed

David-2
Greg thanks for swift reply.

I've had same feeling that there's not too much happening with
rdiff-backup whilst it is so good tool.

I'll have a look at projects you've mentioned.

Many thanks!
David

On 2016-02-07 14:28, Greg Troxel wrote:

> David <[hidden email]> writes:
>
>> Are there plans to implement fuzzy match or similar algorithms to
>> match files moved/renamed files?
> In the last few years there have not been any updates, not even the
> usual security and portability nits.
>
>> What I would like to suggest is:
>> in case of discovering new file, calculate checksum, check if the
>> checksum exists already in destination folder (under any subfolder):
>> a) if the file does exist in destination folder (different file
>> name/path) and the file name/path does not exist anymore in the
>> source, simply rename/move file
> You may want to look at bup or attic, or similar tools.
>
> If anyone is actually working on rdiff-backup, I'm sure  most readers of
> this list would appreciate a "note from the maintainer" describing
> status.


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Poppins: rsync based backup tool (was: Re: fuzzy match - moved/renamed)

rhkramer
In reply to this post by Greg Troxel
On Sunday, February 07, 2016 08:28:04 AM you wrote:
> If anyone is actually working on rdiff-backup, I'm sure  most readers of
> this list would appreciate a "note from the maintainer" describing
> status.

Recently (the last day or so) I've come across a backup tool named poppins
(link below).  I mention it here because it uses rsync.  It also uses a client
/ server approach (but, perhaps it can be adapted to have the client and
server on the same computer).

I have not used it, and can make no endorsement.  I would be interested in
feedback from anyone who does try it.  (I don't even know if it completely
open source or if there are some proprietary portions or any fees...)

The web site (the installation page for Debian):

https://poppinsbackups.wordpress.com/2015/12/13/install-poppins-on-debian/


<quote from the web page (maybe the home page)>
Poppins: Rotating backup script based on rsync with support for BTRFS/ZFS
snapshots
</quote>



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy match - moved/renamed

Dave Kempe
In reply to this post by David-2

From: "David" <[hidden email]>

Are there plans to implement fuzzy match or similar algorithms to match
files moved/renamed files?

Tracking moves has been discussed at length in the past, and we haven't been able to see a way past the very expensive calculation of checksums to match files.
I think that disk is cheaper than time or cpu usage, so it hasn't been possible to do this easily.
If you dig through the archives tracking moves has been discussed.

As for other products and rdiff-backup maintainership, its working for me just fine. We intend to use it until we run into something that requires fixing.
If there is a body of work that needs to be done to get it up to some modern standard (say Python3) I'm happy to consider sponsoring it even.

thanks
Dave




_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy match - moved/renamed

Patrik Dufresne-2
In reply to this post by David-2
Hello Dave,

About the maintainership. I don't plan to maintain rdiff-backup. I've already take over rdiffweb. Still, I do have some interest to get bug fixed since I'm offering professional services related to backup and rdiff-backup is in the center of the service provided. 

In long term, unless someone take over rdiff-backup, I will need to port it to python3 if I have to continue providing my service.

--
Patrik Dufresne Service Logiciel inc.
http://www.patrikdufresne.com/
514-971-6442
1-114 rue des Hautbois,
St-Colomban, QC J5K 2H6

Date: Wed, 10 Feb 2016 11:57:12 +1100 (EST)
From: Dave Kempe <[hidden email]>
To: David <[hidden email]>
Cc: Rdiff Backup <[hidden email]>
Subject: Re: [rdiff-backup-users] fuzzy match - moved/renamed
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

> From: "David" <[hidden email]>

> Are there plans to implement fuzzy match or similar algorithms to match
> files moved/renamed files?

Tracking moves has been discussed at length in the past, and we haven't been able to see a way past the very expensive calculation of checksums to match files.
I think that disk is cheaper than time or cpu usage, so it hasn't been possible to do this easily.
If you dig through the archives tracking moves has been discussed.

As for other products and rdiff-backup maintainership, its working for me just fine. We intend to use it until we run into something that requires fixing.
If there is a body of work that needs to be done to get it up to some modern standard (say Python3) I'm happy to consider sponsoring it even.

thanks
Dave

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy match - moved/renamed

David-2
In reply to this post by Dave Kempe
Hi Dave,

I've to admit that generally would agree with "disk is cheaper than time/cpu" as long as it is "close enough" that bandwidth doesn't play a role.

It becomes major headache if scenario drives one to pump unnecessary xxx GBs of data on daily basis and disaster recovery site is accessible through 20Mbps link. (rdiff-backup does not finish within 24h).

Than the time to calculate data locally since CPU is cheap changes the story completely and it is worth to do local calculation and discover moves.

This is what drove me towards posting this question.

Thanks,
Dawid

On 2016-02-10 1:57, Dave Kempe wrote:

From: "David" [hidden email]

Are there plans to implement fuzzy match or similar algorithms to match
files moved/renamed files?

Tracking moves has been discussed at length in the past, and we haven't been able to see a way past the very expensive calculation of checksums to match files.
I think that disk is cheaper than time or cpu usage, so it hasn't been possible to do this easily.
If you dig through the archives tracking moves has been discussed.

As for other products and rdiff-backup maintainership, its working for me just fine. We intend to use it until we run into something that requires fixing.
If there is a body of work that needs to be done to get it up to some modern standard (say Python3) I'm happy to consider sponsoring it even.

thanks
Dave





_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: fuzzy match - moved/renamed

David-2
In reply to this post by Patrik Dufresne-2
Hi Patrik,

Nice job! I haven't heard about rdiffweb (feel like ignorant now, apologies), as it seems to be really great piece of software!

For some reason it reminded me Apple TimeMachine :)

Would be great if some "movement"/fuzzy logic would be implemented, but I get your point.
From my side I'm operating within NGO hence the limits above and can't afford to purchase support to support this development. I offer my service as volunteer aside of professional life.

David

On 2016-02-10 18:29, Patrik Dufresne wrote:
Hello Dave,

About the maintainership. I don't plan to maintain rdiff-backup. I've already take over rdiffweb. Still, I do have some interest to get bug fixed since I'm offering professional services related to backup and rdiff-backup is in the center of the service provided. 

In long term, unless someone take over rdiff-backup, I will need to port it to python3 if I have to continue providing my service.

--
Patrik Dufresne Service Logiciel inc.
http://www.patrikdufresne.com/
514-971-6442
1-114 rue des Hautbois,
St-Colomban, QC J5K 2H6

Date: Wed, 10 Feb 2016 11:57:12 +1100 (EST)
From: Dave Kempe <[hidden email]>
To: David <[hidden email]>
Cc: Rdiff Backup <[hidden email]>
Subject: Re: [rdiff-backup-users] fuzzy match - moved/renamed
Message-ID:
        <[hidden email]>
Content-Type: text/plain; charset="utf-8"

> From: "David" <[hidden email]>

> Are there plans to implement fuzzy match or similar algorithms to match
> files moved/renamed files?

Tracking moves has been discussed at length in the past, and we haven't been able to see a way past the very expensive calculation of checksums to match files.
I think that disk is cheaper than time or cpu usage, so it hasn't been possible to do this easily.
If you dig through the archives tracking moves has been discussed.

As for other products and rdiff-backup maintainership, its working for me just fine. We intend to use it until we run into something that requires fixing.
If there is a body of work that needs to be done to get it up to some modern standard (say Python3) I'm happy to consider sponsoring it even.

thanks
Dave


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
https://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki