Use alternative diff backend

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Use alternative diff backend

duplicity-talk mailing list
Next generation file systems like ZFS and BtrFs allow storing snapshots with no additional costs. Calculating diffs is also extremely cheap. 

Is it possible to use duplicity with an alternative diff backend?

Instead of maintaining a file signature cache or downloading these informations, one could use create a snapshot use easily get this information from these file systems. The rest (create tar, sign, transfer and even restore) would be the same.

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Use alternative diff backend

duplicity-talk mailing list
On 7/28/2018 19:53, darkdragon via Duplicity-talk wrote:
> Next generation file systems like ZFS and BtrFs allow storing snapshots with no additional costs. Calculating diffs is also extremely cheap. 

can you give a command line or native python example to retrieve a diff by file?
 
> Is it possible to use duplicity with an alternative diff backend?

generally, of course. but as the project is currently driven by portable compatibility but your suggestion is limited to two specific source file systems, i don't see it in the realm of duplicity as it is used today.
 
> Instead of maintaining a file signature cache or downloading these informations, one could use create a snapshot use easily get this information from these file systems.

how? a snapshot is merrily fixed state in time. how would you retrieve only changes (diffs) from that?

also, at least btrfs already has the functionality to send changes to another remote btrfs filesystem eg. over ssh . so why reinvent the wheel?

>The rest (create tar, sign, transfer and even restore) would be the same.

..ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Use alternative diff backend

duplicity-talk mailing list
On 7/29/2018 16:48, darkdragon wrote:
>>> Next generation file systems like ZFS and BtrFs allow storing snapshots with no additional costs. Calculating diffs is also extremely cheap.
>>
>> can you give a command line or native python example to retrieve a diff by file?
>
> Currently only a native format is supported (see btrfs-send:
> https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-send). There are
> some projects for decoding this stream and outputting it in various
> formats (see https://github.com/sysnux/btrfs-snapshots-diff including
> issues).

ok. this kind of decodes changes, but it has no routines to restore those.

>>> Is it possible to use duplicity with an alternative diff backend?
>>
>> generally, of course. but as the project is currently driven by portable compatibility but your suggestion is limited to two specific source file systems, i don't see it in the realm of duplicity as it is used today.
>
> Would be nice to have a standard interface to exchange diff
> calculation similar to backends though. Then everyone can just plug-in
> a suitable backend.

well, yes of course. currently we use librsync which generates some kind of rsync diff stream (never dug into rsync specifics here), which can be applied by librsync on a the previous version and et voila.

so, no abstraction currently but a pretty hardcoded librsync implementation.

>>> Instead of maintaining a file signature cache or downloading these informations, one could use create a snapshot use easily get this information from these file systems.
>>
>> how? a snapshot is merrily fixed state in time. how would you retrieve only changes (diffs) from that?
>
> Of course I mean the difference between two snapshots: Full backup ->
> create snapshot, Incremental backup -> create another snapshot,
> calculate diff between these snapshots (see "-p <parent>" and "-c
> <clone-src>" in btrfs-send linked above).

understood
 
>> also, at least btrfs already has the functionality to send changes to another remote btrfs filesystem eg. over ssh . so why reinvent the wheel?
>
> Because the remote also needs to be btrfs. Also it is not very stable
> when the network connection is lost etc. There are also no checks if
> data was transferred correctly. It's just a stream.
>
> I hope that this clarifies my use case a bit.
>

well, it does.

that reminds me that i read somewhere that you can dump the btrfs-send result and apply it remotely after transferring it by hand.
also, i am pretty sure you could insert something into the pipe that
- wraps the data periodically w/ a checksum
- unwraps remote and compares
- reorders the chunk again, if it failed
. not sure that it exists, but sounds like a functionality that might have been implemented already somewhere.

tl;dr
because of these instabilities and limitations to/by specific fs's i still prefer rsync over these features, which might not be the fastest but a rock stable solution.
eg. i combine btrfs w/ rsync like
- make a btrfs snapshot of the last backup folder on the target
- rsync differences over ssh from source machine (being btrfs as, but could be anything) to the snapshot
- start over
.

for smaller web spaces and backups to third party remotes i use duplicity (librsync).

..ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Use alternative diff backend

duplicity-talk mailing list
Sorry that I missed this at the time.

On Jul 29 2018, at 4:19 pm, edgar.soldin--- via Duplicity-talk <[hidden email]> wrote:

On 7/29/2018 16:48, darkdragon wrote:
Next generation file systems like ZFS and BtrFs allow storing snapshots with no additional costs. Calculating diffs is also extremely cheap.

can you give a command line or native python example to retrieve a diff by file?

Currently only a native format is supported (see btrfs-send:
https://btrfs.wiki.kernel.org/index.php/Manpage/btrfs-send). There are
some projects for decoding this stream and outputting it in various
formats (see https://github.com/sysnux/btrfs-snapshots-diff including
issues).

I think that a nice intermediate ground here would be to leverage these COW filesystem capabilities to avoid a full filesystem scan.

I have been toying with the idea of adding a way to specify the changed files/files for duplicity to scan, rather than duplicity scanning all files looking for changes.

My initial use case was an inotify-based implementation that always kept everything backed up, but it could similarly be used for this COW filesystem support if you could pull the list of changed files from the filesystem without a scan.

That has the advantage that it would essentially just need some tweaks to the filesystem walk, without changing any of the diff logic etc. It would also mean that you could run occasional 'normal' scans to wash-up anything that wasn't caught for whatever reason.

Kind regards,

Aaron

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk