support xz compression

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

support xz compression

duplicity-talk mailing list
Hi,
I'd like to add XZ (LZMA) compression to Duplicity. I'd like to discuss,
how to go about it, so that the solution is aligned with your high level
plan for the project.

It would be based on lzma module in Python 3 and it's backport in Python 2.

The feature could be be activated with new option,
--compression-xz=<on|off|<preset>>
on - turn on, use default preset 6
off - turn off, give user chance to override, if specified multiple
times (in scripts)
<preset> - 0-9[e], turn on with given preset with meaning and effect as
documented in lzma package and xz(1) man page

Upon archiving, if active with encryption on:
- adjust archive files' extensions in file_naming to have .xz.gpg extension
- in gpg.GPGWriteFile, turn off default gpg compression, run data
through lzma compressor before feeding it to gpg
if not encrypting:
- adjust archive files' extensions in file_naming to have .xz extension
- output to file obtained by means of lzma.open() in gpg.GzipWriteFile()

Upon restoring (the option need not be present, the feature could be
autodetected):
- detect XZ compression in file_naming.parse,
set_encryption_or_compression function and perhaps set new flag on the
ParseResults object
- activate XZ decompressor in path.DupPath.filtered_open() based on
above flag

There may be issues with accuracy of --volsize feature, because lzma
uses larger buffers. Let's see during testing.

I'm successfully running PoC along these lines for few months now,
albeit with piping to external xz process.

And of course, I'll add battery of tests and entry in manpage.

Any ideas, comments, feedback?

Best,
Radim



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: support xz compression

duplicity-talk mailing list
On 10/21/2018 18:52, Radim Tobolka via Duplicity-talk wrote:
> Hi,
> I'd like to add XZ (LZMA) compression to Duplicity. I'd like to discuss, how to go about it, so that the solution is aligned with your high level plan for the project.
>
> It would be based on lzma module in Python 3 and it's backport in Python 2.
>
> The feature could be be activated with new option, --compression-xz=<on|off|<preset>>
> on - turn on, use default preset 6
> off - turn off, give user chance to override, if specified multiple times (in scripts)
> <preset> - 0-9[e], turn on with given preset with meaning and effect as documented in lzma package and xz(1) man page

ideally this syntax should be transferred to gzip compression as well. but while at, why not using some that would be easily extendable in the future with different algo's like

--compression=<algo> and
--compression-params="" or --compression-level=<level>

not sure what to do w/ the current default to compress via gzip unless '--no-compression' is given. could be kept, meaning --no-compression would translate to --compression="", but maybe we should deactivate the automatic compression?
 
> Upon archiving, if active with encryption on:
> - adjust archive files' extensions in file_naming to have .xz.gpg extension
> - in gpg.GPGWriteFile, turn off default gpg compression, run data through lzma compressor before feeding it to gpg
> if not encrypting:
> - adjust archive files' extensions in file_naming to have .xz extension
> - output to file obtained by means of lzma.open() in gpg.GzipWriteFile()

don't like reusing anything named Gzip** handling xz. how about eventually merging

PlainWriteFile()
GzipWriteFile()

to a clean

WriteFile() supporting different encryptions via parameter (eg. derived from '--compression=<algo>')?

> Upon restoring (the option need not be present, the feature could be autodetected):

as that should be the case already with gzip compression, i see no obstacle there.

> - detect XZ compression in file_naming.parse, set_encryption_or_compression function and perhaps set new flag on the ParseResults object
> - activate XZ decompressor in path.DupPath.filtered_open() based on above flag

same here
 
> There may be issues with accuracy of --volsize feature, because lzma uses larger buffers. Let's see during testing.
>
> I'm successfully running PoC along these lines for few months now, albeit with piping to external xz process.
>
> And of course, I'll add battery of tests and entry in manpage.

always a good idea!

> Any ideas, comments, feedback?

above ;) and please document everything in the man page.
 
> Best,
> Radim

dito and regards.. ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: support xz compression

duplicity-talk mailing list
Hi Edgar,

thanks for feedback. I'd rather retain existing behavior and keep
compression on by default. I don't think, that users would appreciate,
if their backups would suddenly and silently grow by 10-20% after
upgrade, if we disabled it.

I like --compression=<algo> approach. What about --compression=off being
synonymous for --no-compression? I think it's slightly more user
friendly, than ''. Minor detail anyway.

And I have one general question: can I use pytest style test cases and
fixtures? I find them more practical, than what unittest module offers.
For example test case parametrization and test fixtures. Do you have any
plan with regards to that?

Best regards,

Radim

On 10/21/18 7:33 PM, [hidden email] wrote:

> On 10/21/2018 18:52, Radim Tobolka via Duplicity-talk wrote:
>> Hi,
>> I'd like to add XZ (LZMA) compression to Duplicity. I'd like to discuss, how to go about it, so that the solution is aligned with your high level plan for the project.
>>
>> It would be based on lzma module in Python 3 and it's backport in Python 2.
>>
>> The feature could be be activated with new option, --compression-xz=<on|off|<preset>>
>> on - turn on, use default preset 6
>> off - turn off, give user chance to override, if specified multiple times (in scripts)
>> <preset> - 0-9[e], turn on with given preset with meaning and effect as documented in lzma package and xz(1) man page
> ideally this syntax should be transferred to gzip compression as well. but while at, why not using some that would be easily extendable in the future with different algo's like
>
> --compression=<algo> and
> --compression-params="" or --compression-level=<level>
>
> not sure what to do w/ the current default to compress via gzip unless '--no-compression' is given. could be kept, meaning --no-compression would translate to --compression="", but maybe we should deactivate the automatic compression?
>  
>> Upon archiving, if active with encryption on:
>> - adjust archive files' extensions in file_naming to have .xz.gpg extension
>> - in gpg.GPGWriteFile, turn off default gpg compression, run data through lzma compressor before feeding it to gpg
>> if not encrypting:
>> - adjust archive files' extensions in file_naming to have .xz extension
>> - output to file obtained by means of lzma.open() in gpg.GzipWriteFile()
> don't like reusing anything named Gzip** handling xz. how about eventually merging
>
> PlainWriteFile()
> GzipWriteFile()
>
> to a clean
>
> WriteFile() supporting different encryptions via parameter (eg. derived from '--compression=<algo>')?
>
>> Upon restoring (the option need not be present, the feature could be autodetected):
> as that should be the case already with gzip compression, i see no obstacle there.
>
>> - detect XZ compression in file_naming.parse, set_encryption_or_compression function and perhaps set new flag on the ParseResults object
>> - activate XZ decompressor in path.DupPath.filtered_open() based on above flag
> same here
>  
>> There may be issues with accuracy of --volsize feature, because lzma uses larger buffers. Let's see during testing.
>>
>> I'm successfully running PoC along these lines for few months now, albeit with piping to external xz process.
>>
>> And of course, I'll add battery of tests and entry in manpage.
> always a good idea!
>
>> Any ideas, comments, feedback?
> above ;) and please document everything in the man page.
>  
>> Best,
>> Radim
> dito and regards.. ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: support xz compression

duplicity-talk mailing list
Also, do I need to worry about short filenames or more specifically,
shortened file extensions?

On 10/27/18 11:19 AM, Radim Tobolka wrote:

> Hi Edgar,
>
> thanks for feedback. I'd rather retain existing behavior and keep
> compression on by default. I don't think, that users would appreciate,
> if their backups would suddenly and silently grow by 10-20% after
> upgrade, if we disabled it.
>
> I like --compression=<algo> approach. What about --compression=off
> being synonymous for --no-compression? I think it's slightly more user
> friendly, than ''. Minor detail anyway.
>
> And I have one general question: can I use pytest style test cases and
> fixtures? I find them more practical, than what unittest module
> offers. For example test case parametrization and test fixtures. Do
> you have any plan with regards to that?
>
> Best regards,
>
> Radim
>
> On 10/21/18 7:33 PM, [hidden email] wrote:
>> On 10/21/2018 18:52, Radim Tobolka via Duplicity-talk wrote:
>>> Hi,
>>> I'd like to add XZ (LZMA) compression to Duplicity. I'd like to
>>> discuss, how to go about it, so that the solution is aligned with
>>> your high level plan for the project.
>>>
>>> It would be based on lzma module in Python 3 and it's backport in
>>> Python 2.
>>>
>>> The feature could be be activated with new option,
>>> --compression-xz=<on|off|<preset>>
>>> on - turn on, use default preset 6
>>> off - turn off, give user chance to override, if specified multiple
>>> times (in scripts)
>>> <preset> - 0-9[e], turn on with given preset with meaning and effect
>>> as documented in lzma package and xz(1) man page
>> ideally this syntax should be transferred to gzip compression as
>> well. but while at, why not using some that would be easily
>> extendable in the future with different algo's like
>>
>> --compression=<algo> and
>> --compression-params="" or --compression-level=<level>
>>
>> not sure what to do w/ the current default to compress via gzip
>> unless '--no-compression' is given. could be kept, meaning
>> --no-compression would translate to --compression="", but maybe we
>> should deactivate the automatic compression?
>>> Upon archiving, if active with encryption on:
>>> - adjust archive files' extensions in file_naming to have .xz.gpg
>>> extension
>>> - in gpg.GPGWriteFile, turn off default gpg compression, run data
>>> through lzma compressor before feeding it to gpg
>>> if not encrypting:
>>> - adjust archive files' extensions in file_naming to have .xz extension
>>> - output to file obtained by means of lzma.open() in
>>> gpg.GzipWriteFile()
>> don't like reusing anything named Gzip** handling xz. how about
>> eventually merging
>>
>> PlainWriteFile()
>> GzipWriteFile()
>>
>> to a clean
>>
>> WriteFile() supporting different encryptions via parameter (eg.
>> derived from '--compression=<algo>')?
>>
>>> Upon restoring (the option need not be present, the feature could be
>>> autodetected):
>> as that should be the case already with gzip compression, i see no
>> obstacle there.
>>
>>> - detect XZ compression in file_naming.parse,
>>> set_encryption_or_compression function and perhaps set new flag on
>>> the ParseResults object
>>> - activate XZ decompressor in path.DupPath.filtered_open() based on
>>> above flag
>> same here
>>> There may be issues with accuracy of --volsize feature, because lzma
>>> uses larger buffers. Let's see during testing.
>>>
>>> I'm successfully running PoC along these lines for few months now,
>>> albeit with piping to external xz process.
>>>
>>> And of course, I'll add battery of tests and entry in manpage.
>> always a good idea!
>>
>>> Any ideas, comments, feedback?
>> above ;) and please document everything in the man page.
>>> Best,
>>> Radim
>> dito and regards.. ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: support xz compression

duplicity-talk mailing list
Ok, after studying code in more detail, given short options are
deprecated, it doesn't make much sense implementing them for new
features like xz.

Could you people advice on some more points please?

- Should the compression settings apply to local cache too? Right now,
sigtars are always gzipped, even with --no-compression option.
- Why are local manifests stored uncompressed?
- What is the purpose of top_off function in gpg.GpGWriteFile and why it
is not implemented in GzipWriteFile as well? Seems to me, that when
performing backup over network, it is more desirable not to top off the
volumes.
- Why aren't remote manifests compressed when using --no-encryption option?
- Does anybody have experience with lz4 or lzo Python packages? I think
it would be useful to have one of those super fast so-so compression
schemes available, but I'm reluctant to introducing additional
dependency from source I don't know...

Cheers,

Radim



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk