Following up on an extended attributes question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Following up on an extended attributes question

duplicity-talk mailing list
Hi

So I'd asked about extended attributes support in duplicity a while ago. I was told that this
is the fault of the tarfile module. I did follow up on this, and it seems that tarfile does support
extended attributes: one simply creates a tarfile in the POSIX.1-2001 pax format. However, 
the default tarinfo object created by tarfile doesn't include the extended attributes, and just
the output of os.stat(). However, it's is easy to include xattrs using the xattrs module via xattrs.getattrs 
and append it to the pax_header.

If I create a file with extended attributes, and *open* it using tarfile, it reads the 
headers just fine.

I'd like to help code support for extended attributes, and it be great to get any help to get started.

- A




_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Following up on an extended attributes question

duplicity-talk mailing list
Hi Arjun,

Thanks for volunteering to do this.  It would be a major enhancement to duplicity and would be greatly appreciated.

See https://docs.python.org/2/library/tarfile.html for docs on what we use.  

Currently we use the GNU_FORMAT, which is the default format, so there is no pax_header in that format.  For sake of backwards compatibility, we'd need to be able to read GNU_FORMAT, and the PAX_FORMAT would need to be readable by GNU tar for manual recovery of a corrupted backup.  Perhaps it's time to change formats?  I don't know all the tradeoffs we'll need to make and what the complexity would be.

Since pax_headers are key/value pairs, we need to track their size and be prepared for a value that may exceed our max block size.  I've read of cases where key/value pairs exceed the size of the file itself, performing the functional equivalent of a dictionary file.  We would have to allow for cases where a large value crosses a volume boundary.

We have a fair number of test cases that may need to be revamped to handle the change in format, and we'll need to maintain older test cases for backwards compatibility testing.

All that, and probably more complexity await.  If you are still interested, we'll need to coordinate and decide when is best to release your changes.

I look forwards to hearing from you.

...Thanks,
...Ken


  

On Wed, May 24, 2017 at 9:27 PM, Arjun Krishnan via Duplicity-talk <[hidden email]> wrote:
Hi

So I'd asked about extended attributes support in duplicity a while ago. I was told that this
is the fault of the tarfile module. I did follow up on this, and it seems that tarfile does support
extended attributes: one simply creates a tarfile in the POSIX.1-2001 pax format. However, 
the default tarinfo object created by tarfile doesn't include the extended attributes, and just
the output of os.stat(). However, it's is easy to include xattrs using the xattrs module via xattrs.getattrs 
and append it to the pax_header.

If I create a file with extended attributes, and *open* it using tarfile, it reads the 
headers just fine.

I'd like to help code support for extended attributes, and it be great to get any help to get started.

- A




_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Following up on an extended attributes question

duplicity-talk mailing list
Hi Ken

I'd be happy to work on it, with a couple of caveats: I'm a research mathematician by day, and have little professional programming experience.
So as long as people are willing to indulge slow progress and trivial questions as I get familiar with the codebase, I'd be willing to give it a shot!


Currently we use the GNU_FORMAT, which is the default format, so there is no pax_header in that format.  For sake of backwards compatibility, we'd need to be able to read GNU_FORMAT, and the PAX_FORMAT would need to be readable by GNU tar for manual recovery of a corrupted backup.  Perhaps it's time to change formats?  I don't know all the tradeoffs we'll need to make and what the complexity would be.
gnu tar appears to support pax_format with `--xattrs` flag. From what I've seen so far, tarfile happily reads both pax and gnu format
files, but it neither appends extended attributes when creating files, nor does it extract files properly and set extended attributes. 
Maybe they didn't want to add a dependency on the xattr module, or maybe this is for cross platform compatibility.

Since pax_headers are key/value pairs, we need to track their size and be prepared for a value that may exceed our max block size.  I've read of cases where key/value pairs exceed the size of the file itself, performing the functional equivalent of a dictionary file.  We would have to allow for cases where a large value crosses a volume boundary.
I can see that the blocksize is a diff process parameter,
and that volsize affects the difftar size. I'm still figuring out what's being compared during the diff process,
and how the diffs are being created, so I don't understand how the xattrs fit in to this process. 

Is there a brief technical overview of what duplicity is doing?

Arjun


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Following up on an extended attributes question

duplicity-talk mailing list
hey Arjun,

we are a quite friendly and patient bunch on this list. i'd say go for it, every contribution is welcome, time is not a factor :)

..ede/duply.net

On May 29, 2017 8:07:58 PM GMT+02:00, Arjun Krishnan via Duplicity-talk <[hidden email]> wrote:

>Hi Ken
>
>I'd be happy to work on it, with a couple of caveats: I'm a research
>mathematician by day, and have little professional programming
>experience.
>So as long as people are willing to indulge slow progress and trivial
>questions as I get familiar with the codebase, I'd be willing to give
>it a
>shot!
>
>
>> Currently we use the GNU_FORMAT, which is the default format, so
>there is
>> no pax_header in that format.  For sake of backwards compatibility,
>we'd
>> need to be able to read GNU_FORMAT, and the PAX_FORMAT would need to
>be
>> readable by GNU tar for manual recovery of a corrupted backup.
>Perhaps
>> it's time to change formats?  I don't know all the tradeoffs we'll
>need to
>> make and what the complexity would be.
>>
>gnu tar appears to support pax_format with `--xattrs` flag. From what
>I've
>seen so far, tarfile happily reads both pax and gnu format
>files, but it neither appends extended attributes when creating files,
>nor
>does it extract files properly and set extended attributes.
>Maybe they didn't want to add a dependency on the xattr module, or
>maybe
>this is for cross platform compatibility.
>
>>
>> Since pax_headers are key/value pairs, we need to track their size
>and be
>> prepared for a value that may exceed our max block size.  I've read
>of
>> cases where key/value pairs exceed the size of the file itself,
>performing
>> the functional equivalent of a dictionary file.  We would have to
>allow for
>> cases where a large value crosses a volume boundary.
>>
>I can see that the blocksize is a diff process parameter,
>and that volsize affects the difftar size. I'm still figuring out
>what's
>being compared during the diff process,
>and how the diffs are being created, so I don't understand how the
>xattrs
>fit in to this process.
>
>Is there a brief technical overview of what duplicity is doing?
>
>Arjun

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Following up on an extended attributes question

duplicity-talk mailing list
Hi Arjun,

Glad you're willing to help.  The best (only) formal docs we have are at readthedocs, or in the docs directory, but those are incomplete.  I started working on them a long while back and other things just got in the way.  Thankfully, the docs are part of the source code, so can be added to as you work on the module.  We'd appreciate you doing that as you go.  It'll help all of us in the long run.

librsync is the 'diff' tool we use.  It's the core part of the rsync algorithm and how we keep network bandwidth low.  This is worth a good read.

As to code style, we use PEP8 style, and use PyLint to lint check the code.  These are all part of the testing we do before release, so just be aware of them.

...Ken


On Mon, May 29, 2017 at 1:30 PM, edgar.soldin--- via Duplicity-talk <[hidden email]> wrote:
hey Arjun,

we are a quite friendly and patient bunch on this list. i'd say go for it, every contribution is welcome, time is not a factor :)

..ede/duply.net

On May 29, 2017 8:07:58 PM GMT+02:00, Arjun Krishnan via Duplicity-talk <[hidden email]> wrote:
>Hi Ken
>
>I'd be happy to work on it, with a couple of caveats: I'm a research
>mathematician by day, and have little professional programming
>experience.
>So as long as people are willing to indulge slow progress and trivial
>questions as I get familiar with the codebase, I'd be willing to give
>it a
>shot!
>
>
>> Currently we use the GNU_FORMAT, which is the default format, so
>there is
>> no pax_header in that format.  For sake of backwards compatibility,
>we'd
>> need to be able to read GNU_FORMAT, and the PAX_FORMAT would need to
>be
>> readable by GNU tar for manual recovery of a corrupted backup.
>Perhaps
>> it's time to change formats?  I don't know all the tradeoffs we'll
>need to
>> make and what the complexity would be.
>>
>gnu tar appears to support pax_format with `--xattrs` flag. From what
>I've
>seen so far, tarfile happily reads both pax and gnu format
>files, but it neither appends extended attributes when creating files,
>nor
>does it extract files properly and set extended attributes.
>Maybe they didn't want to add a dependency on the xattr module, or
>maybe
>this is for cross platform compatibility.
>
>>
>> Since pax_headers are key/value pairs, we need to track their size
>and be
>> prepared for a value that may exceed our max block size.  I've read
>of
>> cases where key/value pairs exceed the size of the file itself,
>performing
>> the functional equivalent of a dictionary file.  We would have to
>allow for
>> cases where a large value crosses a volume boundary.
>>
>I can see that the blocksize is a diff process parameter,
>and that volsize affects the difftar size. I'm still figuring out
>what's
>being compared during the diff process,
>and how the diffs are being created, so I don't understand how the
>xattrs
>fit in to this process.
>
>Is there a brief technical overview of what duplicity is doing?
>
>Arjun

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Following up on an extended attributes question

duplicity-talk mailing list

Hello Arjun,

On 30/05/17 12:27, Kenneth Loafman via Duplicity-talk wrote:
Hi Arjun,

Glad you're willing to help.  The best (only) formal docs we have are at readthedocs, or in the docs directory, but those are incomplete. 
In the hope of saving you a little time figuring out where to dive in, an extremely quick overview of the code base, largely from memory (so don't take as gospel) and assuming a full backup to keep it simple:
1. Start in bin/duplicity main()
2. Process commandline arguments in duplicity/commandline ProcessCommandLine
2.a. parse_cmdline_options() does what you would expect, though watch the globals.
2.b. Note the set_selection(), which doesn't really belong here, but triggers all the selection code in duplicity/selection (walking over the filesystem to see if files match the glob patterns etc that people have passed as --include/--exclude options).
3. bin/duplicity do_backup() to decide what type of backup (full, restore etc)
4. I believe bin/duplicity full_backup() then kicks off all the writing to signature and archive files, using the various backends.

Hopefully that helps. Anyone else feel free to jump in if I have anything wrong or misleading.

Kind regards,

Aaron



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk