Restore single file from incremental, retrieving full

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Restore single file from incremental, retrieving full

duplicity-talk mailing list
Hi,

I'm testing retrieving a single file from a backup in Duplicity (v0.7.17).

The backup set consists of 1 full and 1 incremental - and the requested
file only exists in the incremental. (This is a test scenario I put
together, so I'm certain this is the case).

When I retrieve the single file (with --verbosity debug) I can see it is
retrieving the full backup volumes.

Am I misunderstanding the duplicity model, or is my restore command
wrong? It seems to me the metadata should know the file only exists in
the incremental and just get that?

I've tried specifying `--time` both before the incremental and after it.
Is this affected by the timestamps on the files originally, or only
their presence for a backup?

duplicity -vd --file-to-restore example.pdf
s3://s3.eu-west-1.amazonaws.com/redacted/ restore

Thanks,
Oli

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restore single file from incremental, retrieving full

duplicity-talk mailing list
On 04.04.2018 22:50, Oliver Cole via Duplicity-talk wrote:
> Hi,

hey Oli,
 
> I'm testing retrieving a single file from a backup in Duplicity (v0.7.17).
>
> The backup set consists of 1 full and 1 incremental - and the requested file only exists in the incremental. (This is a test scenario I put together, so I'm certain this is the case).

could show that by posting a collection-status?
 
> When I retrieve the single file (with --verbosity debug) I can see it is retrieving the full backup volumes.

right, it shouldn't.
 
> Am I misunderstanding the duplicity model, or is my restore command wrong? It seems to me the metadata should know the file only exists in the incremental and just get that?

can you post a list of the files on the remote?
 
> I've tried specifying `--time` both before the incremental and after it. Is this affected by the timestamps on the files originally, or only their presence for a backup?
>
> duplicity -vd --file-to-restore example.pdf s3://s3.eu-west-1.amazonaws.com/redacted/ restore

the man page could be more specific. time is given wrt. the backup date. it's only useful during restore or verify. eg.

restore from april fools day
 duplicity --time 2918/4/1 file://test/bkp /restore

restore from a week ago
 duplicity --time 1W file://test/bkp /restore

but generally it is not needed in your case as the default value when not given is 'now', meaning the latest backup avail.

the man page holds some details on the formatting of time
  http://duplicity.nongnu.org/duplicity.1.html#sect8

..ede/duply.net




_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restore single file from incremental, retrieving full

duplicity-talk mailing list
Thanks for replying!

On 05/04/2018 10:33, [hidden email] wrote:
>> The backup set consists of 1 full and 1 incremental - and the requested file only exists in the incremental. (This is a test scenario I put together, so I'm certain this is the case).
> could show that by posting a collection-status?


Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Sun Apr  1 14:18:00 2018
Collection Status
-----------------
Connecting with backend: BackendWrapper
Archive dir: /backup/duplicitycache/e3b115cb91b96e04a62ff54cd95bdbac

Found 0 secondary backup chains.

Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Sun Apr  1 14:18:00 2018
Chain end time: Wed Apr  4 19:38:45 2018
Number of contained backup sets: 2
Total number of contained volumes: 3
  Type of backup set:                            Time:      Num volumes:
                 Full         Sun Apr  1 14:18:00 2018                 2
          Incremental         Wed Apr  4 19:38:45 2018                 1
-------------------------
No orphaned or incomplete backup sets found.



>> When I retrieve the single file (with --verbosity debug) I can see it is retrieving the full backup volumes.
>
> right, it shouldn't.
>  
>> Am I misunderstanding the duplicity model, or is my restore command wrong? It seems to me the metadata should know the file only exists in the incremental and just get that?
>
> can you post a list of the files on the remote?

dataduplicity-full.20180401T141800Z.vol1.difftar.gpg
dataduplicity-full.20180401T141800Z.vol2.difftar.gpg
dataduplicity-inc.20180401T141800Z.to.20180404T193845Z.vol1.difftar.gpg
duplicity-full-signatures.20180401T141800Z.sigtar.gpg
duplicity-full.20180401T141800Z.manifest.gpg
duplicity-inc.20180401T141800Z.to.20180404T193845Z.manifest.gpg
duplicity-new-signatures.20180401T141800Z.to.20180404T193845Z.sigtar.gpg

Thanks again!

Oli

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restore single file from incremental, retrieving full

duplicity-talk mailing list
We're looking at an edge case here without enough info.  I have a thought.

The way duplicity finds which volume to search for the restore file is to look at the start path and ending path in the volume info in the manifest.  So, if the test is something like:

full backup has only the file 'foo', the range in the manifest will be ['.'..'foo']
inc backup has only the file 'bar', the range in the manifest will be ['.'..'bar']

So, when a recovery is requested, duplicity will download the full volume because 'bar' is in the range.  Hmmm, no 'bar' there, so it will download the inc volume, and there it is.

Does that match your test case and observations?  If so, then duplicity is working correctly.

...Thanks,
...Ken


On Fri, Apr 6, 2018 at 12:43 AM, Oliver Cole via Duplicity-talk <[hidden email]> wrote:
Thanks for replying!

On 05/04/2018 10:33, [hidden email] wrote:
The backup set consists of 1 full and 1 incremental - and the requested file only exists in the incremental. (This is a test scenario I put together, so I'm certain this is the case).
could show that by posting a collection-status?


Local and Remote metadata are synchronized, no sync needed.
Last full backup date: Sun Apr  1 14:18:00 2018
Collection Status
-----------------
Connecting with backend: BackendWrapper
Archive dir: /backup/duplicitycache/e3b115cb91b96e04a62ff54cd95bdbac

Found 0 secondary backup chains.

Found primary backup chain with matching signature chain:
-------------------------
Chain start time: Sun Apr  1 14:18:00 2018
Chain end time: Wed Apr  4 19:38:45 2018
Number of contained backup sets: 2
Total number of contained volumes: 3
 Type of backup set:                            Time:      Num volumes:
                Full         Sun Apr  1 14:18:00 2018                 2
         Incremental         Wed Apr  4 19:38:45 2018                 1
-------------------------
No orphaned or incomplete backup sets found.



When I retrieve the single file (with --verbosity debug) I can see it is retrieving the full backup volumes.

right, it shouldn't.
 
Am I misunderstanding the duplicity model, or is my restore command wrong? It seems to me the metadata should know the file only exists in the incremental and just get that?

can you post a list of the files on the remote?

dataduplicity-full.20180401T141800Z.vol1.difftar.gpg
dataduplicity-full.20180401T141800Z.vol2.difftar.gpg
dataduplicity-inc.20180401T141800Z.to.20180404T193845Z.vol1.difftar.gpg
duplicity-full-signatures.20180401T141800Z.sigtar.gpg
duplicity-full.20180401T141800Z.manifest.gpg
duplicity-inc.20180401T141800Z.to.20180404T193845Z.manifest.gpg
duplicity-new-signatures.20180401T141800Z.to.20180404T193845Z.sigtar.gpg

Thanks again!

Oli


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restore single file from incremental, retrieving full

duplicity-talk mailing list
On 06/04/2018 11:52, Kenneth Loafman wrote:

> We're looking at an edge case here without enough info.  I have a thought.
>
> The way duplicity finds which volume to search for the restore file is
> to look at the start path and ending path in the volume info in the
> manifest.  So, if the test is something like:
>
> full backup has only the file 'foo', the range in the manifest will be
> ['.'..'foo']
> inc backup has only the file 'bar', the range in the manifest will be
> ['.'..'bar']
>
> So, when a recovery is requested, duplicity will download the full
> volume because 'bar' is in the range.  Hmmm, no 'bar' there, so it will
> download the inc volume, and there it is.
>
> Does that match your test case and observations?  If so, then duplicity
> is working correctly.

I think it does match my test case. Would you mind confirming with my
(lightly edited) manifests? (I'm aware the hostnames differ - I've just
been testing this out in Docker). The --file-to-restore was example.pdf.


duplicity-full.20180401T141800Z.manifest
---
Hostname 460110c45ce2
Localdir /data
Volume 1:
     StartingPath   .
     EndingPath     example.mkv 3186
     Hash SHA1 c4bc29c33779e0369116008ba3b9e767435fa00e
Volume 2:
     StartingPath   example.mkv 3187
     EndingPath     example.mkv
     Hash SHA1 034022e2f997c09010852d009e8c69b0f4ce12b2
Filelist 1
     new      example.mkv


duplicity-inc.20180401T141800Z.to.20180404T193845Z.manifest
---
Hostname 332d275ac92e
Localdir /data
Volume 1:
     StartingPath   .
     EndingPath     example4.log
     Hash SHA1 40ed25b5c60c1112dd3d984767fdd64f67868d78
Filelist 5
     new      example.pdf
     new      example1.log
     new      example2.log
     new      example3.log
     new      example4.log


Can you explain a little more about why duplicity wouldn't look through
all the manifests, find example.pdf absent from the full and present in
the incremental for the first time, and not just download the
incremental? I'm sure I'm just missing something in my understanding.

Thanks again,
Oli

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restore single file from incremental, retrieving full

duplicity-talk mailing list
OK, What we're looking at is a change to the manifest that has been backed out of the recent versions because it ate too much memory.  The Filelist was added to support collection-status --changed-file="somefilename", and was never integrated further, so the old behavior I described is the behavior that duplicity still uses, kind of crude, but it works well in the normal use case.  So, remove "Filelist 5" and everything below it and you are left with the normal manifest.

I am working on series 8.x, where we will use SQLite for volume level manifest and signatures files.  By moving these huge lists to disk, memory use will be much less and we should be more robust.  It will also reimplement the memory eating feature above, without the memory eating part, plus it will give us access to know which volumes contain which files so we download only those files needed for recovery.

On Fri, Apr 6, 2018 at 2:25 PM, Oliver Cole <[hidden email]> wrote:
On 06/04/2018 11:52, Kenneth Loafman wrote:
We're looking at an edge case here without enough info.  I have a thought.

The way duplicity finds which volume to search for the restore file is to look at the start path and ending path in the volume info in the manifest.  So, if the test is something like:

full backup has only the file 'foo', the range in the manifest will be ['.'..'foo']
inc backup has only the file 'bar', the range in the manifest will be ['.'..'bar']

So, when a recovery is requested, duplicity will download the full volume because 'bar' is in the range.  Hmmm, no 'bar' there, so it will download the inc volume, and there it is.

Does that match your test case and observations?  If so, then duplicity is working correctly.

I think it does match my test case. Would you mind confirming with my (lightly edited) manifests? (I'm aware the hostnames differ - I've just been testing this out in Docker). The --file-to-restore was example.pdf.


duplicity-full.20180401T141800Z.manifest
---
Hostname 460110c45ce2
Localdir /data
Volume 1:
    StartingPath   .
    EndingPath     example.mkv 3186
    Hash SHA1 c4bc29c33779e0369116008ba3b9e767435fa00e
Volume 2:
    StartingPath   example.mkv 3187
    EndingPath     example.mkv
    Hash SHA1 034022e2f997c09010852d009e8c69b0f4ce12b2
Filelist 1
    new      example.mkv


duplicity-inc.20180401T141800Z.to.20180404T193845Z.manifest
---
Hostname 332d275ac92e
Localdir /data
Volume 1:
    StartingPath   .
    EndingPath     example4.log
    Hash SHA1 40ed25b5c60c1112dd3d984767fdd64f67868d78
Filelist 5
    new      example.pdf
    new      example1.log
    new      example2.log
    new      example3.log
    new      example4.log


Can you explain a little more about why duplicity wouldn't look through all the manifests, find example.pdf absent from the full and present in the incremental for the first time, and not just download the incremental? I'm sure I'm just missing something in my understanding.

Thanks again,
Oli


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk