Restoring from Glacier Cold Storage

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Restoring from Glacier Cold Storage

duplicity-talk mailing list
Hello,

I am using duplicity 0.8.14 with the s3 backend to write to a Scaleway C14
Glacier storage (similar to Amazon S3 Glacier). Scaleway provides an
s3-compatible API and this is actually working very well.

I just noticed one glitch: When restoring from an archive in Glacier, it
seems to me that duplicity always processes one chunk after the other and
only unfreezes a chunk immediately before downloading it. What I mean is:
duplicity wants to download vol1.difftar.gpg (which is stored in Glacier)
and needs to unfreeze it. After unfreezing it downloads it and after that
processes vol2.difftar.gpg. And so on. As unfreezing can take multiple
hours that would mean the restore process will get extremely long with
bigger backups.

I would have expected that duplicity starts an unfreeze process for /all/
chunks it needs for the restore at the same time, so they will get
available for download at roughly the same time. Therefore the waiting cost
for unfreezing would have to be "paid" only once.

Is my observation correct or am I misinterpreting something?

Is this the expected behaviour? Actually the Glacier integration would be
quite useless in this case as a restore of a large backup would take
extraordinary amounts of time and to avoid that I would have to manually
unfreeze each file needed for the restore (which is also not easy, since I
don't know which files are really necessary for the restore.


Best regards
Marco

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restoring from Glacier Cold Storage

duplicity-talk mailing list
No ideas about it?

Just to clarify (and provide a tldr), what I want to know is:
Is it possible (and how) to unfreeze all files related to a restore at once?

Best regards
Marco

Am 12. Juli 2020 00:07:30 MESZ schrieb ml--- via Duplicity-talk <[hidden email]>:
Hello,

I am using duplicity 0.8.14 with the s3 backend to write to a Scaleway C14
Glacier storage (similar to Amazon S3 Glacier). Scaleway provides an
s3-compatible API and this is actually working very well.

I just noticed one glitch: When restoring from an archive in Glacier, it
seems to me that duplicity always processes one chunk after the other and
only unfreezes a chunk immediately before downloading it. What I mean is:
duplicity wants to download vol1.difftar.gpg (which is stored in Glacier)
and needs to unfreeze it. After unfreezing it downloads it and after that
processes vol2.difftar.gpg. And so on. As unfreezing can take multiple
hours that would mean the restore process will get extremely long with
bigger backups.

I would have expected that duplicity starts an unfreeze process for /all/
chunks it needs for the restore at the same time, so they will get
available for download at roughly the same time. Therefore the waiting cost
for unfreezing would have to be "paid" only once.

Is my observation correct or am I misinterpreting something?

Is this the expected behaviour? Actually the Glacier integration would be
quite useless in this case as a restore of a large backup would take
extraordinary amounts of time and to avoid that I would have to manually
unfreeze each file needed for the restore (which is also not easy, since I
don't know which files are really necessary for the restore.


Best regards
Marco
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk

--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restoring from Glacier Cold Storage

duplicity-talk mailing list
Hi Marco,

Plenty of ideas, just no time to implement them.  A bit of help would be great!

I imagine it would not be difficult to implement, however, it would mean access to mainline functions and data outside the normal scope of the backends.  So far we've avoided "if this backend then do that" in the mainline code and I intend to keep it that way, no preferential treatment for any single backend.  If we could do some sort of universal hook for all backends (say prepare_recovery()), that might work out.

That said, there are probably web tools from Amazon that would allow you to unfreeze files en masse for recovery.  I'd investigate those first.

You also might check out the rclonebackend.  It has a lot of functionality that augments duplicity.

...Thanks,
...Ken


On Mon, Jul 20, 2020 at 12:30 AM Marco Herrn via Duplicity-talk <[hidden email]> wrote:
No ideas about it?

Just to clarify (and provide a tldr), what I want to know is:
Is it possible (and how) to unfreeze all files related to a restore at once?

Best regards
Marco

Am 12. Juli 2020 00:07:30 MESZ schrieb ml--- via Duplicity-talk <[hidden email]>:
Hello,

I am using duplicity 0.8.14 with the s3 backend to write to a Scaleway C14
Glacier storage (similar to Amazon S3 Glacier). Scaleway provides an
s3-compatible API and this is actually working very well.

I just noticed one glitch: When restoring from an archive in Glacier, it
seems to me that duplicity always processes one chunk after the other and
only unfreezes a chunk immediately before downloading it. What I mean is:
duplicity wants to download vol1.difftar.gpg (which is stored in Glacier)
and needs to unfreeze it. After unfreezing it downloads it and after that
processes vol2.difftar.gpg. And so on. As unfreezing can take multiple
hours that would mean the restore process will get extremely long with
bigger backups.

I would have expected that duplicity starts an unfreeze process for /all/
chunks it needs for the restore at the same time, so they will get
available for download at roughly the same time. Therefore the waiting cost
for unfreezing would have to be "paid" only once.

Is my observation correct or am I misinterpreting something?

Is this the expected behaviour? Actually the Glacier integration would be
quite useless in this case as a restore of a large backup would take
extraordinary amounts of time and to avoid that I would have to manually
unfreeze each file needed for the restore (which is also not easy, since I
don't know which files are really necessary for the restore.


Best regards
Marco
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk

--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restoring from Glacier Cold Storage

duplicity-talk mailing list
Hi Kenneth,

many thanks for your answer.

I have looked into the source code of duplicity and indeed tried to do a few changes.
It seems that actually botos recover() method already is blocking, so I implemented a
'pre_process_download_batch' method that actually calls 'pre_process_download' for
each file to restore in a separate thread.

It is working well for my use case. If there are files to unfreeze from Glacier
the unfreezing process is started for all of them.
The actual 'pre_process_download' is still blocking until the file it tries to
preprocess is available in S3.
If all files already are in S3 the restore works normally without additional
waiting time.

Due to my poor python knowledge and the unfamiliar codebase this might not be
the best solution and I might have done some stupid mistakes, but it is a nice
isolated addition.

Since I don't have a gitlab account, I attach the diff of my changes to this mail.
Or you can view it in my github account:
https://github.com/hupfdule/duplicity/commit/35a562605f816c6d95b744b2c093d95b0374e0c9

Best regards
Marco

Am 2020-07-20 17:02, schrieb Kenneth Loafman:

> Hi Marco,
> Plenty of ideas, just no time to implement them. A bit of help would
> be great!
> I imagine it would not be difficult to implement, however, it would
> mean access to mainline functions and data outside the normal scope of
> the backends. So far we've avoided "if this backend then do that" in
> the mainline code and I intend to keep it that way, no preferential
> treatment for any single backend. If we could do some sort of
> universal hook for all backends (say prepare_recovery()), that might
> work out.
> That said, there are probably web tools from Amazon that would allow
> you to unfreeze files en masse for recovery. I'd investigate those
> first.
> You also might check out the rclonebackend. It has a lot of
> functionality that augments duplicity.
> ...Thanks,
> ...Ken
> On Mon, Jul 20, 2020 at 12:30 AM Marco Herrn via Duplicity-talk
> <[hidden email]> wrote:
> No ideas about it?
>>>> Just to clarify (and provide a tldr), what I want to know is:
>> Is it possible (and how) to unfreeze all files related to a restore
>> at once?
>>>> Best regards
>> Marco
>>>> Am 12. Juli 2020 00:07:30 MESZ schrieb ml--- via Duplicity-talk
>> <[hidden email]>:
>>>>> Hello,
>
>>>> I am using duplicity 0.8.14 with the s3 backend to write to a
> Scaleway C14
> Glacier storage (similar to Amazon S3 Glacier). Scaleway provides
> an
> s3-compatible API and this is actually working very well.
>>>> I just noticed one glitch: When restoring from an archive in
> Glacier, it
> seems to me that duplicity always processes one chunk after the
> other and
> only unfreezes a chunk immediately before downloading it. What I
> mean is:
> duplicity wants to download vol1.difftar.gpg (which is stored in
> Glacier)
> and needs to unfreeze it. After unfreezing it downloads it and
> after that
> processes vol2.difftar.gpg. And so on. As unfreezing can take
> multiple
> hours that would mean the restore process will get extremely long
> with
> bigger backups.
>>>> I would have expected that duplicity starts an unfreeze process
> for /all/
> chunks it needs for the restore at the same time, so they will get
> available for download at roughly the same time. Therefore the
> waiting cost
> for unfreezing would have to be "paid" only once.
>>>> Is my observation correct or am I misinterpreting something?
>>>> Is this the expected behaviour? Actually the Glacier integration
> would be
> quite useless in this case as a restore of a large backup would
> take
> extraordinary amounts of time and to avoid that I would have to
> manually
> unfreeze each file needed for the restore (which is also not easy,
> since I
> don't know which files are really necessary for the restore.
>>>> Best regards
> Marco
> -------------------------
> Duplicity-talk mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/duplicity-talk 
>>>> --
>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail
>> gesendet._______________________________________________
>> Duplicity-talk mailing list
>> [hidden email]
>> https://lists.nongnu.org/mailman/listinfo/duplicity-talk

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk

duplicity-unfreeze-s3-at-once.patch (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restoring from Glacier Cold Storage

duplicity-talk mailing list
Thanks, I'll take a look. 

...Ken


On Thu, Jul 23, 2020 at 4:32 PM <[hidden email]> wrote:
Hi Kenneth,

many thanks for your answer.

I have looked into the source code of duplicity and indeed tried to do a few changes.
It seems that actually botos recover() method already is blocking, so I implemented a
'pre_process_download_batch' method that actually calls 'pre_process_download' for
each file to restore in a separate thread.

It is working well for my use case. If there are files to unfreeze from Glacier
the unfreezing process is started for all of them.
The actual 'pre_process_download' is still blocking until the file it tries to
preprocess is available in S3.
If all files already are in S3 the restore works normally without additional
waiting time.

Due to my poor python knowledge and the unfamiliar codebase this might not be
the best solution and I might have done some stupid mistakes, but it is a nice
isolated addition.

Since I don't have a gitlab account, I attach the diff of my changes to this mail.
Or you can view it in my github account:
https://github.com/hupfdule/duplicity/commit/35a562605f816c6d95b744b2c093d95b0374e0c9

Best regards
Marco

Am 2020-07-20 17:02, schrieb Kenneth Loafman:

> Hi Marco,
> Plenty of ideas, just no time to implement them. A bit of help would
> be great!
> I imagine it would not be difficult to implement, however, it would
> mean access to mainline functions and data outside the normal scope of
> the backends. So far we've avoided "if this backend then do that" in
> the mainline code and I intend to keep it that way, no preferential
> treatment for any single backend. If we could do some sort of
> universal hook for all backends (say prepare_recovery()), that might
> work out.
> That said, there are probably web tools from Amazon that would allow
> you to unfreeze files en masse for recovery. I'd investigate those
> first.
> You also might check out the rclonebackend. It has a lot of
> functionality that augments duplicity.
> ...Thanks,
> ...Ken
> On Mon, Jul 20, 2020 at 12:30 AM Marco Herrn via Duplicity-talk
> <[hidden email]> wrote:
> No ideas about it?
>>>> Just to clarify (and provide a tldr), what I want to know is:
>> Is it possible (and how) to unfreeze all files related to a restore
>> at once?
>>>> Best regards
>> Marco
>>>> Am 12. Juli 2020 00:07:30 MESZ schrieb ml--- via Duplicity-talk
>> <[hidden email]>:
>>>>> Hello,
>
>>>> I am using duplicity 0.8.14 with the s3 backend to write to a
> Scaleway C14
> Glacier storage (similar to Amazon S3 Glacier). Scaleway provides
> an
> s3-compatible API and this is actually working very well.
>>>> I just noticed one glitch: When restoring from an archive in
> Glacier, it
> seems to me that duplicity always processes one chunk after the
> other and
> only unfreezes a chunk immediately before downloading it. What I
> mean is:
> duplicity wants to download vol1.difftar.gpg (which is stored in
> Glacier)
> and needs to unfreeze it. After unfreezing it downloads it and
> after that
> processes vol2.difftar.gpg. And so on. As unfreezing can take
> multiple
> hours that would mean the restore process will get extremely long
> with
> bigger backups.
>>>> I would have expected that duplicity starts an unfreeze process
> for /all/
> chunks it needs for the restore at the same time, so they will get
> available for download at roughly the same time. Therefore the
> waiting cost
> for unfreezing would have to be "paid" only once.
>>>> Is my observation correct or am I misinterpreting something?
>>>> Is this the expected behaviour? Actually the Glacier integration
> would be
> quite useless in this case as a restore of a large backup would
> take
> extraordinary amounts of time and to avoid that I would have to
> manually
> unfreeze each file needed for the restore (which is also not easy,
> since I
> don't know which files are really necessary for the restore.
>>>> Best regards
> Marco
> -------------------------
> Duplicity-talk mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/duplicity-talk
>>>> --
>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail
>> gesendet._______________________________________________
>> Duplicity-talk mailing list
>> [hidden email]
>> https://lists.nongnu.org/mailman/listinfo/duplicity-talk

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restoring from Glacier Cold Storage

duplicity-talk mailing list
Just for the reference:

To make the review easier, I now created a merge request in Gitlab for it:

  https://gitlab.com/duplicity/duplicity/-/merge_requests/20


Best regards
Marco




24. Juli 2020 20:50, "Kenneth Loafman" <[hidden email]> schrieb:

Thanks, I'll take a look.

...Ken

On Thu, Jul 23, 2020 at 4:32 PM <[hidden email]> wrote:

Hi Kenneth,

many thanks for your answer.

I have looked into the source code of duplicity and indeed tried to do a few changes.
It seems that actually botos recover() method already is blocking, so I implemented a
'pre_process_download_batch' method that actually calls 'pre_process_download' for
each file to restore in a separate thread.

It is working well for my use case. If there are files to unfreeze from Glacier
the unfreezing process is started for all of them.
The actual 'pre_process_download' is still blocking until the file it tries to
preprocess is available in S3.
If all files already are in S3 the restore works normally without additional
waiting time.

Due to my poor python knowledge and the unfamiliar codebase this might not be
the best solution and I might have done some stupid mistakes, but it is a nice
isolated addition.

Since I don't have a gitlab account, I attach the diff of my changes to this mail.
Or you can view it in my github account:
https://github.com/hupfdule/duplicity/commit/35a562605f816c6d95b744b2c093d95b0374e0c9

Best regards
Marco

Am 2020-07-20 17:02, schrieb Kenneth Loafman:

> Hi Marco,
> Plenty of ideas, just no time to implement them. A bit of help would
> be great!
> I imagine it would not be difficult to implement, however, it would
> mean access to mainline functions and data outside the normal scope of
> the backends. So far we've avoided "if this backend then do that" in
> the mainline code and I intend to keep it that way, no preferential
> treatment for any single backend. If we could do some sort of
> universal hook for all backends (say prepare_recovery()), that might
> work out.
> That said, there are probably web tools from Amazon that would allow
> you to unfreeze files en masse for recovery. I'd investigate those
> first.
> You also might check out the rclonebackend. It has a lot of
> functionality that augments duplicity.
> ...Thanks,
> ...Ken
> On Mon, Jul 20, 2020 at 12:30 AM Marco Herrn via Duplicity-talk
> <[hidden email]> wrote:
> No ideas about it?
>>>> Just to clarify (and provide a tldr), what I want to know is:
>> Is it possible (and how) to unfreeze all files related to a restore
>> at once?
>>>> Best regards
>> Marco
>>>> Am 12. Juli 2020 00:07:30 MESZ schrieb ml--- via Duplicity-talk
>> <[hidden email]>:
>>>>> Hello,
>
>>>> I am using duplicity 0.8.14 with the s3 backend to write to a
> Scaleway C14
> Glacier storage (similar to Amazon S3 Glacier). Scaleway provides
> an
> s3-compatible API and this is actually working very well.
>>>> I just noticed one glitch: When restoring from an archive in
> Glacier, it
> seems to me that duplicity always processes one chunk after the
> other and
> only unfreezes a chunk immediately before downloading it. What I
> mean is:
> duplicity wants to download vol1.difftar.gpg (which is stored in
> Glacier)
> and needs to unfreeze it. After unfreezing it downloads it and
> after that
> processes vol2.difftar.gpg. And so on. As unfreezing can take
> multiple
> hours that would mean the restore process will get extremely long
> with
> bigger backups.
>>>> I would have expected that duplicity starts an unfreeze process
> for /all/
> chunks it needs for the restore at the same time, so they will get
> available for download at roughly the same time. Therefore the
> waiting cost
> for unfreezing would have to be "paid" only once.
>>>> Is my observation correct or am I misinterpreting something?
>>>> Is this the expected behaviour? Actually the Glacier integration
> would be
> quite useless in this case as a restore of a large backup would
> take
> extraordinary amounts of time and to avoid that I would have to
> manually
> unfreeze each file needed for the restore (which is also not easy,
> since I
> don't know which files are really necessary for the restore.
>>>> Best regards
> Marco
> -------------------------
> Duplicity-talk mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/duplicity-talk
>>>> --
>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail
>> gesendet._______________________________________________
>> Duplicity-talk mailing list
>> [hidden email]
>> https://lists.nongnu.org/mailman/listinfo/duplicity-talk

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Restoring from Glacier Cold Storage

duplicity-talk mailing list
Thanks! 

On Fri, Jul 24, 2020 at 4:47 PM <[hidden email]> wrote:
Just for the reference:

To make the review easier, I now created a merge request in Gitlab for it:

  https://gitlab.com/duplicity/duplicity/-/merge_requests/20


Best regards
Marco




24. Juli 2020 20:50, "Kenneth Loafman" <[hidden email]> schrieb:

Thanks, I'll take a look.

...Ken

On Thu, Jul 23, 2020 at 4:32 PM <[hidden email]> wrote:

Hi Kenneth,

many thanks for your answer.

I have looked into the source code of duplicity and indeed tried to do a few changes.
It seems that actually botos recover() method already is blocking, so I implemented a
'pre_process_download_batch' method that actually calls 'pre_process_download' for
each file to restore in a separate thread.

It is working well for my use case. If there are files to unfreeze from Glacier
the unfreezing process is started for all of them.
The actual 'pre_process_download' is still blocking until the file it tries to
preprocess is available in S3.
If all files already are in S3 the restore works normally without additional
waiting time.

Due to my poor python knowledge and the unfamiliar codebase this might not be
the best solution and I might have done some stupid mistakes, but it is a nice
isolated addition.

Since I don't have a gitlab account, I attach the diff of my changes to this mail.
Or you can view it in my github account:
https://github.com/hupfdule/duplicity/commit/35a562605f816c6d95b744b2c093d95b0374e0c9

Best regards
Marco

Am 2020-07-20 17:02, schrieb Kenneth Loafman:

> Hi Marco,
> Plenty of ideas, just no time to implement them. A bit of help would
> be great!
> I imagine it would not be difficult to implement, however, it would
> mean access to mainline functions and data outside the normal scope of
> the backends. So far we've avoided "if this backend then do that" in
> the mainline code and I intend to keep it that way, no preferential
> treatment for any single backend. If we could do some sort of
> universal hook for all backends (say prepare_recovery()), that might
> work out.
> That said, there are probably web tools from Amazon that would allow
> you to unfreeze files en masse for recovery. I'd investigate those
> first.
> You also might check out the rclonebackend. It has a lot of
> functionality that augments duplicity.
> ...Thanks,
> ...Ken
> On Mon, Jul 20, 2020 at 12:30 AM Marco Herrn via Duplicity-talk
> <[hidden email]> wrote:
> No ideas about it?
>>>> Just to clarify (and provide a tldr), what I want to know is:
>> Is it possible (and how) to unfreeze all files related to a restore
>> at once?
>>>> Best regards
>> Marco
>>>> Am 12. Juli 2020 00:07:30 MESZ schrieb ml--- via Duplicity-talk
>> <[hidden email]>:
>>>>> Hello,
>
>>>> I am using duplicity 0.8.14 with the s3 backend to write to a
> Scaleway C14
> Glacier storage (similar to Amazon S3 Glacier). Scaleway provides
> an
> s3-compatible API and this is actually working very well.
>>>> I just noticed one glitch: When restoring from an archive in
> Glacier, it
> seems to me that duplicity always processes one chunk after the
> other and
> only unfreezes a chunk immediately before downloading it. What I
> mean is:
> duplicity wants to download vol1.difftar.gpg (which is stored in
> Glacier)
> and needs to unfreeze it. After unfreezing it downloads it and
> after that
> processes vol2.difftar.gpg. And so on. As unfreezing can take
> multiple
> hours that would mean the restore process will get extremely long
> with
> bigger backups.
>>>> I would have expected that duplicity starts an unfreeze process
> for /all/
> chunks it needs for the restore at the same time, so they will get
> available for download at roughly the same time. Therefore the
> waiting cost
> for unfreezing would have to be "paid" only once.
>>>> Is my observation correct or am I misinterpreting something?
>>>> Is this the expected behaviour? Actually the Glacier integration
> would be
> quite useless in this case as a restore of a large backup would
> take
> extraordinary amounts of time and to avoid that I would have to
> manually
> unfreeze each file needed for the restore (which is also not easy,
> since I
> don't know which files are really necessary for the restore.
>>>> Best regards
> Marco
> -------------------------
> Duplicity-talk mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/duplicity-talk
>>>> --
>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail
>> gesendet._______________________________________________
>> Duplicity-talk mailing list
>> [hidden email]
>> https://lists.nongnu.org/mailman/listinfo/duplicity-talk

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk