Deletion

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Deletion

duplicity-talk mailing list

I'm wondering how this scenario works:

Let's say I use Duplicity for incremental, daily backups (the default being 200MB parts). I don't use any deletion scheme on a day-to-day basis. Every six months, I want to delete redundant data that is older than one year.

I'm expecting it to keep adding files until I reach the "cleanup" stage (twice per year). At that point, it should remove old data that isn't relevant anymore (deleted or changed). Since there are volumes, I imagine some might be deleted and others, updated (or deleted and reconstructed).

Does this approach make sense with Duplicity? Am I on the right path?

Thanks!


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Deletion

duplicity-talk mailing list
hi Foust,

On 24.01.2020 05:12, Fogust via Duplicity-talk wrote:
> I'm wondering how this scenario works:
>
> Let's say I use Duplicity for incremental, daily backups (the default
> being 200MB parts). I don't use any deletion scheme on a day-to-day
> basis. Every six months, I want to delete redundant data that is older
> than one year.

sounds like you never started another backup chain. hence your recent backups depend on the old ones probably dating back into your deletion time frame.

> I'm expecting it to keep adding files until I reach the "cleanup" stage
> (twice per year). At that point, it should remove old data that isn't
> relevant anymore (deleted or changed). Since there are volumes, I
> imagine some might be deleted and others, updated (or deleted and
> reconstructed).
>
> Does this approach make sense with Duplicity? Am I on the right path?

duplicity treats backend as dumb file storage. it merely writes or reads data from there. and because there is not target based duplicity a "reconstruction" approach would need to re-transfer the whole data.
that is in turn equal to a simple new full which is the suggested methodology here. simply use '--full-if-older-than 6M' during backup and you will end up with a new independent chain every 6 months that can be safely discarded if not needed anymore.

personally i keep monthly chains* and verify after every backup, just to be sure.

*if one volume gets corrupted the whole chain might not be restorable by default means, so regular full backups are advised.

..ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Deletion

duplicity-talk mailing list


 

On 2020-01-24 05:02, [hidden email] wrote:

hi Foust,
 
--cut--

duplicity treats backend as dumb file storage. it merely writes or reads data from there. and because there is not target based duplicity a "reconstruction" approach would need to re-transfer the whole data.
that is in turn equal to a simple new full which is the suggested methodology here. simply use '--full-if-older-than 6M' during backup and you will end up with a new independent chain every 6 months that can be safely discarded if not needed anymore.

personally i keep monthly chains* and verify after every backup, just to be sure.

*if one volume gets corrupted the whole chain might not be restorable by default means, so regular full backups are advised.

 

Thank you. I have a large amount of data and there's a good chance that a significant portion will not change (or rarely change). So, if I have 10TB of data and 6TB doesn't change between backups, is there a way to intelligently update that to bring a bunch of incremental changes elsewhere into a full backup?



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Deletion

duplicity-talk mailing list
On Jan 24, 2020, at 4:53 PM, Fogust via Duplicity-talk <[hidden email]> wrote:

On 2020-01-24 05:02, [hidden email] wrote:

hi Foust,
 
--cut--

duplicity treats backend as dumb file storage. it merely writes or reads data from there. and because there is not target based duplicity a "reconstruction" approach would need to re-transfer the whole data.
that is in turn equal to a simple new full which is the suggested methodology here. simply use '--full-if-older-than 6M' during backup and you will end up with a new independent chain every 6 months that can be safely discarded if not needed anymore.

personally i keep monthly chains* and verify after every backup, just to be sure.

*if one volume gets corrupted the whole chain might not be restorable by default means, so regular full backups are advised.

 

Thank you. I have a large amount of data and there's a good chance that a significant portion will not change (or rarely change). So, if I have 10TB of data and 6TB doesn't change between backups, is there a way to intelligently update that to bring a bunch of incremental changes elsewhere into a full backup

Not on the server side!  As Edgar pointed out the server is considered compromised and does not have any keys or access to data.  Thus it cannot coalesce  incremental with the initial full backup.  To do so, all the data would have to be downloaded to the local machine, coalesced and then uploaded back to the server.  This is more cpu and network bandwidth than just uploading a new Full backup.

Other non-secure backups can do this.  There are several commercial systems or a series of scripts and use of rsync.  Duplicity by its name and design includes top shelf security with as much efficiency as possible.

Also note the issue of corruption with long chains and not making fresh full backups.

-Scott



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Deletion

duplicity-talk mailing list
Am Freitag, den 24.01.2020, 17:05 -0500 schrieb Scott Hannahs via Duplicity-talk:

> > Thank you. I have a large amount of data and there's a good chance that a significant portion will not change (or rarely change). So, if
> > I have 10TB of data and 6TB doesn't change between backups, is there a way to intelligently update that to bring a bunch of incremental
> > changes elsewhere into a full backup
> >
>
> Not on the server side!  As Edgar pointed out the server is considered compromised and does not have any keys or access to data.  Thus it
> cannot coalesce  incremental with the initial full backup.  To do so, all the data would have to be downloaded to the local machine,
> coalesced and then uploaded back to the server.  This is more cpu and network bandwidth than just uploading a new Full backup.
>
> Other non-secure backups can do this.

restic can do this - and I see no reason why it should be less secure than duplicity.

restic aggressively deduplicates data in a backup medium (named repository).
So if a repository is corrupt in just one block, ALL copies of ALL files
referencing that block are gone.

ALL backups of that file are gone. So with restic it is more important than ever
having several independent repositories.

--
Wolfgang


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Deletion

duplicity-talk mailing list
In reply to this post by duplicity-talk mailing list
On 24.01.2020 23:05, Scott Hannahs via Duplicity-talk wrote:

>> On Jan 24, 2020, at 4:53 PM, Fogust via Duplicity-talk <[hidden email] <mailto:[hidden email]>> wrote:
>>
>> On 2020-01-24 05:02, [hidden email] <mailto:[hidden email]> wrote:
>>
>>> hi Foust,
>>>  
>>> --cut--
>>>
>>> duplicity treats backend as dumb file storage. it merely writes or reads data from there. and because there is not target based duplicity a "reconstruction" approach would need to re-transfer the whole data.
>>> that is in turn equal to a simple new full which is the suggested methodology here. simply use '--full-if-older-than 6M' during backup and you will end up with a new independent chain every 6 months that can be safely discarded if not needed anymore.
>>>
>>> personally i keep monthly chains* and verify after every backup, just to be sure.
>>>
>>> *if one volume gets corrupted the whole chain might not be restorable by default means, so regular full backups are advised.
>>>
>>>  
>>
>> Thank you. I have a large amount of data and there's a good chance that a significant portion will not change (or rarely change). So, if I have 10TB of data and 6TB doesn't change between backups, is there a way to intelligently update that to bring a bunch of incremental changes elsewhere into a full backup
>>
> Not on the server side!  As Edgar pointed out the server is considered compromised and does not have any keys or access to data.  

well. yes and no :). actually files of one backup in time are determinable by name and timestamp. so as a _workaround_ you could base your new chain on the old avail full on the remote backend already.

steps
1. remote: create a new folder
2. remote: copy the files starting with full from the old target folder into the new
3. local: do an incremental backup against the new location
4. local: verify to make sure everything worked out

> Thus it cannot coalesce  incremental with the initial full backup.  To do so, all the data would have to be downloaded to the local machine, coalesced and then uploaded back to the server.  This is more cpu and network bandwidth than just uploading a new Full backup.

couldn't have written it better.

> Other non-secure backups can do this.  There are several commercial systems or a series of scripts and use of rsync.  Duplicity by its name and design includes top shelf security with as much efficiency as possible.

well. it's as secure as gpg is. if that's top shelf, then yes. ;)

> Also note the issue of corruption with long chains and not making fresh full backups.

consider reasonably short chains, using multibackend to place in several locations. there is also a par2 backend to protect against minimal damage.

have fun.. ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk