Big bandwidth bill from Rackspace

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Big bandwidth bill from Rackspace

James Patterson
Hello,

I recently got hit with a 200 usd bill from Rackspace for bandwidth
while using duply against their cloudfiles backend.

I'd like to find out why - the backup source didn't change much, and
reading the random daily e-mail reports that duply generates don't show
any big numbers.

Does anyone know if perhaps the "verify" part of backup_verify_purge is
doing something traffic intensive?

If not, what else could it be?

Thanks in advance.

--
  James Patterson
  [hidden email]

--
http://www.fastmail.fm - Same, same, but different...


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

Lou-23
On Mon, 10 Jun 2013 11:54:08 -0700
James Patterson <[hidden email]> wrote:

> Hello,
>
> I recently got hit with a 200 usd bill from Rackspace for bandwidth
> while using duply against their cloudfiles backend.
>
> I'd like to find out why - the backup source didn't change much, and
> reading the random daily e-mail reports that duply generates don't show
> any big numbers.
>
> Does anyone know if perhaps the "verify" part of backup_verify_purge is
> doing something traffic intensive?
>
> If not, what else could it be?
>
> Thanks in advance.

This might be a mistake on their part. I've had a similar experience
with Rackspace which ended up with them giving me a refund.

My usual monthly bill was ~$20/month, but my July 10 - August 10 billing
cycle was $358.85. The category in question was "Cloud Files Transfer
In Fees" which indicated that 4209 GB were transferred into Rackspace
Cloud Files, although the server that connected to Rackspace had a
total combined bandwidth out of 256.64 GB for July and August.

I opened a support ticket through the web interface. They didn't
respond after a few weeks so I ended up calling them.

Lou

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

edgar.soldin
On 10.06.2013 21:10, Lou wrote:

> I recently got hit with a 200 usd bill from Rackspace for bandwidth
>> while using duply against their cloudfiles backend.
>>
>> I'd like to find out why - the backup source didn't change much, and
>> reading the random daily e-mail reports that duply generates don't show
>> any big numbers.
>>
>> Does anyone know if perhaps the "verify" part of backup_verify_purge is
>> doing something traffic intensive?
>>

yes. essentially it restores any file of the latest backup and compares it to the original path in the local file system. therefore pretty much all the last full and the incremental volumes have to be downloaded again.

verify less regularly or do full's more often. but however you turn this, it's gonna cost with cloud space. this is the main reason for me to stick with cheap hoster ftp backup space.

..ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

James Patterson
> >> Does anyone know if perhaps the "verify" part of backup_verify_purge is
> >> doing something traffic intensive?
> >>
>
> yes. essentially it restores any file of the latest backup and compares
> it to the original path in the local file system. therefore pretty much
> all the last full and the incremental volumes have to be downloaded
> again.
>
> verify less regularly or do full's more often. but however you turn this,
> it's gonna cost with cloud space. this is the main reason for me to stick
> with cheap hoster ftp backup space.

Ah. Any chance the man page could explain this?

--
http://www.fastmail.fm - Faster than the air-speed velocity of an
                          unladen european swallow


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

edgar.soldin
On 10.06.2013 21:29, James Patterson wrote:

>>>> Does anyone know if perhaps the "verify" part of backup_verify_purge is
>>>> doing something traffic intensive?
>>>>
>>
>> yes. essentially it restores any file of the latest backup and compares
>> it to the original path in the local file system. therefore pretty much
>> all the last full and the incremental volumes have to be downloaded
>> again.
>>
>> verify less regularly or do full's more often. but however you turn this,
>> it's gonna cost with cloud space. this is the main reason for me to stick
>> with cheap hoster ftp backup space.
>
> Ah. Any chance the man page could explain this?
>

looks like it does
http://duplicity.nongnu.org/duplicity.1.html

..ede

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

James Patterson
I'm not seeing it... can you help me out?

--
  James Patterson
  [hidden email]

On Mon, Jun 10, 2013, at 12:44 PM, [hidden email] wrote:

> On 10.06.2013 21:29, James Patterson wrote:
> >>>> Does anyone know if perhaps the "verify" part of backup_verify_purge is
> >>>> doing something traffic intensive?
> >>>>
> >>
> >> yes. essentially it restores any file of the latest backup and compares
> >> it to the original path in the local file system. therefore pretty much
> >> all the last full and the incremental volumes have to be downloaded
> >> again.
> >>
> >> verify less regularly or do full's more often. but however you turn this,
> >> it's gonna cost with cloud space. this is the main reason for me to stick
> >> with cheap hoster ftp backup space.
> >
> > Ah. Any chance the man page could explain this?
> >
>
> looks like it does
> http://duplicity.nongnu.org/duplicity.1.html
>
> ..ede
>
> _______________________________________________
> Duplicity-talk mailing list
> [hidden email]
> https://lists.nongnu.org/mailman/listinfo/duplicity-talk

--
http://www.fastmail.fm - Email service worth paying for. Try it for free


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

edgar.soldin
On 10.06.2013 21:56, James Patterson wrote:
> I'm not seeing it... can you help me out?
>

"
verify [--file-to-restore <relpath>]
    Enter verify mode instead of restore. If the --file-to-restore option is given, restrict verify to that file or directory. duplicity will exit with a non-zero error level if any files are different. On verbosity level 4 or higher, log a message for each file that has changed.
"

if you think that's to unclear feel free to provide a modified text. i could wrangle it into a patch then.

..ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

James Patterson
On Mon, Jun 10, 2013, at 01:36 PM, [hidden email] wrote:

> On 10.06.2013 21:56, James Patterson wrote:
> > I'm not seeing it... can you help me out?
> >
>
> "
> verify [--file-to-restore <relpath>]
>     Enter verify mode instead of restore. If the --file-to-restore option
>     is given, restrict verify to that file or directory. duplicity will
>     exit with a non-zero error level if any files are different. On
>     verbosity level 4 or higher, log a message for each file that has
>     changed.
> "
>
> if you think that's to unclear feel free to provide a modified text. i
> could wrangle it into a patch then.

Thanks. I would suggest:

Enter verify mode instead of restore. This will restore each file from
the latest backup and compare it to the local copy.
If the --file-to-restore option is given, restrict verify to that file
or directory. duplicity will exit with a non-zero error level if any
files are different. On verbosity level 4 or higher, log a message for
each file that has changed.

But one moment: that man page is from duplicity, not from duply. If
duply is aimed at being the friendlier more convenient front-end to
duplicity, then there really needs to be a mention of it in duply's man
page.

--
http://www.fastmail.fm - IMAP accessible web-mail


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

edgar.soldin
On 10.06.2013 23:13, James Patterson wrote:

> On Mon, Jun 10, 2013, at 01:36 PM, [hidden email] wrote:
>> On 10.06.2013 21:56, James Patterson wrote:
>>> I'm not seeing it... can you help me out?
>>>
>>
>> "
>> verify [--file-to-restore <relpath>]
>>     Enter verify mode instead of restore. If the --file-to-restore option
>>     is given, restrict verify to that file or directory. duplicity will
>>     exit with a non-zero error level if any files are different. On
>>     verbosity level 4 or higher, log a message for each file that has
>>     changed.
>> "
>>
>> if you think that's to unclear feel free to provide a modified text. i
>> could wrangle it into a patch then.
>
> Thanks. I would suggest:
>
> Enter verify mode instead of restore. This will restore each file from
> the latest backup and compare it to the local copy.
> If the --file-to-restore option is given, restrict verify to that file
> or directory. duplicity will exit with a non-zero error level if any
> files are different. On verbosity level 4 or higher, log a message for
> each file that has changed.
>
> But one moment: that man page is from duplicity, not from duply. If
> duply is aimed at being the friendlier more convenient front-end to
> duplicity, then there really needs to be a mention of it in duply's man
> page.
>

true. duply manpage currently says

 verify     list files changed since latest backup

gotta think of something. feel free to provide something short and sweet ;) current manpage is online here
http://www.duply.net/?title=Duply-documentation#Manpage

..ede/dupply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

Joseph D. Wagner
In reply to this post by edgar.soldin
On 06/10/2013 12:20 pm, [hidden email] wrote:

> On 10.06.2013 21:10, Lou wrote:
>
>> I recently got hit with a 200 usd bill from Rackspace for bandwidth
>>
>>> while using duply against their cloudfiles backend. I'd like to find
>>> out why - the backup source didn't change much, and reading the
>>> random daily e-mail reports that duply generates don't show any big
>>> numbers. Does anyone know if perhaps the "verify" part of
>>> backup_verify_purge is doing something traffic intensive?
>
> yes. essentially it restores any file of the latest backup and
> compares
> it to the original path in the local file system. therefore pretty
> much
> all the last full and the incremental volumes have to be downloaded
> again.

Why does it need to download the entire file?  Why not just download a
list of files and their checksums (CRC32/64 or if paranoid
SHA1/256/512)?  This would minimize bandwidth usage for verify
operations and be just as effective as comparing the whole file.

Joseph D. Wagner

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

edgar.soldin
On 10.06.2013 23:47, Joseph D. Wagner wrote:

> On 06/10/2013 12:20 pm, [hidden email] wrote:
>
>> On 10.06.2013 21:10, Lou wrote:
>>
>>> I recently got hit with a 200 usd bill from Rackspace for bandwidth
>>>
>>>> while using duply against their cloudfiles backend. I'd like to find
>>>> out why - the backup source didn't change much, and reading the
>>>> random daily e-mail reports that duply generates don't show any big
>>>> numbers. Does anyone know if perhaps the "verify" part of
>>>> backup_verify_purge is doing something traffic intensive?
>>
>> yes. essentially it restores any file of the latest backup and compares
>> it to the original path in the local file system. therefore pretty much
>> all the last full and the incremental volumes have to be downloaded
>> again.
>
> Why does it need to download the entire file?  Why not just download a list of files and their checksums (CRC32/64 or if paranoid SHA1/256/512)?  This would minimize bandwidth usage for verify operations and be just as effective as comparing the whole file.
>

that's the way it is currently implemented. you are totally right. the signatures contain enough information for incrementing without having to restore the previous state of a file, why does verify need it! because nobody hacked it in until now ;)

feel free to step up.. ede/duply.net

_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

Nate Eldredge-4
In reply to this post by Joseph D. Wagner
On Mon, 10 Jun 2013, Joseph D. Wagner wrote:

> On 06/10/2013 12:20 pm, [hidden email] wrote:
>> yes. essentially it restores any file of the latest backup and compares
>> it to the original path in the local file system. therefore pretty much
>> all the last full and the incremental volumes have to be downloaded
>> again.
>
> Why does it need to download the entire file?  Why not just download a list
> of files and their checksums (CRC32/64 or if paranoid SHA1/256/512)?  This
> would minimize bandwidth usage for verify operations and be just as effective
> as comparing the whole file.

This would tell you whether your local file had changed, but it wouldn't
actually verify the backup.  Suppose some of the backup volumes are
corrupted, but the signature / checksum files are intact.  Your proposal
would not detect this.

I don't think there's any way to truly verify the backup other than
fetching all the volumes, since we don't assume that we can run code or
store encryption keys on the backend.

--
Nate Eldredge
[hidden email]


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

edgar.soldin
On 11.06.2013 00:56, Nate Eldredge wrote:

> On Mon, 10 Jun 2013, Joseph D. Wagner wrote:
>
>> On 06/10/2013 12:20 pm, [hidden email] wrote:
>>> yes. essentially it restores any file of the latest backup and compares
>>> it to the original path in the local file system. therefore pretty much
>>> all the last full and the incremental volumes have to be downloaded
>>> again.
>>
>> Why does it need to download the entire file?  Why not just download a list of files and their checksums (CRC32/64 or if paranoid SHA1/256/512)?  This would minimize bandwidth usage for verify operations and be just as effective as comparing the whole file.
>
> This would tell you whether your local file had changed, but it wouldn't actually verify the backup.  Suppose some of the backup volumes are corrupted, but the signature / checksum files are intact.  Your proposal would not detect this.
>
> I don't think there's any way to truly verify the backup other than fetching all the volumes, since we don't assume that we can run code or store encryption keys on the backend.
>

valid point. btw. the next release will have a --compare-data switch which was not exposed before to actually compare local path with the backed up version bit by bit.

that's the essential difference between a
verification - which actually makes sure everything is fine
vs.
comparison - which might simply look for differences, without restoring anything

duplicity's current verify is a combination of both. we should probably make that clearer, or clean it up by separating the actions properly
compare - actually compare local path with a restored copy
verify - simply restore and compare against the saved checksum

but that still would download the volumes. giving compare a parameter to compare against the manifest files, saving traffic, would allow to check for changes. but that is essentially what --dry-run is doing already. so why bother?

to come back to the original problem, which used 'duply profile backup_verify_purge', probably from a cron job. the periodical verification would make sense only if you want to make sure the remote backup is not corrupted. therefor i'd say Nate's evaluation is correct for this scenario. no way around the traffic.

..ede/duply.net


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

Aaron Whitehouse
> On 11.06.2013 00:56, Nate Eldredge wrote:
> valid point. btw. the next release will have a --compare-data switch which
> was not exposed before to actually compare local path with the backed up
> version bit by bit.
>
[...]
> duplicity's current verify is a combination of both. we should probably
> make that clearer, or clean it up by separating the actions properly
> compare - actually compare local path with a restored copy
> verify - simply restore and compare against the saved checksum
>
> but that still would download the volumes. giving compare a parameter to
> compare against the manifest files, saving traffic, would allow to check
> for changes. but that is essentially what --dry-run is doing already. so
> why bother?
[...]
> ..ede/duply.net

For what it is worth, I completely agree that this should be better split.

This issue came up in a question that I raised a few years ago:
https://answers.launchpad.net/duplicity/+question/116587
and my proposed change to the man page to make this clearer:
https://bugs.launchpad.net/duplicity/+bug/644816
(which didn't happen). We also discussed a "test-restore" option that
sounds to match the new comparison that you mention.

I am currently hitting issues because the verify throws an error message
when a local log file changes between the backup and verify, which in this
instance I don't care about so long as the backup version can successfully
restore and matches the checksum.  Based on the theoretical idea of verify
(remote archive tested against checksum), I don't think that there should
be any comparison to the filesystem.  That feature is useful, but belongs
more in a separate function like the --compare-data option you refer to.

In case it helps, the way that I address the bandwidth issue is to duply
to the local filesystem and then rsync it off to the cloud - that way you
can run verify each time without downloading the files. Of course, there
is still the risk that the files haven't rsynced properly.

Hope that helps,

Aaron




_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Big bandwidth bill from Rackspace

edgar.soldin
In reply to this post by James Patterson
it's taken a year but good things etc. here is the change's branch
https://code.launchpad.net/~ed.so/duplicity/manpage.verify

..ede


On 10.06.2013 23:13, James Patterson wrote:

> On Mon, Jun 10, 2013, at 01:36 PM, [hidden email] wrote:
>> On 10.06.2013 21:56, James Patterson wrote:
>>> I'm not seeing it... can you help me out?
>>>
>>
>> "
>> verify [--file-to-restore <relpath>]
>>     Enter verify mode instead of restore. If the --file-to-restore option
>>     is given, restrict verify to that file or directory. duplicity will
>>     exit with a non-zero error level if any files are different. On
>>     verbosity level 4 or higher, log a message for each file that has
>>     changed.
>> "
>>
>> if you think that's to unclear feel free to provide a modified text. i
>> could wrangle it into a patch then.
>
> Thanks. I would suggest:
>
> Enter verify mode instead of restore. This will restore each file from
> the latest backup and compare it to the local copy.
> If the --file-to-restore option is given, restrict verify to that file
> or directory. duplicity will exit with a non-zero error level if any
> files are different. On verbosity level 4 or higher, log a message for
> each file that has changed.
>
> But one moment: that man page is from duplicity, not from duply. If
> duply is aimed at being the friendlier more convenient front-end to
> duplicity, then there really needs to be a mention of it in duply's man
> page.
>


_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk