File change detection using hashes

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

File change detection using hashes

halfgaar
Ben,

A while back I suggested a feature to include hash-checks to determine
if files have changed or not, instead of the mtimes+size combo. I trust
you remember. I've been away from the list a while, and I see there has
been a poll. I didn't see this feature in the poll, but I'd like to
emphasise that it is quite important. I've just discovered another
situation in which mtime+size is not reliable. I'll explain:

if you have two files of equal size, but different contents, and copy
one over the other, the mtime of the target file is preserved. Now,
because the size and mtime haven't changed, rdiff-backup doesn't see it
as a change and doesn't back it up. This if course, would be very
undesirable.

Here is an example of what I mean:

================================
# lh
total 8.0K
-rw-------  1 [owner] users 16 Feb 10 15:46 a
-rw-------  1 [owner] users 16 Feb 10 15:47 b

# cat a;cat b
this is a file.
this is b file.

# cp a b

# lh
total 8.0K
-rw-------  1 [owner] users 16 Feb 10 15:46 a
-rw-------  1 [owner] users 16 Feb 10 15:47 b

# cat a;cat b
this is a file.
this is a file.
================================

I really think this feature should be part of the next stable release,
because now I can't fully trust my backup to be accurate.

On a sidenote, the wiki at
http://rdiff-backup.solutionsfirst.com.au/?SuggestedFeatures doesnt seem
to work, the articles are not accessible.

Regards,

Wiebe Cazemier

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

signature.asc (264 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

Vadim Kouzmine
On Fri, 2006-02-10 at 15:53 +0100, Wiebe Cazemier wrote:

> Ben,
>
> A while back I suggested a feature to include hash-checks to determine
> if files have changed or not, instead of the mtimes+size combo. I trust
> you remember. I've been away from the list a while, and I see there has
> been a poll. I didn't see this feature in the poll, but I'd like to
> emphasise that it is quite important. I've just discovered another
> situation in which mtime+size is not reliable. I'll explain:
>
> if you have two files of equal size, but different contents, and copy
> one over the other, the mtime of the target file is preserved. Now,
> because the size and mtime haven't changed, rdiff-backup doesn't see it
> as a change and doesn't back it up. This if course, would be very
> undesirable.
>
> Here is an example of what I mean:
>
> ================================
> # lh
> total 8.0K
> -rw-------  1 [owner] users 16 Feb 10 15:46 a
> -rw-------  1 [owner] users 16 Feb 10 15:47 b
>
> # cat a;cat b
> this is a file.
> this is b file.
>
> # cp a b
>
> # lh
> total 8.0K
> -rw-------  1 [owner] users 16 Feb 10 15:46 a
> -rw-------  1 [owner] users 16 Feb 10 15:47 b
>
> # cat a;cat b
> this is a file.
> this is a file.
> ================================

I did the same and mtime has changed for me:

dd if=/dev/urandom of=a count=1
dd if=/dev/urandom of=b count=1

stat a b
  File: `a'
  Size: 512             Blocks: 8          IO Block: 131072 regular file
Device: 806h/2054d      Inode: 2094563     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1001/kouzminv)   Gid: (  440/  vadiki)
Access: 2006-02-10 11:49:56.000000000 -0500
Modify: 2006-02-10 11:49:56.000000000 -0500
Change: 2006-02-10 11:49:56.000000000 -0500
  File: `b'
  Size: 512             Blocks: 8          IO Block: 131072 regular file
Device: 806h/2054d      Inode: 2094711     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1001/kouzminv)   Gid: (  440/  vadiki)
Access: 2006-02-10 11:50:01.000000000 -0500
Modify: 2006-02-10 11:50:01.000000000 -0500
Change: 2006-02-10 11:50:01.000000000 -0500

cp a b

stat a b
  File: `a'
  Size: 512             Blocks: 8          IO Block: 131072 regular file
Device: 806h/2054d      Inode: 2094563     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1001/kouzminv)   Gid: (  440/  vadiki)
Access: 2006-02-10 11:51:20.000000000 -0500
Modify: 2006-02-10 11:49:56.000000000 -0500
Change: 2006-02-10 11:49:56.000000000 -0500
  File: `b'
  Size: 512             Blocks: 8          IO Block: 131072 regular file
Device: 806h/2054d      Inode: 2094711     Links: 1
Access: (0664/-rw-rw-r--)  Uid: ( 1001/kouzminv)   Gid: (  440/  vadiki)
Access: 2006-02-10 11:50:01.000000000 -0500
Modify: 2006-02-10 11:51:20.000000000 -0500
Change: 2006-02-10 11:51:20.000000000 -0500

Notice *** Modify: 2006-02-10 11:51:20.000000000 -0500 ***

I think you miss SECONDS in your lh listing. Looks like "b" was created
15:47 and then you copied a -> b at 15:47.

Vadim

>
> I really think this feature should be part of the next stable release,
> because now I can't fully trust my backup to be accurate.
>
> On a sidenote, the wiki at
> http://rdiff-backup.solutionsfirst.com.au/?SuggestedFeatures doesnt seem
> to work, the articles are not accessible.
>
> Regards,
>
> Wiebe Cazemier
> _______________________________________________
> rdiff-backup-users mailing list at [hidden email]
> http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
> Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
--
Vadim Kouzmine <[hidden email]>



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

Dave Kempe
In reply to this post by halfgaar
Wiebe Cazemier wrote:
> I really think this feature should be part of the next stable release,
> because now I can't fully trust my backup to be accurate.
>

the feature is in the development version... have you tried it?


> On a sidenote, the wiki at
> http://rdiff-backup.solutionsfirst.com.au/?SuggestedFeatures doesnt seem
> to work, the articles are not accessible.
>
I will get someone to fix it.
hopefully by this arvo

dave


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

halfgaar
In reply to this post by halfgaar
(this is a reply to a message sent to me, but not the list. Press
"reply-all", Gregory :) )

On 02/10/06 19:14, Gregory Benjamin wrote:

>A good argument in favor of this is the case where a hacker
>replaces files on a machine with altered ones that have the
>been fixed to appear to have the same mtime and size as the
>original. I've run into this problem a couple of times over
>the last few years. A cracker/script-kiddie gets into the
>machine and installs a "root-kit". This root-kit contains
>scripts and utilities that replace commands like ps, ls,
>login, etc. with altered copies. To cover their tracks, the
>root-kit changes the mtimes of these infected commands to
>match the originals. The sizes are also often adjusted to
>exactly match the original.
>
>Only by computing a md5sum or equivalent is it possible to
>detect that these files ARE NOT the original ones.
>
>- Greg Benjamin
>
Actually, this can be detected, because the ctime has changed. There is
no way an application can set a ctime. Any alteration to the file or
it's metadata results in a new ctime.

But, this is of course not rdiff-backups job, to keep track of. There is
security software which checks for changed ctimes.

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

signature.asc (264 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

halfgaar
In reply to this post by Vadim Kouzmine
On 02/10/06 18:00, Vadim Kouzmine wrote:

>
>I did the same and mtime has changed for me:
>
>dd if=/dev/urandom of=a count=1
>dd if=/dev/urandom of=b count=1
>
>stat a b
>  File: `a'
>  Size: 512             Blocks: 8          IO Block: 131072 regular file
>Device: 806h/2054d      Inode: 2094563     Links: 1
>Access: (0664/-rw-rw-r--)  Uid: ( 1001/kouzminv)   Gid: (  440/  vadiki)
>Access: 2006-02-10 11:49:56.000000000 -0500
>Modify: 2006-02-10 11:49:56.000000000 -0500
>Change: 2006-02-10 11:49:56.000000000 -0500
>  File: `b'
>  Size: 512             Blocks: 8          IO Block: 131072 regular file
>Device: 806h/2054d      Inode: 2094711     Links: 1
>Access: (0664/-rw-rw-r--)  Uid: ( 1001/kouzminv)   Gid: (  440/  vadiki)
>Access: 2006-02-10 11:50:01.000000000 -0500
>Modify: 2006-02-10 11:50:01.000000000 -0500
>Change: 2006-02-10 11:50:01.000000000 -0500
>
>cp a b
>
>stat a b
>  File: `a'
>  Size: 512             Blocks: 8          IO Block: 131072 regular file
>Device: 806h/2054d      Inode: 2094563     Links: 1
>Access: (0664/-rw-rw-r--)  Uid: ( 1001/kouzminv)   Gid: (  440/  vadiki)
>Access: 2006-02-10 11:51:20.000000000 -0500
>Modify: 2006-02-10 11:49:56.000000000 -0500
>Change: 2006-02-10 11:49:56.000000000 -0500
>  File: `b'
>  Size: 512             Blocks: 8          IO Block: 131072 regular file
>Device: 806h/2054d      Inode: 2094711     Links: 1
>Access: (0664/-rw-rw-r--)  Uid: ( 1001/kouzminv)   Gid: (  440/  vadiki)
>Access: 2006-02-10 11:50:01.000000000 -0500
>Modify: 2006-02-10 11:51:20.000000000 -0500
>Change: 2006-02-10 11:51:20.000000000 -0500
>
>Notice *** Modify: 2006-02-10 11:51:20.000000000 -0500 ***
>
>I think you miss SECONDS in your lh listing. Looks like "b" was created
>15:47 and then you copied a -> b at 15:47.
>
>Vadim
>
Sorry, me stupid :)

But, the problem still exists, use "mv" instead "cp". I remembered
incorrectly which command it was.

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

signature.asc (264 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

halfgaar
In reply to this post by Dave Kempe
On 02/10/06 21:29, dave kempe wrote:

> Wiebe Cazemier wrote:
>
>> I really think this feature should be part of the next stable release,
>> because now I can't fully trust my backup to be accurate.
>>
>
> the feature is in the development version... have you tried it?

I have read the changelogs, and can only find mention of compare
abilities for comparing the archive to the current disk-status. But not
as a way to detect changes in files while doing a backup. Are you sure
it's in? If so, I'll try the development version.

Should anyone wanna read my feature request, I sent a mail to the list
on oktober 21st 2005 about it. You should be able to find it in the
archive. The feature I'm referring to now, I described back then with
"--checksum-diffs".

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

signature.asc (264 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

halfgaar
On 02/11/06 00:29, Wiebe Cazemier wrote:

>
>I have read the changelogs, and can only find mention of compare
>abilities for comparing the archive to the current disk-status. But not
>as a way to detect changes in files while doing a backup. Are you sure
>it's in? If so, I'll try the development version.
>
>Should anyone wanna read my feature request, I sent a mail to the list
>on oktober 21st 2005 about it. You should be able to find it in the
>archive. The feature I'm referring to now, I described back then with
>"--checksum-diffs".
>
I've read the man page of the development version. The feature I'm
talking about is not present yet.

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

signature.asc (264 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

Dave Kempe
Wiebe Cazemier wrote:
> I've read the man page of the development version. The feature I'm
> talking about is not present yet.

I think I misread your request. I believe the current method to work
with changing files during the backup is to take an LVM snapshot. Is
that going to be possible for you?

dave


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

halfgaar
(feature-discussion summary at the end of this mail)

On 02/13/06 21:30, dave kempe wrote:

>
> I think I misread your request. I believe the current method to work
> with changing files during the backup is to take an LVM snapshot. Is
> that going to be possible for you?

I think there still is some misunderstanding.

The problem is in the "diff" part of rdiff-backup. It reads the source
dir, and any file for which the mtime or size is different than the most
recent backup of it, is selected to make a backup of.

But, as I showed with my "mv" example (orignally "cp" example, but
should be "mv"), the contents of a file can change without the mtime or
size changing. The example shown is not the only way files can change
without their mtime or size changing. In my original feature request, I
wrote down an elaborate explaination when a system's package management
is involved.

But it comes down to this: using only mtime and size to determin if a
file should be selected for backup is unreliable. The data of the file
must be checked, hence the checksum comparison.

But, now that you mention it, detection for changes in files which are
processed _during_ a backup (which produces the well-known
update-errors) also rely on mtime, which is not 100% reliable. But I
don't think that's too bad. If it is, the ctime could be used. For this
feature, the ctime may very well be an excellent choice, because we're
only talking about the time-span it takes to do a backup. Using ctimes
to for the file-selection-for-backup situation is not practical, because
very little is needed to change a ctime, doing a backup of your disk
with "dar" with default settings is one. That would cause rdiff-backup
to think every file on disk has changed.

You know, come to think of it, perhaps it's not that bad a choice to use
ctimes to detect if a file has changed for determining if it should be
backed up or not. It could be available as an option. When the ctime has
changed, _some_thing must have happened to it. Then, when it starts to
diff the files, the resulting diff increment for the new session is
almost 0 bytes (there is some header info or something, but not much).
This could be a lot faster then my --checksum-diffs option, because it
only reads the contents of the files of which the ctime has changed. But
a possible problem with this is, that some filesystems don't have
ctimes. But, fvat's mtime for example acts like a ctime. And when you
request ctime, you get mtime it would seem. So, that's not really a
problem. And as for ctimes changing because of dar, dar can be run with
the "--alter=atime" option, to avoid dar resetting the atime (which
results in a change in ctime, restoring atime). Ben tried to implement
ctime checking, but there was a problem, and he forgot what it was :(


======
OK, to sum this up, mostly for Ben (and I do hope you're reading this,
because it's kind of critical IMO):

Something _has_ to be devised to detect changes in files properly, to
avoid files not being backed up. Perhaps you could try to implement an
option for ctime checking, and possibly discover again why that's not
possible. If it _is_ impossible, my --checksum-diffs should be
implemented, IMO.

And, would it be possible to check for changes occured in files _during_
a backup with ctimes? That would be more reliable then mtimes.

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

signature.asc (264 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

dean gaudet-4
On Mon, 13 Feb 2006, Wiebe Cazemier wrote:

> Something _has_ to be devised to detect changes in files properly, to
> avoid files not being backed up. Perhaps you could try to implement an
> option for ctime checking, and possibly discover again why that's not
> possible. If it _is_ impossible, my --checksum-diffs should be
> implemented, IMO.

what you really want is something akin to rsync -c ... which forces
checksumming of every file, regardless of mtime/size.

throwing more stat elements into the comparison is only approximately an
improvement (you could also throw in dev:inode -- which would handle your
mv situation most of the time, but breaks in some cases like snapshotting
which preserves inode without preserving dev).


> And, would it be possible to check for changes occured in files _during_
> a backup with ctimes? That would be more reliable then mtimes.

ctime doesn't change when you modify the file, only mtime does.

-dean


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: File change detection using hashes

halfgaar
On 02/15/06 20:34, dean gaudet wrote:

>
>what you really want is something akin to rsync -c ... which forces
>checksumming of every file, regardless of mtime/size.
>  
>
Well, that's basicly what my --checksum-diffs suggestion, which I
suggested in that feature request in october, would do. Calculate the
checksum of a file, compare it to what rdiff-backup has stored in it's
metafiles and decide to backup the file if the checksums are different.

>throwing more stat elements into the comparison is only approximately an
>improvement (you could also throw in dev:inode -- which would handle your
>mv situation most of the time, but breaks in some cases like snapshotting
>which preserves inode without preserving dev).
>  
>
>>And, would it be possible to check for changes occured in files _during_
>>a backup with ctimes? That would be more reliable then mtimes.
>>    
>>
>
>ctime doesn't change when you modify the file, only mtime does.
>
The ctime (the change time of the inode) changes *always*. And, it
cannot be set by any application, so it would be an accurate detection.

But, I do agree that I would much rather have checksum comparisons than
another stat comparison, it's the most robust, and acts well when
backing up a system which has just been restored _from_ a backup (since
that would have a new ctime for every file).

And of course, checking for changes in special files like blockdevices
still has to happen the old fashion way, which shouldn't be a problem...

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

signature.asc (264 bytes) Download Attachment