Encountering the NFS "Directory not empty" error

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Encountering the NFS "Directory not empty" error

Nick Parker-2
After using rdiff for daily backups for the last couple weeks, I've run
into the problem described at
http://cvs.lp.se/doc/rdiff-backup/FAQ.html#dir_not_empty

It appears to be triggered by removal of a directory in the production
copy, which creates problems when the backup copy tries to duplicate the
removal.

I am not able to perform the backup outside of NFS, as we are doing this
in case of a local server failure while also conforming to existing
internal network infrastructure. Is there a known workaround for this
issue? Has it been resolved in the 1.1 series? If the answer to these is
"no", are there alternative backup systems similar in functionality to
rdiff-backup? (preferably with support for diffs of binary files, like
what rdiff-backup uses)

Thanks!

Nick

This is the error itself:

Traceback (most recent call last):
   File "/usr/bin/rdiff-backup", line 23, in ?
     rdiff_backup.Main.Main(sys.argv[1:])
   File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line
283, in Main
     take_action(rps)
   File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line
253, in take_action
     elif action == "backup": Backup(rps[0], rps[1])
   File "/usr/lib/python2.4/site-packages/rdiff_backup/Main.py", line
303, in Backup
     backup.Mirror_and_increment(rpin, rpout, incdir)
   File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line
51, in Mirror_and_increment
     DestS.patch_and_increment(dest_rpath, source_diffiter, inc_rpath)
   File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line
229, in patch_and_increment
     ITR(diff.index, diff)
   File "/usr/lib/python2.4/site-packages/rdiff_backup/rorpiter.py",
line 281, in __call__
     if self.finish_branches(index) is None:
   File "/usr/lib/python2.4/site-packages/rdiff_backup/rorpiter.py",
line 233, in finish_branches
     to_be_finished.end_process()
   File "/usr/lib/python2.4/site-packages/rdiff_backup/backup.py", line
574, in end_process
     self.base_rp.rmdir()
   File "/usr/lib/python2.4/site-packages/rdiff_backup/rpath.py", line
806, in rmdir
     self.conn.os.rmdir(self.path)
OSError: [Errno 39] Directory not empty:
'/backup/trantor/project_local/old/trac_data/templates/0.8'
Exception exceptions.TypeError: "'NoneType' object is not callable" in
<bound method GzipFile.__del__ of <gzip open file
'/backup/trantor/project_local/rdiff-backup-data/file_statistics.2005-10-29T16:50:49-04:00.data.gz',
mode 'wb' at 0xb7bb6f50 -0x4842b234>> ignored
Exception exceptions.TypeError: "'NoneType' object is not callable" in
<bound method GzipFile.__del__ of <gzip open file
'/backup/trantor/project_local/rdiff-backup-data/error_log.2005-10-29T16:50:49-04:00.data.gz',
mode 'wb' at 0xb7ea1f08 -0x4842b074>> ignored
Exception exceptions.TypeError: "'NoneType' object is not callable" in
<bound method GzipFile.__del__ of <gzip open file
'/backup/trantor/project_local/rdiff-backup-data/mirror_metadata.2005-10-29T16:50:49-04:00.snapshot.gz',
mode 'wb' at 0xb7bb6f98 -0x48427a14>> ignored


_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Encountering the NFS "Directory not empty" error

Bernd Schubert-6
Hi Nick,

On Saturday 29 October 2005 23:21, Nick Parker wrote:

> After using rdiff for daily backups for the last couple weeks, I've run
> into the problem described at
> http://cvs.lp.se/doc/rdiff-backup/FAQ.html#dir_not_empty
>
> It appears to be triggered by removal of a directory in the production
> copy, which creates problems when the backup copy tries to duplicate the
> removal.
>
> I am not able to perform the backup outside of NFS, as we are doing this
> in case of a local server failure while also conforming to existing
> internal network infrastructure. Is there a known workaround for this
> issue? Has it been resolved in the 1.1 series? If the answer to these is
> "no", are there alternative backup systems similar in functionality to
> rdiff-backup? (preferably with support for diffs of binary files, like
> what rdiff-backup uses)
>
> Thanks!

I'm pretty sure the NFS problem is actually a general problem, but which is
hidden by most kernels/filesystems. Actually, rdiff-backup does not close
some filehandles before it tries to delete the files. On most filesystems,
the files and also parent directories still can be deleted, though to the
user it only seems they are deleted - its still on the disk, only no *new*
filedescriptors can be opened. Old filedescriptors still have full access. On
nfs this is different since it goes over the network and since the
nfs-protocol up to v3 has no native filelocking (NFSv4 has native
filelocking, but I still didn't have the time to test it). So on NFS
directories, the client kernel can't hide the deleted file as on other
filesystems, but moves each file which is deleted, but which still has open
filedescriptors (fd), to .nfs* files. Those .nfs files can't be deleted until
the last fd with access to this file has been closed. Unfortunately of
course, also the parent directory can't be deleted then... *This is then the
acutual problem you will notice*

A few month ago I already tried to fix this, but did it wrong (due to my lack
of Python knowledge), unfortutely my lack of time (need to finish my Ph.D.
thesis as soon as possible, but there are still unsolved problems) prevented
further me to look deeper into it :( I will really do another attempt this or
the next weekend.
Well a workaround, yes there is one, mount your nfs-directory without locking
support (-olock on linux). But be carefull, other applications which may need
locking support, will have their own problems then.

Cheers,
        Bernd



--
Bernd Schubert
PCI / Theoretische Chemie
Universit├Ąt Heidelberg
INF 229
69120 Heidelberg



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Encountering the NFS "Directory not empty" error

Ben Escoto
>>>>> Bernd Schubert <[hidden email]>
>>>>> wrote the following on Sun, 30 Oct 2005 01:23:21 +0200

> I'm pretty sure the NFS problem is actually a general problem, but
> which is hidden by most kernels/filesystems. Actually, rdiff-backup
> does not close some filehandles before it tries to delete the
> files. On most filesystems, the files and also parent directories
> still can be deleted, though to the user it only seems they are
> deleted - its still on the disk, only no *new* filedescriptors can
> be opened. Old filedescriptors still have full access. On nfs this
> is different since it goes over the network and since the
> nfs-protocol up to v3 has no native filelocking (NFSv4 has native
> filelocking, but I still didn't have the time to test it). So on NFS
> directories, the client kernel can't hide the deleted file as on
> other filesystems, but moves each file which is deleted, but which
> still has open filedescriptors (fd), to .nfs* files. Those .nfs
> files can't be deleted until the last fd with access to this file
> has been closed. Unfortunately of course, also the parent directory
> can't be deleted then... *This is then the acutual problem you will
> notice*
I had considered the possibility that rdiff-backup was failing to
close files, but I spent some time looking through the code and
couldn't find any loose files.  Also from the reports I've gotten it
seems this error occurs inconsistently, and at different places, even
though rdiff-backup is single-threaded and purely deterministic.

But your paragraph explains the mechanism well, and suggests that it's
an rdiff-backup problem after all.

> A few month ago I already tried to fix this, but did it wrong (due
> to my lack of Python knowledge), unfortutely my lack of time (need
> to finish my Ph.D. thesis as soon as possible, but there are still
> unsolved problems) prevented further me to look deeper into it :( I
> will really do another attempt this or the next weekend.

>From my perspective, the hard part seems to be replicating the problem
consistently, and figuring out exactly which file(s) are not getting
closed.  If you can do this (which should be doable with no Python
knowledge), then it may be easy for me or someone else to fix it.

Good luck with your thesis :-)


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Encountering the NFS "Directory not empty" error

Bernd Schubert-6
Hello Ben,

> I had considered the possibility that rdiff-backup was failing to
> close files, but I spent some time looking through the code and
> couldn't find any loose files.  Also from the reports I've gotten it
> seems this error occurs inconsistently, and at different places, even
> though rdiff-backup is single-threaded and purely deterministic.
>
> But your paragraph explains the mechanism well, and suggests that it's
> an rdiff-backup problem after all.

I already checked my theory some time ago, by just putting a long sleep before
the rmdir exception. Then I looked into the directory and it showed .nfs
files. Furthermore /proc/{PID of rdiff-backup}/fd showed open filedescriptors
to those files.

>
> > A few month ago I already tried to fix this, but did it wrong (due
> > to my lack of Python knowledge), unfortutely my lack of time (need
> > to finish my Ph.D. thesis as soon as possible, but there are still
> > unsolved problems) prevented further me to look deeper into it :( I
> > will really do another attempt this or the next weekend.
>
> From my perspective, the hard part seems to be replicating the problem
> consistently, and figuring out exactly which file(s) are not getting
> closed.  If you can do this (which should be doable with no Python
> knowledge), then it may be easy for me or someone else to fix it.

Well, replicating is very easy, I think I can do this within seconds, I only
need one file in one dir. So I think its also easy to figure out which file
it is. From my point of view its difficult to find out the corresponding
open() in rdiff-backup. I already tried a python-debugger, but it always
fails with another exception (I believe to remember its an open() to a
non-existent file), I also still do not understand why it doesn't rise this
exception without a debugger. If you are also interested in this issue, I can
tell you tomorrow or on Tuesday the exact problem.
Another possibility would be do monitor all open() and close() calls, but I
think there are rather many of them and all of them would need a print or
somethink link that.

Another point I also still don't understand is how the deleted files reappear
in the directory on the next rdiff-backup run. In principle the .nfs files
are immediately deleted after rdiff-backup has rised its exception and closes
itself (that's why one usually can't see the .nfs files). In principle I
would expect rdiff-backup to succeed on the next run, but on the next run
those already deleted files reappear again :(

Sorry, today I again had no time to further care about it (its time to go to
sleep now), but we have a public holiday on Tuesday and I will try to look
into it agai tomorrow evening or on Tuesday.

>
> Good luck with your thesis :-)

Thanks, I really need it (we need to find a numerical workaround for a
mathematical singularity and now its seems only a small step is missing).

Cheers,
        Bernd

--
Bernd Schubert
PCI / Theoretische Chemie
Universit├Ąt Heidelberg
INF 229
69120 Heidelberg



_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki
Reply | Threaded
Open this post in threaded view
|

Re: Encountering the NFS "Directory not empty" error

Ben Escoto
>>>>> Bernd Schubert <[hidden email]>
>>>>> wrote the following on Mon, 31 Oct 2005 00:18:19 +0100
>
> I already checked my theory some time ago, by just putting a long
> sleep before the rmdir exception. Then I looked into the directory
> and it showed .nfs files. Furthermore /proc/{PID of rdiff-backup}/fd
> showed open filedescriptors to those files.
...
> Well, replicating is very easy, I think I can do this within seconds,
> I only need one file in one dir.

Ahh, interesting.  So can you just look at the file and tell me which
one it is?  Like if the backup directory "backup" looks like

backup/foo/somefile

and the new source directory "source" doesn't have the /foo directory
in it, and then you run

rdiff-backup source backup

does that cause the error all the time?  What is the file that's
hanging around, is it "somefile" itself?

> I already tried a python-debugger, but it always fails with another
> exception (I believe to remember its an open() to a non-existent
> file), I also still do not understand why it doesn't rise this
> exception without a debugger.

Dunno I don't use debuggers.

> Another possibility would be do monitor all open() and close()
> calls, but I think there are rather many of them and all of them
> would need a print or somethink link that.

Yes, that's definitely doable if you can reproduce the problem with
such a small data set.


--
Ben Escoto

_______________________________________________
rdiff-backup-users mailing list at [hidden email]
http://lists.nongnu.org/mailman/listinfo/rdiff-backup-users
Wiki URL: http://rdiff-backup.solutionsfirst.com.au/index.php/RdiffBackupWiki

attachment0 (196 bytes) Download Attachment