CloseConnections doesn't wait for child processes to die

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

CloseConnections doesn't wait for child processes to die

Trevor Cordes
If I run rdiff-backup with a complicated ssh connection string that
does some housekeeping stuff after, and it's running on a slow remote
box, rdiff-backup will exit while the ssh ps is still running for .5 to
1.5 seconds.  If I check with ps command at the right time I can see
that the ssh command has been taken over by ps 1 as its ppid.

CloseConnections() I guess isn't wait()ing on the children.

This is a problem for me as I like to capture the output from my
cleanup stuff in my script that is handling all of this.  That output
is lost to nowheresville if it isn't spit out in a short enough time
(before pid 1 takes over ssh).

Took me forever to figure out why some runs I got this output and some
I didn't.  Depends on how fast and loaded the remote computer is.

I managed to fix it by hacking in 5 lines of code in and to remember the process handle along with the pipe
handles, and in connection._close doing a wait() on the handle.  Now it
works perfectly, rdiff properly waits for its children before exiting,
not stranding any ps (or output!) with pid 1.

It would be great to get this tweak into the official version!  I can't
see any downside.

The Python docs on Popen says wait could deadlock, but since the
rdiff-backup protocol is so precise, I doubt there could ever be
remaining data on the pipe at that point.  Otherwise they say to use
communicate() but that seems more complicated than what I'm after.  (And
Python is not my language of choice nor maximum aptitude.)

rdiff-backup-users mailing list at [hidden email]
Wiki URL: