Duplicity hangs

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Duplicity hangs

duplicity-talk mailing list

This is a follow-on to a previous submission where duplicity (under duply) hangs. The duplicity hangs occurred on 7/11, 9/20 and 10/30. That is 71 days between #1 and #2, and 40 days between #2 and #3.


You can see the hang in this ps command:


$ ps wwaux|grep duply
transfe+  7764  0.0  0.0   4508   852 ?        Ss   Oct30   0:00 /bin/sh -c duply db_backup pre+bkp+post >> /var/log/duply
transfe+  7766  0.0  0.0  13056  3420 ?        S    Oct30   0:00 bash /usr/bin/duply db_backup pre+bkp+post
transfe+  8776  0.0  0.0 231436 31732 ?        Sl   Oct30   0:58 python2 /usr/bin/duplicity --name duply_db_backup --encrypt-key 11D39D41 --sign-key 11D39D41 --verbosity 2 --full-if-older-than 1W --exclude-filelist /home/transfers/.duply/db_backup/exclude /var/tmp/pg_dumps scp://[hidden email]//mnt/backup/backups/db3.labmed.uw.edu/db_backup

Duply and Duplicity have been running since yesterday morning - over 33 hours ago.

Next I noted that the disk space at the remote partition is at 37% - no problem there. The local partition has 11TB free at 2% usage. One responder suggested running strace. When I did that on the parent duplicity process I got this:


sudo strace -p8776
strace: Process 8776 attached
futex(0x19ce500, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff


and it just sat there. Then I did the same on the child process (LWP in ps) and got this:


sudo strace -p8781
strace: Process 8781 attached
restart_syscall(<... resuming interrupted poll ...>) = 0
poll([{fd=4, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=4, events=POLLIN}], 1, 100)   = 0 (Timeout)


repeating the poll timeout commands quickly. Occasionally, after about 150 poll timeouts, a sendto command would appear like this:


poll([{fd=4, events=POLLOUT}], 1, 100)  = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "It\372\261\306\277{V\256\374\351'f\231t\201\245Y\311\343\251`\333\246\224\314\366Q\200\327\331]"..., 64, 0, NULL, 0) = 64

and the repeating poll timeout commands would continue. The second argument to sendto, if you drop the high bit, starts with Itz1F?{V.|i'f^Y^A%YIc)`. Not very enlightening. I did this - sudo strace -f -p8776 - and it outputted the same kind of output as above, only it combined them and identified the PID on each line.

Eventually I would kill the strace command and it would detach normally.

So it looks like the duplicity process is waiting for its thread to complete, and the thread is polling on fd 4 and it keeps timing out. I don't know what file descriptor 4 is for. The exception is calling sendto which causes the poll command to not be a timeout. Does this make sense to anyone?

Tom Ekberg
Senior Computer Specialist, Lab Medicine
University of Washington Medical Center
work: (206) 598-8544
email: tekberg@...



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Duplicity hangs

duplicity-talk mailing list
Do you have any encryption or sign keys that need a password?  Is it supplied via the environment?

Do you have a log file?

...Thanks,
...Ken


On Tue, Oct 31, 2017 at 12:24 PM, Thomas W. Ekberg via Duplicity-talk <[hidden email]> wrote:

This is a follow-on to a previous submission where duplicity (under duply) hangs. The duplicity hangs occurred on 7/11, 9/20 and 10/30. That is 71 days between #1 and #2, and 40 days between #2 and #3.


You can see the hang in this ps command:


$ ps wwaux|grep duply
transfe+  7764  0.0  0.0   4508   852 ?        Ss   Oct30   0:00 /bin/sh -c duply db_backup pre+bkp+post >> /var/log/duply
transfe+  7766  0.0  0.0  13056  3420 ?        S    Oct30   0:00 bash /usr/bin/duply db_backup pre+bkp+post
transfe+  8776  0.0  0.0 231436 31732 ?        Sl   Oct30   0:58 python2 /usr/bin/duplicity --name duply_db_backup --encrypt-key 11D39D41 --sign-key 11D39D41 --verbosity 2 --full-if-older-than 1W --exclude-filelist /home/transfers/.duply/db_backup/exclude /var/tmp/pg_dumps scp://lmbackup@lmbackup.labmed.uw.edu//mnt/backup/backups/db3.labmed.uw.edu/db_backup

Duply and Duplicity have been running since yesterday morning - over 33 hours ago.

Next I noted that the disk space at the remote partition is at 37% - no problem there. The local partition has 11TB free at 2% usage. One responder suggested running strace. When I did that on the parent duplicity process I got this:


sudo strace -p8776
strace: Process 8776 attached
futex(0x19ce500, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff


and it just sat there. Then I did the same on the child process (LWP in ps) and got this:


sudo strace -p8781
strace: Process 8781 attached
restart_syscall(<... resuming interrupted poll ...>) = 0
poll([{fd=4, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=4, events=POLLIN}], 1, 100)   = 0 (Timeout)


repeating the poll timeout commands quickly. Occasionally, after about 150 poll timeouts, a sendto command would appear like this:


poll([{fd=4, events=POLLOUT}], 1, 100)  = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "It\372\261\306\277{V\256\374\351'f\231t\201\245Y\311\343\251`\333\246\224\314\366Q\200\327\331]"..., 64, 0, NULL, 0) = 64

and the repeating poll timeout commands would continue. The second argument to sendto, if you drop the high bit, starts with Itz1F?{V.|i'f^Y^A%YIc)`. Not very enlightening. I did this - sudo strace -f -p8776 - and it outputted the same kind of output as above, only it combined them and identified the PID on each line.

Eventually I would kill the strace command and it would detach normally.

So it looks like the duplicity process is waiting for its thread to complete, and the thread is polling on fd 4 and it keeps timing out. I don't know what file descriptor 4 is for. The exception is calling sendto which causes the poll command to not be a timeout. Does this make sense to anyone?

Tom Ekberg
Senior Computer Specialist, Lab Medicine
University of Washington Medical Center
work: <a href="tel:(206)%20598-8544" value="+12065988544" target="_blank">(206) 598-8544
email: tekberg@...



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk
Reply | Threaded
Open this post in threaded view
|

Re: Duplicity hangs

duplicity-talk mailing list

Ken,


Unfortunately I don't have a useful log file. I unfortunately I have 3 duplys running via cron at the same time, so the log lines are intermixed. I changed the cron entries to stagger the runs to fix this problem.


Also, the duply logs (cron writes >> to /var/log/duply) have a lot of timestamps but few have the Y/M/D date. I changed our duply to use datefull_from_nsec instead of date_from_nsec to output the full date more often. The duply development version has the same problem.


Tom Ekberg
Senior Computer Specialist, Lab Medicine
University of Washington Medical Center
work: (206) 598-8544
email: tekberg@...





From: [hidden email] <[hidden email]> on behalf of Kenneth Loafman <[hidden email]>
Sent: Wednesday, November 1, 2017 5:14 AM
To: Discussion about duplicity backup
Cc: Thomas W. Ekberg
Subject: Re: [Duplicity-talk] Duplicity hangs
 
Do you have any encryption or sign keys that need a password?  Is it supplied via the environment?

Do you have a log file?

...Thanks,
...Ken


On Tue, Oct 31, 2017 at 12:24 PM, Thomas W. Ekberg via Duplicity-talk <[hidden email]> wrote:

This is a follow-on to a previous submission where duplicity (under duply) hangs. The duplicity hangs occurred on 7/11, 9/20 and 10/30. That is 71 days between #1 and #2, and 40 days between #2 and #3.


You can see the hang in this ps command:


$ ps wwaux|grep duply
transfe+  7764  0.0  0.0   4508   852 ?        Ss   Oct30   0:00 /bin/sh -c duply db_backup pre+bkp+post >> /var/log/duply
transfe+  7766  0.0  0.0  13056  3420 ?        S    Oct30   0:00 bash /usr/bin/duply db_backup pre+bkp+post
transfe+  8776  0.0  0.0 231436 31732 ?        Sl   Oct30   0:58 python2 /usr/bin/duplicity --name duply_db_backup --encrypt-key 11D39D41 --sign-key 11D39D41 --verbosity 2 --full-if-older-than 1W --exclude-filelist /home/transfers/.duply/db_backup/exclude /var/tmp/pg_dumps scp://lmbackup@lmbackup.labmed.uw.edu//mnt/backup/backups/db3.labmed.uw.edu/db_backup

Duply and Duplicity have been running since yesterday morning - over 33 hours ago.

Next I noted that the disk space at the remote partition is at 37% - no problem there. The local partition has 11TB free at 2% usage. One responder suggested running strace. When I did that on the parent duplicity process I got this:


sudo strace -p8776
strace: Process 8776 attached
futex(0x19ce500, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 0, NULL, ffffffff


and it just sat there. Then I did the same on the child process (LWP in ps) and got this:


sudo strace -p8781
strace: Process 8781 attached
restart_syscall(<... resuming interrupted poll ...>) = 0
poll([{fd=4, events=POLLIN}], 1, 100)   = 0 (Timeout)
poll([{fd=4, events=POLLIN}], 1, 100)   = 0 (Timeout)


repeating the poll timeout commands quickly. Occasionally, after about 150 poll timeouts, a sendto command would appear like this:


poll([{fd=4, events=POLLOUT}], 1, 100)  = 1 ([{fd=4, revents=POLLOUT}])
sendto(4, "It\372\261\306\277{V\256\374\351'f\231t\201\245Y\311\343\251`\333\246\224\314\366Q\200\327\331]"..., 64, 0, NULL, 0) = 64

and the repeating poll timeout commands would continue. The second argument to sendto, if you drop the high bit, starts with Itz1F?{V.|i'f^Y^A%YIc)`. Not very enlightening. I did this - sudo strace -f -p8776 - and it outputted the same kind of output as above, only it combined them and identified the PID on each line.

Eventually I would kill the strace command and it would detach normally.

So it looks like the duplicity process is waiting for its thread to complete, and the thread is polling on fd 4 and it keeps timing out. I don't know what file descriptor 4 is for. The exception is calling sendto which causes the poll command to not be a timeout. Does this make sense to anyone?

Tom Ekberg
Senior Computer Specialist, Lab Medicine
University of Washington Medical Center
work: <a href="tel:(206)%20598-8544" value="&#43;12065988544" target="_blank">(206) 598-8544
email: tekberg@...



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk



_______________________________________________
Duplicity-talk mailing list
[hidden email]
https://lists.nongnu.org/mailman/listinfo/duplicity-talk