Thread 'Heartbeat problem again ...'

Message boards : BOINC client : Heartbeat problem again ...
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileAnanas

Send message
Joined: 27 Jun 06
Posts: 305
Germany
Message 15300 - Posted: 9 Feb 2008, 0:13:37 UTC
Last modified: 9 Feb 2008, 0:20:24 UTC

It seems that ACTIVE_TASK::copy_output_files() can - for whatever reason - take some time now and then.

This can be more than 30 seconds, so maybe it would be a good idea to put a poll() inside of the loop.

I haven't checked for side effects, I guess there are ppl. who know that better than me :-)

I got this impression in this situation :

09-Feb-2008 [b]00:29:33[/b] [Cosmology@Home] ... [i]Assuming that copy_output_files() started here[/i]
09-Feb-2008 [b]00:29:39[/b] [Cosmology@Home] [error] Can't rename output file wu_020608_013241_1_1_0
09-Feb-2008 00:29:44 [Cosmology@Home] [error] Can't rename output file wu_020608_013241_1_1_1
09-Feb-2008 00:29:50 [Cosmology@Home] [error] Can't rename output file wu_020608_013241_1_1_2
09-Feb-2008 00:29:56 [Cosmology@Home] [error] Can't rename output file wu_020608_013241_1_1_3
09-Feb-2008 00:30:01 [Cosmology@Home] [error] Can't rename output file wu_020608_013241_1_1_4
09-Feb-2008 00:30:07 [Cosmology@Home] [error] Can't rename output file wu_020608_013241_1_1_5
09-Feb-2008 [b]00:30:07[/b] [boincsimap] Task 8020203.072949_0 exited with zero status but no 'finished' file


This happened on a machine with 2 CPUs so the tasks should be quite independant, but the two problems occured within about 30 seconds (6 seconds for the start of the first file added).
ID: 15300 · Report as offensive
ProfileAnanas

Send message
Joined: 27 Jun 06
Posts: 305
Germany
Message 15302 - Posted: 9 Feb 2008, 0:45:37 UTC
Last modified: 9 Feb 2008, 0:51:46 UTC

Reason for copy_output_files() taking some time :

boinc_rename() uses boinc_delete_file() which calls boinc_sleep(drand()*2) after a failed rename.

::Sleep() with a value of 0 gives control to any other process that wants to run and returns only if no other process requests time.

What about using boinc_sleep(drand()*1.5+0.2) instead (or similar)?

Or just always add 0.1 in boinc_sleep() in order to make sure that it is always >0


poll() should still be used in this output files rename loop though, changing boinc_sleep() alone will not change the behaviour described above.
ID: 15302 · Report as offensive

Message boards : BOINC client : Heartbeat problem again ...

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.