Message boards : Questions and problems : Stop switching between WU
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 26 Jun 09 Posts: 8 ![]() |
I re-downloaded 6.6.36 and reset the project and it's running normally, i.e., running 2 cpu WUs and just 1 gpu task. 3hours has passed and I've already 3 cuda WUs finished. |
Send message Joined: 26 Jun 09 Posts: 8 ![]() |
24hours and still running good (only 1 cuda task) after re-download of 6.6.36 and seti project reset. I set the contact/stock preferences to 1/3 and it has already fetched more work for either cpu and gpu and this didn't bothered the processing sequence till now. And the numbers already came back to the rising curve. My set up includes 1 hour outage everyday and the outage was also surpassed w/o any troubles. |
![]() Send message Joined: 8 Aug 08 Posts: 570 ![]() |
I got more problems not less. Now I got the problem on a computer with less than 1/2 day of WU left <400 It had about 30 CUDA WU waiting, and 7 in memory so that caused 1 of them to go in fallback mode. Had to do a reboot. The WU that are still in waiting mode finish and report ok when they eventually get done that is. I am going to fight this... Got 2 tactics : 1) Check the GPU temperature -> if it goes below 58C the system will reboot solving the problem. 2) Check for running CUDA programs -> if there are more than allowed, in this case 2 the system will reboot, solving the problem. Written all of this in a program, hopefully this solves things. This is only for unattended system of course.... For the other system I send an email instead of doing the reboot, so I can decide what to do. Because when the system reboots and runs again everything is fine for a long long time. |
Send message Joined: 26 Jun 09 Posts: 8 ![]() |
24hours and still running good (only 1 cuda task) after re-download of 6.6.36 and seti project reset. I set the contact/stock preferences to 1/3 and it has already fetched more work for either cpu and gpu and this didn't bothered the processing sequence till now. And the numbers already came back to the rising curve. My set up includes 1 hour outage everyday and the outage was also surpassed w/o any troubles. And then, without any interference, the boinc jumped from a first gpu task to a second one living the first at 20s from finish . Now I have 2 on the way and this means that Boinc is already into the failure mode. It happened 2 seconds after Boinc started a new cpu task and it left a message about starting the second task BUT NO MESSAGE about the interruption of the first. This second wu was NOT recently loaded and was in the list for 6 hours laready, date limit jul 05th, 5:48:31 PM and the first one jul 05th, 6:20:00 PM. AND right now, while I was observing it, when the second one reached 14s form finishing boinc jumped back to the first one, living NO MESAGE AT ALL(I was with my hands off, just watching and no other operation was in curse). At re-start of the first one, the remaining time jumped back to 1'30" fom finish. And 20s after this, just 1s after the cpu has started a new cpu one, the Boinc left the first one at 1'10" from finish and started a third one, living a message about the start of this third one (that was already in the list for hours and has a date limit of jul 21st!) and no message about the interrption. It's like the GPU followed the CPU. NOW I HAVE 3 GPU tasks in curse. The first and the second cuda units are as short as 20 minutes and the third 1h07. This third finished and boinc started a forth one, date jul21st also, I have still 3 on the way. This is the failure mode or one of the failure modes, at least. Hope the description helps, Eduardo |
Send message Joined: 26 Jun 09 Posts: 8 ![]() |
24hours and still running good (only 1 cuda task) after re-download of 6.6.36 and seti project reset. I set the contact/stock preferences to 1/3 and it has already fetched more work for either cpu and gpu and this didn't bothered the processing sequence till now. And the numbers already came back to the rising curve. My set up includes 1 hour outage everyday and the outage was also surpassed w/o any troubles. Left alone all day long, boinc has recovered somehow, it is running just 1 cuda wu at a time and the performance of the day was just fine. |
![]() Send message Joined: 8 Aug 08 Posts: 570 ![]() |
Whatever it does it does, occasionally poking in the checkin list, you find that the GPU scheduling modifications is not finished yet: TThrottle For those interested TThrottle can alert you, send an email or restart the system, when this happens...... In Rules add: if gpu number > 2 email In the Programs tab Active must be checked! |
Send message Joined: 23 Apr 07 Posts: 1112 ![]() |
There's an Echo in here. Claggy |
Send message Joined: 26 Jun 09 Posts: 8 ![]() |
24hours and still running good (only 1 cuda task) after re-download of 6.6.36 and seti project reset. I set the contact/stock preferences to 1/3 and it has already fetched more work for either cpu and gpu and this didn't bothered the processing sequence till now. And the numbers already came back to the rising curve. My set up includes 1 hour outage everyday and the outage was also surpassed w/o any troubles. After a week, boinc got back to problems, running more than one cuda unit. It's happening for more than a week now and, it's not only a impression, I took many hours in observing the boinc behaviour and I found that: this behavior starts when processing a "BAD UNIT", i.é, a unit boinc is not capable to properly unwind for some reason. This cause the boinc to start jumping from one unit to another and starting a new one and so. As soon as you abort the unit, it returns to normal behavior(sometimes there are more than one bad unit that need to be killed). An additional side effect is that while the problems with the bad unit endures in the GPU, it cause the computer almost to paralize, reducing the cpu performance on cpu units and other customer requests as well. The last cuda unit I'd killed was 14dc08ab.29925.2526.8.8.212_2. It had reached just 1,4% after more than an hour of procesing and it had paralised the boinc. |
Send message Joined: 26 Jun 09 Posts: 8 ![]() |
24hours and still running good (only 1 cuda task) after re-download of 6.6.36 and seti project reset. I set the contact/stock preferences to 1/3 and it has already fetched more work for either cpu and gpu and this didn't bothered the processing sequence till now. And the numbers already came back to the rising curve. My set up includes 1 hour outage everyday and the outage was also surpassed w/o any troubles. And a missing information: during these periods of almost paralize, the explorer uses up to 50% of cpu time. |
![]() Send message Joined: 8 Aug 08 Posts: 570 ![]() |
On risk of echoing... think they did something with 6.6.37 and GPU crunching... it's alpha... well 6.6.36 was really alpha, but that's probably an echoed statement too ;>)I'm testing that one now, but there is not enough work.. Yet, to get these kind of problems. |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.