Thread 'Work fetch Scheduling feature'

Message boards : Questions and problems : Work fetch Scheduling feature
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileGary Charpentier
Avatar

Send message
Joined: 23 Feb 08
Posts: 2516
United States
Message 53526 - Posted: 5 Apr 2014, 22:35:49 UTC

A feature as reported by the marketing department.

Core i7, eight cores, had cores idle. Here is what I surmise happened.
I had an Einstein GPU task running and more in the queue. Had a Test4Theroy VM task running high priority. Had several collatz multi-core tasks in the queue. Even with cores idle BOINC was not requesting work. Obviously it "knew" it had enough work even though it could have squeezed in several single core tasks while it waited to run the collatz tasks. In this case the multi-core tasks blocked progress that could have been made on other tasks. Should never happen! Scheduling is far too well researched to allow such a feature into the code.

I suspect the entire work fetch scheduling part needs a rewrite from a blank slate. Perhaps a request work from project of up to x cores. This would also help with the overcommitment issue when GPU work takes a fraction of a core. I don't think bandaids on top of parches will work this time.
ID: 53526 · Report as offensive
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 53527 - Posted: 5 Apr 2014, 23:29:31 UTC - in response to Message 53526.  

Obviously it "knew" it had enough work even though it could have squeezed in several single core tasks while it waited to run the collatz tasks. In this case the multi-core tasks blocked progress that could have been made on other tasks. Should never happen! Scheduling is far too well researched to allow such a feature into the code.

I suspect the entire work fetch scheduling part needs a rewrite from a blank slate. Perhaps a request work from project of up to x cores. This would also help with the overcommitment issue when GPU work takes a fraction of a core. I don't think bandaids on top of parches will work this time.

I am glad you asked. This has been discussed before, by people who know a lot more about it than I do, except that I know when things don't work.
http://boinc.berkeley.edu/dev/forum_thread.php?id=8511
http://boinc.berkeley.edu/dev/forum_thread.php?id=8579

It would seem to the uninitiated that with the relatively large number of CPU cores available these days, you could just assign a given project to each core, with one or more backups in case the primary is out of work. Otherwise, you could ditch the whole scheduling scheme and become more productive for everybody, both researchers and users.
ID: 53527 · Report as offensive
ProfileGary Charpentier
Avatar

Send message
Joined: 23 Feb 08
Posts: 2516
United States
Message 53529 - Posted: 6 Apr 2014, 2:32:13 UTC - in response to Message 53527.  

Thank you, but I believe this is a different feature than addressed in those threads.

Here BOINC saw that it had enough total work in the queue, the 8 core collatz jobs, and calculated that no more was required. However its calculation did not take into account that cores were free because the test4theroy job was in EDF mode and the collatz jobs could not start until it was finished. Thus a resource was being wasted.

Scheduling algorithms are far too well understood for this to be an issue. As I suspect it was originally written with an assumption that all jobs are single core and a hack was added to allow for multicore jobs. The hack needs to come out and a solid well researched algorithm replace it.

I know the scheduler is the most complex part of BOINC. And I am sure fixing it will require changes both to the client and the server. However as more and more projects use parallelism, this will become a major issue if left as a feature.
ID: 53529 · Report as offensive
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 53531 - Posted: 6 Apr 2014, 19:09:15 UTC - in response to Message 53529.  
Last modified: 6 Apr 2014, 19:11:39 UTC

Thank you, but I believe this is a different feature than addressed in those threads.

Here BOINC saw that it had enough total work in the queue, the 8 core collatz jobs, and calculated that no more was required. However its calculation did not take into account that cores were free because the test4theroy job was in EDF mode and the collatz jobs could not start until it was finished. Thus a resource was being wasted.

I probably did not understand your particular issue well enough, but they are all variations on the same theme insofar as I am concerned. Another version of it is that when you have a cc_config.xml file to assign work from different projects to different GPUs, the scheduling does not take that into account properly, and you end up with a mess. I am facing that now with running GPUGrid and Einstein and trying to keep them both fed, but not overfed in order to qualify for the bonus points on GPUGrid. If I had a separate scheduler for each card, then I could adjust it properly. But the BOINC scheduler wants to do it all for you. I think it is a result of trying to be "fair" with each project, and allowing you to assign 49.873% of your time to one, and 23.874% to another, while who-knows-what takes the rest. It is all over-engineered so that nothing works right.

The only solution I have found is to keep it all very simple, and assign only once CPU project and one GPU project per machine for example, which is probably the opposite of what they were trying to achieve with all of the options in BOINC.
ID: 53531 · Report as offensive
Jim1348

Send message
Joined: 8 Nov 10
Posts: 310
United States
Message 53533 - Posted: 7 Apr 2014, 0:18:40 UTC - in response to Message 53532.  
Last modified: 7 Apr 2014, 0:19:55 UTC

The only solution I have found is to keep it all very simple, and assign only once CPU project and one GPU project per machine for example, which is probably the opposite of what they were trying to achieve with all of the options in BOINC.


ok this may not be the best. but you could fire up more than one boinc client and attach a mix to each client.
using app_config.xml and various setting for the cc_config.xml

Thanks, but I have looked into starting up more than one BOINC client, and the cure seems to be worse than the disease. However, for me it was not that big of a deal once I recognized the problem, since I can just do WCG on the CPU, and let them take care of the scheduling for me. But I have come to grief when I attached to other CPU projects (e.g., Climate Prediction Net) that have widely different work unit times from anything else that is running. BOINC just does not handle that the way I want to, but the way it wants to (frequently without work on some projects unfortunately). And I have managed to achieve a rough compromise in juggling GPU projects (GPUGrid and Einstein) on the same machine, though I may end up putting them on separate machines eventually.

It should not be that much of a problem, but it seems that BOINC just tries to do too much, and takes too much control out of your hands. They are trying to make it easy on me, the bane of my life.
ID: 53533 · Report as offensive
ProfileGary Charpentier
Avatar

Send message
Joined: 23 Feb 08
Posts: 2516
United States
Message 53535 - Posted: 7 Apr 2014, 5:29:35 UTC - in response to Message 53533.  

Scheduling is a black art. If it looks like a human designed it, then you can be assured it is wasteful. If it looks totally insane, then it likely is close to an optimum solution.

Take an example which everyone can relate. Told to switch every hour, the time quantum. Task A has another 2 minutes to be done when that hour is up. A human would let it finish. A computer will switch to another task. Is that right? As task A runs the project A is for is lowering its run priority compared to all other projects. So at the switch time another project / task now has a higher priority. A human would let it finish then switch, an emotional choice. But the human can actually screw up the works doing that. By not switching suppose the now higher priority project doesn't get enough run time before deadline? Oops.

BOINC does a good job on the CPU as long as all the projects are single core projects. It seems to have issues when there are multi-core jobs for the CPU. The issues seem to arise when there is a mix of jobs needing differing numbers of cores. BOINC doesn't seem able to calculate that. Of course there could be the degenerate case, only 3 core jobs available on an 8 core machine, leaving two idle. However when attached to at least one project with single core jobs it should never leave a core idle.

I suspect the client and server need to exchange more information in the work fetch, such as number of cores available, available run time and deadline so BOINC can calculate an optimal flow and fetch the right tasks.
ID: 53535 · Report as offensive
Plaque FCC

Send message
Joined: 5 May 14
Posts: 1
Message 53964 - Posted: 5 May 2014, 22:00:22 UTC

Hi!

I run projects with different priorities and completely different continuousity of task, but all of them run in single threads. And fetching shorter tasks from WCG stopped when there are 3 (on 4+1 CPU+GPU system) tasks from hi-pri Collatz project.

I tried adding this to cc_config.xml file:
<cc_config>
 <options>
   <rec_half_life_days>20</rec_half_life_days>
 </options>
</cc_config>


and it likely helped, knowing that by default scheduler assumes 10 days horizon, which is being rather exceeded by Collatz tasks.

How do you think, am I doing wrong doing so, or is it an adequate solution for this kind of case?
ID: 53964 · Report as offensive

Message boards : Questions and problems : Work fetch Scheduling feature

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.