Message boards : BOINC Manager : BOINC scheduler explanation
Message board moderation
Author | Message |
---|---|
![]() ![]() Send message Joined: 30 Oct 05 Posts: 1239 ![]() |
Why does it seem that 5.8.x fetches less work than previous versions of BOINC? Read on for the explanation from John McLeod VII... source OK, the primary goal is to not report work late. The goal of keeping the CPU busy is of much lower priority. therefore, the CPU scheduler is working correctly as designed. Work has to be finished before the connection before it is actually due. This might be as long as connect every X before the report deadline. The scheduler also to ensure that the task is not actually processing at the time of the connection - hence the slop factors of one day and project switch interval. the rr simulator also takes into account the estimated processing time remaining on all tasks. It is only projects that have a task that is in deadline trouble that are barred from downloading work. If you are attached to several projects and there are some projects that are not in deadline trouble, those projects will fetch work until the queue is filled or they have a task in deadline trouble. As was stated earlier, the computation deadline for all work is: Report deadline - (connect every + project switch time + 1 day). If you are really disconnected most of the time, this is required in order to ensure that the report is made on time. If you are really disconnected for most of the time, projects with short deadlines may not be appropriate. If you are really connected all of the time, then a short queue is probably sufficient. 3 days of work + 3 days of disconnected + 1 day + 1 hour > 7 days. You ought to be able to use a 3 day queue for a project with an 8 day deadline. But not a 4 day queue (4 days of work + 4 days of disconnected + 1 day + 1 hour > 9 days). Work fetch and the CPU scheduler have to assume that the worst case can happen - i.e. there will be a connect every X days gap in the connection for the entire period specified by X prior to the report deadline so the work has to be completed before that. The notice that a project has N deadline misses is an indication that the host is entering EDF mode for N CPUs for that project. If you get the same notice for more than one project, then sum all of the numbers to get the number of CPUs that are being dedicated to EDF. Kathryn :o) |
Send message Joined: 19 Mar 07 Posts: 7 |
It is only projects that have a task that is in deadline trouble that are barred from downloading work.NOT SO. A computer here is only running one project, and is on the last workunit, not due until 3/23. It should finish in about 3 hours. I have to babysit the computer to make sure it is connected when the queue empties, so it will download more workunits and not sit around doing nothing. |
![]() ![]() Send message Joined: 30 Oct 05 Posts: 1239 ![]() |
I'm only quoting what John has said.... There is an option that you can set in the cc_config.xml file... You want <options> <work_request_factor>n</work_request_factor> </options> where n is from 1 to 10. The page I linked to says it only works with 5.10+ but I know of at least one person who's using it with a 5.8.x client. I've got an email out to Rom to clarify what version of the core client is needed. Kathryn :o) |
Send message Joined: 19 Mar 07 Posts: 7 |
OK, the primary goal is to not report work late.Let's look at that, also. Downloading all of the workunits at once can put the last workunit in jeopardy, since they all have the same due date. With 5.4.X, a few were downloaded at a time, and the due date also showed a progression. By the time the same number of workunits was downloaded, some had already run and one or several days had passed. The nth workunit downloaded had a later due date. On the average, all downloaded workunits had nearly a complete range of time to finish, not a descending range as they run, requiring increased scrutiny until they all run. Chances of blowing the due date were slim unless there were other factors like power outages, etc. There was still more chance to complete all workunits, since built in buffer time remained about the same. Now, the extra time ranges from a lot to not much, causing the new need for an advanced "keep things from being late" algorithm. |
Send message Joined: 19 Jan 07 Posts: 1179 ![]() |
OK, the primary goal is to not report work late.Let's look at that, also. Downloading all of the workunits at once can put the last workunit in jeopardy, since they all have the same due date. With 5.4.X, a few were downloaded at a time, and the due date also showed a progression. By the time the same number of workunits was downloaded, some had already run and one or several days had passed. The nth workunit downloaded had a later due date. The due date has nothing to do with the client, it's set by the project on the server. |
![]() ![]() Send message Joined: 30 Oct 05 Posts: 1239 ![]() |
.... The page I linked to says it only works with 5.10+ but I know of at least one person who's using it with a 5.8.x client. Well.... here's Rom's answer. That feature hasn’t made it out into a build released by Berkeley yet. If he is using it, he probably built the client himself. So I dunno. Kathryn :o) |
![]() ![]() Send message Joined: 30 Oct 05 Posts: 1239 ![]() |
Yup. That's the flag we're talking about. I dunno... I just relay messages. Kathryn :o) |
Send message Joined: 19 Mar 07 Posts: 7 |
The due date has nothing to do with the client, it's set by the project on the server.Well, yes. I know. Things with this scheduler are so much different. I was running a project, and only one project, with a due date of 7 days (set by the project.) 5.5 days had a lot in the queue, but allowed me to go away or ignore it for a few days. The 5.8 scheduler had the empty the queue problem. This one seems to be counterintuitive, but may need some tweaking. I changed the network value to 1.0 days. One computer only got 6 hours of work and it stayed that way for a couple of days...no more than 6 hours of work. I changed the network value to 2.5 days and will go with that for awhile. I do seem to get another workunit now and then. That's what I'm trying to get to...enough work in the queue so that I don't have to connect at specific times to keep 4 computers busy. At least three computers have 28 hours of estimated work. It's not 2.5 days, but it's starting to be a reasonable amount. When mixing projects, they seem to work better when the due dates are about the same. One week projects just sit there if some fast running due-in-a-day projects come in. The climate simulation projects really can't be mixed with anything, since the due date can be over a year out. |
![]() Send message Joined: 29 Aug 05 Posts: 304 ![]() |
Work Buffer in the Wiki. With a 7 day deadline the max queue size is 2.82 days. If you set it any higher you will run into deadline problems and normally have less work on the host waiting to run. The mixture of projects does not really make much difference, however you will need to figure based on the shortest deadline. BOINC will take the percentages into account and maintain appoximately you desired amount of work on hand in total. Thank god it is no longer a full queue per project, I can actually set a desired queue now instead of having to set it to nothing. One thing that can cause confusion is if your computer is not on all the time or there are high CPU usage programs running on the host. BOINC does take this into account when figuring your queue. If your host is not available more than 50% of the time it will look like it is only getting half of the work you set. BOINC WIKI ![]() ![]() BOINCing since 2002/12/8 |
![]() Send message Joined: 2 Mar 06 Posts: 12 ![]() |
Unfortunately, the behavior I am seeing on my multi-core system is not what is described above. I believe that that is what is intended, but I think there is something wrong. The symptoms I see are described in this post: http://boinc.berkeley.edu/dev/forum_thread.php?id=2140 In rereading my original post, I realize that some of the factors I list as being potential causes cannot be. For example both my single-core system (not always on) and my dual core system (always on) are running CPDN, and LHC, but my single-core system fetches work just fine. Currently my dual-core system has the following tasks, both of them running: a CPDN workunit with a deadline of 2/3/08, and 800 hours 'to completion' a Rosetta workunit with a deadline 10/2/07 and 3 hours remaining. In this case, I have significantly less than two-day's work on my second cpu, and according to the calculation above (3 hours work + 1 day disconnected + 2 hours project switch time + 1 day = 2 days 5 hours, or less than the shortest deadline of any project I conenct to, 7 days) I should be requesting work, yet no work requests will occur until the RAH WU is finished. As you may know, the length of RAH workunits is very stable. Also, the cpdn model will have no trouble completing in the allotted time. |
![]() Send message Joined: 2 Mar 06 Posts: 12 ![]() |
Please disregard my previous post, I have isolated the issue to another problem unrelated to the scheduling algorithm. http://boinc.berkeley.edu/dev/forum_thread.php?id=2154 |
Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License,
Version 1.2 or any later version published by the Free Software Foundation.