Thread 'BOINC scheduler explanation'

Author	Message
KSMarksPsych Send message Joined: 30 Oct 05 Posts: 1239	Message 8867 - Posted: 19 Mar 2007, 3:10:42 UTC Why does it seem that 5.8.x fetches less work than previous versions of BOINC? Read on for the explanation from John McLeod VII... source OK, the primary goal is to not report work late. The goal of keeping the CPU busy is of much lower priority. therefore, the CPU scheduler is working correctly as designed. Work has to be finished before the connection before it is actually due. This might be as long as connect every X before the report deadline. The scheduler also to ensure that the task is not actually processing at the time of the connection - hence the slop factors of one day and project switch interval. the rr simulator also takes into account the estimated processing time remaining on all tasks. It is only projects that have a task that is in deadline trouble that are barred from downloading work. If you are attached to several projects and there are some projects that are not in deadline trouble, those projects will fetch work until the queue is filled or they have a task in deadline trouble. As was stated earlier, the computation deadline for all work is: Report deadline - (connect every + project switch time + 1 day). If you are really disconnected most of the time, this is required in order to ensure that the report is made on time. If you are really disconnected for most of the time, projects with short deadlines may not be appropriate. If you are really connected all of the time, then a short queue is probably sufficient. 3 days of work + 3 days of disconnected + 1 day + 1 hour > 7 days. You ought to be able to use a 3 day queue for a project with an 8 day deadline. But not a 4 day queue (4 days of work + 4 days of disconnected + 1 day + 1 hour > 9 days). Work fetch and the CPU scheduler have to assume that the worst case can happen - i.e. there will be a connect every X days gap in the connection for the entire period specified by X prior to the report deadline so the work has to be completed before that. The notice that a project has N deadline misses is an indication that the host is entering EDF mode for N CPUs for that project. If you get the same notice for more than one project, then sum all of the numbers to get the number of CPUs that are being dedicated to EDF. Kathryn :o) ID: 8867 ·

retsof Send message Joined: 19 Mar 07 Posts: 7	Message 8895 - Posted: 19 Mar 2007, 17:25:42 UTC Last modified: 19 Mar 2007, 17:26:59 UTC It is only projects that have a task that is in deadline trouble that are barred from downloading work. NOT SO. A computer here is only running one project, and is on the last workunit, not due until 3/23. It should finish in about 3 hours. I have to babysit the computer to make sure it is connected when the queue empties, so it will download more workunits and not sit around doing nothing. ID: 8895 ·

KSMarksPsych Send message Joined: 30 Oct 05 Posts: 1239	Message 8896 - Posted: 19 Mar 2007, 17:50:56 UTC ly quoting what John has said.... There is an option that you can set in the cc_config.xml file... You want [pre]<options> <work_request_factor>n</work_request_factor> </options>[/pre] where n is from 1 to 10. The page I linked to says it only works with 5.10+ but I know of at least one person who's using it with a 5.8.x client. I've got an email out to Rom to clarify what version of the core client is needed. Kathryn :o) ID: 8896 ·

retsof Send message Joined: 19 Mar 07 Posts: 7	Message 8902 - Posted: 19 Mar 2007, 21:12:18 UTC Last modified: 19 Mar 2007, 21:17:24 UTC OK, the primary goal is to not report work late. Let's look at that, also. Downloading all of the workunits at once can put the last workunit in jeopardy, since they all have the same due date. With 5.4.X, a few were downloaded at a time, and the due date also showed a progression. By the time the same number of workunits was downloaded, some had already run and one or several days had passed. The nth workunit downloaded had a later due date. On the average, all downloaded workunits had nearly a complete range of time to finish, not a descending range as they run, requiring increased scrutiny until they all run. Chances of blowing the due date were slim unless there were other factors like power outages, etc. There was still more chance to complete all workunits, since built in buffer time remained about the same. Now, the extra time ranges from a lot to not much, causing the new need for an advanced "keep things from being late" algorithm. ID: 8902 ·

Nicolas Send message Joined: 19 Jan 07 Posts: 1179	Message 8919 - Posted: 20 Mar 2007, 14:37:24 UTC - in response to Message 8902. OK, the primary goal is to not report work late. Let's look at that, also. Downloading all of the workunits at once can put the last workunit in jeopardy, since they all have the same due date. With 5.4.X, a few were downloaded at a time, and the due date also showed a progression. By the time the same number of workunits was downloaded, some had already run and one or several days had passed. The nth workunit downloaded had a later due date. On the average, all downloaded workunits had nearly a complete range of time to finish, not a descending range as they run, requiring increased scrutiny until they all run. Chances of blowing the due date were slim unless there were other factors like power outages, etc. There was still more chance to complete all workunits, since built in buffer time remained about the same. Now, the extra time ranges from a lot to not much, causing the new need for an advanced "keep things from being late" algorithm. The due date has nothing to do with the client, it's set by the project on the server. ID: 8919 ·

KSMarksPsych Send message Joined: 30 Oct 05 Posts: 1239	Message 8932 - Posted: 20 Mar 2007, 20:18:56 UTC - in response to Message 8903. .... The page I linked to says it only works with 5.10+ but I know of at least one person who's using it with a 5.8.x client. I've got an email out to Rom to clarify what version of the core client is needed. Make it 2 you 'know' and it works on 5.8.15/16 and it can use fractions unless it's rounding it to whole numbers when reading the value. Using BOINCview 1.4.2, there is now a full palette of all the options that can be set in the cc_config.xml. It suggest 6 decimals. If you hover over the vaarious options it actually tells from which BOINC version, point release the options are effective... little other known is the checkpoint_debug option that logs them savepoints in the message tab.... a great addition. Well.... here's Rom's answer. That feature hasn’t made it out into a build released by Berkeley yet. If he is using it, he probably built the client himself. ----- Rom So I dunno. Kathryn :o) ID: 8932 ·

KSMarksPsych Send message Joined: 30 Oct 05 Posts: 1239	Message 8939 - Posted: 21 Mar 2007, 2:24:56 UTC Yup. That's the flag we're talking about. I dunno... I just relay messages. Kathryn :o) ID: 8939 ·

retsof Send message Joined: 19 Mar 07 Posts: 7	Message 8971 - Posted: 21 Mar 2007, 21:15:14 UTC - in response to Message 8919. Last modified: 21 Mar 2007, 21:17:55 UTC The due date has nothing to do with the client, it's set by the project on the server. Well, yes. I know. Things with this scheduler are so much different. I was running a project, and only one project, with a due date of 7 days (set by the project.) 5.5 days had a lot in the queue, but allowed me to go away or ignore it for a few days. The 5.8 scheduler had the empty the queue problem. This one seems to be counterintuitive, but may need some tweaking. I changed the network value to 1.0 days. One computer only got 6 hours of work and it stayed that way for a couple of days...no more than 6 hours of work. I changed the network value to 2.5 days and will go with that for awhile. I do seem to get another workunit now and then. That's what I'm trying to get to...enough work in the queue so that I don't have to connect at specific times to keep 4 computers busy. At least three computers have 28 hours of estimated work. It's not 2.5 days, but it's starting to be a reasonable amount. When mixing projects, they seem to work better when the due dates are about the same. One week projects just sit there if some fast running due-in-a-day projects come in. The climate simulation projects really can't be mixed with anything, since the due date can be over a year out. ID: 8971 ·

Keck_Komputers Send message Joined: 29 Aug 05 Posts: 304	Message 8974 - Posted: 21 Mar 2007, 22:40:44 UTC Work Buffer in the Wiki. With a 7 day deadline the max queue size is 2.82 days. If you set it any higher you will run into deadline problems and normally have less work on the host waiting to run. The mixture of projects does not really make much difference, however you will need to figure based on the shortest deadline. BOINC will take the percentages into account and maintain appoximately you desired amount of work on hand in total. Thank god it is no longer a full queue per project, I can actually set a desired queue now instead of having to set it to nothing. One thing that can cause confusion is if your computer is not on all the time or there are high CPU usage programs running on the host. BOINC does take this into account when figuring your queue. If your host is not available more than 50% of the time it will look like it is only getting half of the work you set. BOINC WIKI BOINCing since 2002/12/8 ID: 8974 ·

Uioped1 Send message Joined: 2 Mar 06 Posts: 12	Message 12696 - Posted: 22 Sep 2007, 19:05:11 UTC Unfortunately, the behavior I am seeing on my multi-core system is not what is described above. I believe that that is what is intended, but I think there is something wrong. The symptoms I see are described in this post: http://boinc.berkeley.edu/dev/forum_thread.php?id=2140 In rereading my original post, I realize that some of the factors I list as being potential causes cannot be. For example both my single-core system (not always on) and my dual core system (always on) are running CPDN, and LHC, but my single-core system fetches work just fine. Currently my dual-core system has the following tasks, both of them running: a CPDN workunit with a deadline of 2/3/08, and 800 hours 'to completion' a Rosetta workunit with a deadline 10/2/07 and 3 hours remaining. In this case, I have significantly less than two-day's work on my second cpu, and according to the calculation above (3 hours work + 1 day disconnected + 2 hours project switch time + 1 day = 2 days 5 hours, or less than the shortest deadline of any project I conenct to, 7 days) I should be requesting work, yet no work requests will occur until the RAH WU is finished. As you may know, the length of RAH workunits is very stable. Also, the cpdn model will have no trouble completing in the allotted time. ID: 12696 ·

Uioped1 Send message Joined: 2 Mar 06 Posts: 12	Message 12737 - Posted: 25 Sep 2007, 16:57:58 UTC - in response to Message 12696. Please disregard my previous post, I have isolated the issue to another problem unrelated to the scheduling algorithm. http://boinc.berkeley.edu/dev/forum_thread.php?id=2154 ID: 12737 ·

Copyright © 2025 University of California.
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.