World Community Grid - View Thread - BOINC: Deadlines for Standard & Rush Tasks and What Clients Get These

World Community Grid requires work send to volunteered computers to be returned before an X number of days. Reason is to ensure the overall project batches do not get delayed. At the same time this facilitates the participation of devices that are on only few hours a day. E.g. home computers can process a project in the background whilst doing email, web browsing and other housekeeping chores. The principle is that every cycle counts and a work unit eventually does get completed. Frequent Checkpoint-Saving will let these jobs resume very near to where they were shut down the previous time.

The deadlines for work generated, updated as of Sep. 06, 2013 are:

3 daysPending Verification Tasks, those sent out to check if a single distribution result is correct

4 days
- Beta tasks (repairs of Beta tasks are standard allowed 1.6 days)
- Repairs in replacement of no reply/failed/aborted/lost in device detachments that originally had a 10 day deadline. ++
- 2nd No Reply result substitutions, that expired the original deadlines below
10 days, including first No Reply replacements for:
- Say NO to Schistosoma (SN2S)
- Computing for Clean Water (C4CW)
- The Clean Energy Project, Phase 2 (CEP2)
- Help Fight Childhood Cancer (HFCC)
- FightAIDS@Home (FAAH)

When the feeders get clogged [due to backlog of rush jobs not finding a wingman in their own homogeneity class], one of the ways to clear it is to drop the 'reliability' feature temporarily so that the workunits get assigned faster. In that case the repair units are given the original deadline period as per above.

Rush jobs to make up for tasks missing ** deadlines are usually a fraction of the original project deadline, 1-5 days, the server computes for each device (host), but are generally send to those computers known to WCG for quick, valid returns and high current uptime history ***. The Rush deadline algorithm is:

Take the original deadline days of the project and and multiply by 0.4. E.g. if the original deadline was 10 days, then the rush job deadline would be 4 days, plus a small host specific allowance. Batch test tasks to find out how long new sets will run also get the 4 day deadline. They can be recognized by a date/time prefix such as this sample 20091203144517_ faah9423_ ZINC12021275_ xJ1_ xtal_ 01_ 0--
Computes the estimated turnaround time for the job on the client. If this time is more than the value in #1 then use this value up to a maximum of twice #1.

Beta jobs have varying deadlines depending on criteria like pre-launch testing or bug resolution. Again they have short return time requirements.

Tasks that were lost by the client and are resubmitted to same client get a new sent time equal to the date/time of resending. Their deadlines are then recalculated as a function of past device return history and original deadline. Most cases this works out to be the original deadline, sometimes later, sometimes earlier.

The maximum total work queued a BOINC client allows is 10 days and can either be set in the My Grid > Device Manager > Device Profile > Profile of Choice > Custom Options > Cache, or in the versions 5.10 and higher of BOINC clients using the Advanced menu > Preferences > Network Usage > "Additional Buffer Days" option.

BOINC will automatically manage the buffered work order through it's scheduler function so all can meet the individual assigned Task deadline. Rush jobs ++ will show messages like "Running - High Priority" and others like "Waiting to Run" if paused or preempted.

It is not advisable to exceed 7 days of work buffer [Recommend maximum 4] if e.g. running the 10 day deadline projects, alone or in a mix, simply because of potential outages, particularly if work is only returned periodically. Take computer down time into consideration! Secondly, do not fill up the buffer with brand new projects as often they suffer child diseases and bugs in the new science software. An initial full size production could reveal those late ones and cause a client to process large volumes of work that might in fact be bad. WCG can remotely cancel that work when aware of a high failure rate, but only if the host is on-line and has a scheduled contact with the project servers.

For information on the number of copies circulated to arrive at proper quorum validation visit the FAQ BOINC: Minimum Quorum &am... Work Units (Tasks)[/size

** Late Tasks generally do not get credit granted unless completed and sent back before the Extra Copy is returned. If not in time they get marked as "Too Late" on the Result Status page and get zero credit, except where WCG stopped the redistribution of additional quorum copies due work unit problems. Then credit will be granted manually: No need to alert staff!

*** Fast / Reliable clients are determined by the following:

The recent average of the length of time between when a result is assigned and the time that the result is reported as done by the client. This time must be less than 48 hours on average. This measurement includes the impact of queue size because a larger queue size will lengthen the 'turnaround time' for a task and push the computer out of the 'reliable' metric.
The host must have maximum daily quota, as at March 7, 2012 that's 120 per day, per active thread. Errors/Invalids reduce this quota value

No messages are presently send to hosts to confirm their reliability state. The "In and "Out" is automatic. No device status page reflects the rating.
Known/regularly contributing hosts must have had "last" 5 returned results validated "at science app level". A new science or an error/invalid requires renewed proofing for that science. ++.

NB: ++ Because of database efficiency reasons, deadlines for "Rush" jobs (Repair & Make up for bad jobs) use the 40% rule (40% of initial quorum deadline). A request is outstanding to give these particular "Rush" jobs the same deadline as the original, with a minimum of the 40% rule. These Rush Jobs are only send to "reliable" clients that are known to have a very high "valid" rate and return results usually within 48 hours from submission to a device. 40% was chosen because occasionally a very long running job goes out. Then a short deadline could not be met.

+++ Because the number of error results [those aborted, invalid, error] can cause a device to move in and out of the reliable class, the below matrix provides an easy lookup how many valid results are needed to get/return into this group and how many errors are allowed before the device is moved out (range within the accolades indicating a maximum leeway of 12 errors). http://bit.ly/ch2ZYc REDUNDANT Since March 6, 2012 when WCG servers were upgraded to BOINC version 700. Full host reliability requires a continuous series of:

A) Last 5 (five) checked results as valid. [Zero redundant sciences require >= 20 (twenty) consecutive validation to be allowed to run them alone.]
B) An average return time of less than or equal to 2 days (48 hours)
C) A maximum quota permission at time of any work assignment (120 per device core as at March 6, 2012)

Results that require a second result copy for verification go to any host.

++ As of March 2013, user aborted tasks that have no computing time on them are reissued also to regular clients that are not full repair rated. This is because large abort events could cause the feeder shared memory priority of these 'repair' jobs led to work temporarily only going to 'reliable' clients and 'regular' client getting messages there was temporarily no work.

Related topics:

Return to Start Here FAQ Index