Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 179
|
![]() |
Author |
|
Psalm103
Cruncher Joined: Jan 6, 2007 Post Count: 24 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I realize that run-time projections can be very wrong. The unusual thing on this was that after the reboot on all 6 beta WU's I had at the time, the time between steps nearly doubled. They reached 50% in about 5.5-6 hours, but then ended up taking around 17-19 hours each to complete.
Thanks, Ed |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Interesting ones In Progress: Yeah, interesting ones.BETA_ avx101118-031_ r6_ 1_ wcgfahb00060000_ 0-- BETA_ avx101118-075_ r8_ 1_ wcgfahb00070000_ 0-- These numbering was expected to show up once. Parent tasks must be from the 1st batch, where the user either aborted the task or exceeded the deadline (more probably) when the tasks were at 60% or 70% completion. Would be nice to know, when deadline is not achieved, whether the task will be server aborted or run to 100%. I think Keith is requested to enlighten that. This is normal. There are a few instances where this happens. One is that the user had returned a trickle message that was determined invalid. When that happens, the next work unit is generated from the point when the last valid point of a work unit. This allows for the greater say 3million steps that are needed to proceed without having to restart a given work unit from the beginning again. This will cut down on CPU cycles being wasted. Thanks, -Uplinger Not fully answered my question. What's happening with the task on the clients machine? - When a trickle seems to be invalid. Will the rest of the client task be aborted by the server? There will be few things that happen when a result is determined invalid based off the trickle message and intermediate upload (both are required to validate a 10k step section). 1. The result will be marked as returned and ready for validation. This will allow the validator to make sure that everything is in order and create the next work unit based on the step it completed. If a machine has completed 40k steps valid, then the result is marked valid and the next result is sent starting at the 40k. If a machine returns trickle messages and the first one is invalid, then the whole result is considered invalid and a fresh copy is sent out. 2. The user will be sent a trickle up message to "hard stop", which means just stop what you're doing now and check in with the server. This will give the client on it's next scheduler request a heads up that 'hey, you only completed 40% good, we don't want you to continue." There will be some lost cpu time here as it is based on the next scheduler request, but same would go for server abort. 3. Once the result is assimilated up to the x% valid, and it has been assimilated for a full day, then the result will be purged from the database, also telling the client to stop (server abort). - When the task is at e.g. 60% when deadline achieved. Will the task be server aborted for the 40% to do? Another interesting point. The trickle message is requested before the corresponding upload files are received by the server. Would it be better to upload the files successful first before sending/requesting a trickle message? We have a check on the trickle message handler to check again for X number of minutes. If a intermediate upload is not sent back within X minutes (set to 30 right now), Then the result is marked as ready for validation as the step 1 above says. 29 Aug 18:32:06 [trickle] read trickle file projects/www.worldcommunitygrid.org/trickle_up_BETA_avx101118-002_r0_1_wcgfahb00200000_0_1440865872.xml 29 Aug 18:32:06 Sending scheduler request: To send trickle-up message. 29 Aug 18:34:40 Started upload of BETA_avx101118-002_r0_1_wcgfahb00200000_0_3 29 Aug 18:34:40 Started upload of BETA_avx101118-002_r0_1_wcgfahb00200000_0_13 29 Aug 18:34:44 Finished upload of BETA_avx101118-002_r0_1_wcgfahb00200000_0_3 29 Aug 18:34:44 Finished upload of BETA_avx101118-002_r0_1_wcgfahb00200000_0_13 Hope this helps better answer your questions. Thanks, -Uplinger |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1297 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Interesting reading. It raises for question for me
----------------------------------------1. The result will be marked as returned and ready for validation. This will allow the validator to make sure that everything is in order and create the next work unit based on the step it completed. If a machine has completed 40k steps valid, then the result is marked valid and the next result is sent starting at the 40k. If a machine returns trickle messages and the first one is invalid, then the whole result is considered invalid and a fresh copy is sent out. We have a check on the trickle message handler to check again for X number of minutes. If a intermediate upload is not sent back within X minutes (set to 30 right now), Then the result is marked as ready for validation as the step 1 above says. In the event a user decides to have networking disabled for longer than 30 minutes what will the server do. Would it mark the job in valid? ![]() |
||
|
Mathilde2006
Senior Cruncher Germany Joined: Sep 30, 2006 Post Count: 269 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Does it surprise you? BETA_ avx101118-033_ r18_ 1_ 1-- - No Reply Sent Time 8/29/15 18:01:10 Due Time 8/29/15 18:01:10 0.00 0.0 / 0.0 Got two (no reply) betas on two clients. Both betas are still running- but one client was in the future powered by flux capacitor and returned the result exactly at the return time: BETA_ avx101118-008_ r12_ 1_ 1-- - In Progress 29.08.15 18:00:30 31.08.15 03:36:29 3.24 38.8 / 0.0 wingman: BETA_ avx101118-008_ r12_ 1_ 0-- - No Reply 25.08.15 18:00:23 29.08.15 18:00:23 32.82 310.8 / 0.0 The client is still crunching here - 13% at 4.5 hours. 36 hours return time is very short. Could be too late on my 1,5 Ghz AMD Turion. ![]() Update In the last seven hours also my second beta WU has claimed time, but is still running without wingman: BETA_ avx101118-072_ r19_ 1_ wcgfahb00100000_ 0-- - In Progress 29.08.15 19:07:36 02.09.15 19:07:36 6.22 155.4 / 0.0 Crunching at 46% after 7 hours. Core2 Quad 2,66 ![]() [Edit 4 times, last edit by Mathilde2006 at Aug 30, 2015 7:13:27 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Interesting reading. It raises for question for me 1. The result will be marked as returned and ready for validation. This will allow the validator to make sure that everything is in order and create the next work unit based on the step it completed. If a machine has completed 40k steps valid, then the result is marked valid and the next result is sent starting at the 40k. If a machine returns trickle messages and the first one is invalid, then the whole result is considered invalid and a fresh copy is sent out. We have a check on the trickle message handler to check again for X number of minutes. If a intermediate upload is not sent back within X minutes (set to 30 right now), Then the result is marked as ready for validation as the step 1 above says. In the event a user decides to have networking disabled for longer than 30 minutes what will the server do. Would it mark the job in valid? It's a harder read, but my take is: 1) Trickle message request send for the 10K step upload 2) Server waits max 30 minutes for the trickle upload to come. 3) No Upload, the End. (Better get your BOINC version up to scratch ;) This can very well happen for those on e.g. a scheduled connection period. If that is set to say 08:00 to 18:00 and the trickle msg is sent at 7:59:58, the client will cut of networking at 08:00 and upload is in limbo. Maybe, maybe WCG may wish to ask Dr.Anderson to make a mod to scheduling, which is already designed to report immediately in the 30 minutes before cut-off time: If upload/trickle cycle is in progress, hold off flipping network switch till upload is finished, none are queued. A 10K block is on my comps 2.1 to 5.5 hours going by this test. Swiped aside in the 'unfortunate' event? Can see cries of frustration coming, if this is the correct interpretation. There's rush it appears, but if a deadline is 7-10 days for the whole work unit, I wonder. Also we have regular work that takes less than an hour. A trickle is 1-5.5 hours computing, maybe some had even longer times. The actual mean-times for past 4 days in hours of the beta: 13.7383 19.1683 17.5747 19.6399 i.e. 1.3 to 1.96 hours between upload intervals [assuming all were 100K results] Plan B, maybe just get rid of the trickle notion. Yes it's fun playing with features, but this whole complexity may not be needed as when we have production tasks of under 1 hour. Just wonder if instead of 20-22 computers being involved in the 2M steps, with 200-220, how close the last step would be compared to the whole 2M being computed on 1 computer. It'd be interesting to understand how QC is maintained on such a jumble of computer results being concatenated, not my problem, but interesting nonetheless. |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1323 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
text text text Hope this helps better answer your questions. Thanks, -Uplinger Keith, thanks for your enlightenment ![]() |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1297 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks Rob for the explanation. It is certainly going to be interesting when this phase (if that is the correct terminology) goes live to see what happens.
----------------------------------------![]() |
||
|
vepaul
Senior Cruncher Belgium Joined: Nov 17, 2004 Post Count: 261 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
One error, others OK
----------------------------------------Test bêta Nom du résultat Nom d'unité Etat Heure d'envoi Heure de retour prévue / Heure de retour Temps d'unité centrale (heures) Crédit BOINC demandé/accordé BETA_ avx101118-004_ r10_ 1_ wcgfahb00400000_ 0-- Bureau2-HP En cours 29/08/15 19:07:05 02/09/15 19:07:05 10,69 / 0,00 310,8 / 0,0 BETA_ avx101118-009_ r15_ 1_ wcgfahb100000_ 0-- Bureau2-HP Valide 27/08/15 10:37:45 28/08/15 13:18:28 20,76 / 22,70 455,0 / 455,0 BETA_ avx101118-036_ r2_ 1_ 0-- paul-HP2 Erreur 25/08/15 18:03:18 29/08/15 18:03:24 0,00 / 0,00 484,1 / 0,0 BETA_ avx101118-074_ r14_ 1_ 0-- Bureau2-HP Valide 25/08/15 17:59:33 26/08/15 14:07:43 13,17 / 16,48 457,1 / 457,1 BETA_ avx101118-005_ r6_ 1af_ 3-- paul-HP2 Valide 04/08/15 05:42:28 04/08/15 19:57:36 11,81 / 11,88 353,2 / 443,9 BETA_ avx101118-005_ r0_ 1ar_ 0-- paul-HP2 Valide 29/07/15 18:19:55 30/07/15 09:14:26 12,86 / 12,94 384,9 / 453,8 BETA_ avx101118-005_ r1_ 1cs_ 1-- Bureau2-HP Valide 29/07/15 18:15:59 30/07/15 15:25:48 14,46 / 17,72 489,6 / 506,1 [Edit 1 times, last edit by vep at Aug 30, 2015 9:02:40 AM] |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7697 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I was curious about the figures for the latest beta so I made a little spreadsheet just for this beta.
----------------------------------------![]() I have one completed at 27.18 hours on Linux with 99.92% efficiency with a Xeon E5410. I have two more in progress on other machines with one of them looking to go about 52 hours on an AMD 9150e at 1.8ghz with Win 7. Hope they allow at least a 10 day window for completion once the project is up and running. Based on the chart, the bulk of the units have been completed by some pretty fast machines, but there are some units which have been done by some pretty slow machines. If individual results were available it would be possible to calculate the median value which may be a better indicator of the machine speed distribution.The overall mean time is 18.39 hours. Cheers
Sgt. Joe
----------------------------------------*Minnesota Crunchers* [Edit 1 times, last edit by Sgt.Joe at Aug 30, 2015 11:24:26 AM] |
||
|
fablefox
Senior Cruncher Joined: May 31, 2010 Post Count: 161 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
deadline date. is it international (IBM time?) or local (my time?)
----------------------------------------I may have to abort it since its 21 hours remaining. better off giving it to to those with better hardware. but i thought the client will not run if it think it won't have enough time, right? |
||
|
|
![]() |