Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 140
|
![]() |
Author |
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Beta WUs with the subject name
----------------------------------------BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx seem to be extremely difficult to compute. Several occurrences of these WUs have already been reported in different threads. I am opening this thread to consolidate this reporting and to make their name visible in the forum index. See https://secure.worldcommunitygrid.org/forums/wcg/printpost?post=226241 posted by me in thread "BETA 8, version 6.10, April 24, 2008, ANONYMOUS" https://secure.worldcommunitygrid.org/forums/wcg/printpost?post=226261 posted by sprigo in thread "Beta 6.10 CMD2 Time to completion getting longer and longer." and possibly https://secure.worldcommunitygrid.org/forums/wcg/printpost?post=226246 , the opening post of the same thread, posted by RaulBonegio. After reporting 6 % after 2.5 hours in my own post (i.e. a projected runtime of 42 hours) I am worried to report that six hours later the same WU is showing a runtime of 8:27:00 for only 8.556 % done, i.e. a new projected total runtime of close to 100 hours on this rather fast machine (Q6600 at 2.88 GHz). I am asking the techs, what should we do? Jean. Edit: Added the processor type of my machine to help figure out the case. Edit2: Added the names of the authors of the three referenced posts. ---------------------------------------- [Edit 2 times, last edit by JmBoullier at Apr 25, 2009 4:49:04 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I can report simlar problems on 2 WUs. One is reporting 7 hour elapsed, 26% progress and 24 hours to completion. The 2nd reports 16 hours elapsed, 11% progress and 51 hours to completion!!!! The progress is going up, but so is the time to completion.
|
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Raul,
----------------------------------------In cases of underestimated WUs like these ones you can definitely ignore the time to completion (TTC) shown by Boinc. Boinc is already bad at this for moderately underestimated WUs, so it is completely out of the game in the current circumstances. You can only trust the projected runtime that you can extrapolate from the runtime and percentage already done. My beta has started at about 4 hours to completion and after almost 10 hours the TTC is still only 15.5 hours while a simple extrapolation gives a total of about 107 hours, i.e. 97 hours to go! It is the fact that this extrapolated total time is continuously increasing which is unusual, not the increase of Boinc's TTC. I would prefer that you confirm that one of your WUs has a name similar to the one listed in the title and that you tell us which processor is used and/or possibly how long a HCC WU is needing usually in your machine. By the way I think that your WU reporting 7 hours elapsed and 26% progress is not part of this monster series, it is "only" a big one as we can see from time to time. Please compare its name to the title of this thread. In the current circumstances I would like to see 26 % progress after "only" 7 hours of runtime! ![]() Cheers. Jean. ---------------------------------------- [Edit 1 times, last edit by JmBoullier at Apr 25, 2009 3:09:44 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi Jean, and thanks for the info - I'll watch and calculate and see if we are getting closer or further to completion.
----------------------------------------The WU that seems 'normal' is BETA_CMD2_0001-DHRS3.clustersOccur-MYH2A.clustersOccur-137_1 and the WU that seems to be a'monster' is BETA_CMD2_0001-1RKC_A.clustersOccur-1YDI_A.clustersOccur_86_2 They seem to be from the same 'family'. The 'monster' above is now running at high priority, and is showing 16 hours elapsed 13% progress and 51 Hours to completion. The 'normal' Wu is running on an Intel Pentium M 1700MHz in an IBM Thinkpad, and the 'monster' is running on an Intel Celeron 3.06GHz in a Packard Bell desktop system [Edit 2 times, last edit by Former Member at Apr 25, 2009 3:29:34 PM] |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Raul, thank you for the details you have provided.
----------------------------------------I have tried to catch the attention of the techs off stage but during the week end I am not sure it will succeed. Keep making your own estimate of the total runtime of these WUs from time to time and tell us if it is stable or continuously increasing. I hope you will say "stable" for the first WU, and I wish you can say it too for the second one because otherwise that would mean another series of such monster WUs (its name is not the same as mine and sprigo's). Thanks for your help. Jean. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Processing BETA_CMD2_0001-PP1BA.clustersOccur-TPM3A.clustersOccur_4_1
Currently 10.191% after 6:40:26 Estimated 16:50:52 remaining, but simple math takes me to 52 hours remaining Xeon R5404 (2Ghz), also running another Beta (BETA_CMD2_0001-PP1BA.clustersOccur-VINCA.clustersOccur_5_1 21% after 2 hours, set to finish in 10) and 2 energy work units, not sure if its affecting anything at all. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It is the fact that this extrapolated total time is continuously increasing which is unusual, not the increase of Boinc's TTC. I would prefer that you confirm that one of your WUs has a name similar to the one listed in the title and that you tell us which processor is used and/or possibly how long a HCC WU is needing usually in your machine. I don't think the WU I was concerned about is part of the 'monster' family, then. name: BETA_CMD2_0001-1I7X_C.clustersOccur-1I7X_C.clustersOccur_38_1 It's going about 50% over the initial estimated time (on a 2.4GHz P4 CPU), but its extrapolated time's not increasing exponentially as the 'monster' WUs seem to be doing. The closest matches to the title WUs I've gotten have been BETA_ CMD2_ 0001-PP1BA.clustersOccur-UGPA2A.clustersOccur_ 0_ 1 and BETA_CMD2_0001-PP1BA.clustersOccur-PYGM.clustersOccur_60_2 - the former finished in about 12 hours, and the latter is @ 72% after 10 hours... both on the same 2.0 GHz Athlon. |
||
|
GIBA
Ace Cruncher Joined: Apr 25, 2005 Post Count: 5374 Status: Offline |
Hi all,
----------------------------------------I picked a monster one too: BETA_ CMD2_ 0001-1RKC_ A.clustersOccur-1YDI_ A.clustersOccur_ 10_ 0-- This WU are running in high priority in the last 26 hours now. The time to completion are more than 34 hours now, but continue increasing each more. The increment of work done are happens in very slow motion, but are growing anyway (never saw this kind of speed for any WU in more than 4 years of WCG... including many Betas WU's crunched before...). There is no special resources drenage happens until now in my machine ( a INTEL QUAD EXTREME QX9770 running at 3.2 GHz, W Vista 32 Ultimate , 4 GB RAM, crunching more 3 HFCC WU's at sametime, at 100% of processor dedication to WCG). Just monitored for around half hour and saw that this monster WU are using around 5.400 kB of memory to crunch (very little if compared with 115.000 KB in average of any HFCC WU's that I'm crunching at sametime, for instance). One from my WU quorum just give up to continue crunch, and abort your WU after 19 hours. None are finished yet. I'm monitoring this one each 3 horus, once means that are crunching without issues, despite of the lack of speed, to avoid bad surprises. I hope that it finishing in good shape in one or two days, despite my little concern about the completion deadline scheduled to 4/27 12:08 PM. ![]()
Cheers ! GIB@
![]() ![]() Join BRASIL - BRAZIL@GRID team and be very happy ! http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1 |
||
|
sprigo
Cruncher England Joined: Apr 30, 2007 Post Count: 37 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I've got one of the monsters as well.
----------------------------------------BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_31 Currently been running for 12hrs and at 9.4% complete! This is running on an Core i7 Extreme (4GHz) hyperthreading disabled, 100% useage,12 Gb RAM. ![]() [Edit 1 times, last edit by sprigo at Apr 25, 2009 5:50:18 PM] |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Checkpoint at 10 % for my BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM1A.clustersOccur_ 34_ 0
----------------------------------------Runtime % done Extrapd Total Runtime Extrapolated total runtime is computed from runtime and % already done. Boinc's time to completion is useless in these circumstances (~17 hours right now!). It seems more and more obvious that this exponential WU might never reach 100 %. I will let it live a little longer in case I have some good idea or one of the techs tells us what to do. And then I will probably have to kill it. I'll let you know. Beside this seriously annoying problem everything looks fine and in line with other WUs for this new project: RAM use: 5,400 kB Peak memory use: 5,572 kB VM size: 30,212 kB Total page faults: 18,336 (for 12.5 hours!) Other WUs running in the other cores are not affected at all. And checkpointing occurs as frequently as usual for this project. Biggest checkpoint file in the slot directory: 169 kB. All others at 1 kB. Cheers. Jean. |
||
|
|
![]() |