Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 140
Posts: 140   Pages: 14   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 144525 times and has 139 replies Next Thread
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Beta WUs with the subject name
BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx
seem to be extremely difficult to compute.

Several occurrences of these WUs have already been reported in different threads. I am opening this thread to consolidate this reporting and to make their name visible in the forum index.

See
https://secure.worldcommunitygrid.org/forums/wcg/printpost?post=226241
posted by me in thread "BETA 8, version 6.10, April 24, 2008, ANONYMOUS"
https://secure.worldcommunitygrid.org/forums/wcg/printpost?post=226261
posted by sprigo in thread "Beta 6.10 CMD2 Time to completion getting longer and longer."
and possibly
https://secure.worldcommunitygrid.org/forums/wcg/printpost?post=226246 ,
the opening post of the same thread, posted by RaulBonegio.

After reporting 6 % after 2.5 hours in my own post (i.e. a projected runtime of 42 hours) I am worried to report that six hours later the same WU is showing a runtime of 8:27:00 for only 8.556 % done, i.e. a new projected total runtime of close to 100 hours on this rather fast machine (Q6600 at 2.88 GHz).

I am asking the techs, what should we do? Jean.

Edit: Added the processor type of my machine to help figure out the case.
Edit2: Added the names of the authors of the three referenced posts.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
----------------------------------------
[Edit 2 times, last edit by JmBoullier at Apr 25, 2009 4:49:04 PM]
[Apr 25, 2009 1:47:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

I can report simlar problems on 2 WUs. One is reporting 7 hour elapsed, 26% progress and 24 hours to completion. The 2nd reports 16 hours elapsed, 11% progress and 51 hours to completion!!!! The progress is going up, but so is the time to completion.
[Apr 25, 2009 2:24:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Raul,
In cases of underestimated WUs like these ones you can definitely ignore the time to completion (TTC) shown by Boinc. Boinc is already bad at this for moderately underestimated WUs, so it is completely out of the game in the current circumstances. You can only trust the projected runtime that you can extrapolate from the runtime and percentage already done. My beta has started at about 4 hours to completion and after almost 10 hours the TTC is still only 15.5 hours while a simple extrapolation gives a total of about 107 hours, i.e. 97 hours to go!

It is the fact that this extrapolated total time is continuously increasing which is unusual, not the increase of Boinc's TTC.

I would prefer that you confirm that one of your WUs has a name similar to the one listed in the title and that you tell us which processor is used and/or possibly how long a HCC WU is needing usually in your machine.

By the way I think that your WU reporting 7 hours elapsed and 26% progress is not part of this monster series, it is "only" a big one as we can see from time to time. Please compare its name to the title of this thread.
In the current circumstances I would like to see 26 % progress after "only" 7 hours of runtime! smile

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
----------------------------------------
[Edit 1 times, last edit by JmBoullier at Apr 25, 2009 3:09:44 PM]
[Apr 25, 2009 3:07:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Hi Jean, and thanks for the info - I'll watch and calculate and see if we are getting closer or further to completion.
The WU that seems 'normal' is BETA_CMD2_0001-DHRS3.clustersOccur-MYH2A.clustersOccur-137_1 and the WU that seems to be a'monster' is BETA_CMD2_0001-1RKC_A.clustersOccur-1YDI_A.clustersOccur_86_2 They seem to be from the same 'family'.
The 'monster' above is now running at high priority, and is showing 16 hours elapsed 13% progress and 51 Hours to completion.
The 'normal' Wu is running on an Intel Pentium M 1700MHz in an IBM Thinkpad, and the 'monster' is running on an Intel Celeron 3.06GHz in a Packard Bell desktop system
----------------------------------------
[Edit 2 times, last edit by Former Member at Apr 25, 2009 3:29:34 PM]
[Apr 25, 2009 3:24:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Raul, thank you for the details you have provided.
I have tried to catch the attention of the techs off stage but during the week end I am not sure it will succeed.
Keep making your own estimate of the total runtime of these WUs from time to time and tell us if it is stable or continuously increasing. I hope you will say "stable" for the first WU, and I wish you can say it too for the second one because otherwise that would mean another series of such monster WUs (its name is not the same as mine and sprigo's).

Thanks for your help. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Apr 25, 2009 3:42:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Processing BETA_CMD2_0001-PP1BA.clustersOccur-TPM3A.clustersOccur_4_1
Currently 10.191% after 6:40:26
Estimated 16:50:52 remaining, but simple math takes me to 52 hours remaining

Xeon R5404 (2Ghz), also running another Beta (BETA_CMD2_0001-PP1BA.clustersOccur-VINCA.clustersOccur_5_1 21% after 2 hours, set to finish in 10) and 2 energy work units, not sure if its affecting anything at all.
[Apr 25, 2009 4:40:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

It is the fact that this extrapolated total time is continuously increasing which is unusual, not the increase of Boinc's TTC.

I would prefer that you confirm that one of your WUs has a name similar to the one listed in the title and that you tell us which processor is used and/or possibly how long a HCC WU is needing usually in your machine.



I don't think the WU I was concerned about is part of the 'monster' family, then.
name:
BETA_CMD2_0001-1I7X_C.clustersOccur-1I7X_C.clustersOccur_38_1

It's going about 50% over the initial estimated time (on a 2.4GHz P4 CPU), but its extrapolated time's not increasing exponentially as the 'monster' WUs seem to be doing.

The closest matches to the title WUs I've gotten have been
BETA_ CMD2_ 0001-PP1BA.clustersOccur-UGPA2A.clustersOccur_ 0_ 1 and
BETA_CMD2_0001-PP1BA.clustersOccur-PYGM.clustersOccur_60_2 - the former finished in about 12 hours, and the latter is @ 72% after 10 hours... both on the same 2.0 GHz Athlon.
[Apr 25, 2009 4:58:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
GIBA
Ace Cruncher
Joined: Apr 25, 2005
Post Count: 5374
Status: Offline
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Hi all,
I picked a monster one too:

BETA_ CMD2_ 0001-1RKC_ A.clustersOccur-1YDI_ A.clustersOccur_ 10_ 0--


This WU are running in high priority in the last 26 hours now.

The time to completion are more than 34 hours now, but continue increasing each more.

The increment of work done are happens in very slow motion, but are growing anyway (never saw this kind of speed for any WU in more than 4 years of WCG... including many Betas WU's crunched before...).

There is no special resources drenage happens until now in my machine ( a INTEL QUAD EXTREME QX9770 running at 3.2 GHz, W Vista 32 Ultimate , 4 GB RAM, crunching more 3 HFCC WU's at sametime, at 100% of processor dedication to WCG). Just monitored for around half hour and saw that this monster WU are using around 5.400 kB of memory to crunch (very little if compared with 115.000 KB in average of any HFCC WU's that I'm crunching at sametime, for instance).

One from my WU quorum just give up to continue crunch, and abort your WU after 19 hours. None are finished yet.

I'm monitoring this one each 3 horus, once means that are crunching without issues, despite of the lack of speed, to avoid bad surprises.

I hope that it finishing in good shape in one or two days, despite my little concern about the completion deadline scheduled to 4/27 12:08 PM.

coffee
----------------------------------------
Cheers ! GIB@ peace coffee
Join BRASIL - BRAZIL@GRID team and be very happy !
http://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=DF99KT5DN1

[Apr 25, 2009 5:07:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
sprigo
Cruncher
England
Joined: Apr 30, 2007
Post Count: 37
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

I've got one of the monsters as well.

BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_31

Currently been running for 12hrs and at 9.4% complete! This is running on an Core i7 Extreme (4GHz) hyperthreading disabled, 100% useage,12 Gb RAM.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by sprigo at Apr 25, 2009 5:50:18 PM]
[Apr 25, 2009 5:48:26 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: BETA_CMD2_0001-PP1BA.clustersOccur-TPM1A.clustersOccur_xx monster WUs

Checkpoint at 10 % for my BETA_ CMD2_ 0001-PP1BA.clustersOccur-TPM1A.clustersOccur_ 34_ 0
   Runtime        % done    Extrapd Total Runtime
2.5 hours 6.00 % 42 hours
8.5 hours 8.56 % 100 hours
~10 hours 9.25 % 110 hours
11.89 hours 10.002 % 118.87 hours

Extrapolated total runtime is computed from runtime and % already done. Boinc's time to completion is useless in these circumstances (~17 hours right now!).

It seems more and more obvious that this exponential WU might never reach 100 %. I will let it live a little longer in case I have some good idea or one of the techs tells us what to do. And then I will probably have to kill it. I'll let you know.

Beside this seriously annoying problem everything looks fine and in line with other WUs for this new project:
RAM use: 5,400 kB
Peak memory use: 5,572 kB
VM size: 30,212 kB
Total page faults: 18,336 (for 12.5 hours!)
Other WUs running in the other cores are not affected at all.
And checkpointing occurs as frequently as usual for this project.
Biggest checkpoint file in the slot directory: 169 kB. All others at 1 kB.

Cheers. Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Apr 25, 2009 6:11:41 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 140   Pages: 14   [ 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread