Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 109
|
![]() |
Author |
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
The techs have one WU where they're actually able to replicate a fail on their lab machines, sometimes. If they'd been able to determine the inroad to what the root cause, and then not affect the science result, they'd long done that.
----------------------------------------Anyway, this was the last official reply: http://www.worldcommunitygrid.org/forums/wcg/...ead,27739_offset,0#258519
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jan 31, 2010 9:36:14 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello WCG.
Attention: uplinger -- WCG Tech cross-reference: my [Dec 28, 2009 6:34:05 PM] post From the latest batch of WCG WUs I have uploaded (about 12.5hrs earlier) to WCG, there was only one WU (with 129 mixed-project WCG WUs as trouble-free) with an error -- an HPF2 project WU, "nb947_00022_4" with details as follows: Boinc_v6.2.28 display ---------------------- Name: nb947_00022_4 CPU time: 03:15:41 Progress: 100% Report deadline: 2010.02.01.Mon 23:27:37 Status: Computation error Snippets from "stdoutdae.txt": ------------------------------ 28-Jan-2010 23:38:56 [World Community Grid] Starting nb947_00022_4 28-Jan-2010 23:38:56 [World Community Grid] Starting task nb947_00022_4 using hpf2 version 603 29-Jan-2010 03:02:23 [World Community Grid] Computation for task nb947_00022_4 finished 29-Jan-2010 03:02:23 [World Community Grid] Output file nb947_00022_4_0 for task nb947_00022_4 absent Others: -------- OS: 32-bit Vista Ultimate; SP2 The earlier post I did (Dec28,2009) was about some HPF2 WUs with stuck/frozen progress, else consumes unusually long crunchTimes (~30hrs, which, at that time, I opted to abort). Would the codebase of those HPF2 WUs be the same as those HPF2 WUs that also (sometimes) exhibit the above-mentioned error? If so, would there possibly be some connection between a suspect HPF2 WU 'getting lost' in a non-convergence zone on one extreme (exhibiting stuck/frozen progress), and on the other extreme, the WU somehow 'trapped' (which triggers an error) in the non-convergence? Good day. |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Yet to hear of a restarted HPF2 job that was stuck in endless loop to not complete successfully [see FAQs]. How this and the error phenomena could somehow be connected is hard to see.
----------------------------------------1. The right out fails are within minutes of start, long before 1st checkpoint 2. The endless looping happening at any point in time, very possibly near the end zones just before the checkpoints. No one has as yet made a checkpoint connection that I can recollect. It's the I just happened to look discovery, where I run the RosettaView ** utility on the side since it monitors % progress on jobs and gives off alerts. ** No current source known for download.
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
joeperry39@gmail.com
Advanced Cruncher USA Joined: Nov 22, 2006 Post Count: 140 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Runs OK on my Vista32 and on my XP32 but not so good on my Vista64. I have not tried Win7 yet so have nothing to offer. My (osugrad) original reply on this subject: I'm currently running HPF2 on an AMD Athlon II X2 235e Processor running at 2.70GHz with 6GB RAM and 64-bit Windows 7. So far, no problems. Life is good! ![]() BTW: I'm running BOINC 6.10.18 if that makes a difference. I'm also running HPF2 on my older machine with an AMD Athlon XP 2400+ processor running at 2Ghz with 2 GB of RAM and 32-bit XP Home, SP-3. HPF2 has been running exclusively on both machines for quite some time now with absolutely No Errors returned. I also have BOINC rel 6.10.18 on the older computer. I'm no expert on such matters, but can't help but wonder if it's somehow a combination of OS (and perhaps the version thereof), processor make and model, version of BOINC running the jobs and possibly other software that may be running on the various machines at the same time HPF2 is running. ![]() ![]() "Everything in moderation, including moderation" -- Mark Twain |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Anyway, I seem to recall having that problem with HCMD2 on vista and 7 ... Zoso, I don't know if you are misremembering or miskeying but HCMD2 has never shown any problem similar to these naughty HPF2 problems.It happens with more than just the HCMD2 task on vista/7? Or, am I misremembering that, also? ![]() There have been a few teething problems at the beginning of the project and, later, several WUs have looked like they were stuck while computing very tough positions but we have not had any real case of looping yet. And no cases of failures right at the beginning of WUs either. In fact, HCMD2 is a rather quiet project if you except the high variousness of durations which may disturb BOINC's ability to schedule jobs properly sometimes. Cheers. Jean. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello WCG.
The 'error phenomenon' surrounding WCG's HPF2 WUs seem to have, as I see it, affected a number of WCG crunchers. I thought I would provide some feedback to the WCG community regarding the crunching of HPF2 WUs in my machine in my hope that some useful data may be extracted therefrom that may serve as sign-posts in the search for a solution. Thus.. I just had one HPF2 WU (nc519_00084_13) whose progress was stuck for some time (since when, I don't know), and when I restarted my BOINC_v6.2.28, the said WU resumed incrementing its progress. My image-capture of the said WU in BOINC shows: -- 04:54:42 and counting up (CPU time) -- 46.125% and stuck (Progress) -- 04:42:02 and counting up unevenly/irregularly (To competion) After waiting for about 30minutes, with the progress still stuck at 46.125%, I opted to restart BOINC. Some minutes after that BOINC restart, results are as follows: -- 02:01:49 and counting up (CPU time) -- 51.67% and counting up (Progress) -- 02:39:43 and counting down (To completion) Finally, the said HPF2 WU completed error-free with a BOINC-indicated CPU time of -- 04:07:56. Good day. ; |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
[snip] I'm no expert on such matters, but can't help but wonder if it's somehow a combination of OS (and perhaps the version thereof), processor make and model, version of BOINC running the jobs and possibly other software that may be running on the various machines at the same time HPF2 is running. ![]() That's why I posted the beginning string of Messages from when BOINC last started up before the error; If reports don't include those data it just takes longer for the pattern (if there even is one) to reveal itself. It would take 1500 or so computers to test all the permutations from just 6 types of CPU, 5 manufacturers of motherboard/chipsets, 5 types of video and 10 different OS's (there are certainly more of all those variables), so it would be difficult, at best, to test every possible hardware combination in alpha OR beta before releasing the task WUs for public crunching. ![]() Unless I see another one that errors out by the end of this month, I'll have to agree that it was an AV issue. That's the only Win7 box I have and I'm not planning on paying $300 for Win7 Ultimate when it starts shutting down a month from now... I spent just over $300 assembling that machine (mostly used, off ebay; keyboard and USB hub new from amazon). 6.2.28 is still the 'official' windows version (with WCG customizations)... as of this second, anyway ![]() |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
ZoSo, we've been battling with a numb ax on this for longer... add the number of concurrent HPF2 jobs to the variations v.v. multicores :-|
----------------------------------------Quite a few would like to see an option Never send me this science in combo with the Send me something else if you don't have my fav project. BUT, the techs are working on a process that will remember if a client has continuous problems with a specific science and then will only send 1 periodically to check if the issue was fixed, if asked for of course. edit: inserted continuous
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Feb 2, 2010 10:03:11 AM] |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hello all,
The error rate for the windows platform on this application does have our attention. We are currently working very hard to bring two more science applications online before we are able to dedicate more of our time to fixing this issue. This error is different than most we have seen in the past and requires more dedicated time debugging it than usual. Please be patient with us and we will fix this issue. Thanks, -Uplinger |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello WCG.
Reference: -- uplinger's [Feb 1, 2010 9:04:35 PM] post -- Sekerob's [Feb 1, 2010 10:59:09 AM] post Gentlemen: I commend your efforts in dealing with the HPF2 issue. Sekerob's idea proposes to address those concerns that emphasize getting points off crunching WUs while uplinger's idea addresses the nuts-and-bolts of HPF2 WU processing itself with a view of hunting down the source of the issue and with that, proposes to address concerns that emphasize the importance of HPF2 WUs; that is, for crunchers who decided to stick with HPF2 WUs (because of the importance of the underlying science) despite the relatively few errors that may arise crunching them. P.S. To this hour, I have finshed crunching 14 WCG HPF2 WUs, averaging 243_minutes-per-HPF2-WU. No HPF2 problems thus far (since my last report). Good day. ; |
||
|
|
![]() |