Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 109
Posts: 109   Pages: 11   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 734980 times and has 108 replies Next Thread
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Unrelated, though my 64 bit client produced again a perfect result, the funny with the negative credits has apparently something to do with... mostly 64 bit clients generating corrupt output files. Noted before by the tech, the 2 minute issue has the attention, but the focus remains first on getting 2 projects launched.

Personally, I'd go for a project mix also because HPF2 actually belongs to the more cpu heavier. [ot]Currently have snug up to 3 badge simultaneous upgrades (now within 10 days, 10,8,4 days to be exact), trying to time this so they fall on the same day (an outside chance I even get a fourth, virtual, to happen for an anticipated level, 11 CPU days out). Why it's done like that... because I can :D [/ot]
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Feb 28, 2010 9:06:19 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

As I had foreseen, now that I am crunching HPFP2 again like in the past I get errors in large quantities. All these errors have CPU times of 0.02 Hours with credit claims of 0.6 points or similar.
No big impact except that these error WU's occupy bandwith, memory and a little processing time for nothing.
If we consider across all machines that crunch this project on WCG it probably does drag the overall efficiency down.
But I undertsand there is no solution. So keep crunching until Sapphire.
----------------------------------------

[Mar 2, 2010 7:44:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Hypernova,

If the 2 minute fails grow too large in numbers within 24 hours (UTC timekeeping) you'll be facing quota dry-up, so be sure to continue to return valid results in-between.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Mar 2, 2010 7:55:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Here is the status as seen with Result Status for HPFP2:

Total WU = 738

split as:

In Progress 414
Valid 9
PV 149
Error 166

Hope this will be ok. I would hate stopping this project without getting to Sapphire.
----------------------------------------

[Mar 2, 2010 11:29:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Tonnes of errors indeed but what you have there looks like already one-third towards Ruby.

The rule works like each error counts as quota 1 reduction and eventually even if good results will double the quota again there the hard daily of 80 per core or whatever the number is. You're on about > 50% error out rate. Having a buffer in this particular case helps to ensure you keep crunching.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
[Mar 2, 2010 11:37:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Here is an update of Result Status for HPF2:

Total WU : 1384

Errors: 422

Valid: 116

PV: 374

In Progress : 472

The ratio Valid/Error is 27.4%

It is improving compared to yesterdays 5.4%. I suppose this is due to the lessening effect on the very high quorum on these WU's.

But still 422 errors (over two days) of 2 minutes CPU each sums up to 14 hours CPU lost. This is equivalent to 7 hours per day and across 10 machines that is 42 minutes/machine/day. This should not worsen as ratios should stabilize over the next two days.

This CPU time loss is still an acceptable price to pay for beautiful gemstones like, Ruby, Emerald and Sapphire. biggrin
----------------------------------------

[Mar 3, 2010 9:09:45 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

Hypernova, count the PV as valid, near guaranteed. Then 490:422 makes it presently a ratio of 54:46, better than 50% valid. Good enough to keep running.

Watch out for those notorious and rare HPF2 jobs that loop endlessly. See a task with a run time / % progress ratio that's abnormal, suspend the client with LAIM (Leave application in Memory) OFF and resume after 30 seconds so the job unloads from memory and resumes from last checkpoint. 99.9999999% it finishes then properly and faultless. BOINCview used to have a warning mechanism for low progress jobs and think BOINCTasks has it too but have not toyed with it.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Mar 3, 2010 9:19:49 AM]
[Mar 3, 2010 9:17:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

count the PV as valid, near guaranteed


Excellent!!! La vita e bella. wink
----------------------------------------

[Mar 3, 2010 10:49:25 AM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

I am wondering if we have ever seen such errors in Linux or Mac machines? Indeed my question tends to give my answer: I have the feeling that all these errors have been reported by Windows (all flavors) users.

I have run about 50 CPU days of HPF2 WUs recently in order to get the emerald badge, most under Ubuntu 64 and the rest (<20 %) under XP 32 and the only error I have seen was under XP 32 in the quad.

Could this be just another general Windows flaw, or a not-so-good library used only by Windows jobs?
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Mar 4, 2010 3:25:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hypernova
Master Cruncher
Audaces Fortuna Juvat ! Vaud - Switzerland
Joined: Dec 16, 2008
Post Count: 1908
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: anyone else seeing these kinds of errors? I'm getting tons of them.

When I check across my machines this is what I see:
All machines are W7 64 bit. All motherboards are the same, memory is also the same. CPU's differ and are i7 920, 950 or 975EE. All CPU's are overclocked at similar levels.
Errors are produced by the i7 920 and 950. The i7 975EE runs at 3.6 Ghz and generates one error per day. The 920 produces 26 error/day and the 950 between 21 up to 53 error/day depending on the machine.

So it is really impossible to see a specific pattern. It's a maddenning story. shock thinking shock
----------------------------------------

[Mar 4, 2010 6:20:48 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 109   Pages: 11   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread