Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 18
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1805 times and has 17 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU in trouble - RC = 0xc0000005

There are 16 jobs per molecule. If you do 3 or more jobs correctly, that is useful. Once the algorithm fails, the task is over and returns the job results that you did.
[Aug 17, 2013 11:38:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU in trouble - RC = 0xc0000005

Hi Lawrence,

Thanks for the post and I understand what you say. However, I fail to understand the circumstances that dictate when the same error in the same step returns valid in one case but error in another. There must be something else I don't see. (Or, at least, I hope there is ...).
[Aug 18, 2013 12:12:10 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU in trouble - RC = 0xc0000005

Wingman _9 of the WU that I reported above has done a similar trick. (E214922_ 872_ A.35.C29H17NS3SeSi.38.4.set1d06)
He's Pending Validation but got RC = 0xc0000005 after job 12, while I got Error with the same job log as far as I can see.
I agree with Apis Tintinnambulator - what's going on?
----------------------------------------
[Edit 1 times, last edit by Rickjb at Aug 20, 2013 4:26:26 AM]
[Aug 19, 2013 5:41:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU in trouble - RC = 0xc0000005

I just did a search on 0xc0000005 and found a 2011 post by cleanenergy who is also puzzled by this problem: https://secure.worldcommunitygrid.org/forums/...ead,30835_offset,0#315941

Unless the answer has been found in the last 2 years, this is an official mystery with no official answer.

confused
Lawrence
[Aug 19, 2013 8:39:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU in trouble - RC = 0xc0000005

Hi Lawrence,

Again, thanks for looking into this.

I rather assumed that there must be some information in the returned files that is not reflected in what we see in the logs and it was just an annoying "feechure". But other than to say that I hope it will be made clearer in future science apps, I'm content enough just to live with a minor annoyance in this one (so long as its frequency doesn't increase). If many others report it, however, I think it would justify a little closer attention from the techs, even if only to keep us crunchers happy: I for one certainly like to know why my machines are producing bad results.
[Aug 19, 2013 9:36:52 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU in trouble - RC = 0xc0000005

I just had another look and I see that E213621_137_A.33.C23H11N3S4Se2Si.93.1.set1d06 now has one "no reply", SEVEN "error"s, and still two "in progress". Surely the system should have given up on it by now?
[Aug 19, 2013 9:43:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: WU in trouble - RC = 0xc0000005

I just had a look at the Results Status here @ WCG.org for 2 CEP2 WUs in my cache on this device that have short Deadlines and are still Ready to Start.
They have both returned Error status to all other wingmen.
Most got RC = 0x100, a few RC = 0x01, but none RC = 0xc0000005
A few qinky ones too.

E214649_ 089_ C.35.C31H16S3Se.00985909.1.set1d06_ 4--
E214903_ 644_ C.36.C33H20N2S.00932953.3.set1d06_ 2--

I intend to periodically go through all of "my" CEP2 WUs that haven't yet been returned, and abort any for which about 3 or 4 or more wingmen have already returned an Error. That may* advance them one step closer to the system ceasing to send out repair copies, without wasting my CPU time ...

There must be something indeterminate in the science program's data space that is causing different results on different machines with the same input filles.
It could be dependent on the CPU hardware, and the BOINC sytem may or may not return enough CPU details to the server to be able to discern this.
Or if the CEP2 program does not explicitly initialise one or more of its variables, it could be reading the "random" data that was in the RAM before that chunk of memory was allocated to that CEP2 process.
Or the program may explicitly generate and use random numbers, eg for Monte Carlo simulations, and these sometimes send things out of bounds.
Or, ...

Roll on the rumoured * sparkling all-new improved * version of the Q-CHEM/CEP2 program!
(Any ideas on its time of arrival?)

[Edit:] * Does "Aborted by User" count towards the max no of repair copies of a WU that are sent out? [/Edit]
----------------------------------------
[Edit 1 times, last edit by Rickjb at Aug 20, 2013 10:57:31 AM]
[Aug 20, 2013 5:08:00 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: WU in trouble - RC = 0xc0000005

Hi Lawrence,

Thanks for the post and I understand what you say. However, I fail to understand the circumstances that dictate when the same error in the same step returns valid in one case but error in another. There must be something else I don't see. (Or, at least, I hope there is ...).


Might have just been recoverable in one instance and fatal in the other. Windows is throwing that exception.. maybe it's rethrown during setup and swallowed (but logged) during teardown. Who knows..
[Aug 20, 2013 6:08:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 18   Pages: 2   [ Previous Page | 1 2 ]
[ Jump to Last Post ]
Post new Thread