Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 5
|
![]() |
Author |
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It's known that with the current state of the project there are occasionally bad workunits
Some people have started theads about particular bad WUs, but maybe we could put reports on all of our new bad/strange in the one place. Most recent example of thread about an individual WU: issue with dg05_c083_pqa001 ??? If this turns out to be useful, perhaps a forum mod will sticky the thread to stop it scrolling down and out of easy-to-find reach. - HTH |
||
|
Rickjb
Veteran Cruncher Australia Joined: Sep 17, 2006 Post Count: 666 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Here is my report of a strange WU - dg05_d145_pr67a1
This one is unusual because the first 4 copies issued all failed after about 9 minutes, and then the last 2 copies ran OK and validated. The result logs are all short as per normal, and give no mention of failure: > <stderr_txt> > INFO: No state to restore. Start from the beginning. > called boinc_finish > </stderr_txt> I expect that the techs occasionally check for WUs that fail for all quorum members, but they might not detect ones like this. dg05_ d145_ pr67a1_ 5-- | 640 | Valid | 26/06/12 15:36:21 | 27/06/12 18:36:07 | 4.33 | 66.7 / 69.7 dg05_ d145_ pr67a1_ 4-- | 640 | Valid | 26/06/12 15:36:18 | 27/06/12 02:43:26 | 2.94 | 72.7 / 69.7 dg05_ d145_ pr67a1_ 2-- | 640 | Error | 25/06/12 23:56:39 | 26/06/12 12:13:31 | 0.15 | 3.5 / 0.0 dg05_ d145_ pr67a1_ 3-- | 640 | Error | 25/06/12 23:56:35 | 26/06/12 15:36:01 | 0.15 | 4.5 / 0.0 dg05_ d145_ pr67a1_ 1-- | 640 | Error | 25/06/12 04:56:39 | 25/06/12 21:48:45 | 0.13 | 3.9 / 0.0 (moi) dg05_ d145_ pr67a1_ 0-- | 640 | Error | 25/06/12 04:56:37 | 25/06/12 23:56:12 | 0.13 | 3.3 / 0.0 ------ I had one other bad WU around the same time - dg05_c037_pr02a1 - but in that case all 6 copies got Error status. Again the failures were in the first 6-12 minutes and the result logs show nothing. This suggests that the same error condition occurred in both WUs. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Was reading an article yesterday about the dark rims that *cloud* has, and would not be surprised if it had something to do with this. ;O)
--//-- |
||
|
EZ123
Cruncher USA Joined: Nov 23, 2007 Post Count: 10 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Had a similar WU show up. The results may still be available at https://secure.worldcommunitygrid.org/ms/devi...s.do?workunitId=480981349. Of 7 attempts, 5 errored out (including mine), one was invalid, and two were valid. The valid results received an unusually high number of points, perhaps suggesting that one needed a powerful computer to run them?
Project Name: Discovering Dengue Drugs - Together - Phase 2 Created: 06/25/2012 14:09:06 Name: dg05_d460_pdb000 Minimum Quorum: 2 Replication: 3 Result Name App Version Number Status Sent Time Time Due / Return Time CPU Time (hours) Claimed/ Granted BOINC Credit dg05_ d460_ pdb000_ 7-- 640 Valid 6/28/12 01:16:20 6/28/12 19:24:19 2.00 69.6 / 70.1 dg05_ d460_ pdb000_ 6-- 640 Error 6/28/12 01:14:29 6/28/12 01:16:10 0.00 0.0 / 0.0 dg05_ d460_ pdb000_ 5-- 640 Error 6/28/12 01:12:46 6/28/12 01:14:26 0.00 0.0 / 0.0 dg05_ d460_ pdb000_ 4-- 640 Error 6/28/12 01:11:06 6/28/12 01:12:38 0.00 0.0 / 0.0 dg05_ d460_ pdb000_ 3-- 640 Valid 6/26/12 10:21:25 6/26/12 18:09:03 2.79 70.7 / 70.1 dg05_ d460_ pdb000_ 2-- 640 Error 6/26/12 10:19:45 6/26/12 10:21:16 0.00 0.0 / 0.0 dg05_ d460_ pdb000_ 1-- 640 Error 6/26/12 10:18:01 6/26/12 10:19:36 0.00 51.7 / 0.0 dg05_ d460_ pdb000_ 0-- 640 Invalid 6/26/12 10:17:57 6/27/12 21:15:43 1.15 39.4 / 35.1 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I captured the states of each replication of one I noticed in my Results Status...
This is not the only one I noticed like this; I seriously wonder how the scientists get any useful data out of these results. to wit: ![]() Note, the first copy was marked Valid, yet it exited with SIGABRT. The next 2 both exited with Calling gridPlatform.init() INFO: No state to restore. Start from the beginning. called boinc_finish yet one was given the status Invalid and the other Valid. What??? I have more like that in my results status, if you want/need them. |
||
|
|
![]() |