Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: Computation Error |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 8
|
Author |
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: |
My desktop keeps getting computation errors on the CEP2 WUs; anyone else or any idea what could be causing it?
----------------------------------------Q6600 with 4gb of RAM; 2x GTX 660 running folding@home (each using one of the 4 Q6600 cores). CPU core has been below 60C during the computations, and no overclocking/volting on anything |
||
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3265 Status: Offline Project Badges: |
Hi,
----------------------------------------Please post a result log from an errored work unit(go to results status page and click "error" in a WU) and post the messages from your BOINC manager. AMD Ryzen 5 1600AF 4C/8T 3.2 GHz - 85W AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W Intel Z3740 4C/4T 1.8 GHz - 6W |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: |
First:
----------------------------------------Result Name: E214359_ 854_ C.35.C28H17N3S2Si2.02178101.0.set1d06_ 0-- <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The pipe is being closed. (0xe8) - exit code 232 (0xe8) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. INFO: No state to restore. Start from the beginning. [08:53:52] Number of jobs = 16 [08:53:52] Starting job 0,CPU time has been restored to 0.000000. [08:59:23] Finished Job #0 [08:59:23] Starting job 1,CPU time has been restored to 313.172008. [09:14:41] Finished Job #1 [09:14:41] Starting job 2,CPU time has been restored to 1202.393308. [15:42:38] Finished Job #2 [15:42:38] Starting job 3,CPU time has been restored to 23685.491429. [15:58:33] Finished Job #3 [15:58:33] Starting job 4,CPU time has been restored to 24609.578953. [16:11:35] Finished Job #4 [16:11:35] Starting job 5,CPU time has been restored to 25366.683006. [16:24:50] Finished Job #5 [16:24:50] Starting job 6,CPU time has been restored to 26139.480760. [16:37:37] Finished Job #6 [16:37:37] Starting job 7,CPU time has been restored to 26884.385535. [16:54:40] Finished Job #7 [16:54:40] Starting job 8,CPU time has been restored to 27844.992892. [17:07:28] Finished Job #8 [17:07:28] Starting job 9,CPU time has been restored to 28590.802473. [17:22:04] Finished Job #9 [17:22:04] Starting job 10,CPU time has been restored to 29440.524320. [17:49:40] Finished Job #10 [17:49:40] Starting job 11,CPU time has been restored to 31059.050295. [18:07:02] Finished Job #11 [18:07:02] Starting job 12,CPU time has been restored to 32074.991208. [00:24:03] Number of jobs = 16 [00:24:03] Starting job 12,CPU time has been restored to 32074.991208. Application exited with RC = 0xc0000005 [03:59:41] Finished Job #12 [03:59:41] Starting job 13,CPU time has been restored to 39343.483401. [03:59:41] Skipping Job #13 [03:59:41] Starting job 14,CPU time has been restored to 39343.483401. [03:59:41] Skipping Job #14 [03:59:41] Starting job 15,CPU time has been restored to 39343.483401. [03:59:41] Skipping Job #15 Unable to open result file C.35.C28H17N3S2Si2.02178101.0.bp86.svp.n.bp86.svp.n.sp.out </stderr_txt> ]]> Second: Result Name: E214357_ 512_ C.34.C28H17NOSSeSi2.01541912.3.set1d06_ 0-- <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The pipe is being closed. (0xe8) - exit code 232 (0xe8) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [19:07:34] Number of jobs = 16 [19:07:34] Starting job 0,CPU time has been restored to 0.000000. [19:12:44] Finished Job #0 [19:12:44] Starting job 1,CPU time has been restored to 294.280286. [00:24:03] Number of jobs = 16 [00:24:03] Starting job 1,CPU time has been restored to 294.280286. [04:17:33] Finished Job #1 [04:17:33] Starting job 2,CPU time has been restored to 1318.130449. [10:56:11] Finished Job #2 [10:56:11] Starting job 3,CPU time has been restored to 24663.914101. [11:13:53] Finished Job #3 [11:13:53] Starting job 4,CPU time has been restored to 25693.037098. [11:27:43] Finished Job #4 [11:27:43] Starting job 5,CPU time has been restored to 26498.470261. [11:41:44] Finished Job #5 [11:41:44] Starting job 6,CPU time has been restored to 27316.336703. [11:55:27] Finished Job #6 [11:55:27] Starting job 7,CPU time has been restored to 28116.777834. [12:12:58] Finished Job #7 [12:12:58] Starting job 8,CPU time has been restored to 29142.359608. [12:26:39] Finished Job #8 [12:26:39] Starting job 9,CPU time has been restored to 29942.566738. [12:42:12] Finished Job #9 [12:42:12] Starting job 10,CPU time has been restored to 30854.938586. [13:11:50] Finished Job #10 [13:11:50] Starting job 11,CPU time has been restored to 32589.139303. [13:30:07] Finished Job #11 [13:30:07] Starting job 12,CPU time has been restored to 33658.338957. Killing job because cpu time has been exceeded. Subjob start time = -658772391, Subjob current time = 1088450378 [16:13:39] Finished Job #12 Unable to open result file C.34.C28H17NOSSeSi2.01541912.3.noopt.bp86.sto6g.n.sp.out </stderr_txt> ]]> Third: Result Name: E214359_ 864_ C.35.C28H17N3S2Si2.01932130.1.set1d06_ 0-- <core_client_version>7.0.64</core_client_version> <![CDATA[ <message> The pipe is being closed. (0xe8) - exit code 232 (0xe8) </message> <stderr_txt> INFO: No state to restore. Start from the beginning. INFO: No state to restore. Start from the beginning. [08:53:52] Number of jobs = 16 [08:53:52] Starting job 0,CPU time has been restored to 0.000000. [08:59:36] Finished Job #0 [08:59:36] Starting job 1,CPU time has been restored to 321.018858. [09:16:18] Finished Job #1 [09:16:18] Starting job 2,CPU time has been restored to 1287.023850. [16:20:02] Finished Job #2 [16:20:02] Starting job 3,CPU time has been restored to 25746.420640. [16:37:53] Finished Job #3 [16:37:53] Starting job 4,CPU time has been restored to 26771.113209. [16:51:38] Finished Job #4 [16:51:38] Starting job 5,CPU time has been restored to 27564.721496. [17:06:21] Finished Job #5 [17:06:21] Starting job 6,CPU time has been restored to 28387.267968. [17:20:21] Finished Job #6 [17:20:21] Starting job 7,CPU time has been restored to 29195.852352. [17:38:11] Finished Job #7 [17:38:11] Starting job 8,CPU time has been restored to 30232.603797. [17:51:50] Finished Job #8 [17:51:50] Starting job 9,CPU time has been restored to 31024.932876. [18:07:28] Finished Job #9 [18:07:28] Starting job 10,CPU time has been restored to 31936.789922. [18:37:22] Finished Job #10 [18:37:22] Starting job 11,CPU time has been restored to 33683.314717. [18:56:22] Finished Job #11 [18:56:22] Starting job 12,CPU time has been restored to 34790.375814. [00:24:03] Number of jobs = 16 [00:24:03] Starting job 12,CPU time has been restored to 34790.375814. Killing job because cpu time has been exceeded. Subjob start time = 111875308, Subjob current time = 1088486604 [04:19:32] Finished Job #12 Unable to open result file C.35.C28H17N3S2Si2.01932130.1.noopt.bp86.sto6g.n.sp.out </stderr_txt> ]]> (they all appear to have been "killed" because "CPU time has been exceeded"; I did not know there was a log for this, thank you for pointing it out to me) |
||
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3265 Status: Offline Project Badges: |
Could you pelase post the messages from the BOINC manager?
----------------------------------------to do this open up BOINC, "view"-->"Advanced view"-->"Advanced"-->Event Log and then copy the messages there(first 40 or so). Are you by any chance using Linux? AMD Ryzen 5 1600AF 4C/8T 3.2 GHz - 85W AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W Intel Z3740 4C/4T 1.8 GHz - 6W |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: |
I am unable to copy the messages at present, but I do notice that there are several entries noting that "Output file E[...] for task E[...] absent"
----------------------------------------I'm on Windows 7 home premium 64 |
||
|
Falconet
Master Cruncher Portugal Joined: Mar 9, 2009 Post Count: 3265 Status: Offline Project Badges: |
I'm not sure but having two GPU's working plus 4 CEP2 work units all at the same time might be too much for your computer, namely the hard drive. Try running fewer CEP2 work units and change them with another project like fightaids@home(only other project with work available at WCG) and see if the errors continue.
----------------------------------------AMD Ryzen 5 1600AF 4C/8T 3.2 GHz - 85W AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W Intel Z3740 4C/4T 1.8 GHz - 6W [Edit 1 times, last edit by Falconet at Jul 10, 2013 10:04:37 PM] |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: |
I agree with your assessment, and limited at the installation of the two GPUs WCG to using only 2 cores, the 2 other cores feeding the GPUs
---------------------------------------- |
||
|
Mgruben
Advanced Cruncher Joined: May 26, 2013 Post Count: 94 Status: Offline Project Badges: |
I had to restart my computer and it cleared the log, but my other suspicion is that since I had not properly set the BIOS clock since the installation of the second GPU (it thought it was sometime in 2006), that perhaps that might have had something to do with the error flags thrown(?)
----------------------------------------That or my 4-core CPU and 4gb ram can't handle 2 CEP2 WUs (1 core each, BOINC-max of 2 cores) with 2 P8900 folding@home work units (1 core each, no F@H-native CPU-usage restrictions) Thank you for your help by the way. |
||
|
|