Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: The Clean Energy Project - Phase 2 Forum Thread: CEP2 Crashing |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 25
|
Author |
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 11818 Status: Offline Project Badges: |
I have had the following happen twice in 2 days:
02/07/2013 19:11:17 | World Community Grid | Task E214114_340_C.36.C29H13N3S4.00446664.3.set1d06_0 exited with zero status but no 'finished' file 02/07/2013 19:11:17 | World Community Grid | If this happens repeatedly you may need to reset the project. 02/07/2013 19:11:17 | World Community Grid | Task E214164_855_C.35.C27H16N4S2Si2.01505869.1.set1d06_0 exited with zero status but no 'finished' file 02/07/2013 19:11:17 | World Community Grid | If this happens repeatedly you may need to reset the project. 02/07/2013 19:11:17 | World Community Grid | Task E214162_747_C.37.C28H12N6OS2.01183399.2.set1d06_0 exited with zero status but no 'finished' file 02/07/2013 19:11:17 | World Community Grid | If this happens repeatedly you may need to reset the project. 02/07/2013 19:11:17 | World Community Grid | Task E214164_770_C.34.C29H18S3Si2.01160521.1.set1d06_0 exited with zero status but no 'finished' file 02/07/2013 19:11:17 | World Community Grid | If this happens repeatedly you may need to reset the project. 02/07/2013 19:11:17 | World Community Grid | Task E214163_998_C.36.C29H14N4S3.01369567.1.set1d06_0 exited with zero status but no 'finished' file 02/07/2013 19:11:17 | World Community Grid | If this happens repeatedly you may need to reset the project. 02/07/2013 19:11:17 | World Community Grid | Task E214164_567_C.36.C27H14N6S2Si.01351458.2.set1d06_0 exited with zero status but no 'finished' file 02/07/2013 19:11:17 | World Community Grid | If this happens repeatedly you may need to reset the project. 02/07/2013 19:11:17 | World Community Grid | Task E214163_853_C.36.C30H14N2OS3.01541700.1.set1d06_0 exited with zero status but no 'finished' file 02/07/2013 19:11:17 | World Community Grid | If this happens repeatedly you may need to reset the project. 02/07/2013 19:11:17 | World Community Grid | Computation for task E214114_167_C.35.C26H13N5S2SeSi.00998821.2.set1d06_0 finished 02/07/2013 19:11:17 | World Community Grid | Restarting task E214114_340_C.36.C29H13N3S4.00446664.3.set1d06_0 using cep2 version 640 in slot 5 02/07/2013 19:11:48 | World Community Grid | Restarting task E214164_855_C.35.C27H16N4S2Si2.01505869.1.set1d06_0 using cep2 version 640 in slot 7 02/07/2013 19:11:48 | World Community Grid | Restarting task E214162_747_C.37.C28H12N6OS2.01183399.2.set1d06_0 using cep2 version 640 in slot 3 02/07/2013 19:11:48 | World Community Grid | Restarting task E214164_770_C.34.C29H18S3Si2.01160521.1.set1d06_0 using cep2 version 640 in slot 4 02/07/2013 19:11:48 | World Community Grid | Restarting task E214163_998_C.36.C29H14N4S3.01369567.1.set1d06_0 using cep2 version 640 in slot 1 02/07/2013 19:11:48 | World Community Grid | Restarting task E214164_567_C.36.C27H14N6S2Si.01351458.2.set1d06_0 using cep2 version 640 in slot 2 02/07/2013 19:11:48 | World Community Grid | Restarting task E214163_853_C.36.C30H14N2OS3.01541700.1.set1d06_0 using cep2 version 640 in slot 0 02/07/2013 19:11:48 | World Community Grid | Starting task E214164_221_C.36.C30H14N2OS3.01297820.1.set1d06_0 using cep2 version 640 in slot 6 On both occasions more than half the work had been completed for most of thw units when it happened. Other units have finished correctly as did one in the listing above. Does anyone have any suggestions to stop it, please? I have an Intel i7-3770 CPU @ 3.40GHz with hyperthreading. Mike |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
That is a lot of CEP2 tasks to run simultaneously. What are your system specs, as shown during BOINC startup. Are your profile settings big enough to keep your system running? You can always check that by cutting your thread count in half.
Lawrence |
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 11818 Status: Offline Project Badges: |
Lawrence
02/07/2013 03:43:00 | | Processor: 8 GenuineIntel Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz [Family 6 Model 58 Stepping 9] 02/07/2013 03:43:00 | | Processor features: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss htt tm pni ssse3 cx16 sse4_1 sse4_2 popcnt aes syscall nx lm vmx smx tm2 pbe 02/07/2013 03:43:00 | | OS: Microsoft Windows 7: Professional x64 Edition, Service Pack 1, (06.01.7601.00) 02/07/2013 03:43:00 | | Memory: 3.81 GB physical, 7.62 GB virtual 02/07/2013 03:43:00 | | Disk: 916.37 GB total, 852.04 GB free Thanks Mike |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
You could try reading this
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Also look at the system requirements page at https://secure.worldcommunitygrid.org/help/viewTopic.do?shortName=minimumreq which asks for 1 GB RAM per CEP2 thread. As I suspected, your system has a lot less than 8 GB memory. CEP2 is very memory-hungry. Try reducing the number of CEP2 threads.
Lawrence |
||
|
candid
Cruncher Joined: Apr 3, 2013 Post Count: 2 Status: Offline |
I have several CEP 2 crashed too.
----------------------------------------But in my case the message i got is different: Output file E214253_630_C.36.C31H16N2O2S.01246556.1.set1d06_ 0_0 for task E214253_ 630_C.36.C31H16N2O2S.01246556.1.set1d06_ 0 is absent. I also have this Result Log Result Name: E214253_ 630_ C.36.C31H16N2O2S.01246556.1.set1d06_ 0-- <core_client_version>7.0.27</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> INFO: No state to restore. Start from the beginning. [07:33:39] Number of jobs = 16 [07:33:39] Starting job 0,CPU time has been restored to 0.000000. [07:34:13] Starting new Job [07:34:14] Qink name = fldman [07:34:15] Qink name = gesman [07:34:15] Qink name = scfman [07:57:02] Qink name = anlman [07:58:34] End of Job [07:58:35] Finished Job #0 [07:58:35] Starting job 1,CPU time has been restored to 1233.709102. [07:58:40] Starting new Job [07:58:40] Qink name = fldman [07:59:24] Qink name = gesman [07:59:26] Qink name = scfman [09:23:52] Qink name = anlman </stderr_txt> ]]> I am running two wu simultaneously, have 2 GB RAM and 3 GB Hard drive available for Boinc... Could some one help? Thanks [Edit 1 times, last edit by candid at Jul 3, 2013 11:50:26 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
With each WU requiring 1 gig RAM and you only having 2 gig it maybe cutting it fine, either give more RAM to BOINC or run only one CEP and see what happens....
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I also found this
----------------------------------------[Edit 1 times, last edit by Former Member at Jul 3, 2013 12:08:03 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello candid,
This does not ring any bells with me. Here is a typical SIGNAL 11 error: https://secure.worldcommunitygrid.org/forums/...ead,30167_offset,0#300160 As you can see on the FAQ at http://boincfaq.mundayweb.com/index.php?view=165 SIGNAL 11 can be caused by many different interrupt signals. I am interested in why you only have 3 GB of hard disk available for BOINC. As you can see, the system requirements at https://secure.worldcommunitygrid.org/help/viewTopic.do?shortName=minimumreq want 2 GB RAM and 4 GB hard disk for 2 threads of CEP2. These requirements are generally much bigger than are actually required, but if you start going below them you should carefully check your profile settings to make sure that everything is at 100% and not artificially lowering already small allotments of memory and hard disk. Lawrence |
||
|
candid
Cruncher Joined: Apr 3, 2013 Post Count: 2 Status: Offline |
Many thanks everyone for the quick replies.
----------------------------------------I am in fact running low on HD space, but by my experiece the 2 GB per WU are were more than necessary for my i7 with 8 GB RAM. But in this case I am running an atom from a USB stick with no real HD. that is why i have only 3GB 'HD' space for boinc. The software I used (lili usb creator) does not allow more than 4 GB and some has to be left for the system (ubuntu 12). I am now going to try to run one CEP only with some other type of wu (faah). Lowered to usage limit to 35%. I always check the option to leave app in memory while suspended. while writting this i got an error message from the OS: compiz failed. The OS is asking me to continue or relaunch the OS. I remember that I had seen this before and I did rever relaunch. Could this be (part of) the problem? Other wu such as faah and dsl did not fail because of this failure in compiz. Thanks [Edit 2 times, last edit by candid at Jul 3, 2013 12:47:07 PM] |
||
|
|