Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 254
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Assuming it's a 24/7 machine 100% CPU time machine, estimates occasionally go wonky, but BOINC adjusts for that over time, to request less work (no work actually), so it will get back to buffering 0.5 days per core as your comp crunches on. --//-- Thanks for the replies. It has been this way for over a week. I'll see if it starts getting better soon. I gave it a few days with not requesting new work. I just set it back to get .5 days again, and it immediately took 100+ WU's before I told it to stop. It just seems way off. Let me know if my math is reasonable: .5 days * 12 'cores' (after hyperthreading) = 6 days of work *24 = 144 hours of cache requested. My setting takes just under 5 hours per WU quite consistently. 144/5 = ~29 WU's Why does it keep taking several times this amount? I have been running just c4sw for about 2-3 weeks now. ETA: The 5 hour run-time is consistent, and the expected run-time is also consistently correct, so that is not the problem. [Edit 1 times, last edit by Former Member at May 11, 2012 7:14:18 PM] |
||
|
astroWX
Advanced Cruncher USA Joined: Sep 1, 2007 Post Count: 56 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Re: Task cfsw_1933_01933020_0_0
Task completed and has been "Uploading" with "0" progress for 189+ hours. Any suggestions as to how I can dislodge this critter (short of dynamite) from its love affair with the Transfers queue? (Other tasks come and go normally.) Mercy killing, perhaps? |
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
Suspend networking and re-enable it. This gets the transfer to start up again and often unclogs the drain for me.
----------------------------------------![]() Distributed computing volunteer since September 27, 2000 |
||
|
astroWX
Advanced Cruncher USA Joined: Sep 1, 2007 Post Count: 56 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Suspend networking and re-enable it. This gets the transfer to start up again and often unclogs the drain for me. Thanks for the reply; appreciated. That was done a few times and there was also a reboot. No joy. (The Task times out on the 30th; if it hasn't uploaded by then, I'll pull the plug on CFSW and return to Clean Water and Clean Energy, which are closer to my heart anyway.) From client_state -- name vs. simulation? <result> <name>cfsw_1933_01933020_0</name> <final_cpu_time>12405.390000</final_cpu_time> <exit_status>0</exit_status> <state>4</state> <platform>windows_intelx86</platform> <version_num>605</version_num> <stderr_out> <![CDATA[ <stderr_txt> [12:14:52] INFO:Beginning simulation: 1990:240:128601652 [12:20:31] INFO: Finished tick number 4 [12:24:53] INFO: Finished tick number 9 [12:28:40] INFO: Finished tick number 14 [12:33:29] INFO: Finished tick number 19 [12:37:07] INFO: Finished tick number 24 [12:41:53] INFO: Finished tick number 29 [12:46:00] INFO: Finished tick number 34 [12:50:10] INFO: Finished tick number 39 [12:54:48] INFO: Finished tick number 44 [12:58:18] INFO: Finished tick number 49 [13:03:09] INFO: Finished tick number 54 [13:07:01] INFO: Finished tick number 59 [13:11:41] INFO: Finished tick number 64 [13:16:08] INFO: Finished tick number 69 [13:20:01] INFO: Finished tick number 74 [13:24:58] INFO: Finished tick number 79 [13:28:42] INFO: Finished tick number 84 [13:33:35] INFO: Finished tick number 89 [13:37:50] INFO: Finished tick number 94 [13:42:06] INFO: Finished tick number 99 [13:46:52] INFO: Finished tick number 104 [13:50:30] INFO: Finished tick number 109 [13:55:30] INFO: Finished tick number 114 [13:59:29] INFO: Finished tick number 119 [14:04:11] INFO: Finished tick number 124 [14:08:38] INFO: Finished tick number 129 [14:12:31] INFO: Finished tick number 134 [14:17:23] INFO: Finished tick number 139 [14:21:04] INFO: Finished tick number 144 [14:25:49] INFO: Finished tick number 149 [14:29:57] INFO: Finished tick number 154 [14:34:06] INFO: Finished tick number 159 [14:38:46] INFO: Finished tick number 164 [14:42:17] INFO: Finished tick number 169 [14:47:13] INFO: Finished tick number 174 [14:51:13] INFO: Finished tick number 179 [14:55:54] INFO: Finished tick number 184 [15:00:22] INFO: Finished tick number 189 [15:04:14] INFO: Finished tick number 194 [15:09:07] INFO: Finished tick number 199 [15:12:44] INFO: Finished tick number 204 [15:17:26] INFO: Finished tick number 209 [15:21:32] INFO: Finished tick number 214 [15:25:40] INFO: Finished tick number 219 [15:30:17] INFO: Finished tick number 224 [15:33:47] INFO: Finished tick number 229 [15:38:37] INFO: Finished tick number 234 [15:42:28] INFO: Finished tick number 239 15:42:28 (2952): called boinc_finish </stderr_txt> ]]> </stderr_out> <wu_name>cfsw_1933_01933020</wu_name> <report_deadline>1338382657.000000</report_deadline> <file_ref> <file_name>cfsw_1933_01933020_0_0</file_name> <open_name>result.out</open_name> </file_ref> </result> |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Assuming it's a 24/7 machine 100% CPU time machine, estimates occasionally go wonky, but BOINC adjusts for that over time, to request less work (no work actually), so it will get back to buffering 0.5 days per core as your comp crunches on. --//-- Thanks for the replies. It has been this way for over a week. I'll see if it starts getting better soon. I gave it a few days with not requesting new work. I just set it back to get .5 days again, and it immediately took 100+ WU's before I told it to stop. It just seems way off. Let me know if my math is reasonable: .5 days * 12 'cores' (after hyperthreading) = 6 days of work *24 = 144 hours of cache requested. My setting takes just under 5 hours per WU quite consistently. 144/5 = ~29 WU's Why does it keep taking several times this amount? I have been running just c4sw for about 2-3 weeks now. ETA: The 5 hour run-time is consistent, and the expected run-time is also consistently correct, so that is not the problem. Missed to see it in prior posts and missed to ask, but what client version is/was this with? --//-- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I successfully ran a repair job overnight where the erroring job gave this
Result Name: cfsw_ 2791_ 02791650_ 1-- <core_client_version>5.10.45</core_client_version> <![CDATA[ <message> too many normally harmless exit(s) </message> ]]> I don't recall seeing that error before, and I thought it was quite amusing ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Covered in FAQs under the "too many...". Indicates there were 100 of these "zero status..." exits on the task, 100 reverts to prior checkpoint. Nothing new there, the usual obstruction of some form or a too busy system, not allowing boinc.exe to communicate with the science app and back at least once per 30 seconds on a continuous basis.
--//-- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Assuming it's a 24/7 machine 100% CPU time machine, estimates occasionally go wonky, but BOINC adjusts for that over time, to request less work (no work actually), so it will get back to buffering 0.5 days per core as your comp crunches on. --//-- Thanks for the replies. It has been this way for over a week. I'll see if it starts getting better soon. I gave it a few days with not requesting new work. I just set it back to get .5 days again, and it immediately took 100+ WU's before I told it to stop. It just seems way off. Let me know if my math is reasonable: .5 days * 12 'cores' (after hyperthreading) = 6 days of work *24 = 144 hours of cache requested. My setting takes just under 5 hours per WU quite consistently. 144/5 = ~29 WU's Why does it keep taking several times this amount? I have been running just c4sw for about 2-3 weeks now. ETA: The 5 hour run-time is consistent, and the expected run-time is also consistently correct, so that is not the problem. Missed to see it in prior posts and missed to ask, but what client version is/was this with? --//-- Thanks again for replying. I am using 6.10.58 downloaded from here. win7x64 The problem is persisting, and seems fairly consistent. I used to run with a two day cache, but I have been needing to pull the reins in tight. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
1) Post the following lines from the client_state.xml file in the BOINC data dir
A) near the top: <p_fpops>2617901756.945803</p_fpops> <p_iops>9478292194.844500</p_iops> and the section: <time_stats> <on_frac>0.880827</on_frac> <connected_frac>0.712379</connected_frac> <active_frac>0.999630</active_frac> <gpu_active_frac>0.999630</gpu_active_frac> <last_update>1338816175.107945</last_update> </time_stats> which each under normal 24/7 operation would have values near 1.000000 for each. From same file, way down after the first line mentioning World community Grid, the entry that looks like the below (about 15 lines below the <project> line. <duration_correction_factor>1.148989</duration_correction_factor> A normal running device would have a value in the range of 0.500000 to 2.000000, it varies due the non-constant run times for various WCG projects. 2) Do a benchmark via the BOINC manager advanced menu, then post the benchmark values from the message/event log (Whetstone/Dhrystone) The sum of this info may tell us a bit more. 5 hours for a CFSW is about the time my laptop takes and that benchmarks about 2600 / 9400. TTYL --//-- |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
1) Post the following lines from the client_state.xml file in the BOINC data dir A) near the top: <p_fpops>2617901756.945803</p_fpops> <p_iops>9478292194.844500</p_iops> and the section: <time_stats> <on_frac>0.880827</on_frac> <connected_frac>0.712379</connected_frac> <active_frac>0.999630</active_frac> <gpu_active_frac>0.999630</gpu_active_frac> <last_update>1338816175.107945</last_update> </time_stats> which each under normal 24/7 operation would have values near 1.000000 for each. From same file, way down after the first line mentioning World community Grid, the entry that looks like the below (about 15 lines below the <project> line. <duration_correction_factor>1.148989</duration_correction_factor> A normal running device would have a value in the range of 0.500000 to 2.000000, it varies due the non-constant run times for various WCG projects. 2) Do a benchmark via the BOINC manager advanced menu, then post the benchmark values from the message/event log (Whetstone/Dhrystone) The sum of this info may tell us a bit more. 5 hours for a CFSW is about the time my laptop takes and that benchmarks about 2600 / 9400. TTYL --//-- <p_fpops>2742759499.188092</p_fpops> <p_iops>6923576075.204710</p_iops> - <time_stats> <on_frac>0.990472</on_frac> <connected_frac>0.999643</connected_frac> <active_frac>0.957818</active_frac> <last_update>1338823342.361286</last_update> </time_stats> <duration_correction_factor>1.011737</duration_correction_factor> Benchmarks: Number of CPUs: 12 2748 Whetstone 6322 Dhrystone Everything looks clean to me from this. Maybe you have more insight. |
||
|
|
![]() |