World Community Grid - View Thread - Computing for Sustainable Water Problems Thread

World Community Grid Forums

Category: Completed Research

Forum: Computing for Sustainable Water Forum

Thread: Computing for Sustainable Water Problems Thread

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 254

[ ]

Author

This topic has been viewed 739124 times and has 253 replies

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Computing for Sustainable Water Problems Thread

Assuming it's a 24/7 machine 100% CPU time machine, estimates occasionally go wonky, but BOINC adjusts for that over time, to request less work (no work actually), so it will get back to buffering 0.5 days per core as your comp crunches on.

--//--

Thanks for the replies.
It has been this way for over a week. I'll see if it starts getting better soon.

I gave it a few days with not requesting new work.
I just set it back to get .5 days again, and it immediately took 100+ WU's before I told it to stop.
It just seems way off.
Let me know if my math is reasonable:
.5 days * 12 'cores' (after hyperthreading) = 6 days of work *24 = 144 hours of cache requested.

My setting takes just under 5 hours per WU quite consistently.
144/5 = ~29 WU's

Why does it keep taking several times this amount? I have been running just c4sw for about 2-3 weeks now.

ETA: The 5 hour run-time is consistent, and the expected run-time is also consistently correct, so that is not the problem.

----------------------------------------
[Edit 1 times, last edit by Former Member at May 11, 2012 7:14:18 PM]

[May 11, 2012 6:05:48 PM]

astroWX
Advanced Cruncher
USA
Joined: Sep 1, 2007
Post Count: 56
Status: Offline
Project Badges:

45 day badge for Human Proteome Folding - Phase 2

1 year badge for Help Fight Childhood Cancer

10 year badge for The Clean Energy Project - Phase 2

5 year badge for Computing for Clean Water

2 year badge for Computing for Sustainable Water

180 day badge for Uncovering Genome Mysteries

20 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Computing for Sustainable Water Problems Thread

Re: Task cfsw_1933_01933020_0_0
Task completed and has been "Uploading" with "0" progress for 189+ hours. Any suggestions as to how I can dislodge this critter (short of dynamite) from its love affair with the Transfers queue? (Other tasks come and go normally.) Mercy killing, perhaps?

[May 28, 2012 8:46:21 PM]

KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline


Re: Computing for Sustainable Water Problems Thread

Suspend networking and re-enable it. This gets the transfer to start up again and often unclogs the drain for me.

----------------------------------------

Distributed computing volunteer since September 27, 2000

[May 29, 2012 1:03:42 AM]

astroWX
Advanced Cruncher
USA
Joined: Sep 1, 2007
Post Count: 56
Status: Offline
Project Badges:


Re: Computing for Sustainable Water Problems Thread

Suspend networking and re-enable it. This gets the transfer to start up again and often unclogs the drain for me.

Thanks for the reply; appreciated. That was done a few times and there was also a reboot. No joy.

(The Task times out on the 30th; if it hasn't uploaded by then, I'll pull the plug on CFSW and return to Clean Water and Clean Energy, which are closer to my heart anyway.)

From client_state -- name vs. simulation?

<result>
<name>cfsw_1933_01933020_0</name>
<final_cpu_time>12405.390000</final_cpu_time>
<exit_status>0</exit_status>
<state>4</state>
<platform>windows_intelx86</platform>
<version_num>605</version_num>
<stderr_out>
<![CDATA[
<stderr_txt>
[12:14:52] INFO:Beginning simulation: 1990:240:128601652
[12:20:31] INFO: Finished tick number 4
[12:24:53] INFO: Finished tick number 9
[12:28:40] INFO: Finished tick number 14
[12:33:29] INFO: Finished tick number 19
[12:37:07] INFO: Finished tick number 24
[12:41:53] INFO: Finished tick number 29
[12:46:00] INFO: Finished tick number 34
[12:50:10] INFO: Finished tick number 39
[12:54:48] INFO: Finished tick number 44
[12:58:18] INFO: Finished tick number 49
[13:03:09] INFO: Finished tick number 54
[13:07:01] INFO: Finished tick number 59
[13:11:41] INFO: Finished tick number 64
[13:16:08] INFO: Finished tick number 69
[13:20:01] INFO: Finished tick number 74
[13:24:58] INFO: Finished tick number 79
[13:28:42] INFO: Finished tick number 84
[13:33:35] INFO: Finished tick number 89
[13:37:50] INFO: Finished tick number 94
[13:42:06] INFO: Finished tick number 99
[13:46:52] INFO: Finished tick number 104
[13:50:30] INFO: Finished tick number 109
[13:55:30] INFO: Finished tick number 114
[13:59:29] INFO: Finished tick number 119
[14:04:11] INFO: Finished tick number 124
[14:08:38] INFO: Finished tick number 129
[14:12:31] INFO: Finished tick number 134
[14:17:23] INFO: Finished tick number 139
[14:21:04] INFO: Finished tick number 144
[14:25:49] INFO: Finished tick number 149
[14:29:57] INFO: Finished tick number 154
[14:34:06] INFO: Finished tick number 159
[14:38:46] INFO: Finished tick number 164
[14:42:17] INFO: Finished tick number 169
[14:47:13] INFO: Finished tick number 174
[14:51:13] INFO: Finished tick number 179
[14:55:54] INFO: Finished tick number 184
[15:00:22] INFO: Finished tick number 189
[15:04:14] INFO: Finished tick number 194
[15:09:07] INFO: Finished tick number 199
[15:12:44] INFO: Finished tick number 204
[15:17:26] INFO: Finished tick number 209
[15:21:32] INFO: Finished tick number 214
[15:25:40] INFO: Finished tick number 219
[15:30:17] INFO: Finished tick number 224
[15:33:47] INFO: Finished tick number 229
[15:38:37] INFO: Finished tick number 234
[15:42:28] INFO: Finished tick number 239
15:42:28 (2952): called boinc_finish

</stderr_txt>
]]>
</stderr_out>
<wu_name>cfsw_1933_01933020</wu_name>
<report_deadline>1338382657.000000</report_deadline>
<file_ref>
<file_name>cfsw_1933_01933020_0_0</file_name>
<open_name>result.out</open_name>
</file_ref>
</result>

[May 29, 2012 7:46:07 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Computing for Sustainable Water Problems Thread

Thanks for the replies.
It has been this way for over a week. I'll see if it starts getting better soon.

Missed to see it in prior posts and missed to ask, but what client version is/was this with?

--//--

[May 29, 2012 7:56:08 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Computing for Sustainable Water Problems Thread

I successfully ran a repair job overnight where the erroring job gave this

Result Name: cfsw_ 2791_ 02791650_ 1--

<core_client_version>5.10.45</core_client_version>
<![CDATA[
<message>
too many normally harmless exit(s)
</message>
]]>

I don't recall seeing that error before, and I thought it was quite amusing smile

. It hasn't been mentioned in the forums for many months.

[Jun 2, 2012 7:05:46 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Computing for Sustainable Water Problems Thread

Covered in FAQs under the "too many...". Indicates there were 100 of these "zero status..." exits on the task, 100 reverts to prior checkpoint. Nothing new there, the usual obstruction of some form or a too busy system, not allowing boinc.exe to communicate with the science app and back at least once per 30 seconds on a continuous basis.

--//--

[Jun 2, 2012 7:56:42 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Computing for Sustainable Water Problems Thread

Thanks for the replies.
It has been this way for over a week. I'll see if it starts getting better soon.

Missed to see it in prior posts and missed to ask, but what client version is/was this with?

--//--

Thanks again for replying. I am using 6.10.58 downloaded from here. win7x64
The problem is persisting, and seems fairly consistent. I used to run with a two day cache, but I have been needing to pull the reins in tight.

[Jun 4, 2012 2:53:33 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Computing for Sustainable Water Problems Thread

1) Post the following lines from the client_state.xml file in the BOINC data dir

A) near the top:

<p_fpops>2617901756.945803</p_fpops>
<p_iops>9478292194.844500</p_iops>

and the section:

<time_stats>
<on_frac>0.880827</on_frac>
<connected_frac>0.712379</connected_frac>
<active_frac>0.999630</active_frac>
<gpu_active_frac>0.999630</gpu_active_frac>
<last_update>1338816175.107945</last_update>
</time_stats>

which each under normal 24/7 operation would have values near 1.000000 for each.

From same file, way down after the first line mentioning World community Grid, the entry that looks like the below (about 15 lines below the <project> line.

<duration_correction_factor>1.148989</duration_correction_factor>

A normal running device would have a value in the range of 0.500000 to 2.000000, it varies due the non-constant run times for various WCG projects.

2) Do a benchmark via the BOINC manager advanced menu, then post the benchmark values from the message/event log (Whetstone/Dhrystone)

The sum of this info may tell us a bit more. 5 hours for a CFSW is about the time my laptop takes and that benchmarks about 2600 / 9400.

TTYL

--//--

[Jun 4, 2012 3:09:52 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Computing for Sustainable Water Problems Thread

<p_fpops>2742759499.188092</p_fpops>
<p_iops>6923576075.204710</p_iops>

- <time_stats>
<on_frac>0.990472</on_frac>
<connected_frac>0.999643</connected_frac>
<active_frac>0.957818</active_frac>
<last_update>1338823342.361286</last_update>
</time_stats>

<duration_correction_factor>1.011737</duration_correction_factor>

Benchmarks:
Number of CPUs: 12
2748 Whetstone
6322 Dhrystone

Everything looks clean to me from this. Maybe you have more insight.

[Jun 4, 2012 3:41:26 PM]

[ ]