Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3268
Posts: 3268   Pages: 327   [ Previous Page | 254 255 256 257 258 259 260 261 262 263 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3153425 times and has 3267 replies Next Thread
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 973
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

They not only gave out ARPs but OPNGs which also cause http errors. I'm hoping that once caches are full that the http errors will go away, and it only the issue of filling massive caches that is causing the issues. We shall see. I have my 4 ARPs downloaded, and will only be asking for 1 at a time now.
[Apr 17, 2023 10:30:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

All 6 tasks failed due to HTTP transient errors followed by WU download error: couldn't get input files

ARP1_0022351_139_1 Error 8e7ecffcadcbf6aa6f6f338ab86adcfc. md5 checksum failed for file
ARP1_0030537_139_0 Error c6d66aed2474ceeee1340458237ef91d.7z md5 checksum failed for file
ARP1_0030908_139_0 Error 3fd15f18d919f3a713d1c4fb5f19205a. md5 checksum failed for file
ARP1_0002005_139_1 Error 35bd8aba27b6511cf80a5937eca59361. md5 checksum failed for file
ARP1_0022828_139_0 Error 94ec243793909862224ef1022311ed1b.7z md5 checksum failed for file
ARP1_0032468_139_0 Error 4a83b4e78e1c7199090b4a3d1de24667. md5 checksum failed for file

The failing http downloads created zero byte files probably causing those checksum errors

I could save a 7th task by manual downloading the needed files.
----------------------------------------
[Edit 1 times, last edit by Crystal Pellet at Apr 18, 2023 9:59:00 AM]
[Apr 18, 2023 9:56:30 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2167
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I have a question.

This is my situation:
CPUtime 	Remaining	Est.Total	Percentage	Name----------------
5:30:03 27:35:10 10:14:31 53.7083% ARP1_0025248_138_0
5:11:15 28:18:22 9:52:51 52.5000% ARP1_0029800_138_0
4:59:51 29:47:46 9:59:42 50.0000% ARP1_0004984_138_0
4:57:52 32:56:58 11:06:14 44.7083% ARP1_0019266_138_0
4:39:11 36:21:49 11:56:16 38.9792% ARP1_0016414_139_0
4:38:32 33:24:32 10:33:55 43.9375% ARP1_0005472_138_0
4:22:56 37:59:24 12:05:22 36.2500% ARP1_0011898_138_0
-------- 59:35:32 0.0000% ARP1_0010532_138_1
-------- 59:35:32 0.0000% ARP1_0003820_139_0
-------- 59:35:32 0.0000% ARP1_0018509_139_1

BOINC Manager shows a guestimate of about 60 hours for ARP1-tasks when they arrive. The machine has been running for more than 100 days without rebooting and still the expected runtimes for each ARP1-task are measured to last several days upon arrival, where only 10-12 hours are needed in reality. What can I do to accomplish more 'normal' behaviour?

Adri
PS The output is from 'wcgresults -NCREP1'
----------------------------------------
[Edit 1 times, last edit by adriverhoef at Apr 18, 2023 10:24:29 AM]
[Apr 18, 2023 10:16:06 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

What can I do to accomplish more 'normal' behaviour?'
The first thing you could try is to rerun BOINC CPU benchmark.
Before you do, you could check what floating point and integer BOINC is using now.
[Apr 18, 2023 12:07:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2167
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

What can I do to accomplish more 'normal' behaviour?'
The first thing you could try is to rerun BOINC CPU benchmark.

Okay, I'll do that when i get home.
Before you do, you could check what floating point and integer BOINC is using now.

What do you mean, Crystal Pellet, and check how? The Linux machine is only running WCG.

Adri
[Apr 18, 2023 12:29:14 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

What do you mean, Crystal Pellet, and check how? The Linux machine is only running WCG.
WCG is not 100% BOINC compatible, so this info is not visible for us on WCG-server side, so you have to dig into
the client_state.xml-file in BOINC's data directory (you will understand, that I'm talking Windows not Linux smile )
Just after the 10th line you'll find something like this example

<p_fpops>3883111331.054186</p_fpops>
<p_iops>10034742180.348236</p_iops>

which means

Measured floating point speed 3.88 billion ops/sec
Measured integer speed 10.03 billion ops/sec

These values are used together with a fpops estimation for a task coming from the server to calculate the duration.
[Apr 18, 2023 1:04:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Unixchick
Veteran Cruncher
Joined: Apr 16, 2020
Post Count: 973
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

I'm no longer getting any ARP WUs. I'm guessing people panicked and made a large queue. I'm guessing I'll get resends in a week, until then I'll keep working on my MCM year badge.
[Apr 18, 2023 3:12:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12398
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

CP

My problem is with not being able to get enough work downloaded as Event log says don't need on one of my machines. They both have the same CPU i7-3770 and the estimated remainders are similar and approximately correct.

The one I am having the problem with has an active_frac of 0.999980, p_fpops of 4,113,965,686 and p-iops of 8,650,244,149 ignoring decimals.

The figures for the other machine are 0.999990, 3,989,170,155 & 10,037,514,072 respectively.

Both machines take about 24 hours for ARP1 and have the same settings for cache (12 ARP1, 8 MCM1 & 8 OPN1) and app_config which restricts ARP1 to 4, 2 & 2 threads out of 8 on each machine. That would mean a cache of 3 days for ARP1, 9 hours for MCM1 and 16 hours for OPN1

Any suggestions as to fixing the problem would be appreciated.

Mike
[Apr 18, 2023 5:03:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2167
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

so you have to dig into the client_state.xml-file in BOINC's data directory
Just after the 10th line you'll find something like this example

    <p_fpops>3883111331.054186</p_fpops>
<p_iops>10034742180.348236</p_iops>


OK. I've recorded some screenshots from before and after running the benchmark.
The old values were:
$ grep p_[a-z]*ops client_state.xml 
<p_fpops>1000000000.000000</p_fpops>
<p_iops>1000000000.000000</p_iops>

The new values are:
$ grep p_[a-z]*ops client_state.xml 
<p_fpops>7678583050.955497</p_fpops>
<p_iops>90953821890.227875</p_iops>

That's about the same as what the Event Log is trying to tell after running the benchmark:
Tue 18 Apr 2023 20:02:27 |  | 7679 floating point MIPS (Whetstone) per CPU
Tue 18 Apr 2023 20:02:27 | | 90954 integer MIPS (Dhrystone) per CPU


In BOINC Manager there were many MCM1-tasks with an expected runtime of 9 hours.
After running the benchmark, their expected runtimes dropped to 1½ hours.
I don't have any uninitialized ARP1-tasks at the moment on the machine,
to check that these have also had their runtimes firmly reduced.

Luckily, running the BOINC benchmark honours the LAIM(*1) setting.

[*1] Leave applications in memory
(actually it is saying: "Leave non-GPU tasks in memory while suspended")

Adri
PS In this message from 2010 ("Running CPU benchmarks") it is claimed (by Gundolf Jahn) that "BOINC runs those benchmarks every fifth day." This isn't true anymore, I guess?
[Apr 18, 2023 6:46:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1322
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

@AdriVerhoef:
That's great!
PS In this message from 2010 ("Running CPU benchmarks") it is claimed (by Gundolf Jahn) that "BOINC runs those benchmarks every fifth day." This isn't true by cc_config-setting, but anymore, I guess?
I'm not sure, because I suppressed running benchmark by setting in cc_config.xml (don't want to loose cpu-cycles and the machine doesn't change), but I think it's running default with new machines, new BOINC version etc.
[Apr 18, 2023 7:21:18 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 3268   Pages: 327   [ Previous Page | 254 255 256 257 258 259 260 261 262 263 | Next Page ]
[ Jump to Last Post ]
Post new Thread