Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 3268
|
![]() |
Author |
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 973 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
They not only gave out ARPs but OPNGs which also cause http errors. I'm hoping that once caches are full that the http errors will go away, and it only the issue of filling massive caches that is causing the issues. We shall see. I have my 4 ARPs downloaded, and will only be asking for 1 at a time now.
|
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
All 6 tasks failed due to HTTP transient errors followed by WU download error: couldn't get input files
----------------------------------------ARP1_0022351_139_1 Error 8e7ecffcadcbf6aa6f6f338ab86adcfc. md5 checksum failed for file ARP1_0030537_139_0 Error c6d66aed2474ceeee1340458237ef91d.7z md5 checksum failed for file ARP1_0030908_139_0 Error 3fd15f18d919f3a713d1c4fb5f19205a. md5 checksum failed for file ARP1_0002005_139_1 Error 35bd8aba27b6511cf80a5937eca59361. md5 checksum failed for file ARP1_0022828_139_0 Error 94ec243793909862224ef1022311ed1b.7z md5 checksum failed for file ARP1_0032468_139_0 Error 4a83b4e78e1c7199090b4a3d1de24667. md5 checksum failed for file The failing http downloads created zero byte files probably causing those checksum errors I could save a 7th task by manual downloading the needed files. [Edit 1 times, last edit by Crystal Pellet at Apr 18, 2023 9:59:00 AM] |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2167 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I have a question.
----------------------------------------This is my situation: CPUtime Remaining Est.Total Percentage Name---------------- BOINC Manager shows a guestimate of about 60 hours for ARP1-tasks when they arrive. The machine has been running for more than 100 days without rebooting and still the expected runtimes for each ARP1-task are measured to last several days upon arrival, where only 10-12 hours are needed in reality. What can I do to accomplish more 'normal' behaviour? Adri PS The output is from 'wcgresults -NCREP1' [Edit 1 times, last edit by adriverhoef at Apr 18, 2023 10:24:29 AM] |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What can I do to accomplish more 'normal' behaviour?' The first thing you could try is to rerun BOINC CPU benchmark.Before you do, you could check what floating point and integer BOINC is using now. |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2167 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What can I do to accomplish more 'normal' behaviour?' The first thing you could try is to rerun BOINC CPU benchmark.Okay, I'll do that when i get home. Before you do, you could check what floating point and integer BOINC is using now. What do you mean, Crystal Pellet, and check how? The Linux machine is only running WCG. Adri |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What do you mean, Crystal Pellet, and check how? The Linux machine is only running WCG. WCG is not 100% BOINC compatible, so this info is not visible for us on WCG-server side, so you have to dig intothe client_state.xml-file in BOINC's data directory (you will understand, that I'm talking Windows not Linux ![]() Just after the 10th line you'll find something like this example <p_fpops>3883111331.054186</p_fpops> <p_iops>10034742180.348236</p_iops> which means Measured floating point speed 3.88 billion ops/sec Measured integer speed 10.03 billion ops/sec These values are used together with a fpops estimation for a task coming from the server to calculate the duration. |
||
|
Unixchick
Veteran Cruncher Joined: Apr 16, 2020 Post Count: 973 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() |
I'm no longer getting any ARP WUs. I'm guessing people panicked and made a large queue. I'm guessing I'll get resends in a week, until then I'll keep working on my MCM year badge.
|
||
|
Mike.Gibson
Ace Cruncher England Joined: Aug 23, 2007 Post Count: 12398 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
CP
My problem is with not being able to get enough work downloaded as Event log says don't need on one of my machines. They both have the same CPU i7-3770 and the estimated remainders are similar and approximately correct. The one I am having the problem with has an active_frac of 0.999980, p_fpops of 4,113,965,686 and p-iops of 8,650,244,149 ignoring decimals. The figures for the other machine are 0.999990, 3,989,170,155 & 10,037,514,072 respectively. Both machines take about 24 hours for ARP1 and have the same settings for cache (12 ARP1, 8 MCM1 & 8 OPN1) and app_config which restricts ARP1 to 4, 2 & 2 threads out of 8 on each machine. That would mean a cache of 3 days for ARP1, 9 hours for MCM1 and 16 hours for OPN1 Any suggestions as to fixing the problem would be appreciated. Mike |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2167 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
so you have to dig into the client_state.xml-file in BOINC's data directory Just after the 10th line you'll find something like this example <p_fpops>3883111331.054186</p_fpops> OK. I've recorded some screenshots from before and after running the benchmark. The old values were: $ grep p_[a-z]*ops client_state.xml The new values are: $ grep p_[a-z]*ops client_state.xml That's about the same as what the Event Log is trying to tell after running the benchmark: Tue 18 Apr 2023 20:02:27 | | 7679 floating point MIPS (Whetstone) per CPU In BOINC Manager there were many MCM1-tasks with an expected runtime of 9 hours. After running the benchmark, their expected runtimes dropped to 1½ hours. I don't have any uninitialized ARP1-tasks at the moment on the machine, to check that these have also had their runtimes firmly reduced. Luckily, running the BOINC benchmark honours the LAIM(*1) setting. [*1] Leave applications in memory (actually it is saying: "Leave non-GPU tasks in memory while suspended") Adri PS In this message from 2010 ("Running CPU benchmarks") it is claimed (by Gundolf Jahn) that "BOINC runs those benchmarks every fifth day." This isn't true anymore, I guess? |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
@AdriVerhoef:
That's great! PS In this message from 2010 ("Running CPU benchmarks") it is claimed (by Gundolf Jahn) that "BOINC runs those benchmarks every fifth day." This isn't true by cc_config-setting, but anymore, I guess? I'm not sure, because I suppressed running benchmark by setting in cc_config.xml (don't want to loose cpu-cycles and the machine doesn't change), but I think it's running default with new machines, new BOINC version etc. |
||
|
|
![]() |