Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 315
Posts: 315   Pages: 32   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 71046 times and has 314 replies Next Thread
Michael Goetz
Cruncher
United States
Joined: Dec 11, 2017
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

There was no such file in /home/boinc/projects/www.worldcommunitygrid.org, so I put this in there.


This is an optional file that doesn't exist until the user creates it.

I then stopped the boinc_manager, then shut down boinc, waited 15 seconds and restarted it. I then restarted the boinc_manager. The result was the elapsed time for the two running beta processes dropped to exactly 25% and the times to complete dropped quite a bit.


It appears that these tasks only checkpoint at 12.5% increments. In a worst case scenario, you could lose just under 12.5% of the task's progress when you restart from the most recent checkpoint.

So that file was noticed and changed something. On the other hand, the time to complete is still increasing slowly.
My boinc_manager is 7.2.33 that is the latest one in EPEL for my distro.


You'll need at least BOINC v7.3.18 for fraction_done_exact to be recognized by the BOINC client. Maybe even more recent than that. I'm not 100% certain. But certainly anything earlier than 7.3.18 won't be able to recognize that tag. This was over 4 years ago; there's actually been a lot of useful features added to the BOINC client since then.

(7.3.18 is the first BOINC client to recognize fraction_done_exact when it's sent from the server. I'm not sure when it started accepting it inside app_config, but it wasn't earlier than that.)
[May 31, 2019 8:27:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

Has anyone timed a work unit when it hits 100% to when it actually finishes? confused

Qualitatively, once the WU completes with 100%, about 90 to 100 seconds later the WU finishes for upload, presumably the delay is to prepare/organize the files for upload. Seems to upload separate files for each checkpoint (ea 6 hrs) but cumulatively as the largest I saw was about 26MB with the subsequent 5 getting about 6 -7 MB smaller each file.
[May 31, 2019 8:51:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Michael Goetz
Cruncher
United States
Joined: Dec 11, 2017
Post Count: 35
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

Has anyone timed a work unit when it hits 100% to when it actually finishes? confused

Qualitatively, once the WU completes with 100%, about 90 to 100 seconds later the WU finishes for upload, presumably the delay is to prepare/organize the files for upload. Seems to upload separate files for each checkpoint (ea 6 hrs) but cumulatively as the largest I saw was about 26MB with the subsequent 5 getting about 6 -7 MB smaller each file.


I'm pretty sure that delay is the BOINC client (not the app itself) gzipping the output files before sending them to the server. The app itself is done; that's why it shows 100%. Once the app completes, it's then up to the BOINC client to send the output files to the server. It's a configuration option to have the client optionally compress the files before transmission. With files this large, that can take a while.
[May 31, 2019 9:27:59 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jean-David Beyer
Senior Cruncher
USA
Joined: Oct 2, 2007
Post Count: 337
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

It appears that these tasks only checkpoint at 12.5% increments. In a worst case scenario, you could lose just under 12.5% of the task's progress when you restart from the most recent checkpoint.

Now it is indicating 29.958% complete for the two running processes. So it must have noticed something. and 29.958% is sure not a multiple of 12.5%

So that file was noticed and changed something. On the other hand, the time to complete is still increasing slowly. My boinc_manager is 7.2.33 that is the latest one in EPEL for my distro.
You'll need at least BOINC v7.3.18 for fraction_done_exact to be recognized by the BOINC client. Maybe even more recent than that. I'm not 100% certain. But certainly anything earlier than 7.3.18 won't be able to recognize that tag. This was over 4 years ago; there's actually been a lot of useful features added to the BOINC client since then. (7.3.18 is the first BOINC client to recognize fraction_done_exact when it's sent from the server. I'm not sure when it started accepting it inside app_config, but it wasn't earlier than that.)


Well I cannot change it unless you get the EPEL team to change it for RHEL6.10 distribution. OTOH, it seems to have noticed that file I put in there as the readings shown by boink_manager would not have changed, would they?
----------------------------------------

[May 31, 2019 9:29:54 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1323
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

4 error BETA's from my Linux-VM where I did not extend the rsc_fpops_bound manually myself,
causing the exceeded elapsed time limit 163905.46 (547904.24G/3.34G) after 45.5 hours run time:

BETA_ ARP1_ 0000088_ 000_ 2-- rekendoos3 Error 5/29/19 21:25:28 6/1/19 03:04:09 45.55 / 45.55 31.7 / 0.0
BETA_ ARP1_ 0001780_ 000_ 0-- rekendoos3 Error 5/29/19 18:08:07 5/31/19 16:47:04 45.52 / 45.53 31.7 / 0.0
BETA_ ARP1_ 0001650_ 000_ 0-- rekendoos3 Error 5/29/19 18:05:42 5/31/19 16:47:04 45.52 / 45.53 31.7 / 0.0
BETA_ ARP1_ 0000634_ 000_ 1-- rekendoos3 Error 5/29/19 17:50:59 5/31/19 16:47:04 45.51 / 45.53 31.7 / 0.0

Jonathan did not mention this unnecessary errors and what to do about it for future batches.

I have also 3 tasks running on an Intel Atom CPU Z3735F @ 1.33GHz Windows 10 machine.
To finish they will need a total elapsed time of 96 hours biggrin
I've tenfolded the rsc_fpops_bound and restarted the tasks after a checkpoint, so they should survive.
----------------------------------------
[Edit 2 times, last edit by Crystal Pellet at Jun 1, 2019 6:29:23 AM]
[Jun 1, 2019 6:16:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 772
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

This one appears to have been timed out, slow old AMD running XP but would have finished by due time.
Assume checkpoint dates are simulation dates, suggest add simulation date/time to message.

01/06/2019 06:54:05 Aborting task BETA_ARP1_0000814_000_0: exceeded elapsed time limit 214635.82 (547904.24G/2.55G)

BETA_ ARP1_ 0000814_ 000_ 0-- unknown2 Error 5/29/19 17:53:07 6/1/19 06:53:59 48.01 / 59.64 31.7 / 0.0

Result Name: BETA_ ARP1_ 0000814_ 000_ 0--
<core_client_version>7.14.2</core_client_version>
<![CDATA[
<message>
exceeded elapsed time limit 214635.82 (547904.24G/2.55G)</message>
<stderr_txt>
INFO: No state to restore. Start from the beginning.
Starting WRFMain
[02:38:08] INFO: Checkpoint taken at 2018-04-01_06:00:00
[13:26:50] INFO: Checkpoint taken at 2018-04-01_12:00:00
[01:31:30] INFO: Checkpoint taken at 2018-04-01_18:00:00
[10:15:22] INFO: Checkpoint taken at 2018-04-02_00:00:00
[18:50:23] INFO: Checkpoint taken at 2018-04-02_06:00:00
Unhandled Exception Detected...
- Unhandled Exception Record -
Reason: Breakpoint Encountered (0x80000003) at address 0x7C90120E
Engaging BOINC Windows Runtime Debugger...
********************
BOINC Windows Runtime Debugger Version 7.15.0
Dump Timestamp : 06/01/19 06:54:05
Install Directory : C:\Program Files\BOINC\
Data Directory : C:\Documents and Settings\All Users\Application Data\BOINC
Project Symstore :
LoadLibraryA( C:\Documents and Settings\All Users\Application Data\BOINC\dbghelp.dll ): GetLastError = 126
Loaded Library : dbghelp.dll
LoadLibraryA( C:\Documents and Settings\All Users\Application Data\BOINC\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:\Documents and Settings\All Users\Application Data\BOINC\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:\Documents and Settings\All Users\Application Data\BOINC\version.dll ): GetLastError = 126
Loaded Library : version.dll
</stderr_txt>
]]>
Computer unknown2,
CPID 357f10668df851c55ad76e7584489268
CPU Processors: 2, AuthenticAMD, AMD Athlon(tm) 64 X2 Dual Core Processor 5000+ [Family 15 Model 107 Stepping 2]
CPU fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 htt pni cx16 syscall nx lm svm rdtscp 3dnowext 3dnow
Os Microsoft Windows XP, Home x86 Edition, Service Pack 3, (05.01.2600.00)
Memory 1.87 Gb, Virtual: 5.63 Gb
Disk Used: 29.29 Gb, Free: 8.71 Gb

Paul.
----------------------------------------
Paul.
[Jun 1, 2019 11:02:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

Hi Paul,

Yes, I agree about the message text. It took me far too long to twig that the dates weren't recent and the times weren't clock times! It's fine for those who know, but the first time you see them it's very misleading.
[Jun 1, 2019 11:35:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
retsof
Former Community Advisor
USA
Joined: Jul 31, 2005
Post Count: 6824
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

I have one beta that has been going at it for 23 hours with 14 hours left to go. It is an older AMD 3.3 GHz computer.

I also noticed that I have one in the queue on a 4 core military laptop. It will get to the beta eventually. I have this computer set to run these things at half speed to keep the heat manageable. It will probably take three days to complete. The deadline is only in 4-5 days. I cleared some things from the queue but noticed that the other beta popped in when something finished.

June 2 --- The laptop has been running a beta for 12 hours at half speed, meaning 24 hours of time in the core. The remaining time is at 3 hours and still slowly increasing. The original estimate of just a few hours was silly. The program only has a few checkpoints so I am keeping it going 24/7 to avoid any restart penalty.

The other beta finished and is valid. 31.7 points? Whoopee. That was about 1 point per hour for 31.37 hours of CPU time. Each help stop TB unit gives me over 440.
----------------------------------------
SUPPORT ADVISOR
Work+GPU i7 8700 12threads
School i7 4770 8threads
Default+GPU Ryzen 7 3700X 16threads
Ryzen 7 3800X 16 threads
Ryzen 9 3900X 24threads
Home i7 3540M 4threads50%
----------------------------------------
[Edit 9 times, last edit by retsof at Jun 2, 2019 5:06:56 PM]
[Jun 1, 2019 1:21:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PowerFactor
Ace Cruncher
Joined: Dec 9, 2016
Post Count: 4027
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

I received one beta work unit for both of my machines on May 29, 2019. One has completed after 15.7 hours and waiting for validation hypnotized. The Beta work unit on the older machine is still crunching!
----------------------------------------
[Edit 1 times, last edit by thepeacemaker7 at Jun 1, 2019 1:28:35 PM]
[Jun 1, 2019 1:27:57 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BladeD
Ace Cruncher
USA
Joined: Nov 17, 2004
Post Count: 28976
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test – May 29, 2019 [Issues Thread]

I received one beta work unit for both of my machines on May 29, 2019. One has completed after 15.7 hours and waiting for validation hypnotized. The Beta work unit on the older machine is still crunching!

What PC was that? confused
----------------------------------------
[Jun 1, 2019 1:31:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 315   Pages: 32   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread