Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 171
Posts: 171   Pages: 18   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 24932 times and has 170 replies Next Thread
Falconet
Master Cruncher
Portugal
Joined: Mar 9, 2009
Post Count: 3295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Is checkpointing working?

Edit: Okay.. after 20 minutes of CPU time, no checkpoint and progress went from nearly 10% to 2%.... and seems to be stuck there.
Edit 2: After 5 CPU minutes stuck at 2%, progress increased to 4%. No checkpoint....
64-bit Linux

I do not run Linux but this sounds like normal behaviour checkpoints are not at regular intervals


Just seems way too long. Almost 40 CPU minutes, no checkpoint and still at 4%.
----------------------------------------


AMD Ryzen 5 1600AF 6C/12T 3.2 GHz - 85W
AMD Ryzen 5 2500U 4C/8T 2.0 GHz - 28W
AMD Ryzen 7 7730U 8C/16T 3.0 GHz
[Jul 28, 2017 10:35:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
marist_college
Advanced Cruncher
USA
Joined: Mar 30, 2005
Post Count: 107
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Hi Uplinger,


Yes, it downloaded ok, but isn't viewable as an image (picture)...I'm assuming that's ok?

If you have access to a linux machine on the same network, can you try this command?
curl -Ov http://swift.worldcommunitygrid.org/v1/AUTH_0...6/beta26_image05_7.08.tga

Used the same machine with the Windows 10 Ubuntu on Windows feature. Here's the output:
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0* Trying 173.192.119.113...
* Connected to swift.worldcommunitygrid.org (173.192.119.113) port 80 (#0)
> GET /v1/AUTH_02593dc3-da28-4635-a1c8-8cc5e6e3772a/beta26/beta26_image05_7.08.tga HTTP/1.1
> Host: swift.worldcommunitygrid.org
> User-Agent: curl/7.47.0
> Accept: */*
>
< HTTP/1.1 200 OK
< Content-Length: 66708
< Accept-Ranges: bytes
< Last-Modified: Fri, 07 Jul 2017 18:50:59 GMT
< Etag: 0a6dcb92d8ef4615cae514388e5bbd46
< X-Timestamp: 1499453458.51358
< Content-Type: application/x-www-form-urlencoded
< X-Trans-Id: tx5838c127bff3487586323-00597bbe4a
< Date: Fri, 28 Jul 2017 22:44:26 GMT
<
{ [11973 bytes data]
100 66708 100 66708 0 0 254k 0 --:--:-- --:--:-- --:--:-- 254k
* Connection #0 to host swift.worldcommunitygrid.org left intact


We are seeing some new errors today. Files download but MD5 hash doesn't match.

Here's an example of that:
<core_client_version>7.6.33</core_client_version>
<![CDATA[
<message>
WU download error: couldn't get input files:
<file_xfer_error>
<file_name>0befbaafffda7df7d78f1412272f4147.2</file_name>
<error_code>-119 (md5 checksum failed for file)</error_code>
</file_xfer_error>

</message>
]]>

Edit 1: replaced output of curl with correct output after using the full URL and not the truncated version
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by marist_college at Jul 28, 2017 10:46:09 PM]
[Jul 28, 2017 10:41:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1677
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Received 3 WUs on 3 different hosts; all valid.
However, based on the limited number of computed WUs, following can be observed regarding the granted credit per hour:
  • Windows 7 Pro x64, i7 4770K: 26.8 vs. average 28.0 (OET1) - CPU time: 3.54
  • Ubuntu 16.04 x64, Phenom IIx6: 27.6 vs. average 50.0 (OET1) - CPU time: 3.21
  • Ubuntu 14.04 x64, Athlon II x4: 26.8 vs. average 41.0 (OET1) - CPU time: 8.07

Since I do not have more WUs available, I cannot improve the observation for identifying a pattern: Win vs. Linux or i7 vs. Phenom II/Athlon II. In all cases, the consistency of the granted credits should be improved for this new science.
Cheers,
Yves
----------------------------------------
[Jul 29, 2017 5:43:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Yves, you CAN NOT compare VINA science production on Linux, with any other non-VINA science. Think we know that by now, these OET and brethren process ON LINUX twice as fast thus yield much more credit per hour..
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Jul 29, 2017 6:33:29 AM]
[Jul 29, 2017 6:32:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

hmmm, not for the weak hearted, got 8, and after 8:26 hours, the first has checkpointed 8 times, with 12 minutes past the last (BOINCTasks is great at monitoring chkpnt counts and keeping taps on 'when last'). Cycling an 8 core machine would come with costly progress price. Of course, setting to suspend them at the next checkpoint gives equal loss... no crunching till the last one suspends, as costly.

After 1 hour processing it said it was heading for 15 hours, but this morning it said it would complete at about 9:30. CEP2 like prediction.
----------------------------------------
[Edit 1 times, last edit by SekeRob* at Jul 29, 2017 7:15:58 AM]
[Jul 29, 2017 7:14:53 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Only 10 structures in batch 53, but they must be large structures, so with checkpointing only at the completion of each structure, it's a long interval between them. Those on my 8 core are 10 hours CPU done, 2 to 3 hours estimated remaining. However, batch 17 just loaded on another machine are also 10 structures but only 3 hrs estimated to completion. (All with <fraction_done_exact/>).
[Jul 29, 2017 7:27:55 AM]   Link   Report threatening or abusive post: please login first  Go to top 
duanebong
Advanced Cruncher
Singapore
Joined: Apr 25, 2009
Post Count: 134
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Just seems way too long. Almost 40 CPU minutes, no checkpoint and still at 4%.


I experienced the same long checkpoints. After 4% it will jump to 6% after about an hour run time. Just need to be patient and give it time to run. Some of the WUs took up to 16hrs to run, but completed successfully in the end.
----------------------------------------

[Jul 29, 2017 7:45:39 AM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

The whole 2% incrementing could be artificial to get a feel of progress... sort of TTC 10 hours, 10 structures, thus 1 structure is 10%, no matter how long each structure takes.

This so has the odor of HPF3 (but a different name I'd vote for, as the old HPF has some legacy)
[Jul 29, 2017 8:21:17 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TonyEllis
Senior Cruncher
Australia
Joined: Jul 9, 2008
Post Count: 261
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

duanebong commented...
Some of the WUs took up to 16hrs to run, but completed successfully in the end.

That's quick by comparison to the old atoms here... smile

BETA_ beta26_ 00000026_ 0347_ 0-- violetta.sraellis.com
Valid 7/25/17 19:13:06 7/27/17 20:37:28 46.43 / 49.35 132.7 / 132.7
BETA_ beta26_ 00000026_ 0346_ 0-- violetta.sraellis.com
Valid 7/25/17 19:13:05 7/27/17 19:57:25 45.82 / 48.71 130.9 / 130.9
BETA_ beta26_ 00000026_ 0345_ 0-- violetta.sraellis.com
Valid 7/25/17 19:13:05 7/27/17 17:32:17 43.55 / 46.29 124.4 / 124.4

Naturally quite some time passes between checkpoints...
----------------------------------------
----------------------------------------
[Edit 1 times, last edit by TonyEllis at Jul 29, 2017 9:00:18 AM]
[Jul 29, 2017 8:55:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Here's a strange problem with a wingman (noticed when I received the repair job). The only oddity I can see is "INFO: Could not determine result number" (so it was set to 15?); it should have been 0. FWIW, I think 06.02.9200.00 denotes Windows 8.

BETA_ beta26_ 00000053_ 0308_ 2-- Microsoft Windows 10 Professional x64 Edition, (10.00.14393.00) - In Progress 7/29/17 10:21:35 8/2/17 10:21:35 0.00 0.0 / 0.0
BETA_ beta26_ 00000053_ 0308_ 1-- Microsoft Windows 8.1 x64 Edition, (06.03.9600.00) - In Progress 7/29/17 10:21:34 8/2/17 10:21:34 0.00 0.0 / 0.0
BETA_ beta26_ 00000053_ 0308_ 0-- Microsoft x86 Edition, (06.02.9200.00) 708 Invalid 7/28/17 21:15:38 7/29/17 10:21:25 6.89 172.9 / 0.0

Result Log

Result Name: BETA_ beta26_ 00000053_ 0308_ 0--
<core_client_version>6.10.58</core_client_version>
<![CDATA[
<stderr_txt>
[2017- 7-29 3:56: 6:] :: BOINC:: Initializing ... ok.
[2017- 7-29 3:56: 6:] :: BOINC :: boinc_init()
INFO: Could not determine result number
INFO: result number = 15

BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: projects/www.worldcommunitygrid.org/wcgrid_beta26_rosetta_7.08_windows_intelx86 -in::file::zip beta26_databasev2.zip @./beta26_00000053.flags -out::file::silent result_silent.out -run:jran 1092152105 -nstruct 10 -out::level 100 -run::no_scorefile true
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Unpacking zip data: ../../projects/www.worldcommunitygrid.org/beta26.beta26_databasev2.zip
Setting database description ...
Setting up checkpointing ...
abrelax ...
abrelax.run
Setting up folding (abrelax) ...
Beginning folding (abrelax) ...
BOINC:: Worker startup.
Starting work on structure: _0001
Finished _0001 in 2722.14 seconds.
Starting work on structure: _0002
Finished _0002 in 2856.33 seconds.
Starting work on structure: _0003
Finished _0003 in 2528.64 seconds.
Starting work on structure: _0004
Finished _0004 in 2357.17 seconds.
Starting work on structure: _0005
Finished _0005 in 2604.7 seconds.
Starting work on structure: _0006
Finished _0006 in 1323.58 seconds.
Starting work on structure: _0007
Finished _0007 in 2596.16 seconds.
Starting work on structure: _0008
Finished _0008 in 2829.11 seconds.
Starting work on structure: _0009
Finished _0009 in 2282.36 seconds.
Starting work on structure: _0010
Finished _0010 in 2644.81 seconds.
======================================================
DONE :: 10 structures in 24794.9 cpu seconds
======================================================
BOINC :: BOINC support services shutting down cleanly ...
10:52:01 (8840): called boinc_finish(0)

</stderr_txt>
]]>
[Jul 29, 2017 10:40:14 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 171   Pages: 18   [ Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page ]
[ Jump to Last Post ]
Post new Thread