Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 171
|
![]() |
Author |
|
slakin
Advanced Cruncher Joined: Jul 4, 2008 Post Count: 79 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Had a work unit error out on a windows XP machine, not to fuel the debate as I can always run another project on this old machine. Here is the log.
Result Log Result Name: BETA_ beta26_ 00000049_ 0025_ 0-- <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>wcgrid_beta26_gfx_7.08_windows_intelx86</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> <file_xfer_error> <file_name>beta26_image01_7.08.tga</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> </message> ]]> |
||
|
duanebong
Advanced Cruncher Singapore Joined: Apr 25, 2009 Post Count: 134 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I had 3 beta WUs give errors on an Asus Ultrabook running Windows 7 SP1 x64. So at least for this instance it looks like the (permanent HTTP error) is not related to whether it is running on Windows XP.
----------------------------------------<core_client_version>7.4.22</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>wcgrid_beta26_rosetta_7.08_windows_intelx86</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> </message> ]]> ![]() [Edit 1 times, last edit by duanebong at Aug 5, 2017 9:54:00 AM] |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
There is a batch of work being made available now of the second type of workunit that will be run for this project. If it runs well we will load more batches of that type later. Again if you are experiencing any file transfer or download errors in beta please turn http_debug on in your client and post the event log when the issue occurs. https://boinc.berkeley.edu/wiki/Client_configuration
Thanks, armstrdj |
||
|
slakin
Advanced Cruncher Joined: Jul 4, 2008 Post Count: 79 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I had a wu download failure, this time on a windows 10 machine ..here is the log.
Result Name: BETA_ beta26_ 00000056_ 1563_ 0-- <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> app_version download error: couldn't get input files: <file_xfer_error> <file_name>wcgrid_beta26_rosetta_7.10_windows_intelx86</file_name> <error_code>-224 (permanent HTTP error)</error_code> <error_message>permanent HTTP error</error_message> </file_xfer_error> </message> ]]> Per your update above, if I can figure out how :-), I will turn on http_debug and see if I can capture an error. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
With the current units, the first checkpoint has occurred when the second structure completed, not the first as previously. Is that the intended behaviour?
With one checkpoint having occurred, the stderr file ends with: Setting up checkpointing ... BOINC:: Worker startup. Starting job S_0001 Finished job S_0001 in 1465.83 seconds Starting job S_0002 Finished job S_0002 in 1439.67 seconds Starting job S_0003 |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
A checkpoint should be attempted after every structure is completed but whether or not it takes one will depend on your value for write to disk. I checked the results that are back and saw several examples of runs that took a checkpoint after the first structure was computed and restarted on the second structure. I will take a look at the code to make sure the right calls are being made to signal the checkpoint was taken.
Thanks, armstrdj |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Jonathan, that machine has checkpoint to disk at most every 300 sec, so much less than the 1465.83 sec to complete structure 1. For checkpointing on this occasion, I was going by the task properties and the content of the boinc_checkpoint_count.txt file in the slots folder.
|
||
|
SekeRob
Master Cruncher Joined: Jan 7, 2013 Post Count: 2741 Status: Offline |
A checkpoint should be attempted after every structure is completed but whether or not it takes one will depend on your value for write to disk. I checked the results that are back and saw several examples of runs that took a checkpoint after the first structure was computed and restarted on the second structure. I will take a look at the code to make sure the right calls are being made to signal the checkpoint was taken. Thanks, armstrdj It's also, if your app is flagged to listen to the client desire to write to disk more or less than what the science app is compiled to do, ergo, every structure is the logical point, but 1500 seconds seems to be a long time, the slower the device the longer, and I've seen 1 hour+ on a 3Hgz device. If you run such a science on an 8 / 16 / 32 threaded device that's lots of lost time for every time a client is restarted. Better tell people to switch on 'keep in memory when suspended' for those that actually 'use' their computer, and do not wish to crunch during that 'use' time or you'll encounter another walk away uproar. |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1320 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I got 14 beta's on a Linux machine. All errors with signal 11 after ~10 seconds
Example: BETA_ beta26_ 00000057_ 0129_ 0-- <core_client_version>7.4.22</core_client_version> <![CDATA[ <message> process got signal 11 </message> <stderr_txt> [2017- 8- 9 17:36:22:] :: BOINC:: Initializing ... ok. [2017- 8- 9 17:36:22:] :: BOINC :: boinc_init() INFO: result number = 0 BOINC:: Setting up shared resources ... ok. BOINC:: Setting up semaphores ... ok. BOINC:: Updating status ... ok. BOINC:: Registering timer callback... ok. BOINC:: Worker initialized successfully. command: ../../projects/www.worldcommunitygrid.org/wcgrid_beta26_rosetta_7.10_x86_64-pc-linux-gnu -in::file::zip beta26_databasev2.zip @./beta26_00000057.flags -out::file::silent result_silent.out -run:jran 2069786245 -nstruct 10 -out::level 100 -run::no_scorefile true Registering options.. Registered extra options. Initializing broker options ... Registered extra options. Initializing core... Initializing options.... ok Options::initialize() Options::adding_options() Options::initialize() Check specs. Options::initialize() End reached Loaded options.... ok Processed options.... ok Initializing random generators... ok Initialization complete. Setting WU description ... Unpacking zip data: ../../projects/www.worldcommunitygrid.org/beta26.beta26_databasev2.zip Setting database description ... Setting up checkpointing ... Setting up graphics native ... set_shared_memory_fully_initialized ... BOINC:: Worker startup. Starting job S_0001 </stderr_txt> On a Windows 7 machine they're starting well. |
||
|
gb009761
Master Cruncher Scotland Joined: Apr 6, 2005 Post Count: 2982 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hurray, this afternoon I received 2 WU's - one on each machine. As soon as I arrive home, I'll force them to the front of the queue.
----------------------------------------![]() [Edit 1 times, last edit by gb009761 at Aug 9, 2017 5:51:09 PM] |
||
|
|
![]() |