Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 171
Posts: 171   Pages: 18   [ Previous Page | 7 8 9 10 11 12 13 14 15 16 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 23213 times and has 170 replies Next Thread
slakin
Advanced Cruncher
Joined: Jul 4, 2008
Post Count: 79
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Had a work unit error out on a windows XP machine, not to fuel the debate as I can always run another project on this old machine. Here is the log.


Result Log

Result Name: BETA_ beta26_ 00000049_ 0025_ 0--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_beta26_gfx_7.08_windows_intelx86</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>
<file_xfer_error>
<file_name>beta26_image01_7.08.tga</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>
]]>
[Aug 4, 2017 12:57:36 AM]   Link   Report threatening or abusive post: please login first  Go to top 
duanebong
Advanced Cruncher
Singapore
Joined: Apr 25, 2009
Post Count: 134
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

I had 3 beta WUs give errors on an Asus Ultrabook running Windows 7 SP1 x64. So at least for this instance it looks like the (permanent HTTP error) is not related to whether it is running on Windows XP.

<core_client_version>7.4.22</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_beta26_rosetta_7.08_windows_intelx86</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>
]]>
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by duanebong at Aug 5, 2017 9:54:00 AM]
[Aug 5, 2017 8:20:24 AM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

There is a batch of work being made available now of the second type of workunit that will be run for this project. If it runs well we will load more batches of that type later. Again if you are experiencing any file transfer or download errors in beta please turn http_debug on in your client and post the event log when the issue occurs. https://boinc.berkeley.edu/wiki/Client_configuration
Thanks,
armstrdj
[Aug 8, 2017 7:15:45 PM]   Link   Report threatening or abusive post: please login first  Go to top 
slakin
Advanced Cruncher
Joined: Jul 4, 2008
Post Count: 79
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

I had a wu download failure, this time on a windows 10 machine ..here is the log.

Result Name: BETA_ beta26_ 00000056_ 1563_ 0--
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_beta26_rosetta_7.10_windows_intelx86</file_name>
<error_code>-224 (permanent HTTP error)</error_code>
<error_message>permanent HTTP error</error_message>
</file_xfer_error>

</message>
]]>

Per your update above, if I can figure out how :-), I will turn on http_debug and see if I can capture an error.
[Aug 8, 2017 7:22:10 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

With the current units, the first checkpoint has occurred when the second structure completed, not the first as previously. Is that the intended behaviour?

With one checkpoint having occurred, the stderr file ends with:

Setting up checkpointing ...
BOINC:: Worker startup.
Starting job S_0001
Finished job S_0001 in 1465.83 seconds
Starting job S_0002
Finished job S_0002 in 1439.67 seconds
Starting job S_0003
[Aug 8, 2017 9:09:00 PM]   Link   Report threatening or abusive post: please login first  Go to top 
armstrdj
Former World Community Grid Tech
Joined: Oct 21, 2004
Post Count: 695
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

A checkpoint should be attempted after every structure is completed but whether or not it takes one will depend on your value for write to disk. I checked the results that are back and saw several examples of runs that took a checkpoint after the first structure was computed and restarted on the second structure. I will take a look at the code to make sure the right calls are being made to signal the checkpoint was taken.

Thanks,
armstrdj
[Aug 9, 2017 2:38:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Jonathan, that machine has checkpoint to disk at most every 300 sec, so much less than the 1465.83 sec to complete structure 1. For checkpointing on this occasion, I was going by the task properties and the content of the boinc_checkpoint_count.txt file in the slots folder.
[Aug 9, 2017 3:07:44 PM]   Link   Report threatening or abusive post: please login first  Go to top 
SekeRob
Master Cruncher
Joined: Jan 7, 2013
Post Count: 2741
Status: Offline
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

A checkpoint should be attempted after every structure is completed but whether or not it takes one will depend on your value for write to disk. I checked the results that are back and saw several examples of runs that took a checkpoint after the first structure was computed and restarted on the second structure. I will take a look at the code to make sure the right calls are being made to signal the checkpoint was taken.

Thanks,
armstrdj

It's also, if your app is flagged to listen to the client desire to write to disk more or less than what the science app is compiled to do, ergo, every structure is the logical point, but 1500 seconds seems to be a long time, the slower the device the longer, and I've seen 1 hour+ on a 3Hgz device. If you run such a science on an 8 / 16 / 32 threaded device that's lots of lost time for every time a client is restarted. Better tell people to switch on 'keep in memory when suspended' for those that actually 'use' their computer, and do not wish to crunch during that 'use' time or you'll encounter another walk away uproar.
[Aug 9, 2017 3:20:15 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Crystal Pellet
Veteran Cruncher
Joined: May 21, 2008
Post Count: 1320
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

I got 14 beta's on a Linux machine. All errors with signal 11 after ~10 seconds

Example:

BETA_ beta26_ 00000057_ 0129_ 0--
<core_client_version>7.4.22</core_client_version>
<![CDATA[
<message>
process got signal 11
</message>
<stderr_txt>
[2017- 8- 9 17:36:22:] :: BOINC:: Initializing ... ok.
[2017- 8- 9 17:36:22:] :: BOINC :: boinc_init()
INFO: result number = 0
BOINC:: Setting up shared resources ... ok.
BOINC:: Setting up semaphores ... ok.
BOINC:: Updating status ... ok.
BOINC:: Registering timer callback... ok.
BOINC:: Worker initialized successfully.
command: ../../projects/www.worldcommunitygrid.org/wcgrid_beta26_rosetta_7.10_x86_64-pc-linux-gnu -in::file::zip beta26_databasev2.zip @./beta26_00000057.flags -out::file::silent result_silent.out -run:jran 2069786245 -nstruct 10 -out::level 100 -run::no_scorefile true
Registering options..
Registered extra options.
Initializing broker options ...
Registered extra options.
Initializing core...
Initializing options.... ok
Options::initialize()
Options::adding_options()
Options::initialize() Check specs.
Options::initialize() End reached
Loaded options.... ok
Processed options.... ok
Initializing random generators... ok
Initialization complete.
Setting WU description ...
Unpacking zip data: ../../projects/www.worldcommunitygrid.org/beta26.beta26_databasev2.zip
Setting database description ...
Setting up checkpointing ...
Setting up graphics native ...
set_shared_memory_fully_initialized ...
BOINC:: Worker startup.
Starting job S_0001

</stderr_txt>


On a Windows 7 machine they're starting well.
[Aug 9, 2017 4:01:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 2982
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New Beta Test - July 21, 2017 [ Issues Thread ]

Hurray, this afternoon I received 2 WU's - one on each machine. As soon as I arrive home, I'll force them to the front of the queue.
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by gb009761 at Aug 9, 2017 5:51:09 PM]
[Aug 9, 2017 5:50:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 171   Pages: 18   [ Previous Page | 7 8 9 10 11 12 13 14 15 16 | Next Page ]
[ Jump to Last Post ]
Post new Thread