Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 30
Posts: 30   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 170216 times and has 29 replies Next Thread
gb009761
Master Cruncher
Scotland
Joined: Apr 6, 2005
Post Count: 2987
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

orangepeel13, if all of your SCC jos have failed so far, please can you post the error message(s) - i.e., the logs and your BOINC Event Log - both the excerpt at/around when the SCC jobs try to run/abort and also the first couple of dozen lines from the top (i.e., your configuration), as then, we may be able to find out as to why you're haiving a zero-success rate with this project.
----------------------------------------

[Jan 29, 2017 6:22:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

One of the key suspects is security software preventing loading and or saving files off a new science application.
[Jan 29, 2017 10:30:46 AM]   Link   Report threatening or abusive post: please login first  Go to top 
TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1957
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

There is/are also Win7 machine(s) returning errored WU's...

ETA: e.g., one of them, Microsoft Windows 7, Enterprise x64 Edition, Service Pack 1
I am seeing this too with almost all WUs that I have currently in PVa jail...

Also quite a few where the wingmen are running older versions of Linux...

Ralf
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by TPCBF at Jan 30, 2017 9:14:00 PM]
[Jan 29, 2017 10:52:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
orangepeel13
Cruncher
USA
Joined: Jul 22, 2014
Post Count: 11
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

orangepeel13, if all of your SCC jos have failed so far, please can you post the error message(s) - i.e., the logs and your BOINC Event Log - both the excerpt at/around when the SCC jobs try to run/abort and also the first couple of dozen lines from the top (i.e., your configuration), as then, we may be able to find out as to why you're having a zero-success rate with this project.


Well, after failing 6 - 10 tasks on each of the machines, the tasks started to complete and validate. Don't know what happened, but I am glad they are working now.

It may have been a file download problem, thee are several error message in the BOINC event log:
Sat 28 Jan 2017 02:21:27 AM EST | | Project communication failed: attempting access to reference site
Sat 28 Jan 2017 02:21:27 AM EST | World Community Grid | Temporarily failed download of scc1_image04_7.08.tga: transient HTTP error
Sat 28 Jan 2017 02:21:27 AM EST | World Community Grid | Started download of scc1_image05_7.08.tga
Sat 28 Jan 2017 02:21:28 AM EST | | Internet access OK - project servers may be temporarily down.

The errors all looked like:
<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcgrid_scc1_vina_7.08_x86_64-pc-linux-gnu</file_name>
<error_code>-120 (RSA key check failed for file)</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

</message>
]]>
[Jan 30, 2017 8:06:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

When a new project is launched, each device gets a new chance to prove itself for the new project. For devices that return nothing but errors, it will still take a bit before they reach the point where they are no longer sent work units at all. This can skew the proportion of repair jobs in the beginning hours of a new project, but it should reach a more normal state fairly quickly.
Hmm, still happening 9 days later to a wingman with the same device "signature":

SCC1_ 0000009_ Bct-A_ 19363_ 0-- Microsoft Windows 8.1 Enterprise x64 Edition, (06.03.9600.00) 708 Error 2/4/17 12:47:42 2/4/17 12:50:07 0.00 71.2 / 0.0

<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
couldn't start app: CreateProcess() failed - A required privilege is not held by the client.
(0x522)
[Feb 5, 2017 6:08:49 PM]   Link   Report threatening or abusive post: please login first  Go to top 
seippel
Former World Community Grid Tech
Joined: Apr 16, 2009
Post Count: 392
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

When a new project is launched, each device gets a new chance to prove itself for the new project. For devices that return nothing but errors, it will still take a bit before they reach the point where they are no longer sent work units at all. This can skew the proportion of repair jobs in the beginning hours of a new project, but it should reach a more normal state fairly quickly.


Hmm, still happening 9 days later to a wingman with the same device "signature":



My apologies, the part of the post where I say they wouldn't be sent work at all anymore wasn't accurate. Even machines that return nothing but errors will still get a very small number of work units a day in order to test if they are fixed.

Also, as your machines become trusted for this project, they won't need wingman to verify them as often, although they will still sometimes get a wingman from random verification and they may get selected as a wingman from someone else who needs verification.

Seippel
[Feb 8, 2017 5:02:21 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

When a new project is launched, each device gets a new chance to prove itself for the new project. For devices that return nothing but errors, it will still take a bit before they reach the point where they are no longer sent work units at all. This can skew the proportion of repair jobs in the beginning hours of a new project, but it should reach a more normal state fairly quickly.
Hmm, still happening 9 days later to a wingman with the same device "signature":
My apologies, the part of the post where I say they wouldn't be sent work at all anymore wasn't accurate. Even machines that return nothing but errors will still get a very small number of work units a day in order to test if they are fixed.

Thanks, Al, that explains why I keep on noticing the occasional case.
[Feb 8, 2017 8:54:55 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

It's a very old rule... Quota goes down to 1 per day if a device keeps on returning errors for an app. Non-error and it goes up to 2 4 8 etc back and forth. Guess it is in some old FAQ by undersigned.
[Feb 8, 2017 9:37:34 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

Raising this issue again, because I'm sad to notice that 10 wingman machines, out of a recent download of 21 units, had errored with either
couldn't start app: CreateProcess() failed - A required privilege is not held by the client.
(0x522)
or
couldn't start app: Can't get shared memory segment name: shmget() failed
In all these cases, the OS type was Microsoft Windows 8.1, Enterprise x64 Edition, (06.03.9600.00). Again, I suspect that one or two machine farms are causing these errors (the wingman Sent Times were all one of two times identical to the second), and they must be large groups of machines for me to keep noticing them. It's sad for 2 reasons, they're erroring instead of producing useful work, and they are Replication 2 thereby causing 2 trusted machines to work unnecessarily on the same unit.
[Jun 2, 2017 5:50:23 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: High proportion of repair jobs

I Have to agree with tony, 90% of all repair jobs I get, over a dozen today, are for this same reason. This is across all projects, not just SCC1
----------------------------------------
[Edit 1 times, last edit by Former Member at Jun 5, 2017 10:25:38 PM]
[Jun 5, 2017 10:08:31 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 30   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread