Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 126
Posts: 126   Pages: 13   [ Previous Page | 4 5 6 7 8 9 10 11 12 13 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 18269 times and has 125 replies Next Thread
Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7219
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

I thought I was done with these 39.4 point work units, but I see I got another rash of them. This is on a Dell 1950 with 2Xeon L5420's. There seems to be no correlation between the run time and the low point scores. Maybe the techs could figure it out. I am not that concerned with the points so I am going to continue to run these anyway.
OET1_ 0000643_ xSDGP-OM_ rig_ 77785_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 7.02 / 7.03 239.4 / 239.4
OET1_ 0000502_ xEBGP-OM_ rig_ 30058_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 6.47 / 6.47 220.3 / 220.3
OET1_ 0000618_ xEBGP-OM_ rig_ 51727_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 7.45 / 7.46 254.0 / 254.0
OET1_ 0000618_ xEBGP-OM_ rig_ 51694_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 8.25 / 8.27 281.6 / 281.6
OET1_ 0000502_ xEBGP-OM_ rig_ 30012_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 6.06 / 6.07 206.7 / 206.7
OET1_ 0000618_ xEBGP-OM_ rig_ 90076_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 8.36 / 8.37 285.0 / 285.0
OET1_ 0000526_ xZAGP-OM_ rig_ 96393_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 6.78 / 6.79 231.1 / 231.1
OET1_ 0000618_ xEBGP-OM_ rig_ 89872_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 7.29 / 7.29 248.2 / 248.2
OET1_ 0000618_ xEBGP-OM_ rig_ 89893_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 6.92 / 6.94 236.1 / 236.1
OET1_ 0000617_ xEBGP-OM_ rig_ 15997_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 5.85 / 5.86 199.6 / 199.6
OET1_ 0000617_ xEBGP-OM_ rig_ 7286_ 1-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 8.21 / 8.22 39.4 / 0.0
OET1_ 0000617_ xEBGP-OM_ rig_ 7466_ 1-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 7.88 / 7.90 39.4 / 0.0
OET1_ 0000502_ xEBGP-OM_ rig_ 30670_ 2-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 8.68 / 8.70 39.4 / 0.0
OET1_ 0000620_ xEBGP-OM_ rig_ 93523_ 2-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 10.43 / 10.46 242.4 / 0.0
OET1_ 0000502_ xEBGP-OM_ rig_ 30702_ 1-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 7.06 / 7.07 39.4 / 0.0
OET1_ 0000617_ xEBGP-OM_ rig_ 7405_ 2-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 7.38 / 7.40 39.4 / 0.0
OET1_ 0000526_ xZAGP-OM_ rig_ 25484_ 3-- PowerEdge-1950 Valid 4/20/18 15:39:22 4/21/18 21:04:26 7.36 / 7.38 39.4 / 162.2
Cheers
----------------------------------------
Sgt. Joe
*Minnesota Crunchers*
[Apr 21, 2018 9:17:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KerSamson
Master Cruncher
Switzerland
Joined: Jan 29, 2007
Post Count: 1664
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

Since I run currently OET1 only, I consider only the average granted credit per day and week. At this time, I don't have much time for performing more detailed analysis.
My above feedback about 10% differences relies on an average of 6 hosts (5 Linux, 1 Win7 Pro x64) based on AMD (4 hosts; still not any Ryzen) and Intel (2 hosts).
Maybe there is from time to time some "inadequate" credits ? ...
Cheers,
Yves
----------------------------------------
[Apr 22, 2018 7:02:34 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Procyon Lotor One
Cruncher
Joined: Jun 28, 2008
Post Count: 14
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

There is a 37% of the project remaining so I think you will be okay in terms of run-time. How many days do you average per calendar day?

I get about 14 to 15 per calendar day with my three host.


[Apr 23, 2018 3:54:22 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Speedy51
Veteran Cruncher
New Zealand
Joined: Nov 4, 2005
Post Count: 1220
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

There is a 37% of the project remaining so I think you will be okay in terms of run-time. How many days do you average per calendar day?

I get about 14 to 15 per calendar day with my three host.


At 14.9 days per calendar day it will take you 24.49 days to achieve one year sums 365 divided by 14.9 = 24.49. How many months will it take you to reach your goal?
----------------------------------------

[Apr 23, 2018 4:56:40 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 735
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

Couple questions:

1. Does anyone know what causes the 32-bit Vina to run instead of 64-bit Vina when computing OET1 work units? I'm running nothing but 64-bit systems -- as most of us are -- and quite often many of the work units will launch 32-bit Vina. Why?

2. Last I checked, OET1 is a Quorum 1, Replication 1 project. Yet whenever a WCG volunteer screws up and their machine Errors out, it causes the work unit to change into a Quorum 2, Replication 2, which sets the whole project back unnecessarily and gives everyone twice the work. I started a thread in the Feedback subforum, entitled "Is there a way to disable absentee clie...nstantly error out?", but not a single WCG employee has bothered to comment on 1) root cause, or 2) what they're going to do to fix it.

Essentially, 100% of my Ebola work units now are repair work units caused by Windows 8.1 clients running BOINC 7.2.47, which is over 4 years old and obsolete.

There are 1 of 2 Error messages, with this being the most prevalent:
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
couldn't start app: CreateProcess() failed - A required privilege is not held by the client.
(0x522)
</message>
]]>

or less commonly:
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
couldn't start app: Can't get shared memory segment name: shmget() failed
</message>
]]>


Literally 100% of my Ebola work units are repair work units _2, _3, _4, etc., and 99% of the time, it's because a Windows 8.1 client running BOINC 7.2.47 has dropped the ball, literally duplicating the amount of work the project has left to do. If these clients had not screwed up in the first place, we'd be at Quorum 1, Replication 1 work units and the project would wrap up in an efficient, timely manner.

What gives? Why are WCG staff silent on this issue? I've asked time and time again, but it seems like IBM must be understaffing the team (relative to the workload, assuming they're drowning in work?), underpaying them, or there is a lack of motivation or drive. I've worked in companies where I was drowning in work and even working 80+ hours a week wasn't enough to keep up, since people kept quitting or getting fired.

I just wish I had any clue why nothing is being done to address the Windows 8.1 clients who keep screwing up -- what's the root cause that leads to the errors referenced above? And why are they still peddling BOINC 7.2.47, which is 4+ years old and insecure, when 7.8.3 for Windows has been out for months, and 7.9.x for other operating systems is out, and 7.10.x is right around the corner?

Look, I want to be excited to volunteer my money and electricity for a good cause, but it feels like nobody gives a darn and feels like nobody who actually works at IBM/WCG is actively addressing issues that are years old.
----------------------------------------
  • i3-8100 (Coffee Lake, 4C/4T) @ 3.6 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • E5800 (Wolfdale, 2C/2T) @ 3.2 GHz

[Apr 23, 2018 5:07:16 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 440
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

I share your frustration hchc.
These threats to throughput are dispiriting.

Ensuring that the calculations are identical between versions is probably a difficult task. Hopefully when these older experiments drop off (OM tasks are holdovers from years ago), we'll only be running optimized projects.

Hopefully the preview pane on the project status page is getting tripped up by the non-linear progress we're making through these remaining batches.
[Apr 23, 2018 7:28:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
wolfman1360
Senior Cruncher
Canada
Joined: Jan 17, 2016
Post Count: 176
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

Couple questions:

1. Does anyone know what causes the 32-bit Vina to run instead of 64-bit Vina when computing OET1 work units? I'm running nothing but 64-bit systems -- as most of us are -- and quite often many of the work units will launch 32-bit Vina. Why?

2. Last I checked, OET1 is a Quorum 1, Replication 1 project. Yet whenever a WCG volunteer screws up and their machine Errors out, it causes the work unit to change into a Quorum 2, Replication 2, which sets the whole project back unnecessarily and gives everyone twice the work. I started a thread in the Feedback subforum, entitled "Is there a way to disable absentee clie...nstantly error out?", but not a single WCG employee has bothered to comment on 1) root cause, or 2) what they're going to do to fix it.

Essentially, 100% of my Ebola work units now are repair work units caused by Windows 8.1 clients running BOINC 7.2.47, which is over 4 years old and obsolete.

There are 1 of 2 Error messages, with this being the most prevalent:
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
couldn't start app: CreateProcess() failed - A required privilege is not held by the client.
(0x522)
</message>
]]>

or less commonly:
<core_client_version>7.2.47</core_client_version>
<![CDATA[
<message>
couldn't start app: Can't get shared memory segment name: shmget() failed
</message>
]]>


Literally 100% of my Ebola work units are repair work units _2, _3, _4, etc., and 99% of the time, it's because a Windows 8.1 client running BOINC 7.2.47 has dropped the ball, literally duplicating the amount of work the project has left to do. If these clients had not screwed up in the first place, we'd be at Quorum 1, Replication 1 work units and the project would wrap up in an efficient, timely manner.

What gives? Why are WCG staff silent on this issue? I've asked time and time again, but it seems like IBM must be understaffing the team (relative to the workload, assuming they're drowning in work?), underpaying them, or there is a lack of motivation or drive. I've worked in companies where I was drowning in work and even working 80+ hours a week wasn't enough to keep up, since people kept quitting or getting fired.

I just wish I had any clue why nothing is being done to address the Windows 8.1 clients who keep screwing up -- what's the root cause that leads to the errors referenced above? And why are they still peddling BOINC 7.2.47, which is 4+ years old and insecure, when 7.8.3 for Windows has been out for months, and 7.9.x for other operating systems is out, and 7.10.x is right around the corner?

Look, I want to be excited to volunteer my money and electricity for a good cause, but it feels like nobody gives a darn and feels like nobody who actually works at IBM/WCG is actively addressing issues that are years old.


I am so happy that someone other than me has saw one of these exact things happen.
You would think that a task would be validated at the *beginning*, not the end.
I'm stuck with 6 of these on a machine that is struggling along with these as it is.

While this isn't faah2, boinc is underestimating the time remaining and I'm getting a ton of tasks that I'm probably going to have to babysit.

I can only imagine how many of these are going to fail because people who don't really look at the boinc or WCG window just let it do its thing when their computer is powered on.
----------------------------------------
Crunching for the betterment of human kind and the canines who will always be our best friends.
AWOU!
[Apr 23, 2018 7:49:05 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

That's something I noticed too. Almost all repair jobs I get are caused by some Windows 8.1 machines with client 7.2.47 that fail with:
couldn't start app: CreateProcess() failed - A required privilege is not held by the client.
Amount of such failures is significant, I believe the tech should have a look and this and don't issue such work to those machines.
----------------------------------------

[Apr 23, 2018 9:33:01 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Steve W
Advanced Cruncher
Joined: Dec 9, 2005
Post Count: 110
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

...caused by some Windows 8.1 machines with client 7.2.47...

Those clients might not actually be running on Windows 8.1. The 7.2.47 client doesn't identify Windows 10 correctly and reports it as Windows 8.1. A newer version of the client would fix this but the tech guys need to have time to check the client (known bugs, security...etc) before swapping it out as the client to download on the WCG site. They probably have this on their "to do" list, but have other things that take priority.


CreateProcess() failed - A required privilege is not held by the client.

I think I may have caused one or two of those in the dim and distant past when an anti-virus program started incorrectly blocking one of the project applications. I constantly get repair jobs with this error from all the projects. I agree that something should be done about these clients, just a matter of what and how. At least it could re-issue the work unit without a reduced deadline, but again it is all extra work for the tech guys to squeeze into their workload. The vast majority of their work I imagine we never know about or only find out about when there are issues or a new project is started.
[Apr 23, 2018 10:50:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
hchc
Veteran Cruncher
USA
Joined: Aug 15, 2006
Post Count: 735
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Starting OET1 "OM" type workunits

Steve W. said:
A newer version of the client would fix this but the tech guys need to have time to check the client (known bugs, security...etc) before swapping it out as the client to download on the WCG site. They probably have this on their "to do" list, but have other things that take priority.

Newer BOINC versions fix bugs as well as security issues and include newer versions of OpenSSL, for example. I'm not so sure I'm buying the IBM "Security" angle and why it would take 4+ years. I noticed that WCG has a custom client: that adds unnecessary complexity. All WCG needs to do is use the official BOINC client instead of messing with it: that's what I use instead of the customized nonsense. I'm also not sure I'm buying the "priority" angle, as the BOINC client is user-facing, and that's pretty darn high priority. All they have to do is use the official client and either link to the Berkeley site for WCG users to download, or copy the exact same binaries (EXE/MSI, whatever) and host them on WCG. Very little work necessary.

And since the client is all open source, if there's a security audit necessary, the IBM team already have access to all the code.

Steve W. said:
I think I may have caused one or two of those in the dim and distant past when an anti-virus program started incorrectly blocking one of the project applications. I constantly get repair jobs with this error from all the projects. I agree that something should be done about these clients, just a matter of what and how. At least it could re-issue the work unit without a reduced deadline, but again it is all extra work for the tech guys to squeeze into their workload. The vast majority of their work I imagine we never know about or only find out about when there are issues or a new project is started.

My beef is that whenever these errors happen, it makes the work unit Quorum 2 instead of 1, so it duplicates the amount of work for the project. And repair work units, instead of having 10 day deadlines, are cut down to like 3.5 days or less, quickly snowballing into _3, _4, and _5 work units are more clients either "Error" out or "No Reply."

Right now my main Windows 10 box has 100% Ebola repair units, and they're all due to client version 7.2.47, mostly on Windows 8.1 but some Windows 7.
----------------------------------------
  • i3-8100 (Coffee Lake, 4C/4T) @ 3.6 GHz
  • i5-4590 (Haswell, 4C/4T) @ 3.3 GHz
  • E5800 (Wolfdale, 2C/2T) @ 3.2 GHz

----------------------------------------
[Edit 2 times, last edit by hchc at Apr 23, 2018 11:19:20 AM]
[Apr 23, 2018 11:12:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 126   Pages: 13   [ Previous Page | 4 5 6 7 8 9 10 11 12 13 | Next Page ]
[ Jump to Last Post ]
Post new Thread