Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
World Community Grid Forums
Category: Completed Research Forum: Outsmart Ebola Together Thread: Starting OET1 "OM" type workunits |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 126
|
Author |
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7219 Status: Offline Project Badges: |
I thought I was done with these 39.4 point work units, but I see I got another rash of them. This is on a Dell 1950 with 2Xeon L5420's. There seems to be no correlation between the run time and the low point scores. Maybe the techs could figure it out. I am not that concerned with the points so I am going to continue to run these anyway.
----------------------------------------OET1_ 0000643_ xSDGP-OM_ rig_ 77785_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 7.02 / 7.03 239.4 / 239.4 OET1_ 0000502_ xEBGP-OM_ rig_ 30058_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 6.47 / 6.47 220.3 / 220.3 OET1_ 0000618_ xEBGP-OM_ rig_ 51727_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 7.45 / 7.46 254.0 / 254.0 OET1_ 0000618_ xEBGP-OM_ rig_ 51694_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 8.25 / 8.27 281.6 / 281.6 OET1_ 0000502_ xEBGP-OM_ rig_ 30012_ 0-- PowerEdge-1950 Valid 4/20/18 04:16:10 4/21/18 21:04:26 6.06 / 6.07 206.7 / 206.7 OET1_ 0000618_ xEBGP-OM_ rig_ 90076_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 8.36 / 8.37 285.0 / 285.0 OET1_ 0000526_ xZAGP-OM_ rig_ 96393_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 6.78 / 6.79 231.1 / 231.1 OET1_ 0000618_ xEBGP-OM_ rig_ 89872_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 7.29 / 7.29 248.2 / 248.2 OET1_ 0000618_ xEBGP-OM_ rig_ 89893_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 6.92 / 6.94 236.1 / 236.1 OET1_ 0000617_ xEBGP-OM_ rig_ 15997_ 0-- PowerEdge-1950 Valid 4/20/18 18:38:37 4/21/18 21:04:26 5.85 / 5.86 199.6 / 199.6 OET1_ 0000617_ xEBGP-OM_ rig_ 7286_ 1-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 8.21 / 8.22 39.4 / 0.0 OET1_ 0000617_ xEBGP-OM_ rig_ 7466_ 1-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 7.88 / 7.90 39.4 / 0.0 OET1_ 0000502_ xEBGP-OM_ rig_ 30670_ 2-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 8.68 / 8.70 39.4 / 0.0 OET1_ 0000620_ xEBGP-OM_ rig_ 93523_ 2-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 10.43 / 10.46 242.4 / 0.0 OET1_ 0000502_ xEBGP-OM_ rig_ 30702_ 1-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 7.06 / 7.07 39.4 / 0.0 OET1_ 0000617_ xEBGP-OM_ rig_ 7405_ 2-- PowerEdge-1950 Pending Validation 4/20/18 15:39:22 4/21/18 21:04:26 7.38 / 7.40 39.4 / 0.0 OET1_ 0000526_ xZAGP-OM_ rig_ 25484_ 3-- PowerEdge-1950 Valid 4/20/18 15:39:22 4/21/18 21:04:26 7.36 / 7.38 39.4 / 162.2 Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
KerSamson
Master Cruncher Switzerland Joined: Jan 29, 2007 Post Count: 1664 Status: Offline Project Badges: |
Since I run currently OET1 only, I consider only the average granted credit per day and week. At this time, I don't have much time for performing more detailed analysis.
----------------------------------------My above feedback about 10% differences relies on an average of 6 hosts (5 Linux, 1 Win7 Pro x64) based on AMD (4 hosts; still not any Ryzen) and Intel (2 hosts). Maybe there is from time to time some "inadequate" credits ? ... Cheers, Yves |
||
|
Procyon Lotor One
Cruncher Joined: Jun 28, 2008 Post Count: 14 Status: Offline Project Badges: |
There is a 37% of the project remaining so I think you will be okay in terms of run-time. How many days do you average per calendar day? I get about 14 to 15 per calendar day with my three host. |
||
|
Speedy51
Veteran Cruncher New Zealand Joined: Nov 4, 2005 Post Count: 1220 Status: Offline Project Badges: |
There is a 37% of the project remaining so I think you will be okay in terms of run-time. How many days do you average per calendar day? I get about 14 to 15 per calendar day with my three host. At 14.9 days per calendar day it will take you 24.49 days to achieve one year sums 365 divided by 14.9 = 24.49. How many months will it take you to reach your goal? |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 735 Status: Offline Project Badges: |
Couple questions:
----------------------------------------1. Does anyone know what causes the 32-bit Vina to run instead of 64-bit Vina when computing OET1 work units? I'm running nothing but 64-bit systems -- as most of us are -- and quite often many of the work units will launch 32-bit Vina. Why? 2. Last I checked, OET1 is a Quorum 1, Replication 1 project. Yet whenever a WCG volunteer screws up and their machine Errors out, it causes the work unit to change into a Quorum 2, Replication 2, which sets the whole project back unnecessarily and gives everyone twice the work. I started a thread in the Feedback subforum, entitled "Is there a way to disable absentee clie...nstantly error out?", but not a single WCG employee has bothered to comment on 1) root cause, or 2) what they're going to do to fix it. Essentially, 100% of my Ebola work units now are repair work units caused by Windows 8.1 clients running BOINC 7.2.47, which is over 4 years old and obsolete. There are 1 of 2 Error messages, with this being the most prevalent: <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> couldn't start app: CreateProcess() failed - A required privilege is not held by the client. (0x522) </message> ]]> or less commonly: <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> couldn't start app: Can't get shared memory segment name: shmget() failed </message> ]]> Literally 100% of my Ebola work units are repair work units _2, _3, _4, etc., and 99% of the time, it's because a Windows 8.1 client running BOINC 7.2.47 has dropped the ball, literally duplicating the amount of work the project has left to do. If these clients had not screwed up in the first place, we'd be at Quorum 1, Replication 1 work units and the project would wrap up in an efficient, timely manner. What gives? Why are WCG staff silent on this issue? I've asked time and time again, but it seems like IBM must be understaffing the team (relative to the workload, assuming they're drowning in work?), underpaying them, or there is a lack of motivation or drive. I've worked in companies where I was drowning in work and even working 80+ hours a week wasn't enough to keep up, since people kept quitting or getting fired. I just wish I had any clue why nothing is being done to address the Windows 8.1 clients who keep screwing up -- what's the root cause that leads to the errors referenced above? And why are they still peddling BOINC 7.2.47, which is 4+ years old and insecure, when 7.8.3 for Windows has been out for months, and 7.9.x for other operating systems is out, and 7.10.x is right around the corner? Look, I want to be excited to volunteer my money and electricity for a good cause, but it feels like nobody gives a darn and feels like nobody who actually works at IBM/WCG is actively addressing issues that are years old.
|
||
|
Dayle Diamond
Senior Cruncher Joined: Jan 31, 2013 Post Count: 440 Status: Offline Project Badges: |
I share your frustration hchc.
These threats to throughput are dispiriting. Ensuring that the calculations are identical between versions is probably a difficult task. Hopefully when these older experiments drop off (OM tasks are holdovers from years ago), we'll only be running optimized projects. Hopefully the preview pane on the project status page is getting tripped up by the non-linear progress we're making through these remaining batches. |
||
|
wolfman1360
Senior Cruncher Canada Joined: Jan 17, 2016 Post Count: 176 Status: Offline Project Badges: |
Couple questions: 1. Does anyone know what causes the 32-bit Vina to run instead of 64-bit Vina when computing OET1 work units? I'm running nothing but 64-bit systems -- as most of us are -- and quite often many of the work units will launch 32-bit Vina. Why? 2. Last I checked, OET1 is a Quorum 1, Replication 1 project. Yet whenever a WCG volunteer screws up and their machine Errors out, it causes the work unit to change into a Quorum 2, Replication 2, which sets the whole project back unnecessarily and gives everyone twice the work. I started a thread in the Feedback subforum, entitled "Is there a way to disable absentee clie...nstantly error out?", but not a single WCG employee has bothered to comment on 1) root cause, or 2) what they're going to do to fix it. Essentially, 100% of my Ebola work units now are repair work units caused by Windows 8.1 clients running BOINC 7.2.47, which is over 4 years old and obsolete. There are 1 of 2 Error messages, with this being the most prevalent: <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> couldn't start app: CreateProcess() failed - A required privilege is not held by the client. (0x522) </message> ]]> or less commonly: <core_client_version>7.2.47</core_client_version> <![CDATA[ <message> couldn't start app: Can't get shared memory segment name: shmget() failed </message> ]]> Literally 100% of my Ebola work units are repair work units _2, _3, _4, etc., and 99% of the time, it's because a Windows 8.1 client running BOINC 7.2.47 has dropped the ball, literally duplicating the amount of work the project has left to do. If these clients had not screwed up in the first place, we'd be at Quorum 1, Replication 1 work units and the project would wrap up in an efficient, timely manner. What gives? Why are WCG staff silent on this issue? I've asked time and time again, but it seems like IBM must be understaffing the team (relative to the workload, assuming they're drowning in work?), underpaying them, or there is a lack of motivation or drive. I've worked in companies where I was drowning in work and even working 80+ hours a week wasn't enough to keep up, since people kept quitting or getting fired. I just wish I had any clue why nothing is being done to address the Windows 8.1 clients who keep screwing up -- what's the root cause that leads to the errors referenced above? And why are they still peddling BOINC 7.2.47, which is 4+ years old and insecure, when 7.8.3 for Windows has been out for months, and 7.9.x for other operating systems is out, and 7.10.x is right around the corner? Look, I want to be excited to volunteer my money and electricity for a good cause, but it feels like nobody gives a darn and feels like nobody who actually works at IBM/WCG is actively addressing issues that are years old. I am so happy that someone other than me has saw one of these exact things happen. You would think that a task would be validated at the *beginning*, not the end. I'm stuck with 6 of these on a machine that is struggling along with these as it is. While this isn't faah2, boinc is underestimating the time remaining and I'm getting a ton of tasks that I'm probably going to have to babysit. I can only imagine how many of these are going to fail because people who don't really look at the boinc or WCG window just let it do its thing when their computer is powered on.
Crunching for the betterment of human kind and the canines who will always be our best friends.
AWOU! |
||
|
Mumak
Senior Cruncher Joined: Dec 7, 2012 Post Count: 477 Status: Offline Project Badges: |
That's something I noticed too. Almost all repair jobs I get are caused by some Windows 8.1 machines with client 7.2.47 that fail with:
----------------------------------------couldn't start app: CreateProcess() failed - A required privilege is not held by the client. Amount of such failures is significant, I believe the tech should have a look and this and don't issue such work to those machines. |
||
|
Steve W
Advanced Cruncher Joined: Dec 9, 2005 Post Count: 110 Status: Offline Project Badges: |
...caused by some Windows 8.1 machines with client 7.2.47... Those clients might not actually be running on Windows 8.1. The 7.2.47 client doesn't identify Windows 10 correctly and reports it as Windows 8.1. A newer version of the client would fix this but the tech guys need to have time to check the client (known bugs, security...etc) before swapping it out as the client to download on the WCG site. They probably have this on their "to do" list, but have other things that take priority. CreateProcess() failed - A required privilege is not held by the client. I think I may have caused one or two of those in the dim and distant past when an anti-virus program started incorrectly blocking one of the project applications. I constantly get repair jobs with this error from all the projects. I agree that something should be done about these clients, just a matter of what and how. At least it could re-issue the work unit without a reduced deadline, but again it is all extra work for the tech guys to squeeze into their workload. The vast majority of their work I imagine we never know about or only find out about when there are issues or a new project is started. |
||
|
hchc
Veteran Cruncher USA Joined: Aug 15, 2006 Post Count: 735 Status: Offline Project Badges: |
Steve W. said:
----------------------------------------A newer version of the client would fix this but the tech guys need to have time to check the client (known bugs, security...etc) before swapping it out as the client to download on the WCG site. They probably have this on their "to do" list, but have other things that take priority. Newer BOINC versions fix bugs as well as security issues and include newer versions of OpenSSL, for example. I'm not so sure I'm buying the IBM "Security" angle and why it would take 4+ years. I noticed that WCG has a custom client: that adds unnecessary complexity. All WCG needs to do is use the official BOINC client instead of messing with it: that's what I use instead of the customized nonsense. I'm also not sure I'm buying the "priority" angle, as the BOINC client is user-facing, and that's pretty darn high priority. All they have to do is use the official client and either link to the Berkeley site for WCG users to download, or copy the exact same binaries (EXE/MSI, whatever) and host them on WCG. Very little work necessary. And since the client is all open source, if there's a security audit necessary, the IBM team already have access to all the code. Steve W. said: I think I may have caused one or two of those in the dim and distant past when an anti-virus program started incorrectly blocking one of the project applications. I constantly get repair jobs with this error from all the projects. I agree that something should be done about these clients, just a matter of what and how. At least it could re-issue the work unit without a reduced deadline, but again it is all extra work for the tech guys to squeeze into their workload. The vast majority of their work I imagine we never know about or only find out about when there are issues or a new project is started. My beef is that whenever these errors happen, it makes the work unit Quorum 2 instead of 1, so it duplicates the amount of work for the project. And repair work units, instead of having 10 day deadlines, are cut down to like 3.5 days or less, quickly snowballing into _3, _4, and _5 work units are more clients either "Error" out or "No Reply." Right now my main Windows 10 box has 100% Ebola repair units, and they're all due to client version 7.2.47, mostly on Windows 8.1 but some Windows 7.
[Edit 2 times, last edit by hchc at Apr 23, 2018 11:19:20 AM] |
||
|
|