Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 7
|
![]() |
Author |
|
Ardruin
Cruncher Germany Joined: Dec 3, 2007 Post Count: 21 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi,
----------------------------------------i have a problem with a Typ-A Workunit. After 45 hours and 20 minutes the progress stand at 2.84%. My computer is a Intel Q9450 @3,1 GHz with 4 GB DDR2-RAM and Windows Vista. All other Typ-A Workunits running normal. What should i do? I calculate (with the same computing speed) ~65 days to complete the WU. Deadline are 10 days. WU-Details: Project Name: Discovering Dengue Drugs - Together - Phase 2 (Type A) Created: 05.04.10 Name: erlc_b099_ps0000 Minimum Quorum: 2 Replication: 2 gb009761: Hi Markus, have you tried shutting down/restarting BOINC?, also, what is your wingman reporting? -> I haven't tried restarting. After the next checkpoint (4%) i will tried it. All other WUs (HCC, HFCC) running normal -> My wingman's units is also still in progress Regards. Markus |
||
|
JmBoullier
Former Community Advisor Normandy - France Joined: Jan 26, 2007 Post Count: 3715 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Markus,
----------------------------------------This WU is definitely not running as it should. In case uplinger or another tech wants to check anything I suggest that you 1. make sure you have enough work in queue until the next time you can monitor this device; if necessary, increase your cache size as appropriate and wait for WUs to be downloaded before going to the next step 2. suspend this type A WU until you get new instructions or you need to refill the cache (there is no work fetch while there are any suspended tasks). 3. if you need to refill the cache before you get feedback from the techs resume the suspended task while WUs are downloaded, then suspend it again. If you don't like this suggestion you can obviously decide to kill this WU, because it is most probable that it will never reach a happy end. ![]() Sorry for the trouble. Jean. |
||
|
Ardruin
Cruncher Germany Joined: Dec 3, 2007 Post Count: 21 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Ok, thx JmBoullier.
----------------------------------------Progress is now @ 4.00% and. i stop the computing the WU, until the WCG Techs give new instructions. It's no trouble for me :) Bye Markus |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Markus,
I am sorry for my slow response. From my initial looks at what you have commented on, it appears that the work unit is not converging as it should. We encountered one of these in BETA but are not able to prevent from the very few that squeak through. We are going to set the status of the work unit as unloaded and you should get server abort for that work unit. When I say very few, this is the 2nd one we have seen other than the beta work unit referred to above. Thanks Jean for bringing this to my attention. Thanks, -Uplinger |
||
|
rembertw
Senior Cruncher Belgium Joined: Nov 21, 2005 Post Count: 275 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
For what it's worth, also this work unit is problematic:
ts01_a293_pe0000 It errored on 5 different boxes now (including one of mine) and is now replication 6 and 7 somewhere. |
||
|
uplinger
Former World Community Grid Tech Joined: May 23, 2005 Post Count: 3952 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
rembertw,
If you post the stderr from the results status page you may see why it has errored out. I believe what you'll see is a either a memory access error or "ENERGY TOLERANCE" error. Both of which are the same and is a positive negative for the researchers. -Uplinger |
||
|
Ardruin
Cruncher Germany Joined: Dec 3, 2007 Post Count: 21 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thx for the help uplinger!
----------------------------------------The wu is now server aborted :) Great support here! Bye Markus |
||
|
|
![]() |