Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 7
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1811 times and has 6 replies Next Thread
Ardruin
Cruncher
Germany
Joined: Dec 3, 2007
Post Count: 21
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
DDDT2-TypA // erlc_b099_ps0000 - possible bad work unit

Hi,

i have a problem with a Typ-A Workunit.
After 45 hours and 20 minutes the progress stand at 2.84%.
My computer is a Intel Q9450 @3,1 GHz with 4 GB DDR2-RAM and Windows Vista. All other Typ-A Workunits running normal.

What should i do? I calculate (with the same computing speed) ~65 days to complete the WU. Deadline are 10 days.

WU-Details:
Project Name: Discovering Dengue Drugs - Together - Phase 2 (Type A)
Created: 05.04.10
Name: erlc_b099_ps0000
Minimum Quorum: 2
Replication: 2

gb009761:

Hi Markus, have you tried shutting down/restarting BOINC?, also, what is your wingman reporting?


-> I haven't tried restarting. After the next checkpoint (4%) i will tried it. All other WUs (HCC, HFCC) running normal
-> My wingman's units is also still in progress



Regards.
Markus
----------------------------------------
[Apr 7, 2010 5:43:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
JmBoullier
Former Community Advisor
Normandy - France
Joined: Jan 26, 2007
Post Count: 3715
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2-TypA // erlc_b099_ps0000 - possible bad work unit

Markus,
This WU is definitely not running as it should.
In case uplinger or another tech wants to check anything I suggest that you
1. make sure you have enough work in queue until the next time you can monitor this device; if necessary, increase your cache size as appropriate and wait for WUs to be downloaded before going to the next step
2. suspend this type A WU until you get new instructions or you need to refill the cache (there is no work fetch while there are any suspended tasks).
3. if you need to refill the cache before you get feedback from the techs resume the suspended task while WUs are downloaded, then suspend it again.

If you don't like this suggestion you can obviously decide to kill this WU, because it is most probable that it will never reach a happy end. sad

Sorry for the trouble.
Jean.
----------------------------------------
Team--> Decrypthon -->Statistics/Join -->Thread
[Apr 7, 2010 6:29:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ardruin
Cruncher
Germany
Joined: Dec 3, 2007
Post Count: 21
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2-TypA // erlc_b099_ps0000 - possible bad work unit

Ok, thx JmBoullier.
Progress is now @ 4.00% and. i stop the computing the WU, until the WCG Techs give new instructions.

It's no trouble for me :)

Bye
Markus
----------------------------------------
[Apr 8, 2010 3:40:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2-TypA // erlc_b099_ps0000 - possible bad work unit

Markus,

I am sorry for my slow response. From my initial looks at what you have commented on, it appears that the work unit is not converging as it should. We encountered one of these in BETA but are not able to prevent from the very few that squeak through. We are going to set the status of the work unit as unloaded and you should get server abort for that work unit.

When I say very few, this is the 2nd one we have seen other than the beta work unit referred to above.

Thanks Jean for bringing this to my attention.

Thanks,
-Uplinger
[Apr 8, 2010 9:17:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
rembertw
Senior Cruncher
Belgium
Joined: Nov 21, 2005
Post Count: 275
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2-TypA // erlc_b099_ps0000 - possible bad work unit

For what it's worth, also this work unit is problematic:

ts01_a293_pe0000

It errored on 5 different boxes now (including one of mine) and is now replication 6 and 7 somewhere.
[Apr 10, 2010 7:04:48 AM]   Link   Report threatening or abusive post: please login first  Go to top 
uplinger
Former World Community Grid Tech
Joined: May 23, 2005
Post Count: 3952
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2-TypA // erlc_b099_ps0000 - possible bad work unit

rembertw,

If you post the stderr from the results status page you may see why it has errored out. I believe what you'll see is a either a memory access error or "ENERGY TOLERANCE" error. Both of which are the same and is a positive negative for the researchers.

-Uplinger
[Apr 10, 2010 3:09:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Ardruin
Cruncher
Germany
Joined: Dec 3, 2007
Post Count: 21
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: DDDT2-TypA // erlc_b099_ps0000 - possible bad work unit

Thx for the help uplinger!
The wu is now server aborted :)

Great support here!


Bye
Markus
----------------------------------------
[Apr 10, 2010 3:24:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
[ Jump to Last Post ]
Post new Thread