Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go »
No member browsing this thread
Thread Status: Active
Total posts in this thread: 11
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 98611 times and has 10 replies Next Thread
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
The Bad Type A WUs Thread

This looks like a bad Type A WU: ts05_b001_ps0000
Details:
First 3 copies encountered Error 29 (0x1d) and quit after many hours.
Copies _3 is still "In Progress". [Edit #3]: Copy _3 has run to completion and is PV (!)
I suspended my copy after 6h07m at 27.1% Progress. This would have prevented a new copy being despatched until the techs intervened, or until it timed out on 21 Apr.
[Edit #1]: I aborted the WU. uplinger (Changes to distribution of error work units) said that they are changing the no of errors after which no more copes of a WU will be sent, from 5 to 3. This one has had 3 errors so far, so I've tested the changes.
[Edit #2]: Copy _5 has now been sent out to some poor unsuspecting cruncher. The decreased max error count isn't working for WUs that were started before uplinger's changes. [end edits]
The result logs from the 3 copies that completed seem normal up to the point where the exit happened, except for the error message in the header. For example, here are the head and tail of the log for copy _1:
------
Result Name: ts05_ b001_ ps0000_ 0--
<core_client_version>5.10.30</core_client_version>
<![CDATA[
<message>
The system cannot write to the specified device. (0x1d) - exit code 29 (0x1d)
</message>
<stderr_txt>
.416000
wcgStepsDone = 4100 wcgSteps1 = 5000 wcgCyclesDone = 20 wcgCycles = 50 pctComplete = 0.416400
... <omitted section here> ...
wcgStepsDone = 2000 wcgSteps1 = 5000 wcgCyclesDone = 34 wcgCycles = 50 pctComplete = 0.688000
Encountered error. Exiting.
</stderr_txt>
]]>
---------
Here are extracted details of copies _0, _1 and _3. Columns are:
Name | Extracts_from_result_log | CPU_Time | Claimed/Awarded
ts05_b001_ps0000_0 | exit code 29, wcgCycles = 50 pctComplete = 0.688000; Encountered error. Exiting. | 22.10 | 384.3 / 0.0
ts05_b001_ps0000_1 | exit code 29, wcgCycles = 50 pctComplete = 0.447600; Encountered error. Exiting. | 43.14 | 347.3 / 0.0
ts05_b001_ps0000_2 | exit code 29, wcgCycles = 50 pctComplete = 0.688000; Encountered error. Exiting. | 21.13 | 387.3 / 0.0
[OT] Wish this <adjective omitted> forum software could do tables or at least allowed table stops (aka tabs)!
----------------------------------------
[Edit 3 times, last edit by Rickjb at Apr 18, 2010 9:13:33 AM]
[Apr 14, 2010 7:05:31 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: The Bad Type A WUs Thread

[Edit #2]: Copy _5 has now been sent out to some poor unsuspecting cruncher. The decreased max error count isn't working for WUs that were started before uplinger's changes. [end edits]
[snip]
[OT] Wish this <adjective omitted> forum software could do tables or at least allowed table stops (aka tabs)!



To the former, see https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,28881#276494
All of those were sent out after Uplinger's post.

To the latter, use the 'Code' tags - that preserves tabs.

one tab
two tabs
three tabs
four tabs
two tabs
five tabs


Of course, you have to paste them in. But click and drag on them... they're still tabs. :)

Some board software makes it clear when code tags are used, offsetting it in a 'quote' type box or inset, and/or actually putting the word "Code" before it; mvnForum makes the use of 'code' tags practically seamless.
[Apr 17, 2010 2:04:46 PM]   Link   Report threatening or abusive post: please login first  Go to top 
I need a bath
Senior Cruncher
USA
Joined: Apr 12, 2007
Post Count: 347
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Bad Type A WUs Thread

we have already established that these exit code 29 wu are not "bad". The so-called error is actually a scientifically useful result. Furthermore credit will be granted, so don't abort them.
----------------------------------------

[Apr 17, 2010 4:06:43 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: The Bad Type A WUs Thread

we have already established that these exit code 29 wu are not "bad". The so-called error is actually a scientifically useful result.

I don't think this is always valid, if someone did something like what mweisensee has done here: https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,28891#276541

Edit: Removed part of the quote to eliminate confusion.
----------------------------------------
[Edit 1 times, last edit by Former Member at Apr 18, 2010 5:11:56 AM]
[Apr 18, 2010 5:09:37 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: The Bad Type A WUs Thread

What mweisensee did may not have been the genuine article [the tolerances for CPDN are designed to do that stuff]. I've had several cases where the resume was from zero percent but the run time was not lost and ended up showing near double from normal and credit hacked in half and valid.

At least the one set I had with 4 error returns and 1 server abort showed 3 with the same error at 0.2384 in the log and dramatic different run times. The 4th could have been a restart and had 2.5 times the run time of the other 3 when going down at 0.5016.

Anyway, the announced drought for 3 weeks is not for nothing.

Good morning world.

edit: added progress value for 4th result.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 2 times, last edit by Sekerob at Apr 18, 2010 10:48:38 AM]
[Apr 18, 2010 6:17:04 AM]   Link   Report threatening or abusive post: please login first  Go to top 
sk..
Master Cruncher
http://s17.rimg.info/ccb5d62bd3e856cc0d1df9b0ee2f7f6a.gif
Joined: Mar 22, 2007
Post Count: 2324
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Bad Type A WUs Thread

I suspect this project is taking up a disproportionately high amount of technical & scientific time by the WCG team.
3 weeks to think it over perhaps? It is now defined as an Intermittent project; no more semi status. So perhaps it has already been decided that it will not be allowed to interfere with other existing and planned projects?
[Apr 18, 2010 10:46:23 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: The Bad Type A WUs Thread

Think the thinking is over at UTMB... anyway, we long had your message extensively. You're not forced to contribute time towards this science and the intermittent was stated up front, even with the active label on it... because the scientists need to work on the output maybe? And if all this effort results in 1 or 2 strongly promising compounds, then the mission will have been a resounding success no matter how we got there.

Now, hold on to your boots... science takes time.
----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Apr 18, 2010 10:56:00 AM]
[Apr 18, 2010 10:54:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
ThreadRipper
Veteran Cruncher
Sweden
Joined: Apr 26, 2007
Post Count: 1321
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Bad Type A WUs Thread

I don't know if the WU is bad or what happened, but the WU ts05_b179_ps0000 was finished after avout 30 hours but in the results page it says "Too late". I got it the 14th af April and it said that deadline was 26th. Today is 22nd.

Very strange. Also when I look at the quorum every other copy of this one was marked with "Error"... maybe someone should have a look at it?
----------------------------------------

Join The International Team: https://www.worldcommunitygrid.org/team/viewTeamInfo.do?teamId=CK9RP1BKX1

AMD TR2990WX @ PBO, 64GB Quad 3200MHz 14-17-17-17-1T, RX6900XT @ Stock
AMD 3800X @ PBO
AMD 2700X @ 4GHz
[Apr 22, 2010 5:01:42 PM]   Link   Report threatening or abusive post: please login first  Go to top 
boulmontjj
Senior Cruncher
France
Joined: Nov 17, 2004
Post Count: 317
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: The Bad Type A WUs Thread

I don't know if the WU is bad or what happened, but the WU ts05_b179_ps0000 was finished after avout 30 hours but in the results page it says "Too late". I got it the 14th af April and it said that deadline was 26th. Today is 22nd.

Very strange. Also when I look at the quorum every other copy of this one was marked with "Error"... maybe someone should have a look at it?


Same for mine.
It finished ok and retunr before the dead line but has been marked "too late". And every other copiesfinished in error.
----------------------------------------

Rejoignez nous et visitez le site de l'équipe France ici http://www.grid-france.fr
[Apr 22, 2010 7:06:33 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Sekerob
Ace Cruncher
Joined: Jul 24, 2005
Post Count: 20043
Status: Offline
Reply to this Post  Reply with Quote 
Re: The Bad Type A WUs Thread

From past observation, if the distribution is stopped [for reasons such as here too many errors], the later returned results, if ending normal get marked Too Late [lmaybe lost in froglation http://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,27775 ]. Yes, they will convert to getting credit for all the parts, same as were it a regular unit, these trop tard as boulmontjj posted in other thread:
Nom du résultat App Version Number Etat Heure d'envoi Heure de retour prévue /
Heure de retour Temps d'unité centrale (heures) Crédit BOINC demandé/accordé
ts05_ b039_ ps0000_ 4-- 612 Erreur 18/04/10 22:21:11 22/04/10 00:56:35 25,44 417,7 / 417,7
ts05_ b039_ ps0000_ 3-- 612 Erreur 17/04/10 19:04:35 18/04/10 22:21:09 16,73 282,0 / 282,0
ts05_ b039_ ps0000_ 2-- 612 Trop tard 15/04/10 18:09:02 21/04/10 14:33:32 49,72 757,2 / 757,2 < This one
ts05_ b039_ ps0000_ 1-- 612 Erreur 14/04/10 20:14:58 15/04/10 18:08:53 10,56 183,5 / 183,5
ts05_ b039_ ps0000_ 0-- 612 Erreur 14/04/10 20:14:56 17/04/10 18:41:13 22,73 388,3 / 388,3

----------------------------------------
WCG Global & Research > Make Proposal Help: Start Here!
Please help to make the Forums an enjoyable experience for All!
----------------------------------------
[Edit 1 times, last edit by Sekerob at Apr 22, 2010 7:21:27 PM]
[Apr 22, 2010 7:20:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 11   Pages: 2   [ 1 2 | Next Page ]
[ Jump to Last Post ]
Post new Thread