Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 98
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
I've aborted ALL unfinished/Unstarted beta tasks If you did that about 20 mins before you posted, a dual-core of mine picked two of them up. (If not, someone else aborted a bunch of them about that time.) They were short 00010 ones that completed fine. |
||
|
tmedve
Senior Cruncher USA Joined: Nov 16, 2004 Post Count: 182 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I was running BOINC 7.2.28 when I had my problem of 4 BETAs stuck at 0.000% with 5 or 6 hrs of run time. I downloaded BOINC 7.2.47 and restarted everything and now the % complete is counting. Thought you might like to know.
----------------------------------------Run time was reset to 0 on restart, so I may have lost about 22 hrs of run time. ![]() ![]() [Edit 1 times, last edit by tmedve at Sep 19, 2014 11:54:30 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
On the ones I've got, there are no checkpoint files being created or updated. Indeed, there is no file access in the slot directory except for the file "boinc_ugm1_2".
|
||
|
jonnieb-uk
Ace Cruncher England Joined: Nov 30, 2011 Post Count: 6105 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I was running BOINC 7.2.28 when I had my problem of 4 BETAs stuck at 0.000% with 5 or 6 hrs of run time. I downloaded BOINC 7.2.47 and restarted everything and now the % complete is counting. Thought you might like to know. Run time was reset to 0 on restart, so I may have lost about 22 hrs of run time. ![]() Don't be fooled! The only reports of completed WUs I have seen in this thread relate to a bare handful of jobs that have completed in minutes rather than hours. I would suggest:
BETA work is never "wasted". Errors are to be expected prior to a project going into production. In theory (I think) the RunTime of user aborted units will be added to a users stats when the WU is eventually validated. Not sure what happens if they never validate. This has not been the best of BETA tests |
||
|
deltavee
Ace Cruncher Texas Hill Country Joined: Nov 17, 2004 Post Count: 4884 Status: Recently Active Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I just aborted 74 Beta WUs. I'm leaving town for the weekend and will not have access to the machines. If I had received prior word from the techs I might have hung in there, but I didn't want to risk sitting idle.
----------------------------------------
4858
|
||
|
Worf_VX
Cruncher Joined: Feb 8, 2012 Post Count: 1 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
All my 4 WU error :-( More than 16h run....
|
||
|
KWSN - A Shrubbery
Master Cruncher Joined: Jan 8, 2006 Post Count: 1585 Status: Offline |
The Windows machine self-aborted at 15 hours, maximum run-time exceeded. Several Linux boxes are still going at 18+ hours.
----------------------------------------Edit: And yes, they were re-issued and not returned yet. If they were checkpointing they would likely have been returned by the third user by now. ![]() Distributed computing volunteer since September 27, 2000 [Edit 1 times, last edit by KWSN - A Shrubbery at Sep 19, 2014 1:20:58 PM] |
||
|
Jason1478963
Senior Cruncher United States Joined: Sep 18, 2005 Post Count: 295 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I had around 52 tasks with some being short and validating. I recently had a few now with computation error containing the message:
----------------------------------------16998 World Community Grid 9/19/2014 9:07:09 AM Output file BETA_ugm1_ugm1_00010_0300_1_0 for task BETA_ugm1_ugm1_00010_0300_1 absent I also have twenty with over 11 hours of run time and advancing very slowly with all above 99.9XX percent complete also with no estimated time for completion. ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Here's a more interesting one. I picked up a replacement task for one that bombed out after almost 12 hours with "process exited with code 239 (0xef, -17)". It's a 00010 series, so it should be short:
BETA_ugm1_ugm1_00010_0980 The stderr.txt file has entries like this: *** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !SGGELFAGLQSDDFYVY [maxn:36000/maxt3:35850] *** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !AMEMAMWSLLGERPVQM [maxn:36000/maxt3:35850] *** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !SVQWLVLAGLPAMQLAF [maxn:36000/maxt3:35850] *** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !MVTLAFDITKFCYHKSY [maxn:36000/maxt3:35850] *** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !QLPLFVPCLFGGILLTN [maxn:36000/maxt3:35850] I won't try to put them all in here, as it wrote 17855 such lines in the first minute of processing! Since then, it's been running without any further error messages and without any checkpoint files being written either. |
||
|
Jason1478963
Senior Cruncher United States Joined: Sep 18, 2005 Post Count: 295 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
That was my machine that bombed out on the BETA_ugm1_ugm1_00010_0980. I also had BETA_ugm1_ugm1_00010_0300 with the same error "process exited with code 239 (0xef, -17)" in the messages window i received output file message absent on these work units.
----------------------------------------I thought i saw one of these with over 180 checkpoints but I cannot confirm as I didn't catch what work unit had the high number of checkpoints. ![]() |
||
|
|
![]() |