Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 98
Posts: 98   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 10640 times and has 97 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

I've aborted ALL unfinished/Unstarted beta tasks

If you did that about 20 mins before you posted, a dual-core of mine picked two of them up. (If not, someone else aborted a bunch of them about that time.) They were short 00010 ones that completed fine.
[Sep 19, 2014 11:46:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
tmedve
Senior Cruncher
USA
Joined: Nov 16, 2004
Post Count: 182
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

I was running BOINC 7.2.28 when I had my problem of 4 BETAs stuck at 0.000% with 5 or 6 hrs of run time. I downloaded BOINC 7.2.47 and restarted everything and now the % complete is counting. Thought you might like to know.

Run time was reset to 0 on restart, so I may have lost about 22 hrs of run time. crying
----------------------------------------

----------------------------------------
[Edit 1 times, last edit by tmedve at Sep 19, 2014 11:54:30 AM]
[Sep 19, 2014 11:49:49 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

On the ones I've got, there are no checkpoint files being created or updated. Indeed, there is no file access in the slot directory except for the file "boinc_ugm1_2".
[Sep 19, 2014 12:19:24 PM]   Link   Report threatening or abusive post: please login first  Go to top 
jonnieb-uk
Ace Cruncher
England
Joined: Nov 30, 2011
Post Count: 6105
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

I was running BOINC 7.2.28 when I had my problem of 4 BETAs stuck at 0.000% with 5 or 6 hrs of run time. I downloaded BOINC 7.2.47 and restarted everything and now the % complete is counting. Thought you might like to know.

Run time was reset to 0 on restart, so I may have lost about 22 hrs of run time. crying


Don't be fooled!

The only reports of completed WUs I have seen in this thread relate to a bare handful of jobs that have completed in minutes rather than hours. I would suggest:

  • Use Result Status to check what's happening with the wingmen
  • A few of the short running jobs have been reported as checkpointing. These are probably ok to leave running.
  • Many posts in this thread report normal behaviour until progress reaches >99% and then the WU never completes.
  • If you're concerned suspend processing of other BETA work until the techs come up with "official" advice.


BETA work is never "wasted". Errors are to be expected prior to a project going into production. In theory (I think) the RunTime of user aborted units will be added to a users stats when the WU is eventually validated. Not sure what happens if they never validate.

This has not been the best of BETA tests
----------------------------------------

To Join follow this link: Join the UK Team All Welcome! UK Team thread
[Sep 19, 2014 12:24:29 PM]   Link   Report threatening or abusive post: please login first  Go to top 
deltavee
Ace Cruncher
Texas Hill Country
Joined: Nov 17, 2004
Post Count: 4884
Status: Recently Active
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

I just aborted 74 Beta WUs. I'm leaving town for the weekend and will not have access to the machines. If I had received prior word from the techs I might have hung in there, but I didn't want to risk sitting idle.
----------------------------------------
4858
[Sep 19, 2014 12:25:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Worf_VX
Cruncher
Joined: Feb 8, 2012
Post Count: 1
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

All my 4 WU error :-( More than 16h run....
[Sep 19, 2014 1:12:20 PM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

The Windows machine self-aborted at 15 hours, maximum run-time exceeded. Several Linux boxes are still going at 18+ hours.

Edit: And yes, they were re-issued and not returned yet. If they were checkpointing they would likely have been returned by the third user by now.
----------------------------------------

Distributed computing volunteer since September 27, 2000
----------------------------------------
[Edit 1 times, last edit by KWSN - A Shrubbery at Sep 19, 2014 1:20:58 PM]
[Sep 19, 2014 1:15:37 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jason1478963
Senior Cruncher
United States
Joined: Sep 18, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

I had around 52 tasks with some being short and validating. I recently had a few now with computation error containing the message:
16998 World Community Grid 9/19/2014 9:07:09 AM Output file BETA_ugm1_ugm1_00010_0300_1_0 for task BETA_ugm1_ugm1_00010_0300_1 absent

I also have twenty with over 11 hours of run time and advancing very slowly with all above 99.9XX percent complete also with no estimated time for completion.
----------------------------------------

[Sep 19, 2014 1:27:01 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

Here's a more interesting one. I picked up a replacement task for one that bombed out after almost 12 hours with "process exited with code 239 (0xef, -17)". It's a 00010 series, so it should be short:
BETA_ugm1_ugm1_00010_0980

The stderr.txt file has entries like this:
*** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !SGGELFAGLQSDDFYVY [maxn:36000/maxt3:35850]
*** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !AMEMAMWSLLGERPVQM [maxn:36000/maxt3:35850]
*** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !SVQWLVLAGLPAMQLAF [maxn:36000/maxt3:35850]
*** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !MVTLAFDITKFCYHKSY [maxn:36000/maxt3:35850]
*** error [compacc2.c:996] n1[0/0] != n1[55] from re_getlib() at !QLPLFVPCLFGGILLTN [maxn:36000/maxt3:35850]

I won't try to put them all in here, as it wrote 17855 such lines in the first minute of processing!

Since then, it's been running without any further error messages and without any checkpoint files being written either.
[Sep 19, 2014 1:53:07 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Jason1478963
Senior Cruncher
United States
Joined: Sep 18, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: New BETA test - Sept 18, 2014 [ Issues Thread ]

That was my machine that bombed out on the BETA_ugm1_ugm1_00010_0980. I also had BETA_ugm1_ugm1_00010_0300 with the same error "process exited with code 239 (0xef, -17)" in the messages window i received output file message absent on these work units.

I thought i saw one of these with over 180 checkpoints but I cannot confirm as I didn't catch what work unit had the high number of checkpoints.
----------------------------------------

[Sep 19, 2014 2:06:36 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 98   Pages: 10   [ Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page ]
[ Jump to Last Post ]
Post new Thread