Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 22
Posts: 22   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 1577 times and has 21 replies Next Thread
Alther
Former World Community Grid Tech
United States of America
Joined: Sep 30, 2004
Post Count: 414
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Something going on with WU ex379_2A ??

Okay, thanks for the info. So if there'll be another error this WU will be rejected.

The error occured just when the benchmarks started to run:
2006-06-11 21:23:39 [---] Suspending computation - running CPU benchmarks
2006-06-11 21:23:39 [World Community Grid] Pausing task ex379_2A_3 (removed from memory)
2006-06-11 21:23:39 [---] Suspending network activity - running CPU benchmarks
2006-06-11 21:23:40 [World Community Grid] Unrecoverable error for result ex379_2A_3 ( - exit code -1073741819 (0xc0000005))

We see these errors from time to time in BOINC. The app crashes when BOINC suspends it (removes it from memory). In my investigations (and opinion), there is a race condition in the BOINC code which is causing these crashes. We've seen these since we first launched BOINC with the 5.2 client. Looks like now we're still seeing them with 5.4. Unlike the "bad" batches a couple of weeks ago, this problem can hit any workunit.

When a crash like this occurs, we are sent a stack trace. In all cases the stack trace leads to code in BOINC. The problem with race conditions is that they are notoriously hard to reproduce. In fact, in our internal testing we almost never see them. But let 100,000 machines at it in the wild and it starts to show up.

Also know that not all WU errors are due to this problem. Only in the cases where the app ends after it gets suspended.

FYI, if you leave the process suspended in memory, this problem should not occur.
----------------------------------------
Rick Alther
Former World Community Grid Developer
[Jun 15, 2006 12:54:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
depriens
Senior Cruncher
The Netherlands
Joined: Jul 29, 2005
Post Count: 350
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Something going on with WU ex379_2A ??

Many thanks for all the information! cool
----------------------------------------

[Jun 15, 2006 2:19:27 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 22   Pages: 3   [ Previous Page | 1 2 3 ]
[ Jump to Last Post ]
Post new Thread