Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 20
|
![]() |
Author |
|
gj82854
Advanced Cruncher Joined: Sep 26, 2022 Post Count: 104 Status: Offline Project Badges: ![]() ![]() |
Are there any errors showing up on version 6 of the kernel or later releases of libc? All the errors here seem to be on Version 5 of the Linux kernel and older libc releases.
|
||
|
MarkH
Advanced Cruncher United States of America Joined: May 16, 2020 Post Count: 56 Status: Offline Project Badges: ![]() ![]() ![]() ![]() |
Hello all. For what it's worth, I've had at least two ARP units end with "Computation Error" in the last day or so. MCM units remain unaffected. I only run 1 ARP unit at a time, so I can't show a long-term trend myself. Maybe the logs would tell someone with more knowledge something.
----------------------------------------
"That science of the people, by the people, for the people, shall not perish from the Earth."
|
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2160 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
What did your wingmen do with their ARP1-tasks, Mark, did they turn in a valid result? And in what generation were the workunits?
Adri |
||
|
MarkH
Advanced Cruncher United States of America Joined: May 16, 2020 Post Count: 56 Status: Offline Project Badges: ![]() ![]() ![]() ![]() |
Hello, Adri.
----------------------------------------I'm sorry I don't understand what you mean by "wingmen", so I cannot answer your first question. (I'm not technically adept in the WCG infrastructure). Are those the multiple copies being run (xxx_1, xxx_2, etc.)? Here are the ARP WU's that failed with "Computation Error": https://www.worldcommunitygrid.org/contribution/workunit/660721703 ARP1_0018981_136_1 https://www.worldcommunitygrid.org/contribution/workunit/655572704 ARP1_0029596_133_3
"That science of the people, by the people, for the people, shall not perish from the Earth."
|
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2160 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Mark, I will explain.
Thanks for supplying the link to the workunits, this makes it so much easier! If you look at the URL www.worldcommunitygrid.org/contribution/workunit/660721703, you'll notice the last three parts: contribution / workunit / 660721703, where 660721703 is the numerical ID of a workunit. If you inspect a workunit, you'll see - in the first case of workunit 660721703 - this, essentially: ARP1_0018981_136 [*1] Result name [*2] OS type Status Sent time Due / Return time CPUtime/Elapsed (*1) This is the name of the workunit. (*2) This lists the tasks (or results) that make up the workunit. You can see that this workunit consists of three tasks, of which each name is beginning with the name of the workunit (see *1). Each task has its own suffix: _0, _1 and _2. Since your computer (or client) ran the task with suffix _1 (ARP1_0018981_136_1 in full), the other tasks are each running on somebody else's client; they are your wingmen. Since (both) your wingmen (not running the task with suffix _1) in workunit 660721703 haven't returned their results at this moment, we have but to wait what they will deliver. Adri |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2160 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Mark,
Looking at the error log of your task, the error seems to revolve around the words "Access Violation (0xc0000005) at address 0x02759D59". I've found this post (#619575) in which "Access Violation" is 'translated' into the Linux universe wording "Segmentation violation". We'll see what the other clients (your wingmen) are saying … Adri |
||
|
Sgt.Joe
Ace Cruncher USA Joined: Jul 4, 2006 Post Count: 7665 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
A "segmentation violation" is an indication of some type of memory problem. It is not very specific.
----------------------------------------Cheers
Sgt. Joe
*Minnesota Crunchers* |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2160 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Alright!
----------------------------------------Mark, we have a Pending Validation now: Result name OS type Status Sent time Due / Return time CPUtime/Elapsed Claimed/Granted This would in general mean your Error has nothing to do with the workunit (ARP1_0018981_136), because one wingman of yours (the one with suffix _3) returned a result that wasn't in error, so there's a strong indication that it has something to do with (the configuration of) your computer, I'm afraid. Adri PS For the sake of completeness, three minutes ago three tasks from the workunit were marked 'Valid': Result name OS type Status Sent time Due / Return time CPUtime/Elapsed Claimed/Granted [Edit 1 times, last edit by adriverhoef at Feb 11, 2025 3:36:48 PM] |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1320 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It also happens to me very rarely that an ARP1-task ran into an error during runtime.
----------------------------------------Example https://www.worldcommunitygrid.org/contribution/workunit/657952170 https://www.worldcommunitygrid.org/contribution/workunit/657429818 I also suppose this is caused to memory access failures. Probably software developed and tested on a stanalone system with one single task. As long as t happens not too ofyen, nothing to worry about. [Edit 1 times, last edit by Crystal Pellet at Feb 10, 2025 3:33:02 PM] |
||
|
adriverhoef
Master Cruncher The Netherlands Joined: Apr 3, 2009 Post Count: 2160 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi Crystal Pellet,
Just noticed that one of your wingmen returned their result much too late and I suspect that they will get a nasty surprise when they look into their Results: Result name OS type Status Sent time Due / Return time CPUtime/Elapsed Claimed/Granted Adri |
||
|
|
![]() |