Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 52
|
![]() |
Author |
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
Thanks for putting it in perspective eso. Glad I have BOINCview to filter on apps and versions in the completed tab and work history to include the error codes it's recorded. Never knew i had a '197' case weeks ago
----------------------------------------![]() Added: Noone is in the league of Uplinger's "Badge Show-Off" department though. Black market item or Ebay .... with a green fringe they have to talk to BP ![]()
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jan 24, 2008 8:17:47 PM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
hi h.hett, this thread is/was reporting jobs that went out with a bang and errors. so is yours the one of the 'too late' case or the fail? Hallo Sekerob, thank you, one from my Team Science and Research Hessen will translate for me. After this I can answer you. For the first I can tell you, it was not my first WU from AC@H. The other 5 or 6 Wus are all complete and ok. Only this Wu is incorrect. It`s not only me with " too Late" on this WU ach1_14_72_27. No one was "in Time" by this WU. Ok Sorry,sorry for my english and I can you answer in a better form next time. ![]() Maybe this is the wrong tread for this. Then Delete my post. With friendly regards Horst [Edit 3 times, last edit by Former Member at Feb 15, 2008 9:53:16 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
okay. Wenn du möchtest könntest du es auch auf Deutsch sagen. Bin leider bisschen aus der Übung und der Rechtschreibingskontrolle funktioniert momentan nicht. Deswegen manchmal 'ue' oder 'oe' statt umlaut (geht nicht so gut auf italienische Tastatur).
----------------------------------------Bis später. @Eso, Dont know but it is Murphy ruling again. Just when looking afterwards, one job showed invalid on Result Status page and the log said it had an heartbeat issue. The actual client log (of the headless machine) printed the 'miss reported' dll load error. It could be a matter of robustness, as the job stupidly continued for another 10 hours after that event. I'd preferred for it to go sailing into the abort bin. Have had more of these dll load errors, but the result turning out fine. Anyway, a subsequent job validated and now that WCG found the box with a < 24 hour return, it keeps offering these jobs up even before finishing. Must be my lucky week.... #9 is being readied. ttyl PS, the event was reconstructible.... had a bad LogMeIn session to that machine that had to be killed, right when it recorded the heartbeat issue. Root cause.... ?BOINC RPC and other comms not running on separate threads?
WCG
----------------------------------------Please help to make the Forums an enjoyable experience for All! [Edit 1 times, last edit by Sekerob at Jan 25, 2008 11:37:35 AM] |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
On my previous post, the last one also had a heartbeat issue and happily on for 20:48 hours. Wonder why this machine does heartbeat issues.... 5.10.38 maybe? Anyway, i'd prefer for the job to die if it cant recover properly from the event. My ordeal will last until the Validation is attempted which will be a few days from now.
----------------------------------------![]() Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/7/19::0:0:0 1 No heartbeat from core client for 31 sec - exiting Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/7/22::6:0:0 1 </stderr_txt> ]]>
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Bin leider bisschen aus der Übung Vielen Dank Sekerob. das ist doch sehr gut geschrieben. Ich bin 59 Jahre alt und kann kein bischen italienisch. ![]() ![]() |
||
|
darth_vader
Veteran Cruncher A galaxy far, far away... Joined: Jul 13, 2005 Post Count: 514 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It looks like this problem is still around.
Boinc Log: 2/13/2008 6:52:02 PM|World Community Grid|Computation for task ach1_21_76_4 finished 2/13/2008 6:52:02 PM|World Community Grid|Output file ach1_21_76_4_0 for task ach1_21_76_4 absent 2/13/2008 6:52:02 PM|World Community Grid|Output file ach1_21_76_4_1 for task ach1_21_76_4 absent 2/13/2008 6:52:02 PM|World Community Grid|Output file ach1_21_76_4_2 for task ach1_21_76_4 absent 2/13/2008 6:52:02 PM|World Community Grid|Output file ach1_21_76_4_3 for task ach1_21_76_4 absent Result Log: <core_client_version>5.10.30</core_client_version> <![CDATA[ <message> - exit code 95 (0x5f) </message> <stderr_txt> Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version INFO: No state to restore from. Starting from beginning Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/9/24::0:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/9/24::12:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/9/29::6:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/5::6:0:0 1 Exception: Access Violation At line 296 of file wrf_io.f Traceback: not available, compile with -ftrace=frame or -ftrace=full </stderr_txt> Unlike some of the other instances reported, there is no failure reported in the BOINC log other than the output file missing problem. -D |
||
|
stwainer
Advanced Cruncher Joined: Nov 21, 2005 Post Count: 128 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Looks like this is the place to report these errors:
----------------------------------------ach1_19_67 Result Log <core_client_version>5.8.16</core_client_version> <![CDATA[ <stderr_txt> Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version INFO: No state to restore from. Starting from beginning Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/8/27::0:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/8/30::6:0:0 1 </stderr_txt> and ach1_21_54 <core_client_version>5.8.16</core_client_version> <![CDATA[ <stderr_txt> Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version INFO: No state to restore from. Starting from beginning Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/9/24::0:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/9/26::18:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/9/30::6:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/9/30::18:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/1::0:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/1::0:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/1::6:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/3::12:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/5::6:0:0 1 Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version Restarting WRF Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/10/7::0:0:0 1 </stderr_txt> ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Here are a couple more:
ach1_ 21_ 70_ 13-- 8.72 hours ach1_ 21_ 47_ 10-- 6.99 hours <core_client_version>5.10.30</core_client_version> <![CDATA[ <message> - exit code 95 (0x5f) </message> <stderr_txt> Failed to get VersionInfo size: 1812 World Community Grid ACAH (projects/www.worldcommunitygrid.org/wcg_acah_wrf_5.14_windows_intelx86) version INFO: No state to restore from. Starting from beginning Start_year/Start_Month/Start_Day::Start_Hour:Start_Minute:Start_Second Restart2003/9/24::0:0:0 1 Exception: Access Violation At line 296 of file wrf_io.f Traceback: not available, compile with -ftrace=frame or -ftrace=full </stderr_txt> ]]> |
||
|
Sekerob
Ace Cruncher Joined: Jul 24, 2005 Post Count: 20043 Status: Offline |
esoteric17, This
----------------------------------------Exception: Access Violation would make me dig into the file/directory ownerships and security software permissions and exceptions. A new dig, forgetting what we did / discussed in the past to discover why it is you are having them. How low is the percentage now of these failings?
WCG
Please help to make the Forums an enjoyable experience for All! |
||
|
darth_vader
Veteran Cruncher A galaxy far, far away... Joined: Jul 13, 2005 Post Count: 514 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
How low is the percentage now of these failings? I my case (see earlier post), since I don't see ACAH work units very often, this failure is 1 out of 4 so far this year and all the more annoying because it appears the WU was just about done when this happened: World Community Grid|Computation for task ach1_21_76_4 finished At a minimum, the code should be modified to indicate which file had the access violation. The user under which the WU was being run has full access, so it's hard to see what would cause the violation unless the code was trying to write a read-only file. -D |
||
|
|
![]() |