Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 52
Posts: 52   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 51824 times and has 51 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache of DDDT2 on 2 machines!

I had also a bunch of errors from several machines. It seems to me that it happened when the current in-progress-WU finished and the cached WUs are the old version.

One machine (my slowest) finished a longer running HPF2 WU at 11:14 that validated OK. 3 cached HCC WUs errored out immediately with this error:
<core_client_version>6.2.28</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcg_hcc1_img_6.08_windows_intelx86</file_name>
<error_code>-120</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

</message>
]]>

Cache was filled with new WUs (4 HCC), but I don't have access to that machine atm, and there are no wingman results yet to tell about the app version.

My faster machine (2 core) reported 3 WUs (1 HCC, 2 HPF2) at 03:19 which are valid (1 HPF2 and HCC) resp. PV (1 HPF2). All these old versions. At the same time the machine received 1 HCC 6.40 unit.
At 06:18 another HCC with old version was reported that is now in PV. At the same time 18 WUs (3 C4CW, 8 HPF2, 7 HCC) were dumped with errors. Cache was refilled at that time with 11 HPF2 and 3 HCC (all version 6.40). From the fresh downloads 2 HCC and 2 HPF2 are already in PV. (= finished without error)

One other machine (also a slow one) is still crunching on a long running HPF2 WU, and still has 1 HPF2, 1 HCC and 1 C4CW with old versions in cache. I will watch what happens when the current WU finishes. I expect that the cached units are then dumped and the cache refilled.

All machines (except the first one listed above) are running Boinc 6.10.58 on Windows 32-bit (XP for the first 2 listed above, Win7 for the last one)

HTH to understand what the reason is for this.

Greetings
Thorsten
[Mar 1, 2011 5:56:40 PM]   Link   Report threatening or abusive post: please login first  Go to top 
PMH_UK
Veteran Cruncher
UK
Joined: Apr 26, 2007
Post Count: 769
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache of DDDT2 on 2 machines!

All appears OK now, not sure why my 3 Linux 64 and other Win PCs did not get hit (yet).


Spoke too soon - another 2 windows PCs have errored.
Caches re-filled.

Linux OK so far...

Paul.
----------------------------------------
Paul.
[Mar 1, 2011 8:49:38 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache of DDDT2 on 2 machines!

I also lost several WUs in 5 different machines. 3 of them only run DDDT2 and the other 2 only run HCMD2. Most WU ended in error with 0 CPU Time, but some of them finished, for example:

CMD2_ 1544-1JWY_ A.clustersOccur-3BRW_ B.clustersOccur_ 38_ 91610_ 92351_ 0-- 615 Error 2/27/11 23:07:58 3/1/11 16:36:50 9.63 48.9 / 0.0 worried

Log:

Result Name: CMD2_ 1544-1JWY_ A.clustersOccur-3BRW_ B.clustersOccur_ 38_ 91610_ 92351_ 0--
<core_client_version>6.2.28</core_client_version>
<![CDATA[
<stderr_txt>
INFO: No state to restore. Start from the beginning.
called boinc_finish

</stderr_txt>
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcg_hcmd2_maxdo_6.15_windows_intelx86</file_name>
<error_code>-120</error_code>
<error_message>signature verification failed</error_message>
</file_xfer_error>

</message>
]]>

[Mar 1, 2011 10:36:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
keithhenry
Ace Cruncher
Senile old farts of the world ....uh.....uh..... nevermind
Joined: Nov 18, 2004
Post Count: 18665
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache of DDDT2 on 2 machines!

I just reset the max_results_day setting for users who likely experienced this issue. You should be able to force an update and rebuild your cache.

We apologize that this has impacted some users. We did not see this in the test cases we ran prior to performing this change. The client should have completed the old workunits with the old application versions without disruption (the client checks the signature when the application is downloaded).

At this point the most critical thing that we need to know is if there is anyone who experienced this issue who is not able to start running the new 6.40 versions as they are downloaded. You should be able to use the 'update project' button to fetch new work. Please try this now if you were limited after the workunits crashed.

Then please post if you experienced the issue and then let us know if
A) You cannot run the 6.40 versions
B) You are able to run the 6.40 versions correctly

thanks - and we apologize for the issues.


Kevin, had this hit me overnight as well. The signature verification error was the first thing I saw in Boinc's messages. Then the messages about files missing and everything after that got an immediate computation error. Since this started about 2AM on my machine, I suspect that, while you could have replaced the key file late in the day yesterday, it probably wasn't at 1AM or 2AM in the morning. I'm wondering if that just happened to be the first time my machine asked to download/upload files and happened to get the new key file then. Is this in a separate file by itself or in another file with other data? Pure speculation on my part is that it's in a file that doesn't have a unique-per-version-name and it got overlaid at that time. Anything after that would fail on the signature verification until you had gone through your pre-6.40 cache. When I found this this morning, I could only download one WU per core at first. Once those initial WUs completed and were returned, I was able to refill my cache. Just some thoughts that hopefully help.
----------------------------------------
Join/Website/IMODB



[Mar 2, 2011 1:29:43 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache of DDDT2 on 2 machines!

... did you have anything unusual set up on your machines other than a large cache?

Just to note that it happens with any non-zero cache, it's just less obvious. In the short cache scenario, the older cached WU fails and is replaced with the newer cached 6.40 WU in less than a second, so I'm guessing most users with a short cache haven't even noticed.

e.g.
01-Mar-2011 20:41:41 [World Community Grid] Starting CMD2_1551-2QZU_A.clustersOccur-2RC4_A.clustersOccur_7_1
01-Mar-2011 20:41:41 [World Community Grid] [error] Signature verification failed for wcg_hcmd2_maxdo_6.15_i686-pc-linux-gnu
01-Mar-2011 20:41:41 [World Community Grid] Starting CMD2_1549-2QZU_A.clustersOccur-2Q7N_A.clustersOccur_15_64357_65107_64618_65107_0
01-Mar-2011 20:41:41 [World Community Grid] Starting task CMD2_1549-2QZU_A.clustersOccur-2Q7N_A.clustersOccur_15_64357_65107_64618_65107_0 using hcmd2 version 640
01-Mar-2011 20:41:42 [World Community Grid] Computation for task CMD2_1551-2QZU_A.clustersOccur-2RC4_A.clustersOccur_7_1 finished
01-Mar-2011 20:41:42 [World Community Grid] Output file CMD2_1551-2QZU_A.clustersOccur-2RC4_A.clustersOccur_7_1_0 for task CMD2_1551-2QZU_A.clustersOccur-2RC4_A.clustersOccur_7_1 absent

If you check how many WUs have dumped with 0 elapsed time and error -120 or -163, I guess you'll find huge numbers.
[Mar 2, 2011 2:41:52 AM]   Link   Report threatening or abusive post: please login first  Go to top 
KWSN - A Shrubbery
Master Cruncher
Joined: Jan 8, 2006
Post Count: 1585
Status: Offline
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache of DDDT2 on 2 machines!

I haven't seen this on any of my machines. I've got 8 running and they all have different OS, processors, cache settings, and speeds.

Yes, I've been seeing a lot of repair units, but I would expect more if this were a universal problem.
----------------------------------------

Distributed computing volunteer since September 27, 2000
[Mar 2, 2011 5:22:21 AM]   Link   Report threatening or abusive post: please login first  Go to top 
anhhai
Veteran Cruncher
Joined: Mar 22, 2005
Post Count: 839
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache of DDDT2 on 2 machines!

I agree with KWSN - A Shrubbery. I have lots of machine running and none have this problem. Is this problem only with DDDT2? because I am not running that at this time. Is this going to be a continuing problem until all of the old version get cleared out? Or is it more of a time thing?
----------------------------------------

[Mar 2, 2011 5:42:50 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache of DDDT2 on 2 machines!

I run BOINC 6.10.58 x64 across all my machines.

Machine #4 errored out it's queue early this morning (16 hours ago). This one was crunching HFCC. This problem seems to be completely random and independant of hardware or BOINC version. I have 2 machines left in the "farm" that didn't have any problems(yet). Both crunching HFCC and they have the exact same hardware, BOINC client, OS and internet connection as the one described above. The machines described in my initial post have different processors but everything else is the same.

My 2 home computers seem to be unaffected crunching DDDT2 with 6.10.58 and Win 7 x64.
[Mar 2, 2011 5:51:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache of DDDT2 on 2 machines!

I run BOINC 6.10.56 on all my machines.

I just noticed my WUs for one machine were all terminating with Error and it's not restricted to DDDT2. For example, from Results Status, is the following:

Result Name Device Name Status Sent Time Time Due /
Return Time CPU Time (hours) Claimed/ Granted BOINC Credit
ts01_ b137_ pr02a0_ 1-- DAVETHOMPSON-PC Error 2/27/11 02:21:06 3/2/11 03:15:53 0.00 0.0 / 0.0
ts01_ b129_ pr45b1_ 1-- DAVETHOMPSON-PC Error 2/26/11 23:15:27 3/2/11 03:15:53 0.00 0.0 / 0.0
ts01_ b124_ pca009_ 0-- DAVETHOMPSON-PC Error 2/26/11 21:55:59 3/2/11 03:15:53 0.00 0.0 / 0.0


This is on four pages of the Results Status for a total of 60 WUs in Error. All of the WUs are for DDDT2 except one which is for CEP2. One of the first Error returns is the following:

Result Log

Result Name: ts01_ b096_ pr45b1_ 0--
<core_client_version>6.10.18</core_client_version>
<![CDATA[
<message>
app_version download error: couldn't get input files:
<file_xfer_error>
<file_name>wcg_dddt2_charmm_6.17_windows_intelx86</file_name>
<error_code>-120</error_code>
<error_message>signature verification error</error_message>
</file_xfer_error>

</message>
]]>


This does not appear for most of the other Errors. I exited BOINC on that one machine and restarted; things seem to be running normally.
----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 2, 2011 6:58:29 AM]
[Mar 2, 2011 6:46:38 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Just lost my entire cache on 2 machines!

Suggest to edit the OP title and remove the DDDT2 part it rpofing to be not science specific and the admin moving this thread to the BOINC Support forum where it's likely seen by more members. One member was correlating this to running Betas recently.

thanks.
[Mar 2, 2011 7:05:51 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 52   Pages: 6   [ Previous Page | 1 2 3 4 5 6 | Next Page ]
[ Jump to Last Post ]
Post new Thread