Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 117
Posts: 117   Pages: 12   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 212227 times and has 116 replies Next Thread
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

verheyde, what Alan is pointing at is, if LAIM is not on and BOINC is paused for whatever reason [user input or high system load e.g.], your tasks unload... it's like a restart when they resume.

Impressions are hard to reproduce in a test environment. We need samples of results that got the invalid, but do not show a restart in the result log. I've yet to see a single MCM go invalid that only has a single wcg_learn_limit = nnnnnnnn log line... over 500 returned.
[Dec 5, 2013 8:05:33 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

This thread seems to have drifted off the topic "Re: Lots of Pending Verification Results", onto a discussion of the recent (still current?) Invalid MCM results that sometimes happen when an MCM WU is re-started.

There is another thread on that topic: MCM: Seeing interspersed Invalid results with...a PVal > PVerification
It contains what AFAIK is the only response on this so far by a WCG tech (armstrdj) here.

It's hard to follow what's happening on this when information is in several threads, so I would prefer it if further discussion on it is continued in that other thread.

-- Hope they sort it out soon, and notify us ASAP so that we don't have to use trial and error to find out whether we can restart MCM WUs without them being declared Invalid. --
[Dec 5, 2013 2:18:50 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Mumak
Senior Cruncher
Joined: Dec 7, 2012
Post Count: 477
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

You're not correct about that. The Pending Verification status is in most cases the first step (after a BOINC restart/checkpoint reinit), which then further results in Invalid status.
----------------------------------------

[Dec 5, 2013 2:24:02 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

The "when this happens with v7.26" is a settled thing in my mind [all possible permutations discussed]... but there's always someone able to insert doubt [for themselves of course]. The patient wait for a next Beta is on now, so we can try and break it every which way possible.
----------------------------------------
[Edit 1 times, last edit by Former Member at Dec 5, 2013 2:32:13 PM]
[Dec 5, 2013 2:30:51 PM]   Link   Report threatening or abusive post: please login first  Go to top 
BobCat13
Senior Cruncher
Joined: Oct 29, 2005
Post Count: 295
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

Impressions are hard to reproduce in a test environment. We need samples of results that got the invalid, but do not show a restart in the result log. I've yet to see a single MCM go invalid that only has a single wcg_learn_limit = nnnnnnnn log line... over 500 returned.

All of the invalids are mine.

MCM1_ 0000355_ 8466_ 2-- 726 Valid 12/4/13 04:28:13 12/4/13 12:58:29 2.38 85.0 / 105.9
MCM1_ 0000355_ 8466_ 1-- 726 Valid 12/3/13 17:38:51 12/4/13 04:27:54 5.71 126.8 / 105.9
MCM1_ 0000355_ 8466_ 0-- 726 Invalid 12/3/13 17:38:47 12/4/13 04:01:20 3.32 95.2 / 53.0

Result Name: MCM1_ 0000355_ 8466_ 0--
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.26_windows_x86_64 -SettingsFile MCM1_0000355_8466.txt -DatabaseFile dataset-17_72_SDG_v1.txt
Initializing
wcg_learn_limit = 500000
Running
[16:56:48]: Computing pass 0
Result.out = 1921942.000000
Run complete, CPU time: 11949.458199
20:15:57 (1260): called boinc_finish

</stderr_txt>
]]>


MCM1_ 0000234_ 6561_ 3-- 726 Valid 12/3/13 23:57:03 12/4/13 13:57:01 2.25 80.6 / 81.6
MCM1_ 0000234_ 6561_ 2-- 726 Invalid 12/3/13 14:18:15 12/3/13 23:56:56 2.47 70.1 / 40.8
MCM1_ 0000234_ 6561_ 1-- 726 Error 11/23/13 14:18:02 12/3/13 14:21:26 0.00 0.0 / 0.0
MCM1_ 0000234_ 6561_ 0-- 726 Valid 11/23/13 14:17:57 11/25/13 04:18:15 2.19 82.5 / 81.6

Result Name: MCM1_ 0000234_ 6561_ 2--
<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.26_windows_x86_64 -SettingsFile MCM1_0000234_6561.txt -DatabaseFile dataset-17_72_SDG_v1.txt
Initializing
wcg_learn_limit = 500000
Running
[16:26:59]: Computing pass 0
Result.out = 2087478.000000
Run complete, CPU time: 8886.971367
18:55:02 (340): called boinc_finish

</stderr_txt>
]]>


MCM1_ 0000330_ 7438_ 2-- 726 Valid 12/2/13 20:10:32 12/4/13 11:25:24 2.73 69.6 / 83.3
MCM1_ 0000330_ 7438_ 0-- 726 Valid 12/1/13 23:36:45 12/2/13 20:10:18 5.16 96.9 / 83.3
MCM1_ 0000330_ 7438_ 1-- 726 Invalid 12/1/13 23:36:37 12/2/13 15:01:24 3.20 76.7 / 41.6

Result Name: MCM1_ 0000330_ 7438_ 1--
core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.26_windows_x86_64 -SettingsFile MCM1_0000330_7438.txt -DatabaseFile dataset-17_72_SDG_v1.txt
Initializing
wcg_learn_limit = 500000
Running
[05:51:45]: Computing pass 0
Result.out = 4029535.000000
Run complete, CPU time: 11528.941903
09:03:51 (2444): called boinc_finish

</stderr_txt>
]]>

That is just 3 of 8 that have gone invalid in the last 3 days. This machine runs 24/7 so the tasks never have to restart from checkpoint. For those invalids my machine produced, some of the valid wingmen have restarts and some don't. All of the Result.out are the same size as the valids. Doesn't seem to be a pattern. Would be nice to see the OS and CPU type of the wingmen.
[Dec 5, 2013 5:15:56 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

Thanks for that. Think the coders have their work cut out... a large cross-correlation analysis on the work we have computed so far... tasks running through smooth, no hick-ups.

OC? Got over 550 done, yet no such restart-less invalids for me, W8-64, W7-64, Linux-64, W7-32, all Intel stock speed.
[Dec 5, 2013 5:22:28 PM]   Link   Report threatening or abusive post: please login first  Go to top 
verheyde
Cruncher
Belgium
Joined: Dec 7, 2004
Post Count: 25
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

Yes, LAIM is on (and has been for so long that I forget when I activated it). And I know impressions are not a scientific way of reporting...
In the mean time I haven't seen invalids today...
[Dec 5, 2013 6:32:16 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Hans-Martin
Cruncher
Germany
Joined: Nov 29, 2013
Post Count: 10
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

I've also experienced one invalid result (and one currently pending verification, which I fear will turn out as invalid, too). These are very few data points, as I'm a newbie with a lowly i3/2100 processor there's not more to be expected at this time.
However, I'll follow this closely. To me it seems plausible that interrupted WUs might occasionally get invalid results, it probably depends on the storage of intermediate results.
I haven't looked at how BOINC WUs are interrupted and restarted; if the storage of intermediate results involves some manual code it's quite conceivable that there can be cases in which it does not work correctly.
At the moment I wouldn't blame the CPU - mine is running at standard clock speed, no messing around with clock or voltage...

Hans-Martin
[Dec 5, 2013 8:07:58 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Rickjb
Veteran Cruncher
Australia
Joined: Sep 17, 2006
Post Count: 666
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

@Mumak, who says: "You're not correct about that. The Pending Verification status is in most cases the first step (after a BOINC restart/checkpoint reinit), which then further results in Invalid status."
Facts:
1. In my Results Status pages I currently have 106 MCM results that are Pending Validation and 0 (zero) that have been declared Invalid.
2. I previously had some MCMs declared Invalid, but they have now scrolled out of the WCG website system. AFAIK, they all came from computer restarts and their result logs had multiple "commandline" entries. The total number of these invalids was more like 10, and was only a small fraction of my MCM throughput that went through the PV stage at the time.
3. I can't see the word "invalid" in the title of this thread ("Re: Lots of Pending Verification Results"). Can you, or anyone else?
4. I can see the word "invalid" in the title of the other thread to which I attempted to redirect the current discussion here of some MCM results being declared invalid when it would seem that they should have been valid ("Re: MCM: Seeing interspersed Invalid results with 7.26, passing via Pending Validation ").
5. WCG tech armstrdj posted his acknowledgement of the matter to that other thread, not to here.
6. You posted to the "interspersed Invalid results" thread: "I believe this is still the same issue with restoring checkpoints (after a machine restart, or BOINC exit) discussed in other threads."
---
Opinions:
1. I do not expect any of my current 106 MCM WUs that are PV to be declared invalid, because AFAIK none of the MCM WUs that I have crunched recently have undergone a BOINC restart, except for 1 that I tested by suspending and resuming it with LAIM temporarily disabled. That test WU was declared valid. If my expectation is realised, the odds will be at least 106-0 against your assertion.
2. It seems that my suggestion of transferring this discussion to the "interspersed Invalid results" thread has been overruled, so I think it would be a good idea if you were to edit your post in that thread and make it more specific and convenient for readers to follow discussion on the problem by inserting links to this thread and any others where the problem is being discussed. Thanks.
----------------------------------------
[Edit 2 times, last edit by Rickjb at Dec 6, 2013 11:37:30 AM]
[Dec 6, 2013 11:29:41 AM]   Link   Report threatening or abusive post: please login first  Go to top 
cjslman
Master Cruncher
Mexico
Joined: Nov 23, 2004
Post Count: 2082
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Lots of Pending Verification Results

Over the last 2 days I've had 3 invalids (which is frustrating because I don't churn out a lot of MCM WUs). What I did notice about these invalid MCM WUs is that all of them are quorum 2, replication 3. As for what caused the invalid part, not sure (I do have to turn off the computer at least once a day).
Question: has there been an official statement from the techs that this is being investigated ? (I did a quick scan of this thread and didn't see anything). I'm sure it is, but it is always more assuring if it's in print biggrin

Thanks,

Crunching for a better world...
----------------------------------------
I follow the Gimli philosophy: "Keep breathing. That's the key. Breathe."
Join The Cahuamos Team


[Dec 6, 2013 12:16:04 PM]   Link   Report threatening or abusive post: please login first  Go to top 
Posts: 117   Pages: 12   [ Previous Page | 3 4 5 6 7 8 9 10 11 12 | Next Page ]
[ Jump to Last Post ]
Post new Thread