Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 20
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This happened about last night. Not seen before, not reported before by anyone, not found in a web search query, and only noticed because there was a red log entry in the boinctasks history with status note "Reported: Computation error (194,)". Is there a trend with the fahv output files also suffering a too big output file?
Result Name: MCM1_ 0005617_ 6841_ 3-- The (194,) error is not logged in the result output<core_client_version>7.3.18</core_client_version> <![CDATA[ <message> finish file present too long </message> <stderr_txt> Commandline = projects/www.worldcommunitygrid.org/wcgrid_mcm1_7.32_windows_x86_64 -SettingsFile MCM1_0005617_6841.txt -DatabaseFile dataset-17_72_SDG_v1.txt Settings File DateOfDesign = 08/05/2014 Designer = PMCC_OCI_0.1 WorkOrderID = 0005617_6841 DatasetID = 17_72_SDG_v1 NumberOfGenesInStartingSignature = 25 NumberOfGenesInSignatureMin = 25 NumberOfGenesInSignatureMax = 25 GroupVectorValues = {A}{B}{C}{D}{E}{F} ExplicitStartingGeneSignatures = A B D F StartingGeneSignatureAlgorithm = randomFixedLengthSearch SearchAlgorithmNumberToCreate = 10330 SearchAlgorithmSequentialStartPosition = 5 RunPermutationAlgorithm = 0 PermutationGroups = A PermutationGroupsForReplacement = G PermutationAlgorithm = replaceFromRandomlyToRandomlyGreedy PermutationsNumIterations = 0 OptimizationAlgorithmFrequency = 0 0 1 FBeta = 1.5 SimAnnealIMax = 20000 SimAnnealAlpha = 0.9996 FitnessFn = 0 MinFitness = 0.37 NReps = 10 TrainFrac = 0.7 NFolds = 10 VMethod = LOO ModelType = SVM SvmArgs = "-v 0 -c 0.1 -t 1 -d 2 -r 0" SvmLearnLimit = 400000 RSeed = 12486842 [03:21:17] Initializing [03:21:28] Running [03:21:28] EvaluateFitnessOfStartingGeneSignatures 10330 [10:28:56] Writing final output [10:28:56] Closing Output Stream [10:28:56] Cleaning up Result.out = 359360.000000 Run complete, CPU time: 24955.838772 10:28:57 (8104): called boinc_finish </stderr_txt> ]]> |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I am not sure what is going on with this one but am investigating. Let me know if you see anymore of these.
Thanks, armstrdj |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
I tested the workunit standalone and found no issues. In looking through the client code it looks like this condition occurs when the client finds that the boinc finish file has been written to disk but the science application process is still running. Since the finish file was written then there must be a hang in boinc_finish somewhere. Or it could be a bug or race condition in the client causing a false positive.
Have you upgraded your client version recently? Let me know if you continue to see these errors. Thanks, armstrdj |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The result log indicated 7.3.18 has been on there for as long as that version has been released for testing. Guess since install a couple of thousand of these mcms, running this exclusively, must have passed through. Since this happened, many hundreds were validated, to include the 354 from just yesterday, on a i7-2670 mobile with w7. Suppose your code analysis indicates it's not a 'too long' or 'too big' situation. Given it's one in 120 million results so far, not a concern, yet.
![]() |
||
|
armstrdj
Former World Community Grid Tech Joined: Oct 21, 2004 Post Count: 695 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
We did an analysis of results coming back and it is a very rare occurrence. It is likely a timing bug in the client. If the rate increases we will investigate further.
Thanks, armstrdj |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
This check-in may be addressing the situation [not sure]
David Anderson [Wed, 17 Jun 2015 21:14:54 +0000] client: fix bug that caused delay in job cleanup If a job has an output file with <copy_file> and <optional>, and it doesn't create the file, then the call to boinc_rename() (to move it to the project dir) fails, and we back off and retry. Solution: in boinc_rename(), if the rename fails, check if the file exists, and if it doesn't then don't retry. Also: - when writing client messages, use the actual current time (dtime()) rather than client_state.now. - write log msgs when output file renames fail This would then be in 7.6 or even 7.8, depending if it's lock down with 7.6.2 in kind of close to public testing phase. Anyway yoro42 had seemingly a whole series of these, and recommended to boot. See https://secure.worldcommunitygrid.org/forums/wcg/viewthread_thread,38132 |
||
|
Crystal Pellet
Veteran Cruncher Joined: May 21, 2008 Post Count: 1322 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
This would then be in 7.6 or even 7.8, depending if it's lock down with 7.6.2 in kind of close to public testing phase. Implemented in BOINC client test executable version 7.7.0 |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Yeah, but was not going to advertise alpha releases [uneven subs are].
A funny check-in made it in today too... if ever you were pulling hair of not getting any work at start with a new install: David Anderson [Sun, 21 Jun 2015 07:40:01 +0000] client: allow initial scheduler request to request N instances. I made a change on 27 Feb 2009 that set the initial request to 0 instances. I'm not sure what the rationale was - the checkin note didn't say. Document, document, document, this way you don't have to remember ;) |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, the next message reads like it will be a longer time before a 7.8 comes to fruition, as the NSF has ended the core 3 head development funding.
from: โ
David Anderson to: Boinc Projects <boinc_projects@ssl.berkeley.edu> โ BOINC Developers Mailing List <boinc_dev@ssl.berkeley.edu> โ BOINC Alpha list <boinc_alpha@ssl.berkeley.edu> boinc_admin@googlegroups.com date: Fri, 03 Jul 2015 10:42:13 -0700 subject: [boinc_alpha] BOINC governance changes BOINC's funding from the U.S. National Science Foundation has ended, at least for the time being. This funding supported me, Rom Walton, and Charlie Fenton. We're now working on other things, although we'll stay involved in BOINC at some level. The BOINC project will continue, and will be run according to a community-based model rather than centrally. In essence, the people who contribute to BOINC now make the decisions about it. This model is summarized here: http://boinc.berkeley.edu/trac/wiki/ProjectGovernance and described in detail here: https://docs.google.com/document/d/1C6pU5RqidYBxk9oyAevm1yH1tn4Hw27oM8YpvsnR-gg There will probably be little visible change. The BOINC software will continue to work. The translation system, Alpha testing project, BOINC web site, message boards, and email lists will continue to operate. However, any new development and major bug fixes to BOINC will need to be done by volunteer programmers. I'm confident that the BOINC community will meet the challenge. I welcome your feedback. Please post it to boinc_admin@googlegroups.com, a new email list for discussions about the BOINC project as a whole. -- David _______________________________________________ boinc_alpha mailing list |
||
|
cjslman
Master Cruncher Mexico Joined: Nov 23, 2004 Post Count: 2082 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Thanks Rob for the info above... Interesting, USA has money to sustain an army that monkeys around and makes messes in other countries, but doesn't have money to sustain 3 salaries for a software that saves lives.
----------------------------------------![]() ![]() CJSL Crunching for a brighter future... |
||
|
|
![]() |