World Community Grid - View Thread - no work units, no info here, nothing on Facebook page

World Community Grid Forums

Category: Support

Forum: Website Support

Thread: no work units, no info here, nothing on Facebook page - AGAIN

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 36

[ ]

Author

This topic has been viewed 17264 times and has 35 replies

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 986
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

14 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

1 year badge for The Clean Energy Project - Phase 2

180 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

14 day badge for Computing for Sustainable Water

50 year badge for Mapping Cancer Markers

2 year badge for Uncovering Genome Mysteries

5 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

10 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: no work units, no info here, nothing on Facebook page - AGAIN

2023-01-16:
From 07:40:04 UTC to 13:32:32 UTC I've only been getting re-sends (_2 and _3) of MCM1, 121 tasks in total.

Hmmm... Now, let's see: about a week ago the amount of MCM1 work being generated was increased, and about 6 days later there's a substantial tranche of retries. I wonder why that might be :-)

[Edit: I notice Unixchick referenced the relevant News post before I got my reply in!]

Of course, it won't all be down to users with queue sizes that may seem excessive (by accident or by design), but when there's more work about those systems may well collect more of it... When the amount of work available is lower, I see far fewer deadline-related retries (my own or those going to wingmen), but that may be coincidental. And, of course, if there are download issues there will be more retries because of download errors, but I've not seen any of those amongst my wingmen for quite a while now...

Cheers - Al.

P.S. imagine what the retry scenario is going to be like when ARP1 starts up properly again :-)

----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jan 16, 2023 4:02:29 PM]

[Jan 16, 2023 3:59:04 PM]

Sgt.Joe
Ace Cruncher
USA
Joined: Jul 4, 2006
Post Count: 7697
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

14 day badge for Help Cure Muscular Dystrophy

2 year badge for Discovering Dengue Drugs - Together

2 year badge for Nutritious Rice for the World

14 day badge for The Clean Energy Project

10 year badge for Help Fight Childhood Cancer

90 day badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

45 day badge for Discovering Dengue Drugs - Together - Phase 2

2 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

5 year badge for Drug Search for Leishmaniasis

5 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

200 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

100 year badge for Smash Childhood Cancer

2 year badge for Africa Rainfall Project

100 year badge for OpenPandemics - COVID-19


Re: no work units, no info here, nothing on Facebook page - AGAIN

I see I got a rash of _2and _3's. I looked through them and most of them were the result of an error. There was a couple of "no reply" and a couple of "server aborted" ones. The "server aborted" ones were when somebody ran past the time limit but did finish the unit before my system had a chance to start the unit. So on these, there was no time accrued by my system, but somebody else probably has a queue which is just full enough to miss a deadline, but not by much.
With the disparities in the times of the units between the LOO and NFCV types, I can understand why BOINC would overload a queue. If the system got mostly the shorter units for while the queue size would accumulate enough to satisfy the parameters set by the user. When a group of the longer running units came into the queue, the amount of work to be done could almost double, resulting in some missed deadlines.
A possible solution to this could be to separate the two types and allow the user to specify which type they would like to have. This would provide more uniformity in the run times and virtually eliminate the problem of inadvertent overloaded queues.
Another possible solution would be to tweak the units to have more uniformity in runtimes between the two types. But that would probably involve re-writing some code so it might not be worth it.
At any rate, as long as the systems don't waste any time on running some extra units when they are unnecessary, maybe nothing further needs to be done.

Cheers

----------------------------------------

Sgt. Joe
*Minnesota Crunchers*

[Jan 16, 2023 4:39:24 PM]

BobbyB
Veteran Cruncher
Canada
Joined: Apr 25, 2020
Post Count: 609
Status: Offline
Project Badges:

100 year badge for Mapping Cancer Markers

2 year badge for Microbiome Immunity Project


Re: no work units, no info here, nothing on Facebook page - AGAIN

users with queue sizes that may seem excessive

define excessive.

My slower 4-core cpu has about 15 hours in the queue. I determined 15 hours by 16 WUs at 3.5 hours each. I presume this is NOT excessive

And for my own info where would I see re-send data.

[Jan 16, 2023 4:44:49 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 986
Status: Offline
Project Badges:


Re: no work units, no info here, nothing on Facebook page - AGAIN

users with queue sizes that may seem excessive

If you always return work well before deadlines your queue sizes aren't excessive :-) -- I was thinking of systems that (for one reason or another) have queue sizes that don't leave much scope for issues such as (unplanned?) client down-time, tasks that take a lot longer than expected, or other reasons that a client struggles with deadlines.

For what it's worth, I also run about 16 hours on my largest machines (which aren't exactly powerhouses...) and much lower on my laptop and other "smaller" systems which don't ever run bigger tasks such as WCG's ARP1. Limits are tuned to try to ensure I always return WCG stuff within 24 hours of receipt (unless there's a system problem), so the ability to tune download quantities via the profile is a blessing!

As for re-send data, if one only wants the occasional information it's all in the results section of the web site, but there's a lot of [tedious] clicking involved! So some users make use of a WCG-provided API to collect information about individual tasks they have processed.

There was a version of the API from before the web-site overhaul, but it was not very flexible. With the web-site changes came a newer API which allows collection of work-unit data as well as information about one's own results (akin to what the web site provides), so it's possible to see which wingmen failed and why...

I've got some Python scripts that do this, specific to my needs. User adriverhoef has a useful general application (which may or may not be Linux only - not sure about that). There are various other ways of getting at the information, including a web-page based trial interface to the new API. I found out what I needed to know by forum searches (much to my surprise!), but I suspect that trying to describe it in full detail would run over some limit on message size :-)

Cheers - Al.

[Jan 16, 2023 5:21:38 PM]

BobbyB
Veteran Cruncher
Canada
Joined: Apr 25, 2020
Post Count: 609
Status: Offline
Project Badges:


Re: no work units, no info here, nothing on Facebook page - AGAIN

AH! then not guilty.

adriverhoef has a useful general application

Got those in the past. Still have them.

Not really checking stats anymore. I just let the machines run themselves. Check up on them 2-3 times a day. TN-Grid fills the holes when they run out of WCG. I Still have not been able to turn off other projects without running dry. Tried 3 times so far.

----------------------------------------
[Edit 1 times, last edit by BobbyB at Jan 16, 2023 5:40:51 PM]

[Jan 16, 2023 5:38:56 PM]

bfmorse
Senior Cruncher
US
Joined: Jul 26, 2009
Post Count: 303
Status: Offline
Project Badges:

14 day badge for Human Proteome Folding - Phase 2

14 day badge for Help Fight Childhood Cancer

14 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Computing for Clean Water

180 day badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

180 day badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: no work units, no info here, nothing on Facebook page - AGAIN

As to queue sizes:

If you always return work well before deadlines your queue sizes aren't excessive :-)

YOU, are absolutely correct!

But, when the WU is returned MORE than six (6) days after the original WU was received by that volunteer's system and THEIR processing time is LESS than two (2) hours, Wouldn't you agree that their queue size IS EXCESSIVE?

[Jan 16, 2023 5:45:19 PM]

alanb1951
Veteran Cruncher
Joined: Jan 20, 2006
Post Count: 986
Status: Offline
Project Badges:


Re: no work units, no info here, nothing on Facebook page - AGAIN

As to queue sizes:

If you always return work well before deadlines your queue sizes aren't excessive :-)

If they are "persistent offenders" then yes, I'd agree that there's an issue[1]-- there are certainly users out there who have work-load mixes that seem to regularly see results returned close to deadlines, which is extremely bad for the project(s) concerned if (like ARP1 and HST1) they can't advance a specific data item until the previous unit of work is completed. Whether some of these are late returners because they don't have "always on" networks may, however, be a factor in a few cases, as we only get to see when the task was returned, not when it was actually finished.

Personally, I'd like to see a reduction in the default deadlines for shorter-running applications such as MCM1, OPN1 and SCC1 (if/when it returns), combined with the return of the grace day for those sub-projects -- that would result in active clients killing off tasks earlier, which might shift the balance away from "No Reply" retries to "Not Started by Deadline" retries (which should show up as Errors a day earlier than No Reply), and would help reduce the irritation associated with having tasks Server Aborted because a No Reply task actually turned up a few hours late, as the client would assume the shorter deadline applied...

Cheers - Al.

[1] As (unlike the case with a lot of other BOINC projects) we [as users] can't actually see a user's recent work returns in fine detail, and can't determine what other projects they might be running, we can't tell whether it's excessive queues, inter-project clashes or who knows what else. We might be fairly sure, but... :-)

----------------------------------------
[Edit 1 times, last edit by alanb1951 at Jan 16, 2023 6:15:54 PM]

[Jan 16, 2023 6:14:46 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2172
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

14 day badge for Discovering Dengue Drugs - Together - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

1 year badge for Uncovering Genome Mysteries

2 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: no work units, no info here, nothing on Facebook page - AGAIN

BobbyB wrote:

And for my own info where would I see re-send data.

One way is to go to Results, press the End-key, set the 'Items per page' to a large enough number, scroll through the results, find the beginning and end of a certain pattern that you are looking for, select all those lines, copy and paste them to a line counter program (such as 'wc' on Linux) and presto!

Another way, as alanb1951 replied:

So some users make use of a WCG-provided API to collect information about individual tasks they have processed.

I've got some Python scripts that do this, specific to my needs. User adriverhoef has a useful general application (which may or may not be Linux only - not sure about that).

It is Linux only — its homepage begins with "This is software for Linux" smile

— and that makes it easier to maintain.

Using wcgstats from that software package to carry out this task involves some knowledge about the commandline and what files are used. (...) I did this:

$ wcgstats -w*MCM1_......._...._3 -S
* Let's try to locate the workunit.
There are 2283 pages of results available.
>>> Too many matches.  Only one result allowed.

* Showing page 1/2283 of all tasks with status ’0’ on all of your devices:
 <1>   MCM1_0194946_4842_0  Linux Ubuntu  In Progress           2023-01-16T14:13:01  2023-01-22T14:13:01
 <1> * MCM1_0194946_4842_1  Fedora Linux  In Progress           2023-01-16T14:13:06  2023-01-22T14:13:06


- Did this show the desired workunit? [Y (yes)/n (no, next)/p (previous)/l (last)/q (quit)/c (change)/= (match)/PAGENUMBER]

At this point I typed ^Z to suspend wcgstats temporarily. Then I located the temporary files used by wcgstats in this session:

$ ls -t /tmp/wcgstats.* | head -6
/tmp/wcgstats.225955.wingmen.2023-01-16T20:15:03.247450540
/tmp/wcgstats.225955.results.2023-01-16T20:15:03
/tmp/wcgstats.225955.elected.2023-01-16T20:15:03
/tmp/wcgstats.225955.entries.2023-01-16T20:15:03
/tmp/wcgstats.225955.webpage.2023-01-16T20:15:03
/tmp/wcgstats.225955.cookies.2023-01-16T20:15:03

The needed file is the one with ".entries." in its name. Then I used 'less -N' to scroll through the file with linenumbers, searched for "_2<TAB>" (three characters) to find the first occurrence, noted the linenumber, searched for "_[01]<TAB>" (6 characters), noted the linenumber; then subtracted the two linenumbers and the answer was 121. cool

[Jan 16, 2023 7:36:35 PM]

TPCBF
Master Cruncher
USA
Joined: Jan 2, 2011
Post Count: 1957
Status: Offline
Project Badges:

5 year badge for The Clean Energy Project - Phase 2

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

50 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

50 year badge for Smash Childhood Cancer

50 year badge for Microbiome Immunity Project


Re: no work units, no info here, nothing on Facebook page - AGAIN

Going to have to start looking at other distributed computing projects.

Good luck....

I get plenty of MCM1 WUs on all my hosts, Windows, Linux, Android (don't have a macOS machine running at this time though), wtih no issues what so ever..

I did notice a lot of resends the last couple of days, _2 and today even a bunch of _3, but certainly not a general problem at WCG at the moment that would result in any info post (well, Cyclops posted a general new update on 1/13) here in the forum or on the Facebook page...

Ralf

----------------------------------------

[Jan 17, 2023 6:25:56 AM]

BobbyB
Veteran Cruncher
Canada
Joined: Apr 25, 2020
Post Count: 609
Status: Offline
Project Badges:


Re: no work units, no info here, nothing on Facebook page - AGAIN

Going to have to start looking at other distributed computing projects

Tn-Grid

Follow the instructions on their front page: create an account first.

[Jan 18, 2023 4:21:13 PM]

[ ]