World Community Grid - View Thread

World Community Grid Forums

Category: Active Research

Forum: Africa Rainfall Project

Thread: Work Available

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 3317

[ ]

Author

This topic has been viewed 3308980 times and has 3316 replies

Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:

1 year badge for The Clean Energy Project - Phase 2

14 day badge for Drug Search for Leishmaniasis

100 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

20 year badge for Outsmart Ebola Together

20 year badge for FightAIDS@Home - Phase 2

10 year badge for Smash Childhood Cancer

10 year badge for Microbiome Immunity Project

2 year badge for Africa Rainfall Project

20 year badge for OpenPandemics - COVID-19


Re: Work Available

Dayle

"1 spare on a multicore machine is not a queue. It just tides you over from when one finishes to when the next is downloaded. However, sometimes it takes a bit longer to get one, so the spare keeps you crunching fully."

This is only true if somebody has dropped all other projects and is badge hunting.

A cruncher getting a representative mix of projects does not stop crunching if an ARP task is not available.

Keeping a work unit idle for six hours when somebody else could have started it and been six hours into crunching is nowhere near ideal. Folding at home, for example, won't let you download a new project until the current one is on its last minutes.

Simple settings to keep the project moving along: Set an ARP maximum that your system can handle, opt into all projects, and keep the queue at 0.1 days min + 0.1 days max.

----------------------------------------
[Edit 1 times, last edit by Dayle Diamond at Aug 15, 2021 2:31:48 AM]

[Aug 15, 2021 2:24:02 AM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:

1 year badge for Human Proteome Folding - Phase 2

45 day badge for Discovering Dengue Drugs - Together

14 day badge for Nutritious Rice for the World

180 day badge for Help Fight Childhood Cancer

90 day badge for Help Cure Muscular Dystrophy - Phase 2

14 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

90 day badge for Computing for Clean Water

1 year badge for Drug Search for Leishmaniasis

180 day badge for GO Fight Against Malaria

45 day badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Outsmart Ebola Together

5 year badge for FightAIDS@Home - Phase 2

2 year badge for Microbiome Immunity Project

10 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Work Available

It is also those who have a preference for a particular project. The queues we are trying to cut are those that are several days rather than the singletons.

We are trying to get rid of the 'No Reply' which seem to all too prevalent.

Mike

[Aug 15, 2021 10:51:07 AM]

leloft
Cruncher
Joined: Jun 8, 2017
Post Count: 23
Status: Offline
Project Badges:

180 day badge for Outsmart Ebola Together

1 year badge for FightAIDS@Home - Phase 2

1 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

1 year badge for OpenPandemics - COVID-19


Re: Work Available

Simple settings to keep the project moving along: Set an ARP maximum that your system can handle, opt into all projects, and keep the queue at 0.1 days min + 0.1 days max.

I have restricted ARP to two machines, set the cache to one more than app_config allows, but because the workcache is overloaded, no ARP units were available. As a consequence, I have just received >1000 MCM/OPN units because I followed this advice and checked the 'if no work available send me work from other projects'. This means that the ARP units are going to take >60h to process and not the <40h before this happened.
Is there anyway to abort these 1000 OPN/MCM units in bulk.: it'll take over 16 hours at 1 per minute via boinccmd.

[Aug 15, 2021 11:25:27 AM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2171
Status: Offline
Project Badges:

5 year badge for Human Proteome Folding - Phase 2

90 day badge for Nutritious Rice for the World

2 year badge for Help Fight Childhood Cancer

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for The Clean Energy Project - Phase 2

1 year badge for Computing for Clean Water

1 year badge for GO Fight Against Malaria

1 year badge for Uncovering Genome Mysteries

2 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

5 year badge for Microbiome Immunity Project

50 year badge for OpenPandemics - COVID-19


Re: Work Available

Is there anyway to abort these 1000 OPN/MCM units in bulk.: it'll take over 16 hours at 1 per minute via boinccmd.

If you can't select multiple tasks via Boinc Manager, then use this technique:

- Select all the tasknames you want to abort.
Example:

wcgresults -HN0 | egrep '^OPN|^MCM' > tasks_to_abort # 0 = all uninitialized tasks

- Feed these names to this little UNIX shell snippet:

URL=http://www.worldcommunitygrid.org/
BOINCCMD="boinccmd"
TASKCMD="$BOINCCMD --task $URL"
WCGRESULTS=wcgresults

abort () {
	local wu=$1
	if $WCGRESULTS -HN+ | grep -q "^$wu$"; then
		echo "Task $wu has been started in the meantime:"
		$WCGRESULTS -ND+TWOPLACES | egrep "^Deadline|$wu"
		return
	fi
	if [ -n "$wu" ]; then $TASKCMD $wu abort && echo "$wu aborted"; fi
}

while [ -t 0 ] && printf "Task: "; read task; do abort $task; done < tasks_to_abort

[Aug 15, 2021 12:34:12 PM]

Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:


Re: Work Available

You should be able to abort multiple tasks under the task manager. Select the first to be aborted, hold down your shift key and then click on the last to be aborted and then click 'Abort'.

Mike

[Aug 15, 2021 1:52:12 PM]

leloft
Cruncher
Joined: Jun 8, 2017
Post Count: 23
Status: Offline
Project Badges:


Re: Work Available

Is there anyway to abort these 1000 OPN/MCM units in bulk.

If you can't select multiple tasks via Boinc Manager, then use this technique:

Worked like a charm. Thank you very much indeed. For completeness I've added the additional commands (as nested quotes) as the machine in question is headless and 20km away.

$mkdir ~/wcgresults && cd ~/wcgresults
$wget https://a3a3.home.xs4all.nl/wcg/wcgresults
$chmod 744 ./wcgresults

$./wcgresults -HN0 | egrep '^OPN|^MCM' > tasks_to_abort # 0 = all uninitialized tasks
$URL=http://www.worldcommunitygrid.org/
$BOINCCMD="boinccmd"
$TASKCMD="$BOINCCMD --task $URL"
$WCGRESULTS=wcgresults

$abort () {
local wu=$1
if $WCGRESULTS -HN+ | grep -q "^$wu$"; then
echo "Task $wu has been started in the meantime:"
$WCGRESULTS -ND+TWOPLACES | egrep "^Deadline|$wu"
return
fi
if [ -n "$wu" ]; then $TASKCMD $wu abort && echo "$wu aborted"; fi
}

while [ -t 0 ] && printf "Task: "; read task; do abort $task; done < tasks_to_abort

$boinccmd --project www.worldcommunitygrid.org update

Thanks again.

[Aug 15, 2021 3:35:25 PM]

adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2171
Status: Offline
Project Badges:


Re: Work Available

Is there anyway to abort these 1000 OPN/MCM units in bulk.

If you can't select multiple tasks via Boinc Manager, then use this technique:

Worked like a charm. Thank you very much indeed. For completeness I've added the additional commands (as nested quotes) as the machine in question is headless and 20km away.

My pleasure, it's very nice to read that it worked out. smile

Since you wrote that you would have to use boinccmd, I gathered you probably needed some kind of list of commands, leloft.

Hopefully it didn't take too much time. blushing

Adri

[Aug 15, 2021 4:32:39 PM]

leloft
Cruncher
Joined: Jun 8, 2017
Post Count: 23
Status: Offline
Project Badges:


Re: Work Available

What generates these 'No Reply' issues? I know that the maximum recommended number of concurrent ARP units is half the number of cores; the advice is that the other half can be given over to other projects. However, if I set device profiles such that boinc can use 100% cores, ARP 50% and actively exclude other projects, the time taken to process an ARP unit appears to be reduced by about 40-50%. In effect, it seems that if ARP is free to use 2 cores per unit, it will. 'Top -1' bears this out, with a system load of >14/24 while processing 12 units on a 24 core machine, all 24 cores showing an activity of 70-100%. This strategy appears to reduce the processing of an est 65h unit to an actual 30-38h. Can anyone explain/confirm this? To tackle the backlog, would a more rapid turnover of units be highly desirable?

[Aug 16, 2021 9:16:26 AM]

Acibant
Advanced Cruncher
USA
Joined: Apr 15, 2020
Post Count: 126
Status: Offline
Project Badges:

50 year badge for Mapping Cancer Markers

5 year badge for OpenPandemics - COVID-19


Re: Work Available

However, if I set device profiles such that boinc can use 100% cores, ARP 50% and actively exclude other projects, the time taken to process an ARP unit appears to be reduced by about 40-50%.

What you've done there is basically turn off what Intel would call "hyperthreading" or more generically simultaneous multithreading, only through manual control of threads rather than in BIOS. The usual situation is two threads per core so you could have 24 logical processors on a 12 physical processor machine. You state that you were doing 12 units on a 24 core machine but I suspect that's 24 logical cores not 24 physical cores. I don't believe any of the WCG projects are multithreaded.

That will indeed reduce the time taken to process the work units but would lead to fewer processed per day in theory. Intel claims a 30% overall performance increase. To give a hypothetical on that, only using one thread (or logical core) per physical core a task might take 10 hours, but using SMT to run two threads per core each task might take 14 hours. So it does take longer but instead of taking 20 hours to do two units sequentially you only take 14 hours to do two simultaneously. These values are just for illustration and may not hold true for all architectures or tasks.

----------------------------------------

[Aug 16, 2021 11:59:12 AM]

leloft
Cruncher
Joined: Jun 8, 2017
Post Count: 23
Status: Offline
Project Badges:


Re: Work Available

You state that you were doing 12 units on a 24 core machine but I suspect that's 24 logical cores not 24 physical cores.

You are correct: the machine (Thinkstation C30) has 2 x Xeon E5-2667, 6 physical +6 logical each.

That will indeed reduce the time taken to process the work units but would lead to fewer processed per day in theory.

My point is that might be a price worth paying to clear the backlog more effectively.

I don't believe any of the WCG projects are multithreaded.

I cannot comment on this, but I'd be very surprised if the WRF model wasn't. What I can say with some degree of confidence, is that running 12 ARP units in the absence of any other wcg projects uses all 24 cores dynamically; it significantly reduces the processing time of the 12 ARP units compared to when other projects are concurrently using the other 12 cores.

So it does take longer but instead of taking 20 hours to do two units sequentially you only take 14 hours to do two simultaneously.

As the recommendation for ARP is to use only half the number of cores, there is no gain to the project by processing that second unit. From an ARP perspective, all it is doing is slowing down the ARP unit. I am currently trying to compare how many ARP points are generated per day (as a measure of ARP work done) following a 12/24 ARP strategy vs a 12/24 ARP + 6/24 OPN +6/24 MCM strategy. This is going to take a couple of weeks to organise and I'll post the results when I'm done.

[Aug 16, 2021 2:33:53 PM]

[ ]