World Community Grid - View Thread - Cache-sharing / Work-unit pooling server?

World Community Grid Forums

Category: Support

Forum: BOINC Agent Support

Thread: Cache-sharing / Work-unit pooling server?

Quick Go »

No member browsing this thread

Thread Status: Active
Total posts in this thread: 19

[ ]

Author

This topic has been viewed 11598 times and has 18 replies

vlankhaar
Cruncher
Joined: Oct 31, 2008
Post Count: 6
Status: Offline


Cache-sharing / Work-unit pooling server?

Does anybody know of a way to share a workunit cache between multiple systems?

I have a small cluster of diskless virtual machines on a private network that I would love to run WCG on, but I have two problems:
(1) the virtual machines do not have direct access to the Internet, not even via NAT or proxy
(2) these virtual machines may be created and destroyed, booted and halted at random

The virtual machines do have access to a shared filesystems on the cluster head.

Given these two restrictions, I am hoping that there is some way I can create a single BOINC master installation on the cluster head, with the sole responsibility of download work units from WCG and uploading completed work units to WCG, storing the work units in a pool on the cluster's shared filesystem.

I would like, then, to be able to have virtual machines on the cluster run a boinc client to grab a work unit from the cluster-head, compute on it for a while, then return the result to the cluster-head. The individual, randomly created/destroyed cluster members would not have to be attached/detached from the WCG project.

Is there something like this out there?
Is this something that could be done with a bit of hacking on the boinc client code?
Am I thinking about this the wrong way?

[Mar 26, 2012 1:12:42 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Cache-sharing / Work-unit pooling server?

Hello vlankhaar,
What you are asking for is a variation of a way to run BOINC on a single computer connected to the internet to feed instances of BOINC on computers connected to the Internet computer via a LAN. The BOINC architecture is not designed to allow this. The closest is connecting to the Internet over an ICJ connection to a computer that has internet access. But this is not close enough.

Sorry,
Lawrence

[Mar 26, 2012 1:54:15 AM]

Laura_Stevens
Cruncher
Joined: Mar 12, 2012
Post Count: 3
Status: Offline


Re: Cache-sharing / Work-unit pooling server?

Dotsch will sort of do this, you can install the OS on a main machine, and it will allow diskless clients to boot from it's image, each being then able to pull their own BOINC WU's.

I don't think that the WU's will be pooled, however the server can handle all communications with the project servers iva proxy, and handling downloading/ reporting of all WU's for the diskless clients. the images for the WU's in progress by each client are mirrored on the server, so if a client is restarted, it will continue the WU it was working on before it rebooted.

You may be able to push the Dotsch image into the VM and use the shared resource space to host the image for the machines, however to truly use the upload/download feature as-is, you need to have the main system running Dotsch in order to dish out the images for the VM, and to act as the proxy for the machines to access the internet.

Somewhere in the mix, there needs to be a machine with internet access, even if it's just a diskless client booting the client from USB, having it act as the server, and map the network storage area as it's drive to serve images from and to store the WU's in progress by the clients. Without knowing the exact makeup of the network, it's difficult to guess at exactly which way would be best, but Dotsch will very likely make the task easier since it was designed to do such work.

http://www.dotsch.de/boinc/Dotsch_UX.html

Dotsch did a lot of hard work modifying an Ubuntu image to streamline Dotsch/UX and to make a lot of what you are talking about automated. It even supports CUDA on the diskless clients. Great distro by a hardcore BOINC'er

And just to note: you won't find support for it on the forums for most projects, as it's way outside the realm of typical crunching, however it's the easiest (very relative use of the term here) way to achieve what you are looking for.

----------------------------------------
[Edit 3 times, last edit by Laura_Stevens at Mar 26, 2012 3:36:58 AM]

[Mar 26, 2012 2:53:17 AM]

mikey
Veteran Cruncher
Joined: May 10, 2009
Post Count: 824
Status: Offline
Project Badges:

2 year badge for Human Proteome Folding - Phase 2

180 day badge for Discovering Dengue Drugs - Together

1 year badge for Nutritious Rice for the World

180 day badge for The Clean Energy Project

2 year badge for Help Fight Childhood Cancer

1 year badge for Influenza Antiviral Drug Search

2 year badge for Help Cure Muscular Dystrophy - Phase 2

180 day badge for Discovering Dengue Drugs - Together - Phase 2

5 year badge for The Clean Energy Project - Phase 2

2 year badge for Computing for Clean Water

2 year badge for Drug Search for Leishmaniasis

2 year badge for GO Fight Against Malaria

2 year badge for Computing for Sustainable Water

20 year badge for Mapping Cancer Markers

5 year badge for Uncovering Genome Mysteries

10 year badge for Outsmart Ebola Together

10 year badge for FightAIDS@Home - Phase 2

20 year badge for Smash Childhood Cancer

20 year badge for Microbiome Immunity Project

5 year badge for Africa Rainfall Project

10 year badge for OpenPandemics - COVID-19


Re: Cache-sharing / Work-unit pooling server?

One of the ways people used to cheat was to download a workunit and crunch it until it was around 95% or so complete and then stop, save the workunit, and then finish the workunit and report it. They would then copy that almost finished workunit to multiple pc's and finish it there too, getting credit for it multiple times over and over and over and over. One of the things Boinc has tried hard to do is stop some of the rampant cheating of old, one of the ways they do that is that each workunit is downloaded to a pc and should be returned by that same pc. The current system is not perfect and there are ways around it, and the cheating still continues. BUT the cheating has been reduced and they are still working on it. I am NOT saying that you would or have or would ever consider cheating, I am just trying to show why they do not allow what you would like them to do. Back in the pre-Boinc days they even had websites around the World they would load workunits to and people could grab them from there instead of having to connect to California every time, I am talking in the dial-up days now! Back then it was Seti ONLY, now there are many Boinc Projects and mostly no per connection costs.

----------------------------------------

[Mar 26, 2012 2:03:17 PM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Cache-sharing / Work-unit pooling server?

vlankhaar,

What you want, a little central pool aggregation, dispensing units to diskless nodes is not in the compendium of configurables. Even IBM with there many thousands of hosts have to each fetch their own work from the WCG servers. You have to develop your own script that exchanges clients on a central host, fetch work, load it onto a node. Theoretically it could be done with all nodes + 1 extra instance that fetches the work, then loads 'that' client onto a node and start it there. It would require lots of scheduling and stopping, copying back, copying forth, restarting. The stopping of clients on rotation would cost progress, but still if you had say a dozen, with each holding several days work, it could be made to work [lots of trial and error and mapping the scheme]

As for the concerns voiced above... not in this century. The integrity of results and their uniqueness and non-duplicability is absolute. A result reported and taken in correctly will not be accepted a second time. Never has at WCG. BOINC Server release 700, just implemented, took the last bit away from the clients... the credit claiming and performance tracking. That is all computed on the server now.

--//--

----------------------------------------
[Edit 1 times, last edit by Former Member at Mar 26, 2012 5:13:47 PM]

[Mar 26, 2012 5:11:19 PM]

vlankhaar
Cruncher
Joined: Oct 31, 2008
Post Count: 6
Status: Offline


Re: Cache-sharing / Work-unit pooling server?

I'm disappointed, surely. I have some business reason to put CPU load (any cpu load, frankly) on the cluster. I was really hoping I could put useful (i.e. WCG) load on the cluster. Unfortunately it looks like I'm out of luck on this one.

icj (ics?) connection sharing is Windows-ese for NAT, which I cannot do, as much as I'd love to. Dotsch is out because my choice of OS in the virtual machine cluster / cluster head is fixed. Cheating is an academically amusing consideration, but I crunch for science much more than the pretty 28x28 icons.

Given the complexity mentioned to by SekeRob, I doubt I could justify the time. I am completely unconcerned about cost progress. Since the alternative to WCG on the cluster is to run "for core in $(cat /proc/cpuinfo | grep ^processor | cut -d: -f2); do (while true; do true; done) & done" to drive some CPU load, I feel like any forward progress is good, even if I lose, say 50% of the computation time.

In fact, thinking about it, it sounds like I might almost be better off installing a full boinc server on the cluster head, then hacking a boinc client to dump its downloaded WUs into the feeder and upload WU results triggered by the validator. That doesn't sound like any less work, but interesting to consider nonetheless.

[Mar 27, 2012 2:48:00 AM]

vlankhaar
Cruncher
Joined: Oct 31, 2008
Post Count: 6
Status: Offline


Re: Cache-sharing / Work-unit pooling server?

I think I've managed to convince the security operations folks that allowing the cluster to initiate a connection out to WCG via a IP:Port whitelisting proxy will not result in unmitigated doom, and I think I can probably handle the new auditing burden.

If I can get network operations folks to agree that the bandwidth requirement is small enough, I might actually be able to get direct connection from a cluster vm to WCG. (is there somewhere that lists the approximate average transfer-size-per-point or transfer-size-per-work-unit for each project?)

Given that these systems are going to be diskless (no USB, most of the filesystem is read-only, with a tmpfs filesystem mounted on /BOINC for World Community Grid to run in), are there any recommendations on BOINC configurations to use?

I'm guessing that the following would make sense:
<abort_jobs_on_exit>1</abort_jobs_on_exit>
<dont_contact_ref_site>1</dont_contact_ref_site>
<report_results_immediately>1</report_results_immediately>

Is there a "detach on exit" command, or should I wrap that into the planned-shutdown script? Does detaching serve any purpose on the WCG server side? Or is that just to cleanup the local boinc client's filesystem (which, I don't care about because the moment kernel unmount /BOINC, its contents disappear.)

Is there anything I can do to minimize the number of devices registered with WorldCommunityGrid? I'm guessing that I'll just have to give in and accept that I will have a whole bunch of defunct devices listed.
http://www.worldcommunitygrid.org/forums/wcg/viewthread?thread=16521 suggests that I have backups of a bunch of files, but I am not convinced that I want to attempt to (a) collect backups from VMs on planned poweroff, and (b) dole them out to newly created VMs on boot. It also suggests that there is (as of 5 years ago), a host merging feature, but that will likely fail in my configuration given the VMs are likely to be DHCP driven and memory allocation is likely to vary between VM boots.

[Apr 3, 2012 12:18:03 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Cache-sharing / Work-unit pooling server?

(is there somewhere that lists the approximate average transfer-size-per-point or transfer-size-per-work-unit for each project?)

https://secure.worldcommunitygrid.org/help/vi...?shortName=minimumreq#413

----------------------------------------
[Edit 1 times, last edit by Former Member at Apr 3, 2012 3:19:13 AM]

[Apr 3, 2012 3:17:36 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Cache-sharing / Work-unit pooling server?

On bandwidth use, http://bit.ly/WCGDSH has a line showing 24 hours use per science based on current runtime averages. FAAH/HFCC use least, and after SN2S, DSFL, GFAM, all under 2Mb per core per day. Weighted average over all sciences, 8.8 MB per core per day.

On shutdown (loosing the install), those options look fine. Detaching in fact does not help [the boinccmd tool has an option I think] as the server will interpret this as "oops, but maybe the device gets attached again, so lets see what happens to see if we can resend the lost tasks" and keeps the tasks that were in progress on hold until they expire by deadline.

Devices are recognized on some 5 criteria. If they are the same on next device install, good chance the automated system will recognize the device, but if the complete install is lost, including the data directory it is possibly not going to work.

A few others who run a PXE setup actually store the BOINC instances away from their diskless systems when they have commercial work for their systems, then refetch them and continue running where they left off. This is DIY.

Proxy IP addresses: see this post by knreed for latest http://www.worldcommunitygrid.org/forums/wcg/printpost_post,338077 . Would not recommend you run Clean Energy for Harvard, lest you want to use 118MB per core per day (for average systems).

--//--

[Apr 3, 2012 6:07:34 AM]

Former Member
Cruncher
Joined: May 22, 2018
Post Count: 0
Status: Offline


Re: Cache-sharing / Work-unit pooling server?

I have similar issue.

I personally use my laptop when I'm at home/office. My office has 6 servers 16-cores always up day and night, and they do have their own hard drives. So it's simpler than in vlankhaar's

Yes we use those servers during work hours, and they just idle during the night.
I surely can put BOINC on all of them in minutes using LAN since they're all connected through LAN.
The only problem is that all of them are not connected to the internet. So they cannot fetch their jobs.

I imagine that my computer will be the WU pool, and every 4PM, all WUs will synchronize to the pool, as soon as I get home, I'll just have to upload all finished WU.
That doesn't violate anything does it?
The WU I upload will be marked as valid because the WU was downloaded by the same host (my computer).

Do you think a lot of people also have this kind of problem?

[May 25, 2012 5:34:41 PM]

[ ]