Index  | Recent Threads  | Unanswered Threads  | Who's Active  | Guidelines  | Search
 

Quick Go ยป
No member browsing this thread
Thread Status: Active
Total posts in this thread: 3317
Posts: 3317   Pages: 332   [ Previous Page | 125 126 127 128 129 130 131 132 133 134 | Next Page ]
[ Jump to Last Post ]
Post new Thread
Author
Previous Thread This topic has been viewed 3308980 times and has 3316 replies Next Thread
Dayle Diamond
Senior Cruncher
Joined: Jan 31, 2013
Post Count: 452
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Dayle

"1 spare on a multicore machine is not a queue. It just tides you over from when one finishes to when the next is downloaded. However, sometimes it takes a bit longer to get one, so the spare keeps you crunching fully."


This is only true if somebody has dropped all other projects and is badge hunting.

A cruncher getting a representative mix of projects does not stop crunching if an ARP task is not available.

Keeping a work unit idle for six hours when somebody else could have started it and been six hours into crunching is nowhere near ideal. Folding at home, for example, won't let you download a new project until the current one is on its last minutes.

Simple settings to keep the project moving along: Set an ARP maximum that your system can handle, opt into all projects, and keep the queue at 0.1 days min + 0.1 days max.
----------------------------------------
[Edit 1 times, last edit by Dayle Diamond at Aug 15, 2021 2:31:48 AM]
[Aug 15, 2021 2:24:02 AM]   Link   Report threatening or abusive post: please login first  Go to top 
Mike.Gibson
Ace Cruncher
England
Joined: Aug 23, 2007
Post Count: 12436
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

It is also those who have a preference for a particular project. The queues we are trying to cut are those that are several days rather than the singletons.

We are trying to get rid of the 'No Reply' which seem to all too prevalent.

Mike
[Aug 15, 2021 10:51:07 AM]   Link   Report threatening or abusive post: please login first  Go to top 
leloft
Cruncher
Joined: Jun 8, 2017
Post Count: 23
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Simple settings to keep the project moving along: Set an ARP maximum that your system can handle, opt into all projects, and keep the queue at 0.1 days min + 0.1 days max.
I have restricted ARP to two machines, set the cache to one more than app_config allows, but because the workcache is overloaded, no ARP units were available. As a consequence, I have just received >1000 MCM/OPN units because I followed this advice and checked the 'if no work available send me work from other projects'. This means that the ARP units are going to take >60h to process and not the <40h before this happened.
Is there anyway to abort these 1000 OPN/MCM units in bulk.: it'll take over 16 hours at 1 per minute via boinccmd.
[Aug 15, 2021 11:25:27 AM]   Link   Report threatening or abusive post: please login first  Go to top 
adriverhoef
Master Cruncher
The Netherlands
Joined: Apr 3, 2009
Post Count: 2171
Status: Offline
Project Badges:
Reply to this Post  Reply with Quote 
Re: Work Available

Is there anyway to abort these 1000 OPN/MCM units in bulk.: it'll take over 16 hours at 1 per minute via boinccmd.

If you can't select multiple tasks via Boinc Manager, then use this technique:

- Select all the tasknames you want to abort.
Example:
  • wcgresults -HN0 | egrep '^OPN|^MCM' > tasks_to_abort # 0 = all uninitialized tasks

    - Feed these names to this little UNIX shell snippet:
  • URL=http://www.worldcommunitygrid.org/
    BOINCCMD="boinccmd"
    TASKCMD="$BOINCCMD --task $URL"
    WCGRESULTS=wcgresults

    abort () {
    local wu=$1
    if $WCGRESULTS -HN+ | grep -q "^$wu$"; then
    echo "Task $wu has been started in the meantime:"
    $WCGRESULTS -ND+TWOPLACES | egrep "^Deadline|$wu"
    return
    fi
    if [ -n "$wu" ]; then $TASKCMD $wu abort && echo "$wu aborted"; fi
    }

    while [ -t 0 ] && printf "Task: "; read task; do abort $task; done < tasks_to_abort

  • [Aug 15, 2021 12:34:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    Mike.Gibson
    Ace Cruncher
    England
    Joined: Aug 23, 2007
    Post Count: 12436
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Work Available

    You should be able to abort multiple tasks under the task manager. Select the first to be aborted, hold down your shift key and then click on the last to be aborted and then click 'Abort'.

    Mike
    [Aug 15, 2021 1:52:12 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    leloft
    Cruncher
    Joined: Jun 8, 2017
    Post Count: 23
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Work Available

    Is there anyway to abort these 1000 OPN/MCM units in bulk.

    If you can't select multiple tasks via Boinc Manager, then use this technique:

    Worked like a charm. Thank you very much indeed. For completeness I've added the additional commands (as nested quotes) as the machine in question is headless and 20km away.
    $mkdir ~/wcgresults && cd ~/wcgresults
    $wget https://a3a3.home.xs4all.nl/wcg/wcgresults
    $chmod 744 ./wcgresults

    $./wcgresults -HN0 | egrep '^OPN|^MCM' > tasks_to_abort # 0 = all uninitialized tasks
    $URL=http://www.worldcommunitygrid.org/
    $BOINCCMD="boinccmd"
    $TASKCMD="$BOINCCMD --task $URL"
    $WCGRESULTS=wcgresults

    $abort () {
    local wu=$1
    if $WCGRESULTS -HN+ | grep -q "^$wu$"; then
    echo "Task $wu has been started in the meantime:"
    $WCGRESULTS -ND+TWOPLACES | egrep "^Deadline|$wu"
    return
    fi
    if [ -n "$wu" ]; then $TASKCMD $wu abort && echo "$wu aborted"; fi
    }

    while [ -t 0 ] && printf "Task: "; read task; do abort $task; done < tasks_to_abort
    $boinccmd --project www.worldcommunitygrid.org update

    Thanks again.
    [Aug 15, 2021 3:35:25 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    adriverhoef
    Master Cruncher
    The Netherlands
    Joined: Apr 3, 2009
    Post Count: 2171
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Work Available

    Is there anyway to abort these 1000 OPN/MCM units in bulk.

    If you can't select multiple tasks via Boinc Manager, then use this technique:

    Worked like a charm. Thank you very much indeed. For completeness I've added the additional commands (as nested quotes) as the machine in question is headless and 20km away.

    My pleasure, it's very nice to read that it worked out. smile

    Since you wrote that you would have to use boinccmd, I gathered you probably needed some kind of list of commands, leloft.

    Hopefully it didn't take too much time. blushing

    Adri
    [Aug 15, 2021 4:32:39 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    leloft
    Cruncher
    Joined: Jun 8, 2017
    Post Count: 23
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Work Available

    It is also those who have a preference for a particular project. The queues we are trying to cut are those that are several days rather than the singletons.

    We are trying to get rid of the 'No Reply' which seem to all too prevalent.

    Mike

    What generates these 'No Reply' issues? I know that the maximum recommended number of concurrent ARP units is half the number of cores; the advice is that the other half can be given over to other projects. However, if I set device profiles such that boinc can use 100% cores, ARP 50% and actively exclude other projects, the time taken to process an ARP unit appears to be reduced by about 40-50%. In effect, it seems that if ARP is free to use 2 cores per unit, it will. 'Top -1' bears this out, with a system load of >14/24 while processing 12 units on a 24 core machine, all 24 cores showing an activity of 70-100%. This strategy appears to reduce the processing of an est 65h unit to an actual 30-38h. Can anyone explain/confirm this? To tackle the backlog, would a more rapid turnover of units be highly desirable?
    [Aug 16, 2021 9:16:26 AM]   Link   Report threatening or abusive post: please login first  Go to top 
    Acibant
    Advanced Cruncher
    USA
    Joined: Apr 15, 2020
    Post Count: 126
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Work Available

    However, if I set device profiles such that boinc can use 100% cores, ARP 50% and actively exclude other projects, the time taken to process an ARP unit appears to be reduced by about 40-50%.
    What you've done there is basically turn off what Intel would call "hyperthreading" or more generically simultaneous multithreading, only through manual control of threads rather than in BIOS. The usual situation is two threads per core so you could have 24 logical processors on a 12 physical processor machine. You state that you were doing 12 units on a 24 core machine but I suspect that's 24 logical cores not 24 physical cores. I don't believe any of the WCG projects are multithreaded.

    That will indeed reduce the time taken to process the work units but would lead to fewer processed per day in theory. Intel claims a 30% overall performance increase. To give a hypothetical on that, only using one thread (or logical core) per physical core a task might take 10 hours, but using SMT to run two threads per core each task might take 14 hours. So it does take longer but instead of taking 20 hours to do two units sequentially you only take 14 hours to do two simultaneously. These values are just for illustration and may not hold true for all architectures or tasks.
    ----------------------------------------

    [Aug 16, 2021 11:59:12 AM]   Link   Report threatening or abusive post: please login first  Go to top 
    leloft
    Cruncher
    Joined: Jun 8, 2017
    Post Count: 23
    Status: Offline
    Project Badges:
    Reply to this Post  Reply with Quote 
    Re: Work Available

    You state that you were doing 12 units on a 24 core machine but I suspect that's 24 logical cores not 24 physical cores.
    You are correct: the machine (Thinkstation C30) has 2 x Xeon E5-2667, 6 physical +6 logical each.
    That will indeed reduce the time taken to process the work units but would lead to fewer processed per day in theory.
    My point is that might be a price worth paying to clear the backlog more effectively.
    I don't believe any of the WCG projects are multithreaded.
    I cannot comment on this, but I'd be very surprised if the WRF model wasn't. What I can say with some degree of confidence, is that running 12 ARP units in the absence of any other wcg projects uses all 24 cores dynamically; it significantly reduces the processing time of the 12 ARP units compared to when other projects are concurrently using the other 12 cores.
    So it does take longer but instead of taking 20 hours to do two units sequentially you only take 14 hours to do two simultaneously.
    As the recommendation for ARP is to use only half the number of cores, there is no gain to the project by processing that second unit. From an ARP perspective, all it is doing is slowing down the ARP unit. I am currently trying to compare how many ARP points are generated per day (as a measure of ARP work done) following a 12/24 ARP strategy vs a 12/24 ARP + 6/24 OPN +6/24 MCM strategy. This is going to take a couple of weeks to organise and I'll post the results when I'm done.
    [Aug 16, 2021 2:33:53 PM]   Link   Report threatening or abusive post: please login first  Go to top 
    Posts: 3317   Pages: 332   [ Previous Page | 125 126 127 128 129 130 131 132 133 134 | Next Page ]
    [ Jump to Last Post ]
    Post new Thread