Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 23
|
![]() |
Author |
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Left alone machine did 8 of last beta at 98.5 to 99.1 percent efficiency. The first ugm dozen are barely hitting 96-97. Not running the graphics. What could that be, toughness of the varying sequences compared?
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It's not me nor the sandman. The situation deteriorated, now the compared txt files grown to 13mb and efficiency dropping to 92-93 percent under windows. Per task manager the all exclusive for ugm, largest cpu time competitor was 6 minutes for the system idle process. Switched all cores to mcm and they quickly showed the normal for the node, 99 percent plus. Switched to linux and ran all cores ugm, got 99.8 percent. What's up with that? Have we got another science that favors a particular platform?
|
||
|
OldChap
Veteran Cruncher UK Joined: Jun 5, 2009 Post Count: 978 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
First batch on each rig were at ~96% for me on linux. Mid running I changed setting <no_priority_change> to 1 in cc config and it improves cpu%
----------------------------------------Came home from work to find everything running in the high 98% or low 99% Thinking of limiting write to disk some to see if that helps too because even my rig running @ 3.1 is getting low points compared to claim suggesting that over 90% of the results so far are from much faster beasts so in this respect every little helps ![]() |
||
|
deltavee
Ace Cruncher Texas Hill Country Joined: Nov 17, 2004 Post Count: 4890 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Have we got another science that favors a particular platform? Getting .99+ on my windows machines. |
||
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Not sure on the linux/windows question, although there will be some variation between work units which will be especially true between batches. As for the beta vs. production question, there are a few key difference between what was run in beta vs. what's currently being run in production. During beta, we ran a sampling of all work units for the entire project (or at least what the researchers have provided so far). The researchers also requested that we run some reference seqeuences through first. Early indications are that these are reference sequences are generating larger output files (and more IO) than the average for what we saw in beta (which should be more in line with the whole for the project). Starting with batch 45, non-reference sequences will be worked into the mix and we may see the IO being generated (on average) start to drop.
Also, with the exception of a subset of the last beta run, the sequences were grouped from similar sources. This resulted in few work units generating very large output files but most generating smaller than the average output files and less IO. In production, the order of the sequences in the sequence files is randomized so we don't see such wide variations in the size of the output files generated. Each work unit's sequence file is still only from a single source though, so there is likely to still be variations between batches in the size of the results files. Seippel |
||
|
seippel
Former World Community Grid Tech Joined: Apr 16, 2009 Post Count: 392 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Have we got another science that favors a particular platform? Getting .99+ on my windows machines. I just did some quick database checks and to expand on what I mentioned above, the average size of results files does vary quite a bit by batch. For example, with a large number of results in batch 00000 has nearly twice the output generated as batch 00001. This isn't too surprising since batch 00000 compares sequences from reference source "A" with other sequences from reference source "A". Batch 00001 compares sequences from reference source "A" with sequences from reference source "B." When comparing stats from one machine to the next, you'll want to make sure you are also comparing work units from the same batch to control for those other factors. Seippel |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well let me retry:
1) On same device, after suspending ugm on all cores running in the 92-93 percent efficiency range for batches 17, 18, 33. Switched all cores to mcm as to ascertain if something was awry. After 1:43 hours all of batch 8304, started simultaneous, remained at 99 percent and better. 2) Booted to linux, and get 99.8 percent running all cores ugm. Now increased write to disk under windows from 120 seconds to 300 seconds and restarted the ugm with laim off, so they pick up the new interval write setting i.e. at least checkpointing no sooner than 5 minutes have passed. Just to see if efficiency climbs back up again. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Watching it in boincstats, see the efficiency drop right at the time of checkpointing. They did it -all- simultaneous in the same second, which may point at an io bottleneck. The second checkpoint were also written exactly simultaneous, 5:28 minutes after the previous. And the third checkpoint, all simultaneous 5:28 minutes later. Metronomic is appears, but the efficiency is creeping towards 94 percent. Will increase to 600 seconds write to disk, then if they eventually start saving checkpoints a-synchronous, things may improve. Lol, we could be needing staggered starting, just as with cep2 to optimize utilization and performance.
|
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
So when suspending all the ready to start of ugm and let mcm take over gradually, boinctasks logged ever increasing efficiency as fewer ugm were running concurrent, the last one up to 95.78 percent. The 600 second write interval did add a percent to ugm performance, not much.
Now testing 1 ugm next to only mcm for the duration. Ugm needs about 3:35 hours on this node. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Well, the data speaks for itself, 1 ugm alongside all others mcm, which run at 99+ percent efficiency, gives 98 percent for ugm, that's 4-5 percent better than all cores ugm.
7.22 ugm1 ugm1_ugm1_00033_0259_0 03:21:59 (03:18:00) 10/16/2014 12:41:47 PM 10/16/2014 12:42:18 PM 98.03 Reported: OK + 33.53 MB 67.50 MB Next test, 2 quasi synchronous, still with 600 seconds write to disk. I'll have them though start with a 30 second delay, so the checkpoints would initially be out-of-sync. |
||
|
|
![]() |