Index | Recent Threads | Unanswered Threads | Who's Active | Guidelines | Search |
![]() |
World Community Grid Forums
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
No member browsing this thread |
Thread Status: Active Total posts in this thread: 34
|
![]() |
Author |
|
David Autumns
Ace Cruncher UK Joined: Nov 16, 2004 Post Count: 11062 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
It was a year ago today when I received that thrashing in thread 847
----------------------------------------So the good book's right there is nothing new under the sun. ![]() Dave ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Sorry David, I know you're trying to bring good news so I'll say this.... very...... quietly.........
----------------------------------------If I remember my processor architecture courses correctly, one flop (or floating point operation) is performed over several clock cycles of the CPU, which are taken up by fetching and moving the data through various logic gates in sequence to perform the operation. If an Athlon 3200+ runs at 1.8 GHz (roughly, can't be bothered to look it up) that means the CPU clock cycles 1.8 billion times per second. Let's say the average flop takes up three clock cycles. That would mean that the processor performs flops at a rate of 0.6 GFlops/s. There's absolutely no way that the CPU could perform more flops than the number of clock cycles in one second... unless it has more than one core. Even RISC (Reduced Instruction Set) CPUs can only perform up to a theoretical limit of one flop per clock cycle. Shhhhhhh..... hope that was quiet enough.... ![]() By the way, those courses I mentioned? I took them years ago so I may be completely wrong! [Edit 1 times, last edit by Former Member at Dec 6, 2005 2:01:44 AM] |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
The rosetta at home grid has 39,729 computers running at 12.5 teraflops.
This is an interesting number because there is no redunancy on workunits. Everyone does there workunit then that becomes the result and the workunit is closed out. From what I can see of WCG there is I belive a redunancy of 3 and maybe as much as six. I have seen both figures tossed around the forums. An educated guess??? I belive we have an effective teraflops (taking redunancy into account) of probably the same as rosetta at home 12.5 teraflops. have a nice day everyone...........maybe someone can give some reasons for the high redunancy rates on this Grid?..............is the average member so keen on points that they would overclock to the point of screwing up there calculations? with six projects eventually coming into the fold maybe the redunancy will have to go down in order to process data in a reasonable amount of time? Just some thoughts to stir theings up a little...got to keep things lively.... Cheers |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hello bruce boytler,
We are using the quorum of 4 method, the strictest validation method preprogrammed in BOINC. The number of copies sent out is variable, but Viktors has posted that we normally start by sending out 5 copies to try to assemble the quorum. Rosetta@home does not need any redundancy because they are working on algorithmic development. Anything important (a lowest energy point, for example) they rerun on their own computers to validate. |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
It may be a little off topic, but:
I've read somewhere, that ussually WSG sends one Work Unit to 5 devices, and waits for calculations. If users statistics are known, WCG knows which user send back calculations regularly. So it can send one Work Unit to two certauin users (for comparision of results), and wait for example, 48 hours, for callculations (if user don't send it back, WCG can send this work unit to another users then). If WCG works that way, I think productivity of the grid would be better than 20%. May I be wright? May the grid works that way? |
||
|
David Autumns
Ace Cruncher UK Joined: Nov 16, 2004 Post Count: 11062 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Hi All
----------------------------------------Take a look at this http://www.overclock.net/faqs/18840-detailed-...intel-amd-processors.html The Athlon XP can do up to 9 operations per clock cycle (Not all Floating Point) and has 3 Floating Point Crunchers on board. Intel's have only 2 That's how a 2.2Ghz Athlon is equivalent to a 3.2Ghz P4 I'm not making up the MFLOPS story on the 3200+ my 3.447GFlops is a little on the low side as at the time I could only reach a 386Mhz FSB now I'm running a 400Mhz FSB This is how it benchmarks ![]() ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
Hi all,
----------------------------------------I'll weigh in with some more information. The basic FPU unit for handling 64-bit floating point (actually, 80-bit internally with some underflow protection) has sped up tremendously the last 4 or 5 years. In point of fact, both AMD and Intel can deliver short bursts with better than 0.9 flops per cycle. But this is very unrealistic. The LINPACK benchmarks are what people rely on, and they show that the true sustained speed is much slower. Even so, in some ways a current CPU is the equivalent of a supercomputer in the early 1970s. The streaming instructions add much more complexity. In particular, it is possible to stream 32-bit floating point instructions, with 2 32-bit floats per 64-bit word. Finally, there are some very fast 12-bit streaming instructions that use a table lookup rather than any sort of ALU. These are not true floating point instructions, but are intended for screen graphics calculations. 4096 * 4096 allows calculating the pixel placement for a 2048 * 2048 display without introducing moire patterns (by Nyquist's theorem). If 32-bit floating point is good enough, it can make sense to use these instructions, as long as you are aware that: 1) Not all CPUs will support all these streaming instructions. 2) Therefore, some results will return with slightly different values computed using 64-bit double floats. Note that the standard LINPACK benchmarks are using the 64-bit FPU. Just tossing in my 2 cents. mycrofth [Edit 1 times, last edit by Former Member at Dec 7, 2005 10:12:40 AM] |
||
|
David Autumns
Ace Cruncher UK Joined: Nov 16, 2004 Post Count: 11062 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
So my 3.473GFlops as measured by a whetstone benchmark which is designed to test the floating point maths processing of a CPU is a complete nonsense? As the most it could possibly be is 1.98GFlops
----------------------------------------This test is not testing any of the SSE 2 or 3 optimisations as they don't exist on the XP Barton so it's just what in the good old days would be the maths co-processor. Now we all know that a 3.2Ghz Intel is as fast as a 2.2Ghz Athlon XP hence it's 3200+ moniker. So if it is not possible to do more than 0.9 Flops per clock cycle the question is How come a 3.2Ghz Intel isn't over 45% faster at number crunching as it's 2.2Ghz clocking AMD buddy ![]() ![]() Dave ![]() |
||
|
David Autumns
Ace Cruncher UK Joined: Nov 16, 2004 Post Count: 11062 Status: Offline Project Badges: ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
---------------------------------------- ![]() |
||
|
Former Member
Cruncher Joined: May 22, 2018 Post Count: 0 Status: Offline |
An interesting read. I had forgotten that the AMD architecture can simultaneously do a floating Add and a Multiply, unlike the Intel which can do one or the other. This means that the AMD will come closer to the maximum bandwidth allowed by its clock rate on a sustained basis than the Intel chip will. So an AMD chip can match an Intel chip with a higher clock rate. Of course, the key is sustained speed, rather than burst speed. Either manufacturer can choose the right burst timing for its architecture and seem to beat the other manufacturer.
The AMD White Paper manages to avoid discussing sustained speed and concentrates on how much can be accomplished in a single clock cycle. Even a 64-bit processor will have bandwidth problems running two 64-bit adds and multiplies per clock cycle. [setup time, 2 operands per operation? Arrgh! But that one clock cycle is very impressive at the far end of the pipeline.] mycrofth |
||
|
|
![]() |