The University at Odense (in Denmark) just setup a new supercomputing cluster built entirely on commodity hardware.
It is worthwhile to compare this to the supercomputers built with 'rocket-grade' technology, i.e. the fast Crays and the IBM peta-flop machine.
The price performance ratio of the commodity system cannot be beat of course. But The Horseshoe as the commodity system is dubbed is close to the maximum considerable on commodity hardware with 512 PC's in parallel.
By comparison the top crays will be 20 times as fast (servicing 10K PCs would not be fun) and the IBM machine 50 times faster again (servicing 500K PCs is decidedly non-funny) and as to power consumption - the PC solutions is on a level with the Crays (based on reported price of power the consumption for the full cluster is approximately 73KW (1.7 MDKK for three years continued use at 1.2 per KWh comes out at about 73KW) with the IBM solution coming in at much lower consumption rates (2MW per Peta flop i.e. 4KW for 2 TeraFlops). The heat from the PC's will take quite a cooling system. And scaling the cooling system without doing something like the Cray or IBM solution.
So the IBM technology would save 0.5 MDKK per year on power alone.
In this article, a member of the Danish system team declares the supercomputer "dead" - but from the industrial scale environment you would need to invest in to scale the commodity solution, I'm guessing that if you really need top speed, they are far from dead.
Supercomputers are also clustered systems, but the power use and sheer physical size of a large scale clustered system should ensure a place for custom hardware one would guess.
A recent visit to top500.org and links found there gives some nice price and performance quotes for fast systems.
First off, the possibilty of HorseShoe actually reaching 2 TeraFlops on real problems is hypothetical. This performance is the pure processor speed, where memory bandwidth and the price of parallelization is not considered. From the design of the HorseShoe (commodity networking - fast ethernet only). One would guess that you would need an algorithm that works well with very coarse grained parallelism to achieve any performance near the theoretical topspeed. Indeed only the top 23 systems on the worldwide list are rated > 1TFlops at top500. But that test is only a test of LINPACK, so for search problems or simulations it may not be veri significant.
Secondly, pricing: ASCI White had a cost, reportedly, of 110 M$ two years ago. Assuming it could be done at half that rice today we're still way above the price of a commoidty system. On the other hand IBM plans to build Blue Gene (or at least Blue Gene/L) for 100 M$ by 2004. This machine will be rated at 200TFlops, so the price per TFlop will be 0.5 M$. Assuming a drop in price/performance of 2 over the next couple of years also, this is still comparable to the price of a commodity system today. and the x100 scalefactor will be hard to do for the commodity cluster.
So commodity systems can't beat custom systems, even if the processors of almost all the supercomputers are commodity processors.
Cray is developing a 50 GFlop computer (the SV2 mentioned earlier) using a more traditional supercomputing approach. There are some price announcements for Govt. orders for two these systems, but is is unclear whether the reported 19M$ includes a full-sized system.Posted by Claus at July 22, 2002 10:49 AM