China's NSCS [National Supercomputing Center in Shenzhen] is enthusiastically adopting Dawning's TC3600 blade servers, equipped with Intel's six-core X5650 processors and Nvidia's C2050 GPUs. The exact configuration of the Nebulae machine at NSCS was not available at press time, but the TC3600 blade server is a 10U chassis that holds ten two-socket blades. The C2050s are PCI-Express GPU co-processors with 448 cores and 3 GB of their own GDDR5 memory, rated at 515 gigaflops doing double-precision floating point math and 1.03 teraflops doing single-precision.
The Top 500 ranking for Nebulae does not provide blade or GPU count, but the word on the street is that it has 4,700 nodes. What the Top 500 does say the machine has 120,640 cores in total for a peak theoretical performance of 2.98 petaflops and 1.27 petaflops sustained running the Linpack test. All of the nodes in the Dawning blade cluster are linked by quad data rate (40 Gb/sec) InfiniBand switches.
The first thing to notice about the Jaguar and Nebulae supers is the difference between peak and sustained performance. For the Cray Jaguar Opteron cluster, 75.5 per cent of the flops contained in the box end up doing real Linpack work, while on the Dawning Xeon-Tesla hybrid, only 42.6 per cent of the peak performance embodied in the CPUs and GPUs actually push Linpack math. So it would seem that the all-X64 machine has the edge, right? Wrong. Jaguar cost around $200m to build and burns around 7 megawatts of juice, while the Nebulae machine probably costs on the order of $50m (that's an El Reg estimate) and burns only 2.55 megawatts of juice.
More . . .