SAN FRANCISCO (09/26/2003) - Until now, most high-performance computing clusters were hand-built affairs found mainly in large universities, lovingly designed by enthusiastic professors and tended by sleep-deprived graduate students. But now Dell Inc. -- which was appropriately enough started in a dorm room -- has begun to commercialize the HPC (high-performance computing) market, offering pre-configured supercomputer clusters that combine everything from the servers to racks to software integration.
Thus, now you can buy an eight-node 16-processor HPC cluster off the shelf from Dell, for only US$130,000 plus shipping and applicable sales tax but you'll have to phone in the order, because Dell doesn't yet offer HPC clusters from its Web site. Cables are included, but on-site "rack and stack" service costs extra.
As I found during a hands-on review of an eight-node, 16-processor cluster at Dell's Palmer campus in Austin, Texas, the individual servers (Dell's new Itanium 2-based PowerEdge 3250), are nothing exceptional; they're little more than dual Intel processors and an Intel motherboard slapped into a Dell case. However, the company's integration and total clustering solution is excellent.
The core of a Dell supercomputer cluster is an array of identical server nodes, which do the heavy lifting. Dell offers both the Itanium 2-based PowerEdge 3250 server, as well as a dual Xeon-based PowerEdge 1750 server. Most real-world clusters would have far more nodes than I tested at Dell's lab, by the way; a starting point of 32 or 64 nodes (that is, 64 to 128 processors) would be more realistic for a genuine supercomputer. Dell also provides separate servers for functions such as job scheduling and task distribution, as well as providing access to shared data.
Nearly as important as the servers are the network interconnects that bind the array together. Most Dell high-performance clusters contain two parallel networks: Gigabit Ethernet, for linking the individual nodes with the management servers and shared storage, and a proprietary low-latency, low-overhead network such as Myrinet for interprocess communication between the nodes.
On the software side, the server foundation is Red Hat's Advanced Server 2.1 version of Linux. (Dell also offers Windows-based HPC clusters but only on an as-requested basis, according to the company.) On top of Linux run two products from MPI Software Technology: the MPI/Pro cluster software, which provides the protocol stack for MPI (Message Passing Interface), the messaging middleware between the nodes; and Felix, a management tool that pushes software images from a management server out to the computing nodes. Felix also runs remote Linux commands in parallel across the cluster. Also vital is Ganglia, an open source, HPC, distributed monitoring application, plus a gaggle of other tools and utilities.
A Single Node
Before working on the cluster, I examined one of the PowerEdge 3250 servers. Dell has carefully marketed this device, its second 64-bit server, as being designed for scientific computing and HPC clusters; in fact, the company refused to send InfoWorld a single stand-alone server for review, insisting that it could only be properly evaluated within an HPC environment. (The previous model, the PowerEdge 7150, used the original Itanium chip and was introduced in May 2001.)
The specifications on this 2U-high (3.5-inch) server are impressive. The 1.3GHz Itanium 2 processor has 3MB of L3 cache, which reduces the need for memory I/O and thereby speeds performance. Dell also offers the PowerEdge 3250 with 1.4GHz and 1.5GHz processors, with 4MB and 6MB of cache, respectively. In early September, Dell also introduced lower-cost versions that use Intel's new 1GHz and 1.4GHz Itanium 2 processors with 1.5MB of cache.
Also in my test server was a 400MHz front-side bus, an on-board Ultra320 SCSI RAID controller, dual Gigabit Ethernet NICs (network interface cards), and 4GB of RAM. For storage, the system had one 18GB drive; there are two drive bays available. There are also dual hot-swap power supplies and an onboard management processor to allow remote diagnostics via modem or Ethernet connector -- even if the server is powered down. The biggest weaknesses: The three PCI-X expansion slots aren't hot-swap, and there are only two drive bays. Most 2U servers sport either four or six bays. Those limitations aren't a problem in an HPC cluster, but would be significant issues if the PowerEdge 3250 were deployed as a general enterprise server.
Dell includes firmware- and software-based diagnostic tools with the server, some of which are from Intel, and others of which are from Dell. The Dell offerings are disappointing and are far more primitive than the management tools for Dell's 32-bit Xeon servers. The reason given: The limited demand for 64-bit systems hasn't allowed Dell to focus development resources on optimizing its server management tools.
What's striking about the PowerEdge 3250 server is how few of the components actually come from Dell. It's an Intel processor and Intel chipset on an Intel motherboard, with Intel network controllers and management hardware. Dell's primary contribution to the server appears to be the cabinet, power supply, and RAID controller, plus its minimalist management tools. Simply put, it's little more than Intel parts stuffed inside a Dell box. There's not much more to say than that -- until we get to the cluster; see the "How I Tested" sidebar to this article at infoworld.com.
Running the Cluster
My attention shifted to the cluster itself. Its hardware was already built before our testing; all we had to do was install the software and begin the calculations that comprised my testing. I performed these tasks using GUI and command-line utilities installed on the management server, including Felix and Ganglia. These applications are intimidating at first, but they are well-documented and no more complex than other common Linux utilities.
During the testing, I deployed and executed several applications on the eight-node cluster, including Fluent, a commercial computational fluid dynamics package, as well as a proprietary in-house app used in the oil-and-gas industry, which I can't name for this article. This deployment demonstrated the completeness of Dell's integration, as well as the ease of operation of the HPC cluster.
The critical test was running the Linpack benchmark suite, a standard test used to determine the floating-point performance and efficiency of a server or cluster. The theoretical maximum of an individual 1.3GHz dual-processor PowerEdge 3250 server was 5.2 gigaflops; of the eight-server cluster, 41.6 gigaflops.
The Dell solution, using the Myrinet interconnects and the freeware Goto basic linear algebra subprogram library was impressive: 4.5 gigaflops for one server (86 percent efficiency), and 37 gigaflops for the cluster (89 percent efficiency). Anything more than 75 percent is considered excellent for high-performance clusters. (According to Dell engineers, the extra cache within the Itanium 2 processor helps it achieve those numbers; clusters built on the Intel's Xeon processor are typically around 60 percent efficient -- and of course, can't natively handle 64-bit math in hardware.)
Dell has demonstrated innovation that, frankly, surprised this reviewer. By creating a packaged HPC bundle, based on its rather ho-hum Itanium 2-based PowerEdge 3250, the company has broken out of its traditional mass-market volume-based role. Dell has successfully applied the direct model to supercomputing, creating a system that's designed for rocket scientists, but doesn't require a rocket scientist to design, assemble, and use it.