The rapid expansion of high-performance computing (HPC) installations within government agencies, universities and the private sector is bringing more of the systems under the control of ICT departments in an effort to improve how they’re managed and reduce costs.
But the mainstreaming of HPC technology is causing a culture clash between ICT staffers and the researchers who use supercomputing systems and have typically run them on an independent basis.
For instance, Matthew LeGendre, who develops performance measurement tools on a high-performance server cluster for academic research use at the University of Wisconsin, says management of HPC systems by ICT is an issue of “convenience vs control”.
LeGendre says some of the HPC systems at the university are being supported by its ICT department, but the help comes with strings attached. For instance, if ICT managed the cluster that LeGendre uses, he wouldn’t be able to install new operating systems as he sees fit. “It’s one reason why we haven’t used our IT department to help us [with] support,” he says.
However, Sharan Kalwani, HPC infrastructure manager at General Motors, says supercomputing users and ICT staffers will have to learn to work together. “HPC, now that it has become mainstream, [should] also start acting like it’s mainstream,” he says. He adds that the benefits of adopting ICT processes in HPC environments include improved quality, lower costs “and actually more wide acceptance” of the technology within companies.
Goran Pocina, a technical adviser at a large pharmaceuticals maker, says his company has installed supercomputers at several sites worldwide. The systems are managed locally by groups of researchers that don’t share applications or processes with one another.
“The cost of maintaining this is tremendous,” Pocina says, adding that he thinks his company could improve researcher productivity and cut costs if ICT played a role in managing the supercomputers.
But the problem with putting ICT managers in charge of HPC systems is figuring out how to apply ICT disciplines and measurements “to a research community of users where quality isn’t measured by how stable the environment is but on how quickly it can adapt and change,” Pocina says.
Many of those who build and use supercomputers and HPC clusters live in a different world than mainstream ICT does, says Pocina.
Micah Nerren, a consultant at HPC services provider Mach1 Computing, says he often works as a go-between to bring together ICT managers and HPC groups that lack the management skills needed to run ICT operations and that may not know how best to integrate their machines with business systems. “You have to educate them a bit about how to coexist peacefully ... and educate IT [about] why this is a unique user,” Nerren says.
At the recent SC06 Supercomputing conference in Florida, Kalwani conducted a four-hour tutorial intended to give HPC users an idea of what to expect when working with their ICT departments. He reviewed ICT management basics, such as return on investment, service-level agreements and portfolio management. Kalwani also tried to prepare users in the audience for the cultural changes that can result from working with ICT.ICT managers typically “want the lowest-cost solution, and that’s a battle you find starting from day one,” he says. ICT officials may also have trouble understanding some of the goals of researchers who use HPC systems, he noted. Many ICT managers, “despite the ‘T’ in IT, surprisingly are not technical,” Kalwani warned. “They’re almost bureaucratic.”
Irving Wladawsky-Berger, vice president of technical strategy and innovation at IBM, says that as HPC installations expand further and supercomputing technologies are increasingly used for commercial applications, CIOs will have to learn more about the systems.
“Traditional CIOs need more of the kinds of skills that before were only found in the HPC world,” such as an understanding of the mathematical approaches used in high-performance systems, Wladawsky-Berger says. He believes visualisation capabilities and other functionality used in research settings will increasingly migrate into e-commerce systems and other mainstream applications.
Vendors are also having to adapt to the mainstreaming of HPC. For many users, building high-performance computing systems has been largely a do-it-yourself operation. But now, HPC vendors are paying more attention to delivering out-of-the-box clusters in an effort to encourage wider adoption, especially among new users.
Longtime HPC users said at the SC06 conference that turnkey systems have always been available but that the increasing use of blade servers and other systems that can be easily integrated by vendors is facilitating the out-of-the-box trend.
Sun Microsystems, Silicon Graphics and Linux Networx are among the vendors offering turnkey systems. SGI says it will ship an integrated system with four quad-core Xeon processors in a single chassis in next year’s first quarter. Linux Networx introduced a series of ready-to-run HPC systems tuned for applications such as computational fluid dynamics and crash and impact analysis.
“From a cutting-edge perspective, it’s unclear whether or not any in-the-box solutions will maintain speed with the innovations,” says Terry McLaren, a programme manager for the cyber environments group at the US National Centre for Supercomputing Applications (NCSA). Nonetheless, the NCSA, which is located at the University of Illinois, is evaluating the turnkey systems because of their ease of use, McLaren says.
Roger Smith, a senior systems administrator at Mississippi State University’s High Performance Computing Collaboratory, recently installed a system consisting of 500 Sun Fire x2200 servers equipped with a total of 1,024 Opteron dual-core processors. Smith says the university opted for a prebuilt system developed through Sun’s Customer Ready Systems programme as part of a joint demonstration project.
The system was set up in a single day, Smith says. All that had to be added were some networking hookups that weren’t ready when it was delivered. Smith’s major concern was whether Sun would configure the system exactly as the university wanted it, but he says he visited a Sun facility in Oregon “to assure ourselves that they were going to do a good job”.
Hassan Assiri, director of high-performance computing at Seneca College in Toronto, says he expects that turnkey cluster users will have to pay extra for the systems. But, he adds, that might make economic sense compared with having to deal with multiple vendors or hire new staffers to do an installation.