FRAMINGHAM (10/02/2003) - John Sawyer manages data centers for corporate clients every day. His company, Johnson Controls Inc., has plenty of experience in data center design and management. Nonetheless, the Milwaukee-based company's carefully designed and planned data center recently experienced overheating problems after installing blade servers.
Many data center managers are just beginning to contemplate large-scale deployments with multiple racks of ultracompact blade servers. These new systems take up far less space than traditional rack-mounted servers, but they dramatically increase heat density. Throwing multiple racks of them into a data center can result in problems ranging from outright failures to unexplained slowdowns and shortened equipment life.
"Today, because of the way the air handlers are configured, we can't handle more than 2 kilowatts per rack," says Sawyer, head of critical facility management services at Johnson Controls. Sawyer says new air-handling equipment can boost that figure into the 3-to-4-kw range. But new blade servers could consume 15 kw per hour or more when fully loaded. That equates to more British thermal units per square foot than a typical household oven and requires a cooling capacity sufficient to air-condition two homes, facilities engineers say. So Sawyer can spread out the racks or partially fill each one to reduce overall wattage per square foot, or he can add localized, spot-cooling systems.
Although most data centers don't have many high-density racks today, data center managers are beginning to replace server racks with more compact designs, some of which accommodate more than 300 servers in a single 42U rack. (1U is 1.75 in.) "You can see a train wreck coming," says Kenneth Brill, executive director at The Uptime Institute Inc. in Santa Fe, N.M.
And while vendors say their systems are designed to run efficiently in fully loaded racks, they don't necessarily take into account the broader impact that large numbers of such racks will have on the rest of the data center.
"We can deal with one or two of these things, but we don't know how to deal with lots of them," says Brill.
The problem is compounded by two facts: Every data center is designed differently, and the industry has yet to agree on a standard for designing data center cooling systems that can handle 15 to 20 kw per rack.
The current guidelines from the American Society of Heating, Refrigerating and Air-Conditioning Engineers Inc. (ASHRAE) are outdated, says Edward Koplin, president of Jack Dale Associates PC, an engineering consulting firm in Baltimore. "Design engineers are using standards from the days of punch cards and water-cooled mainframes," he says. Atlanta-based ASHRAE is working hard on new thermal guidelines, says Don Beaty, chair of the group's High-density Electronic Equipment Facility Cooling Committee. He expects a published standard by year's end.
But rack cooling is a big concern right now for Ron Richardson, staff manager in the IT operations center at Qualcomm Inc. in San Diego. The company's chip division wants to upgrade a large engineering compute farm to make the best use of its design software. "If our business unit had its way, they would want 100 racks full of (blade) servers, but I don't think we'll be able to do that. Fifteen kilowatts is incredible. You have to do extraordinary things to cool that unless you want to burn them up," he says. And the idea of leaving racks partially filled is a nonstarter. "Who wants to put servers in (part of) the rack and leave the rest empty?" he says.
Some heat-related problems come from inadequately designed environments, say engineers. "We find that people consistently have trouble cooling 2- or 3-kw racks. It's common to find a 10-degree difference between the floor and the top," says Pitt Turner, principal at ComputerSite Engineering Inc., also in Santa Fe, and a consultant at The Uptime Institute. The problem is usually airflow. "Normally, we find excess (cooling) capacity is installed in the floor, and it's poorly used," he says.
Avoiding a Blowout
When rack-top temperatures exceed 75 degrees Fahrenheit, heat-related problems, such as failures or shortened equipment life, may begin to crop up. In addition, many high-end blade processors are designed to reduce clock speed as temperatures rise. This protects components, but administrators who aren't monitoring air-intake temperature at the top of the racks might misinterpret the cause and add more blades to try to increase performance, adding fuel to the not-so-proverbial fire, Brill says.
Richardson decided to design a new data center to accommodate blade servers. "We believe we can do 10 kw per rack," he says. The plan uses a hot aisle/cold aisle design that involves placing rows of racks so that racks face one another. Chilled air from the cold-aisle floor passes into each rack and out the back into hot-air aisles, where it's removed and cooled again.
Going beyond that would require adding spot cooling devices to individual racks. Liebert Corp., a Columbus, Ohio-based division of Emerson Network Power Systems Inc., offers devices designed to pull hot air out of racks and cool it quickly, including ceiling-mounted air-conditioning units and bolt-on exhaust fans that suck hot air directly off the back of racks. The problem with that approach is scalability, says Brian Benson, senior mechanical engineer at consulting firm Mazzetti & Associates Inc. in San Francisco. "If you put 500 of those racks in, your scalability and the maintenance (requirement) become incredible," he says.
Consultants disagree on how many kilowatts can be cooled effectively with a traditional, raised-floor computer room air-conditioning system. Most say 4 to 5 kw is the upper limit, but the real-world answer depends on each data center's design.
Each kilowatt of load requires passing 140 cubic feet per minute of air through the rack for proper cooling, says Fred Stack, a vice president at Liebert. "In tomorrow's rack, you're looking at (more than) 1,000 cubic feet of air per minute," he says.
That's solvable, says Brill, but there's no margin for error. "You absolutely have to do about a dozen things correctly, and most sites aren't going to do that," he says. That includes adequate under-floor space that's sealed and clear of cabling, pipes and other obstructions as well as racks that are meticulously sealed to control airflow.
Buying extra cooling capacity can be just as disastrous as an undersized system if it pushes the air under the floors too quickly, says Neil Rasmussen, chief technology officer at American Power Conversion Corp. in Kingston, R.I. "The fact that it's flowing quickly under the floor actually causes a Venturi effect that sucks air down into the floor instead of pushing it up into the cold aisle," he says. Air from the hot aisle then flows over the top of the racks and recirculates through the equipment, overheating it.
Bob Sullivan, a consultant at ComputerSite Engineering, says he has cooled 7-kw racks. But, he says, "it requires so much air coming out from under the floor that you're limited to cooling about six racks with a large cooling unit." Overall load across the data center still needs to stay at under 100 watts per square foot, he says. "Beyond that, you're losing more space to cooling than you're saving by compressing the racks," he says.
Meanwhile, blade vendors Hewlett-Packard Co., Sun Microsystems Inc. and IBM say that blades will get smaller and more powerful. IBM, which has been down this road before with mainframes, says cooling may be going back to the future. "We don't want to move to water cooling, but it feels like it's inevitable. By the end of next year, we'll be on the brink of that," says Jeff Benck, vice president of IBM's eServer BladeCenter. And by 2005, Benck says, external water systems that cool at the individual blade or blade chassis level will be required equipment.
HP's current designs use blade-mounted fans, and the company says it should be able to cool individual racks at up to 18 kw. However, Sally Stevens, director of marketing for blades at HP, doesn't recommend deploying blades in fully loaded racks.
Sun owns the Sparc CPU used in its blades and therefore has more control over the design, but Chief Technologist Subodh Bapat says the company is also looking toward liquid cooling.
As server density increases, Johnson Controls' Sawyer worries that more complex cooling designs will increase costs and create ongoing maintenance headaches. "We need an alternative that's going to be feasible and at the same time not elevate risk," he says.
Qualcomm's Richardson says he has discussed the issue with several vendors. "Every one has some little tidbit that you can learn, but nobody seems to have every answer."
Hot Tips For Keeping Racks Cool
Consider spreading out hot racks to reduce average heat density per square foot.
Map the vertical profile of intake air temperatures in the computer room at least once a quarter. Keep temperatures at or below 75 degrees.
Use a hot aisle/cold aisle configuration and follow best practices.
Install internal blanking panels to fill openings within the face of the rack or cabinet to prevent internal back-to-front hot-air recirculation.
Keep bypass airflow under raised floors at 10 percent or below by sealing raised floor openings.
Install the proper quantity of perforated tiles in the cold aisle. Use 40 percent or 60 percent open grates for densities above 3 kw per cabinet.
Remove turning vanes on cooling-unit discharge air ducts. These increase vertical cold-air velocity and starve equipment closest to the cooling unit.
Raise cooling unit return-air set points to 72 degrees to reduce dehumidification and increase available cooling capacity. (Do only after bypass airflow is at or below 20 percent).
Consider relocating cooling units if hot spots persist.
Keep that old mainframe water-cooling system. Chilled water will be coming back within 12 to 24 months for high-end blade-server products.
Source: The Uptime Institute Inc., Santa Fe, N.M.