The business benefits arising from Moore’s Law, which says the number of transistors on a chip will double about every two years, is being turned on its head by the cost of providing power, cooling and other facility support for servers. Those costs now exceed the price of the computing hardware, according to Ken Brill, founder and executive director of The Uptime Institute. In a recent interview, he talked about those escalating costs and outlined what ICT managers can do to improve datacentre energy efficiency, including the elimination of dead servers and more efficient cooling.
What’s the biggest threat facing datacentres?
The economic breakdown of Moore’s Law.
What do you mean by that?
Historically, facilities costs have been 3% of ICT’s total budget, but the economic breakdown of Moore’s Law means that facilities costs (including power consumption) are going to be climbing to 5%, 10% and higher. And that will change the economics of ICT. The business question becomes: will ICT get more money so the increasing portion of the budget that facilities represents doesn’t crowd out other ICT initiatives? Or will the increasing facilities percentage result in curtailing other things that ICT is doing? That’s the economic truncation of Moore’s Law.
Can you illustrate that?
A company [citing an unnamed client as an example] is going to implement a blade server application, for instance, that requires US$22 million (NZ$33 million) of hardware. The business justification based on $22 million of hardware is that you expense over three years and there is a positive cash flow. What’s missing from the justification is the US$54 million in facility costs [over three years].
ICT was not a US$22 million dollar decision — ICT was a US$76 million dollar decision. Infrastructure upgrades include datacentre build-out, cooling and electric capacity to support the hardware and network. ICT’s an invisible price. Those expenses don’t typically show up in ICT. They show up indirectly or they show up after the fact. When the decision was made to implement the blade servers, the facility people were not at the table.
What’s the business cost?
The business cost is that the return on investment that people think they are going to get is not going to be there.
Is there a way to get business and facilities representatives involved in this?
The application justification process needs to change so ICT includes all the cost. Typically, you are looking at just the ICT cost of the hardware and the cost of running that hardware.
The larger and denser servers aren’t going away, so how do companies change the economics of this?
First, when buying equipment [don’t] look only at performance-per-dollar but look at performance-per-watt. Be sharper on buying. ICT has to become conscious of this energy efficiency and put pressure on the manufacturers to be more energy aware. That’s going to benefit everybody in the long term.
Second, kill dead servers — servers that are still running but not actively doing anything.
Are dead servers really an issue?
From 10% to 30% of the load in a datacentre is represented by servers that aren’t doing anything. By turning off those servers, you can cut your energy consumption. The problem is there is no incentive — there is risk — but no incentive to turn those servers off.
The incentive to turn off unused servers would seem apparent. These costs aren’t linked. Who has to turn the server off?
The datacentre manager. He’s measured on availability; He’s not measured on costs. You discover the 10% to 30% of dead servers whenever you move a datacentre because that’s the only time you have to turn stuff off.
Other things that users can do include consolidating multiple servers onto a bigger platform, which will be more energy efficient. [And] ICT can enable the power saving features that are now built into many new servers.
For instance, a laptop comes set [to] not take advantage of power-saving. If you are not using the laptop you turn off the screen, then you turn off the disk. The chip manufacturers, AMD and Intel, have these features built into their chip set, but the default is off. However, this again involves risks because someone has to make an evaluation that the server/chip will come back up to full speed fast enough to meet the service level agreement for the application. This requires the technical group to evaluate this.
Finally, ICT managers can reduce bloat-wear — software with inefficient code requiring a bigger processor to get through ICT.
And what about cooling?
Most datacentres are consuming from 20% to 40% more energy then they should because the cooling systems are not well optimised. For instance, here is a common issue in a computing room with multiple cooling units: if you go up to the face plate of the cooling unit, you may see that one unit is dehumidifying and the unit immediately adjacent to ICT is humidifying, so you have duelling cooling units.
In terms of cooling, what issues do users have with vendors? Are there standards problems? How mature is the technology?
In 2000, at 500 watts to 1,000 watts per cabinet, you [can] do anything and successfully cool ICT. You could be totally incompetent in your engineering and you could successfully cool ICT.
You may not have done ICT energy efficiently but that was never measured so nobody knew how badly ICT was done.
As the density-per-cabinet increases, the mask is ripped off and a user’s responsibility for [dealing with] the engineering in the computer room becomes apparent.
For computer rooms with raised floors, the institute has promoted hot aisles and cold aisles for over ten years. ICT’s accepted as an optimal solution for up to 3KW to 4KW in a cabinet.
But you go into computer room after computer room and you see that the equipment is lined up facing one direction. As a result, people have hot spots. And if you have hot spots, you go out and buy more air-conditioning.