AMD can't say it, but Istanbul, its six-core, 45-nanometer processor, is ready. (Officially, it's set to be launched in the second half of this year).
Istanbul is a drop-in replacement for the quad-core Shanghai, meaning that existing AMD servers can get a 50% boost in workload capacity per square foot, per BTU, per decibel, per kilowatt. Istanbul is destined to become the immediate darling of high-performance computing clusters, science and technology, research, entertainment, and high-end shops that want an affordable, secure, broadly scalable, future-proof architecture. In the current climate where smaller IT budgets must be allocated with far greater care, wise strategy requires taking cues from shops that impose the highest standards on their equipment. Istanbul scales up, compactly and efficiently, to an astonishing 48 cores in a single chassis. The interconnect logic is built into all Opteron processors, so a 48-core (eight-socket) system design is barely more complex than a 24-core, four-socket server. This has a direct bearing on IT: Fewer parts means lower acquisition cost, faster availability of new systems, reduced likelihood of failure, and much lower costs of parts inventory. What the most demanding server customers want should be a template for IT evaluation criteria. The size of the organisation is irrelevant if the cost scales with the capacity of the server.
In IT, Istanbul is more likely to be seen in two- and four-socket (2P and 4P) rack servers and workstations. I think that the sweet spot for Istanbul servers will be 4P. There, Opteron system design innovation becomes highly relevant. For example, the boosted HyperTransport 3 (HT3) bus will create an enormous amount of headroom even in 8P servers. HyperTransport is every Opteron server's nervous system; all communications among processors and with peripherals travels over the HT bus. Multiple HT bus controllers on each processor create direct connections between processors, vital in a NUMA (non-uniform memory access) architecture where each processor has its own RAM. HT3 multiplies the bandwidth of the on-processor buses, and buses between cores within a processor, so that requests for other processors' memory are fulfilled at very little additional cost. HyperTransport has been implemented consistently, according to published and licensable specifications, since the first Opteron in 2003. HT3 not only boosts speed, but also scalability and power utilisation control. Even with direct connections between processors, AMD saw an opportunity to speed up the architecture and vastly reduce bus traffic. My favorite new Istanbul feature is dubbed HT Assist. Each Istanbul processor has two levels of cache for each core, plus one level of cache shared by all of the cores in a processor. The purpose of cache is to avoid trips to comparatively slow RAM when the same data is requested more than once. Trouble enters the picture on a multiprocessing system where all processors have the freedom to change data anywhere in the system's memory. It's possible that processor A could change data that resides in processor B's cache. Processor B has no way to know about the change, so it ends up reading invalid data from its cache. In Opteron systems, processors work cooperatively to read and ensure the validity of all processors' caches. Because AMD's HyperTransport bus is so fast, this constant cache checking imposes very little overhead. But the cache checking ("probing") traffic rises with the number of cores in the system. An 8P system running six-core Istanbul has to track 48, 96, or 144 cores, depending on how you count cache. Istanbul's solution, HT Assist, uses the recently added Level 3 cache to maintain a map of the data held in all of the cores' caches in a given processor.
I can't estimate the performance implications of HT Assist in systems with two processors. The HyperTransport bus is so fast that HT Assist might go unnoticed in a 2P machine with 16GB of memory. Furthermore, 4P and 8P Istanbul servers with huge amounts of memory and heavy workloads could see dramatic improvements in memory performance, which will be experienced as an overall increase in throughput. In all cases, if HT Assist works, the throughput increase will exceed the expectations set by a 50% increase in the number of cores. I can imagine certain scenarios, such as OS reservation of free memory as disk cache or the creation of RAM disk for acceleration, where HT Assist could make an Istanbul server blast off. Another element that becomes more complex as the number of cores increases is parametric monitoring and control of each core, especially with regard to power control. A new bus, managed by a command engine that runs in microcode, streamlines the reading of commonly requested system information, such as thermal sensors and processor power states, and provides a standardised means of transmitting commands for things like power capping. Istanbul continues the Opteron tradition of processor-managed power conservation, using mechanisms that are much more precise than those in Windows. This is because Windows has only a macro view of system load, while each Opteron processor knows exactly how much work is being done by each core. One of the most important lessons I learned during my meetings with AMD was so simple, yet vital. To make hardware-managed power conservation work, you need to go into Windows' power settings and select "Minimal power management." The same is true of late-model desktops, especially those based on the Phenom II processor, which has quite remarkable power management capabilities. The ultimately green Istanbul server will be built with HE CPUs, which use only 55 watts of power per processor. AMD consistently achieves substantially higher per-clock performance than Intel, so the clock speed drop required to get to 55 watts might not hurt as much as you think. Opterons at a range of power levels were available before IT painted itself green. Now would be a good time to look into them, and remember that you can upgrade or downgrade AMD Opteron CPUs by buying retail boxed processors and installing them yourself. If you can change a lightbulb and brush your own teeth, you can swap Opteron processors. Or pay to have it done. It's a cheap way to turn an eight-core Shanghai server into a twelve-core Istanbul machine, without migrating data, ripping out cables, or undergoing, as they say, the "fork lift upgrade". This is one of the key lessons to take from high performance computing and sci/tech shops. Why buy new servers when you can buy new CPUs?