I should be writing about AMD's Quad-Core Opteron, which was formally launched on September 10, but I feel the need to take a brief detour into a point-by-point contrast between AMD’s latest offering and Intel’s new quad-core Xeon MP 7300-series CPU. The MP designates the CPU for use in four-socket servers, which brings up the first difference between Opteron and Xeon MP: Opteron scales up to eight sockets.
Intel certainly grabs your attention with its boast that quad-core Xeon 7300 performs at a nice, round 2X the speed of “prior generation” Xeon MP, yet reduces power consumption. The prior generation turns out to be Xeon 7100 MP, a dual-core CPU built with fatter transistors. Quad-core and process shrink brought Intel to the finish line. This muddy messaging doesn’t go over the heads of IT buyers, but X factors do get us press types excited.
It turns out that Quad-Core Opteron is more than two times faster than its dual-core predecessor, and Quad-Core Opteron saves power not through process shrink, but by turning off or dimming the lights on walkways and in individual rooms that aren’t being used.
The walkway is the memory bus on each CPU, and each CPU is a suite with many rooms: four cores, the floating-point unit within each core, the top 64 bits of a floating-point unit within a core, and so on. Quad-Core Opteron checks which rooms are occupied roughly two billion times per second without any help from the OS.
Intel’s new DHSI (Dedicated High Speed Interconnect) shifts the bottleneck in the connection between the CPUs and the north bridge (the external memory and I/O controller shared by all four CPUs) to the inside of the north bridge chip; picture four hoses feeding into a $10 lawn sprinkler. Intel’s assignation for this topology is point-to-point.
AMD’s Direct Connect offers a different take on point-to-point. Print out a picture of any eight-socket Opteron motherboard. Now draw a line between any two sockets, then from any socket to any of four banks of memory adjacent to each socket, and repeat this exercise until your hand gets tired. All of those lines can carry on simultaneous conversations because Opteron has no north bridge that forces the least pleasant variety of convergence.
How can I make you care about the difference between Intel and AMD bus architectures? It really depends on your workload. If you run four or fewer processes of 8MB or less on your Xeon MP server, then you’re good to go. Otherwise, Opteron will prove more scalable.
Intel has raised the 7300 chipset’s maximum memory capacity to 256GB: a quarter terabyte, 16 cores, and one chip handling all the transfers.
Finally, with regard to the virtualisation enhancements in quad-core Xeon 7300, Intel’s contribution is external to the CPU. A chipset facility, VT-d (Virtualisation Technology for Directed I/O) transparently routes DMA (direct memory access) traffic between peripherals, such as disk and network controllers, and virtual machines. VT-d lets I/O bypass the CPU, OS, and virtualisation software.
AMD has published its spec for an IOMMU (I/O memory management unit) that will serve an identical purpose, but AMD’s IOMMU was not realised in Quad-Core Opteron. Intel scores a legitimate win with VT-d, and I can attest that it is the one enhancement that all virtualisation solution vendors wanted most.
Quad-Core Opteron does implement a DEV (Device Exclusion Vector) facility that blocks access to a peripheral if a virtual machine is not authorised to use it. Seen another way, the DEV could be used to grant safe, exclusive peripheral access to a single VM. That’s short of an IOMMU, but it would relieve some of the software burden of linking devices to virtual machines.
Intel’s VT-d holds the promise of virtual machines that have near-native I/O performance for asynchronous devices, but it has a practical limit to its scalability. If you pack a Xeon 7300 with its maximum 256GB of memory, then carve that into 32 spacious virtual machines, you can’t give each VM its own 10 Gigabit Ethernet card. I can see VT-d consolidating two Oracle servers into a single Xeon server, giving each VM the physical peripherals it had as a discrete server, without a drop in I/O performance.
If you want maximum virtual machine density, or you’re running a server that’s virtualising desktops or performing streaming operations that call for low-latency switching between virtual machines, that’s AMD’s forte.
AMD’s Rapid Virtualisation Indexing restructures a server’s virtual-to-physical memory map with every “world switch” from one virtual machine to another. This is a trick that virtualisation solutions must currently do in software, adding significant latency to the period between virtual machine switches.
Rapid Virtualisation Indexing reduces the process of loading and saving virtual memory maps to one step handled directly by the CPU. All the virtualisation software needs to do is tell Quad-Core Opteron which virtual machine ID is about to take control, and it’s done.