Parallelism: where the x86 hits the wall

Each CPU core should be an execution unit, says Tom Yager

Your desktop computer is fast. It’s faster than you can type, faster than you can browse, and unlike you, it can do many things at once. Sure, you multitask. You can be on a conference call with your boss while you’re buffing your nails, but when you’re asked a hard question, what happens? You stop buffing your nails until you come up with the answer. Humans are not wired for parallel execution.

At the lowest level, computers are — or they were. The term “superscalar” was coined to describe the CPU’s ability to do at least two things in parallel. An x86, for example, can load and store memory; perform integer operations such as add, compare and subroutine call; and do floating-point math, all at the same time. It predicts what your software will do next and optimises your code inside the CPU so that it can keep all of its internal execution and load/store units constantly busy.

That science is more than enough for your desk or your lap. In fact, it’s overkill; humans may seem impatient, but we’re willing to wait between tens of milliseconds and several seconds for answers from our computers. Interactive use of a desktop computer is a best-case scenario that plays to the strengths of superscalar designs, large on-chip caches and ultra-fast buses.

But the weaknesses of the x86 approach to superscalar operation are starting to show. Professional workstation and server buyers who look to x86 systems to replace RISC machines have high expectations that include true parallel operation. In science and technology, creative professions and software development, for example, high-end client systems should be able to parallelise their way through heavy-lifting tasks while leaving enough power for real-time foreground interaction. Likewise, buyers at the high end expect to be able to mix compute-intensive and I/O-intensive server applications, along with multiple virtual machines, without sacrificing smooth and balanced operation of all tasks. When these buyers double the number of server CPUs, they expect a server’s total performance to rise on a near-linear scale.

If RISC users came to PCs with those expectations, they’d walk away disappointed. While modern x86 server and workstation CPUs are outfitted for parallelisation at the core level, PCs’ intra-CPU communication, processor support components, memory, peripherals, the host operating system, the VMM (virtual machine monitor), the guest operating system, device drivers and applications spin a web of interdependencies. That web requires, at times, that execution or I/O follow a specific path, even if sticking to that path calls for cyclically standing still. The result: You buy more high-end x86 systems than you should have to.

Although I am impressed by multicore x86 efforts, I wish Intel and AMD would put as much sweat into holistic platforms that take architecture up a notch and make each CPU core an execution unit. It’s a target that, to this point, only proprietary systems can hit. x86 chipmakers face a challenge that IBM and Sun do not, namely zero control over software and hardware. An x86 CPU and its surrounding architecture must be ready to run system software (OS, drivers, and VMMs) coded for the least capable platform and every peripheral on the market.

There is a fix. The very technology that breaks x86’s parallelism will empower total system designs built for parallel execution: virtualisation. The catch is that it needs to be done in hardware, and for that, all my money’s on AMD.

Join the newsletter!

Error: Please check your email address.

Tags technologyparallelismx86

Show Comments