What are you personally most proud of with the launch of Woodcrest?
The Core microarchitecture that’s at the heart of Woodcrest is not the performance king like the 486 was in its day [and] it’s not the platform king like the Pentium Pro was in its day — this is the energy-efficient king. It really is this incredibly well-tuned machine of trade-offs of power and performance.
What could you have done better with Woodcrest?
The FB-DIMMs’ [fully buffered dual inline memory module] power was over budget, and that was disappointing. So this tremendously good processor is making up for a bit of weakness in the power of the [FB-DIMMs’] subsystem. We’ll get it fixed in subsequent revisions ... but that was disappointing this time around that we didn’t do a bit better job there.
What makes you so sure you’ll gain back the market share lost to AMD over the past couple of years?
Since the beginning of the year, we have been aggressively seeding the platform with customers. We have 3,000 of these things out in the marketplace today, and the responses from OEMs, ISVs, SIs and end-users has been nothing but spectacular ... Fundamentally, I think there’s pent-up demand [and] we expect to see a very rapid product ramp-up as a result.
You’ve said in the past that AMD’s integrated memory controller is over-hyped, yet you’ve also said you plan to add an integrated memory controller to your own future chips. Can you clarify that?
We’ve never said the integrated memory controller is bad, but it is severely over-hyped today ... Our cache is twice as effective as theirs, so that means I go to memory half as often. So, independent of anything else, if I’m going to memory half as much, who cares how long it takes to get to memory? Plus, the other aspect of their design is they have this view of local memory and remote memory. So, if you’re running an operating system, half the time you are local and half the time you are remote. Guess what, when you are remote I have to go here and then go here. The time to get over there is actually equal to the time for us to get to our memory.
So are you still planning to add the integrated memory controller to future chips?
Eventually, we’re looking at that as an engineering trade-off, and we’ll probably make that part of our product line in the future. Why? Because we can. Not that it’s bad, but it’s not some big deal, or big architectural delta, it’s simply an engineering trade-off that, particularly as the cache continues to get larger and larger, it’s probably a good thing to add to it as it doesn’t hurt.
Could you talk a little bit about the overall multicore processing movement and some of the hurdles you are facing as you get to more and more cores?
For the near-term, at 90nm, we were mostly single core and a little bit of dual core. At 65nm, we’re almost all dual core and a little bit of quad core. At 45nm, mostly quad core and a little bit of octagonal core. It just follows Moore’s Law. That’s what we’re expecting on the immediate horizon. The problem is, as you keep going to the higher and higher core counts, you need more and more things operating in parallel. Since the beginning of computing, people have been trying to solve the parallel programming problem.
Today, most multicore machines are actually running multi-tasking, where there is not a lot of multithreads inside of it, but with each task there is a little bit of threading going on. ... If you follow the progression I described, if I’m here in 2012, my desktop is going to have 16 cores on it. It’s this huge programming challenge, and that will be the big barrier for us to fully realise the benefits of multicore designs as we go forward. It’s not a solved problem by any means. There are some promising areas for breakthroughs.
One of them we would call the area of domain-specific programming, where solving the general purpose parallel program is really hard. But if you think about them in certain domains, you can make some big breakthroughs. There are also some characteristics of certain problems that look to be what we call embarrassingly parallel. One example of that might be an application area called ray tracing. Instead of actually trying to render gross images of lights onto rendered displays, which is typically done in different polygon shading today, what they’ll do is model every photon of light and all of its reflections as it bounces around and each photon then becomes a thread of execution. ... In those cases, we’ve seen parallelism up to 100 or 200 threads of parallel, still resulting in very, very high degrees of performance improvement.
We envision this world where it becomes impossible to tell the difference between what’s been rendered and what’s been real ... some of it, you might think so my kids will like games better, but other examples are very interesting, maybe we’ll model the real physics of a tumour and see it really grow and see what characteristics it would take as it touches different tissue types.
Could you talk about Itanium and how it fits into your server chip plans, with respect to your Xeon chips?
If you look at that marketplace segment today, you have four big players: Sparc, Power, PA and Itanium. All of PA is going to Itanium. You have Sparc, for which you have to begun to see the sunset — McNealy’s resignation was more than symbolic. When you look at it today, the revenue from Itanium system as we ended last year was approximately half the size of Power and Sparc, respectively. It’s clearly emerged as the third player today. Our goal is to make it the second player and, eventually, to make it the [main] player.
We think the characteristics that we’re building into Itanium for memory size, RAS, error detection and correction and non-stop capabilities really make it a serious long-term player. Right now, we are in production today on ‘Montecito’, the next generation of products, and we’ll see the announcements of those from system vendors next quarter.
Do you see Xeon sales cannibalising Itanium sales?
There are areas that it’s contested ... examples of some of the battle zones might be in a high-performance computing installation. Neither one is wrong, but for different kinds of applications the Itanium example will win hands down. For the more parallel, [as] you want to run them in a more distributed fashion, the Xeon will have a better price performance characteristic for it. Those are just some places we’re seeing different approaches played out in the marketplace, but for a lot of the very high-end stuff — the banking systems, big transactions, big ERP systems — that seems to be a pretty stable mainframe-type marketplace, despite my proclamations of its death 15 years ago.