Adding more processing cores is becoming the main way of boosting the performance of server and PC chips, but the benefits will be greatly diminished if the industry can't overcome certain hardware and programming challenges, said participants at the recent Multicore Expo in Santa Clara, California.
Most software today is still written for single-core chips and will need to be rewritten or updated to take advantage of the increasing number of cores that Intel, Sun Microsystems and other chip makers are adding to their products, said Linley Gwennap, president and principal analyst at The Linley Group.
Off-the-shelf applications will often run faster on CPUs with up to four processor cores, but beyond that, performance levels off and may even deteriorate as more cores are added, Gwennap said. A recent report from Gartner also highlights this problem.
Chip makers and system builders have begun efforts to educate developers and provide them with better tools for multicore programming. A year ago, Intel and Microsoft said they would invest US$20 million to open two research centres at US universities devoted to tackling the problem. The lack of multicore programming tools for mainstream developers is perhaps the biggest challenge the industry faces today, Gwennap said.
Writing applications in a way that lets different parts of a computing task - such as solving a maths problem or rendering an image - be divided up and executed simultaneously across multiple cores is not new. But this model, often called parallel computing, has been limited so far mainly to specialised, high-performance computing environments.
However, in recent years, Intel and AMD have been adding cores as a more power-efficient way to boost chip performance, a marked change from their traditional practice of increasing clock speed. Intel is building eight cores into its upcoming Nehalem-EX chips, and AMD is designing 12-core chips for servers. They are also adding multi-threading capabilities, which allow each of the cores to work on multiple lines of code at the same time.
That means mainstream applications have to be written in a different way to take advantage of the additional cores available. The work is complex and creates the potential for new types for software bugs. One of the most common is "race conditions", where the output of a calculation depends on the various elements of a task being completed in a certain order. If they are not, errors can result.
A few parallel programming tools are available, such as Intel's Parallel Studio for C and C++. Other vendors in the space are Codeplay, Polycore Software and Clik Arts. There is also a new C-based parallel programming model called OpenCL, being developed by The Khronos Group and backed by Apple, Intel, AMD, Nvidia and others.
But many of the tools available are still works in progress, participants at the Multicore Expo said. Software compilers need to be able to identify code that can be parallelised, and then do the job of parallelising it without manual intervention from programmers, said Shay Gal-on, director of software engineering at EEMBC, a non-profit organisation that develops benchmarks for embedded chips.
Despite the lack of tools, some software vendors have found it relatively easy to create parallel code for simple computing jobs, such as image and video processing, Gwennapp said. For example, Adobe has rewritten Photoshop in a way that can assign duties like magnification and image filtering to specific x86 cores, improving performance by three to four times, he said.
"If you are doing video or graphics, you can take different sets of pixels and assign them to different CPUs. You can get a lot of parallelism that way," he said. But for more complex tasks, it is difficult to find a single approach for identifying a sequence of computations that can be parallelised and then dividing them up.
While the programming side may present the biggest challenge, there are also hardware changes that need to be made, to overcome issues such as memory latency and slow bus speeds. "As you add more and more CPUs on the chip, you need the memory bandwidth to back it up," Gwennap said.