Intel recently published a paper with no less than 12 authors called Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU where they criticize the huge performance discrepancies cited by researchers publishing about General-Purpose GPU (GPGPU) programming in the context of what they call "throughput computing".
We have also noticed bad science in this domain before. We tried to reproduce the incredible results of one paper, with a view to entering this market ourselves, only to discover they had used the reference implementation of LAPACK instead of the vendor tuned implementation for their CPU that was 10× faster. Like Intel, we found that the performance advantage of a GPU was relatively modest (2.5×) given the enormous costs and liabilities of using a GPU for number crunching. However, we are fortunate to be able to simply dismiss fantastical results as irrelevant propaganda but Intel are presumably feeling the pinch as misinformed customers flock to buy nVidia's GPUs to attack problems for which a better Intel CPU would have been more appropriate.
On the other hand, the failure of manycore GPUs to attain competitive performance for all but a handful of tasks raises the question of how much general software stands to gain from the multicore revolution? Is this the end of the line for performance as we know it?
Intel's careful use of the phrase "throughput computing" is significant. Their point is that memory hierarchies are the shared headache. Applications that perform local computations in registers without need of external data can easily be made to scale extremely well. Computing Mandelbrot fractals is one example of this. For example, GPUs should excel at rendering the complex Mandelbulb fractal that was discovered in recent years:
Awesome stuff, but what is it useful for?