Wednesday, 4 September 2013

Efficient Parallel Stencil Convolution in Haskell

Until recently, all research on parallel Haskell programs neglected CPU caches and existing high-performance software solutions like BLAS, LAPACK, FFTW and OpenCV. We criticized the paper Regular, shape-polymorphic, parallel arrays in Haskell for comparing parallel Haskell programs only with poorly-written C code, particularly because this buries the real performance differences in unnecessary cache misses.

In 2011, Ben Lippmeier, Gabriele Keller and Simon Peyton-Jones published a paper entitled Efficient Parallel Stencil Convolution in Haskell that is the first to address caches and locality in the context of parallel Haskell programs and to compare parallel Haskell programs to existing high-performance solutions. The paper discusses an algorithm for edge detection in computer vision that was 10x slower using the old REPA approach and walks through the design and implementation of a more advanced solution using new approaches that take CPU caches into account and, ultimately, attains performance comparable to the high-performance OpenCV library. We highly recommend this paper for anyone interested in the subject. Hopefully future work will progress to more advanced topics like Frigo’s trapezoidal space-time decomposition.

This appears to have marked a turning point in Haskell research. Several other contributors are now discussing locality and cache oblivious algorithms written declaratively in a purely functional style. Best of luck to them!

No comments: