Haskell has a history of making easy problems difficult. Perhaps the most infamous example was the Sieve of Eratosthenes, which is easily implemented in any imperative language but was so difficult to write in Haskell that almost all of the solutions that had been taught in universities and used in research for the preceding 18 years had been wrong until Melissa O'Neill published a seminal paper *The Genuine Sieve of Eratosthenes* that gave a beautiful description of what they had been doing wrong and how it should be corrected. Melissa's solution was to use a priority queue to implement a rolling wheel of numbers. The correct solution turned out to be 10× longer than a much simpler F# solution and a whopping 100× longer than the original bastardized algorithm in Haskell.

Today, quicksort is the new Sieve of Eratosthenes. Again, the academics have addressed Haskell's failure by bastardizing the algorithm, trading orders of magnitude in performance for something that Haskell can express easily:

qsort [] = []
qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

This completely fails to capture the essence of the real quicksort algorithm (see Tony Hoare's original 1962 quicksort paper) that makes it so efficient. Specifically, the in-place partitioning using swaps.

Faced with the challenge of writing a parallel generic quicksort in Haskell, Jim Apple (who is doing a PhD on Haskell at UC Davis) kicked of proceedings with the following code:

import Data.HashTable as H
import Data.Array.IO
import Control.Parallel.Strategies
import Control.Monad
import System
exch a i r =
do tmpi <- readArray a i
tmpr <- readArray a r
writeArray a i tmpr
writeArray a i tmpi
bool a b c = if c then a else b
quicksort arr l r =
if r <= l then return () else do
i <- loop (l-1) r =<< readArray arr r
exch arr i r
withStrategy rpar $ quicksort arr l (i-1)
quicksort arr (i+1) r
where
loop i j v = do
(i', j') <- liftM2 (,) (find (>=v) (+1) (i+1)) (find (<=v) (subtract 1) (j-1))
if (i' < j') then exch arr i' j' >> loop i' j' v
else return i'
find p f i = if i == l then return i
else bool (return i) (find p f (f i)) . p =<< readArray arr i
main =
do [testSize] <- fmap (fmap read) getArgs
arr <- testPar testSize
ans <- readArray arr (testSize `div` 2)
print ans
testPar testSize =
do x <- testArray testSize
quicksort x 0 (testSize - 1)
return x
testArray :: Int -> IO (IOArray Int Double)
testArray testSize =
do ans <- newListArray (0,testSize-1) [fromIntegral $ H.hashString $ show i | i <- [1..testSize]]
return ans

This solution uses Haskell's parallel "strategies". This concept was introduced to give Haskell programmers more control over parallelization but the only available implementation was found to leak memory and nobody was able to get it to work in this case: Jim's solution contains a concurrency bug that causes it to return incorrect results almost every time it is called.

Peaker then posted his own Haskell solution:

import Data.Array.IO
import Control.Monad
import Control.Concurrent
bool t _f True = t
bool _t f False = f
swap arr i j = do
(iv, jv) <- liftM2 (,) (readArray arr i) (readArray arr j)
writeArray arr i jv
writeArray arr j iv
parallel fg bg = do
m <- newEmptyMVar
forkIO (bg >> putMVar m ())
fg >> takeMVar m
sort arr left right = when (left < right) $ do
pivot <- read right
loop pivot left (right - 1) (left - 1) right
where
read = readArray arr
sw = swap arr
find n pred i = bool (find n pred (n i)) (return i) . pred i =<< read i
move op d i pivot = bool (return op)
(sw (d op) i >> return (d op)) =<<
liftM (/=pivot) (read i)
loop pivot oi oj op oq = do
i <- find (+1) (const (<pivot)) oi
j <- find (subtract 1) (\idx cell -> cell>pivot && idx/=left) oj
if i < j
then do
sw i j
p <- move op (+1) i pivot
q <- move oq (subtract 1) j pivot
loop pivot (i + 1) (j - 1) p q
else do
sw i right
forM_ (zip [left..op-1] [i-1,i-2..]) $ uncurry sw
forM_ (zip [right-1,right-2..oq+1] [i+1..]) $ uncurry sw
let ni = if left >= op then i + 1 else right + i - oq
nj = if right-1 <= oq then i - 1 else left + i - op
let thresh = 1024
strat = if nj - left < thresh || right - ni < thresh
then (>>)
else parallel
sort arr left nj `strat` sort arr ni right
main = do
arr <- newListArray (0, 5) [3,1,7,2,4,8]
getElems arr >>= print
sort (arr :: IOArray Int Int) 0 5
getElems arr >>= print

This solution also turned out to be buggy. Firstly, it contains a more subtle concurrency bug causes it to return incorrect results only occassionally. Peaker corrected this bug to give the following code:

import System.Time
import System.Random
import Data.Array.IO
import Control.Monad
import Control.Concurrent
import Control.Exception
import qualified Data.List as L
bool t _ True = t
bool _ f False = f
swap arr i j = do
(iv, jv) <- liftM2 (,) (readArray arr i) (readArray arr j)
writeArray arr i jv
writeArray arr j iv
background task = do
m <- newEmptyMVar
forkIO (task >>= putMVar m)
return $ takeMVar m
parallel fg bg = do
wait <- background bg
fg >> wait
sort arr left right = when (left < right) $ do
pivot <- read right
loop pivot left (right - 1) (left - 1) right
where
read = readArray arr
sw = swap arr
find n pred i = bool (find n pred (n i)) (return i) . pred i =<< read i
move op d i pivot = bool (return op)
(sw (d op) i >> return (d op)) =<<
liftM (/=pivot) (read i)
swapRange px x nx y ny = if px x then sw x y >> swapRange px (nx x) nx (ny y) ny else return y
loop pivot oi oj op oq = do
i <- find (+1) (const (<pivot)) oi
j <- find (subtract 1) (\idx cell -> cell>pivot && idx/=left) oj
if i < j
then do
sw i j
p <- move op (+1) i pivot
q <- move oq (subtract 1) j pivot
loop pivot (i + 1) (j - 1) p q
else do
sw i right
nj <- swapRange (<op) left (+1) (i-1) (subtract 1)
ni <- swapRange (>oq) (right-1) (subtract 1) (i+1) (+1)
let thresh = 1024000
strat = if nj - left < thresh || right - ni < thresh
then (>>)
else parallel
sort arr left nj `strat` sort arr ni right
timed act = do
TOD beforeSec beforeUSec <- getClockTime
x <- act
TOD afterSec afterUSec <- getClockTime
return (fromIntegral (afterSec - beforeSec) +
fromIntegral (afterUSec - beforeUSec) / 1000000000000, x)
main = do
let n = 1000000
putStrLn "Making rands"
arr <- newListArray (0, n-1) =<< replicateM n (randomRIO (0, 1000000) >>= evaluate)
elems <- getElems arr
putStrLn "Now starting sort"
(timing, _) <- timed $ sort (arr :: IOArray Int Int) 0 (n-1)
print . (L.sort elems ==) =<< getElems arr
putStrLn $ "Sort took " ++ show timing ++ " seconds"

This solution does run correctly on small inputs but increasing the problem size to 1,000,000 elements results in a stack overflow. Two attempts were made to diagnose this bug, here and here, but both turned out to be wrong. The bug is actually in the getElems function of the Haskell standard library which stack overflows on long arrays.

Surprisingly, more bug fixing seems to have culminated in the world's first parallel generic quicksort written in Haskell. Furthermore, the resulting Haskell solution is only around 55% slower than the equivalent F# solution. Note that this requires the latest GHC that was only released in recent weeks.