Saturday, 2 February 2013

Curious about the hash table problem

An old post from Stack Overflow:

In short, even with the fix in the latest GHC, Haskell is still incapable of providing a dictionary (mutable or immutable) that is competitively efficient.
Haskell's hash tables were 32× slower than alternatives like C++ and .NET with GHC 6.10. That was partly due to a performance bug in the GHC garbage collector that was fixed for GHC 6.12.2. But Simon Marlow's results there show only a 5× performance improvement which still leaves Haskell's hash tables many times slower than most alternatives.
Purely functional alternatives are also much slower than a decent hash table. For example, Haskell'sIntMap is 10× slower than .NET's hash table.
Using F# 2010 and the latest Haskell Platform 2010.2.0.0 (released yesterday!) with GHC 6.12.3 on this 2.0GHz E5405 Xeon running 32-bit Windows Vista to insert 20M int->int bindings into an empty hash table we find that Haskell is still 29× slower than F# in real time and over 200× slower in terms of CPU time because the Haskell burns all cores:
GHC 6.12.3 Data.HashTable: 42.8s (new!)
.NET hash table:            1.47s
Provided you run only short-lived microbenchmarks you can disable the GHC garbage collector as Don Stewart suggests above. By asking for a nursery generation so large that this particular program will never fill it, he brought the time for the Haskell hash table down to only 1.5s here. However, this completely undermines the whole point of having a nursery generation and will massively degrade the performance of other code because freshly allocated values will now always be cold in the cache (which is why the nursery generation is typically the size of the L2 cache, orders of magnitude smaller than this).