Tuesday, 7 April 2009

More on Haskell's hash table problems

We recently proved that Haskell's defacto-standard hash table implementation is extremely slow. So slow that optimized native-code compiled Haskell is even slower than interpreted Python and 32× slower than a decent hash table implementation! If that result were not remarkable enough, we have now discovered that the history of this issue is even more surprising...

Haskell programmers have been complaining about the poor performance of its hash table implementation for at least 3 years. The cause of the problem, a bug in the garbage collector, was identified a long time ago but that bug was never fixed. Instead, it appears that the Haskell developers went on a crusade to persuade the world that decent hash table implementations are not advantageous. For example, the book "Real World Haskell" by Don Stewart et al. explicitly states:

"Compared to a hash table, a well-implemented purely functional tree data structure will perform competitively. You should not approach trees with the assumption that your code will pay a performance penalty."

Perhaps more surprising, given Haskell's uniquely bad support for hash tables, is the fact that "Real World Haskell" is one of only three books cited from the Wikipedia article on this subject (!).

Suffice to say, there is overwhelming evidence that trees are not competitively performant when compared to hash tables. Indeed, that is precisely why hash tables are such a ubiquitous data structure. In general, we have found that trees are typically 10× slower than hash tables in practice for construction and lookup with around a million elements. That is supported by our recent benchmark where a hash table in F# took 0.7 and the fastest tree-based implementation written in Haskell (which is extremely heavily optimized for trees) was still 11× slower, taking over 7s to complete the same task.

Also, the Haskell entry for the k-nucleotide benchmark on the shootout contains the comment:

-- Note: Hash tables are not generally used in functional languages, so there
-- are no available library implementations for Haskell. This benchmark
-- requires a hash table. This is why I have implemented the hash table here.

And, again, this is just completely factually incorrect. Hash tables are just as popular in functional languages (like OCaml, F#, Scala, Clojure, Mathematica) as they are in imperative and object oriented languages for the simple reason that they are a great data structure. For example, the hash table implementation bundled with OCaml is so efficient that even the uniprocessor OCaml implementation of knucleotide thrashes the parallel Haskell implementation.


5 comments:

namekuseijin said...

Gotta agree with you on this one.

If your purpose with such bashing is to make Haskell improve -- i.e, get that GC bug corrected -- you could be more successful by signing to the Haskell mailing lists and trumpeting it. Come on, don't be shy, barely more difficult than getting into a newsgroup:

http://haskell.org/mailman/listinfo/haskell-cafe

:)

Flying Frog Consultancy Ltd. said...

Several people, including me, have already complained about hash tables on the Haskell-Cafe mailing list without success.

However, I have been censored on that list in the past for questioning the accuracy of statements made by the Haskell developers so I virtually stopped posting there.

namekuseijin said...

I see.

Michel S. said...

Clojure does not have hashtables -- you can use Java hash tables with it, sure, but the Clojure Map is actually an immutable tree with 32-way branching.

Rich Hickey claimed it's fast enough, and it has the nice effect of being side-effect free: if you hold a reference to a Clojure map, you're guaranteed that it will point to the same nodes, no matter what other functions do to their reference to the same map.

Rich would be interested in your benchmark numbers, I'm sure.

jay paul said...

Excellent post and wonderful blog, I really like this type of interesting articles keep it u.

HP - ENVY TouchSmart Sleekbook 15.6" Touch-Screen Laptop - 6GB Memory - 750GB Hard Drive - Modern Silver

HP - ELITE 15.6" Laptop - 8GB Memory - 180GB Solid State Drive