Wednesday, 23 December 2009

Another serious perf bug uncovered in Haskell's garbage collector

The developers of Haskell at Microsoft Research may claim that "Haskell is the world’s finest imperative programming language" but the flagship Haskell implementation GHC still suffers from serious long-standing bugs in its garbage collector that make it impossible to solve many important problems efficiently in Haskell.

We previously rediscovered a serious bug in GHC's garbage collector that rendered every write into an array O(n) instead of O(1) because the GC marks the entire array as dirty and traverses every element at the next minor GC. In particular, this makes it impossible to implement hash tables and sorts efficiently in Haskell. Surprisingly, this bug turned out to be many years old and had been reported by many people before us. Failure to fix this bug in GHC has been forcing Haskell programmers to resort to elaborate workarounds such as manual memory management via the FFI as seen in the Haskell implementation of the knucleotide benchmark from the Computer Language Shootout.

Yesterday, Robert van Herk wrote on the GHC users mailing list that attempts to optimize his Haskell code were also culminating in unexpectedly poor performance even though his arrays only had eight elements each. Simon Marlow clarified that their current implementation of GHC's garbage collector does indeed contain another serious performance bug: every mutable array seen by a mutator is inserted into the remembered set and examined at the next minor collection. Thus, Haskell programs that handle large numbers of small mutable arrays will experience a different kind of asymptotic performance degradation in Haskell.

These bugs effectively render Haskell itself inefficient because imperative programming and mutable data structures are the only performant solution to many common problems.