Sunday, 10 October 2010

Towards concurrent garbage collection for GHC

Simon Marlow of Microsoft Research recently published a blog post entitled First results from GHC's new garbage collector. As his beautiful graphs show so clearly, this is a first step towards concurrent garbage collection. The blog post describes this advancement entirely from the perspective of throughput because the ability to collect per-thread nursery generations independently removes some of the blocking that was wasting mutator time in the previous version.

However, we believe that concurrent programming may become a killer application domain for Haskell and, in that context, latency can be critical. If GHC's garbage collector is made more concurrent, by allowing the old generation to be collected independently as well, then pause times could be greatly reduced and Haskell would have a considerable advantage over competing technologies like .NET.

We have found that even the best-behaved .NET programs that allocate still suffer GC pauses of around 20ms, over an order of magnitude longer than the 600┬Ás pause times indicated on the graphs for this new version of GHC. Real .NET applications that were not designed from the ground up to attain low latency suffer stalls lasting several seconds!


Jonathan Shore said...

20ms is serious for the kind of work I do with real-time models. What sort of test were you running and was it against the MS VM, mono 2.8, new GC old GC, etc?

Truth is that the vast majority of users of the .NET platform would not be concerned with that. The numerical / RT guys are a minority.

Michael Robin said...

>The numerical ... guys or a minority

RT I can understand, but why the the numerical guys care about GC delays? - I'd think overall performance would be the important measure.

Flying Frog Consultancy Ltd. said...

@Jonathan Shore: My test app is a trade matching engine written in F# and designed for low latency and maintainability. We ran tests on both the .NET 4 GC and the previous version with each of the latency options. The best case is 20ms pauses. Other cases are typically 300ms pauses.

@Michael: There is a significant overlap with many numerical people doing things like real-time signal processing or visualization. Latency can also adversely affect parallel programming. For example, 20ms is far longer than a network hop over TCP.

Tezka said...

@Michael people in quant finance, an area wherein functional programming has recently been gaining momentum, are often concerned with both, that is, real time numerical processing. 20ms delay for most HFT platforms is out of the question.