Thursday, 26 July 2007

Haskell on the ray tracer language comparison

We just updated our famous ray tracer language comparison with four progressively-optimized Haskell implementations of the ray tracer.

The first implementation is idiomatic Haskell: entirely lazy.

The other three implementations use eager evaluation extensively in an attempt to achieve competitive performance.

Monday, 2 July 2007


After years of reading articles complaining about the lack of concurrency in the OCaml programming language and speculation that the language would die as a consequence when multi-core computers became ubiquitous, we were pleasantly surprised to find that OCaml programs can exploit our dual-core machine easily and efficiently.

As a test, we augmented the OCaml implementation of our ray tracer benchmark with a parallel higher-order map to exploit this embarassingly parallel algorithm. This is around 20 lines of extra code. The results were quite surprising: the concurrent OCaml implementation both outperforms and scales better than the concurrent F# implementation!

These results led us to study concurrency in more detail and we were surprised to find that such a fundamental problem has no clear solution. The best proven track record of any technology is probably Erlang, used to handle massive concurrency on telephone networks, which cleanly separates concurrent threads and gives each thread its own garbage collector. This is done deliberately to make each concurrent thread as independent as possible.

Surprisingly, the two big concurrent languages for applications programming (Java and .NET) both opted for a different approach using concurrent garbage collection. A concurrent garbage collector consists of at least one extra concurrent thread that traverses the heap and deallocates unreferenced values. This has several advantages:
  • Concurrent threads can share memory.
  • Communication between concurrent threads is cheap.
  • Single-threaded applications exploit two cores because the garbage collector runs concurrently with the main thread.
However, concurrent garbage collection also has disadvantages:
  • Allocation is slowed by synchronization.
  • Concurrent programs relying upon shared memory cannot be distributed, e.g. across a network or between CPU and GPU.
These distinctions are particularly interesting in the context of functional programming languages because of their unusual properties:
  • High allocation rates for small, short-lived values.
  • Large immutable data structures.
Idiomatic functional code is typically several times slower in Java or on .NET because of the overhead imposed by a concurrent GC.

The third approach to concurrency, used by OCaml and JoCaml, is to fork the process and run two processes concurrently. Forking creates a mirror of the heap but it is lazily copied (copy-on-write), so the large immutable data structures found in functional programs do actually get shared, even though explicit shared memory is not supported. This makes message passing much slower but allows an individual process to run at full speed.

So OCaml's fork-based approach to concurrency is preferable for computationally-intensive problems that parallelize easily (like our ray tracer) but the Java and .NET approaches are preferable when there will be a lot of communication.

Interestingly, the F# programming language will soon add support for asynchronous programming using a monadic style derived from the research done on Haskell. This will provide a robust, high-level approach to concurrency that should alleviate the most important problems encountered when trying to write reliable concurrent programs.