As a test, we augmented the OCaml implementation of our ray tracer benchmark with a parallel higher-order map to exploit this embarassingly parallel algorithm. This is around 20 lines of extra code. The results were quite surprising: the concurrent OCaml implementation both outperforms and scales better than the concurrent F# implementation!
These results led us to study concurrency in more detail and we were surprised to find that such a fundamental problem has no clear solution. The best proven track record of any technology is probably Erlang, used to handle massive concurrency on telephone networks, which cleanly separates concurrent threads and gives each thread its own garbage collector. This is done deliberately to make each concurrent thread as independent as possible.
Surprisingly, the two big concurrent languages for applications programming (Java and .NET) both opted for a different approach using concurrent garbage collection. A concurrent garbage collector consists of at least one extra concurrent thread that traverses the heap and deallocates unreferenced values. This has several advantages:
- Concurrent threads can share memory.
- Communication between concurrent threads is cheap.
- Single-threaded applications exploit two cores because the garbage collector runs concurrently with the main thread.
- Allocation is slowed by synchronization.
- Concurrent programs relying upon shared memory cannot be distributed, e.g. across a network or between CPU and GPU.
- High allocation rates for small, short-lived values.
- Large immutable data structures.
The third approach to concurrency, used by OCaml and JoCaml, is to fork the process and run two processes concurrently. Forking creates a mirror of the heap but it is lazily copied (copy-on-write), so the large immutable data structures found in functional programs do actually get shared, even though explicit shared memory is not supported. This makes message passing much slower but allows an individual process to run at full speed.
So OCaml's fork-based approach to concurrency is preferable for computationally-intensive problems that parallelize easily (like our ray tracer) but the Java and .NET approaches are preferable when there will be a lot of communication.
Interestingly, the F# programming language will soon add support for asynchronous programming using a monadic style derived from the research done on Haskell. This will provide a robust, high-level approach to concurrency that should alleviate the most important problems encountered when trying to write reliable concurrent programs.