Saturday, 1 January 2011

The benefits of purely functional data structures

Purely functional data structures have the following advantages:

  • Persistence: old versions can be reused safe in the knowledge that they cannot have been changed.

  • Sharing: many versions of a data structure can be kept simultaneously with relatively modest memory requirements.

  • Thread safety: any mutation is hidden inside lazy thunks (if any) and, therefore, thread safety is handled by the language implementation.

  • Simplicity: not having to keep track of state changes often makes purely functional data structures simpler to use, particularly in the context of concurrency.

  • Incrementality: purely functional data structures are composed of many tiny parts, making them ideal for incremental garbage collection, leading to lower latencies.

Purely functional data structures also have the potential to be beneficial in the context of parallel programming for multicores. However, efficient multicore parallelism requires predictable locality in order to leverage caches and avoid getting bottlenecked on access to shared caches and main memory and purely functional data structures have, at best, unknown characteristics in this regard. Consequently, many programs that use purely functional data structures do not scale well when parallelized on a multicore because they spend all of their time in cache misses, contending for shared memory pathways.


3 comments:

snadric said...

I guess by "purely functional" you simply mean "immutable", right?

There are some key performance characterics to consider when using them:
- Most collections tend to be small, and immutable collections tend to scale better in this use case than mutable ones, esp. Maps.
- Mutable collections have their uses. I guess they are better to use for very large caches. But I would always hide them behind a wrapper class.

Because immutables are simpler to use, i would nearly always prefer them over the mutable ones.

Flying Frog Consultancy Ltd. said...

@snadric: "Most collections tend to be small, and immutable collections tend to scale better in this use case than mutable ones". I think that needs testing. Small Maps are certainly faster than small hash tables but what about association arrays or mutable balanced trees encoded in arrays?

Vincent said...

>Thread safety: any mutation is hidden inside lazy thunks (if any) and, therefore, thread safety is handled by the language implementation


I think it would be just as fair to say that purely functional data structures are thread-incompatible without some sort of language support.

Sure you won't get threading exceptions, but you'll still need a way to reconstruct a new data structure after two threads began operating on the same immutable structure.