Sunday, 11 April 2010

ML's powerful higher-order module system

For around 30 years, the ML family of languages have been characterised by several unusual features including a novel and powerful higher-order module system. This language feature was pioneered by Dave MacQueen in the early 1980s and found its way into the Standard ML and OCaml languages. Higher-order modules allow one module to be parameterized over another module in the form of functors that map modules to modules.

Higher-order modules turn out to be especially useful for parameterizing concrete data structures over more primitive data structures. Perhaps the most compelling application of higher-order modules comes from graph theory, where they allow algorithms over graphs to be abstracted over the concrete data structures used to represent the graphs. This application was discussed in our most recent OCaml Journal article "The A* algorithm" that describes a generalization of Dijkstra's shortest path algorithm that is widely used for route finding in game AI. The ocamlgraph library by Sylvain Conchon, Jean-Christophe FilliĆ¢tre and Julien Signoles is an excellent production-quality example of the application of higher-order modules to data structures and algorithms from graph theory.

Higher-order modules saw some significant visibility in more mainstream computer science literature, perhaps most famously in Chris Okasaki's excellent book "Purely functional data structures" where they were used to build towers of data structures, one upon another, such as a functor that creates catenable list implementations from queue implementations. But this language feature has since failed to gain traction among the wider community of ML programmers and two of the most common modern derivatives, F# and Haskell, have only the most rudimentary module systems with no support for higher-order modules at all. Few people seem to miss higher-order modules but Lennart Augustsson, author of the world's first Haskell compiler, did write a blog post "A somewhat failed adventure in Haskell abstraction" where he described the difficulty of translating some simple examples from ML to Haskell due to the lack of higher-order modules (and second-rate alternatives such as OOP). We have encountered similar problems when trying to abstract data structures over each other in F# although our book F# for Technical Computing does have a section about purely functional data structures that uses the object-oriented programming with interfaces to achieve a similar effect, albeit without the benefit of static checking and resolution that higher-order modules offer.

This raises the question: why did advanced module systems fail? The following reasons seem most likely:

  • The examples given in the literature (purely functional data structures and parsers) are not practically important enough to be sufficiently compelling.

  • The most commonly used ML implementation with higher-order modules is OCaml where this form of abstraction inhibits inlining and, consequently, can severely degrade performance. This makes it impractical to abstract over different forms of primitive arithmetic, for example.

These issues seem to have all but killed higher-order modules as a feature in modern languages.

1 comment:

GDS said...

In Ocaml the canonical way to represent a set is to apply a functor that produces a structure containing the set type and functions over it. This requires a lot more thinking (and typing!) than doing the canonical thing in Java: just instantiate an implementation of Set.

Similarly, writing a function that takes sets of any type is not possible in Ocaml - the function has to be a functor, which is awkward.

In SML the situation is even worse because functors are generative, so Set(Int).t from library 1 is not the same type as Set(Int).t from library 2!