Saturday, 24 April 2010

Variant types and pattern matching in HLVM

The compiler development series of articles in The OCaml Journal have culminated in an example front-end compiler targetting HLVM that now supports variants and pattern matching in addition to previous features.

These new language features have been implemented using only a few lines of code, bringing the front-end up to only 253 lines of OCaml code!

This new functionality allows substantially more complicated ML-style programs to be written, including the following symbolic simplifier:

# type expr = Int of int | Var of int | Add of expr * expr | Mul of expr * expr;; # let rec cmp_int((m, n): int * int) : int = if m<n then -1 else if m=n then 0 else 1;; # let rec cmp((f, g): expr * expr) : int = match f, g with Int m, Int n -> cmp_int(m, n) | Int _, Var _ -> -1 | Int _, Add _ -> -1 | Int _, Mul _ -> -1 | Var x, Var y -> cmp_int(x, y) | Var _, Add _ -> -1 | Var _, Mul _ -> -1 | Add(f0, g0), Add(f1, g1) -> let c = cmp(f0, f1) in if c<>0 then c else cmp(g0, g1) | Add _, Mul _ -> -1 | Mul(f0, g0), Mul(f1, g1) -> let c = cmp(f0, f1) in if c<>0 then c else cmp(g0, g1) | _ -> 1;; # let rec add((f, g): expr * expr) : expr = match f, g with Int m, Int n -> Int(m + n) | Int m, Add(Int n, f) -> Add(Int(m + n), f) | f, Int 0 -> f | Int 0, f -> f | Add(f, g), h -> add(f, add(g, h)) | f, Add(g, h) -> (match cmp(f, g) with -1 -> Add(f, Add(g, h)) | 0 -> Add(Mul(Int 2, f), h) | _ -> add(g, add(f, h))) | f, g -> match cmp(f, g) with -1 -> Add(f, g) | 0 -> Mul(Int 2, f) | _ -> Add(g, f);; # let rec mul((f, g): expr * expr) : expr = match f, g with Int m, Int n -> Int(m * n) | Int m, Mul(Int n, f) -> Mul(Int(m * n), f) | f, Int 0 -> Int 0 | Int 0, f -> Int 0 | f, Int 1 -> f | Int 1, f -> f | Mul(f, g), h -> mul(f, mul(g, h)) | f, Mul(g, h) -> (match cmp(f, g) with 1 -> mul(g, mul(f, h)) | _ -> Mul(f, Mul(g, h))) | f, g -> match cmp(f, g) with 1 -> mul(g, f) | _ -> Mul(f, g);; # let rec d((f, x) : expr * int) : expr = match f with Int n -> Int 0 | Var y -> Int(if x=y then 1 else 0) | Add(f, g) -> add(d(f, x), d(g, x)) | Mul(f, g) -> add(mul(f, d(g, x)), mul(g, d(f, x)));; # let x = Var 0 in let f = add(add(mul(mul(x, x), x), x), Int(-1)) in f, d(f, 0), d(d(f, 0), 0), d(d(d(f, 0), 0), 0);; - : `Struct[`Reference; `Reference; `Reference; `Reference] = (Add$1((Int$1(-1), Add$1((Var$1(0), Mul$1((Var$1(0), Mul$1((Var$1(0), Var$1(0))))))))), Add$1((Int$1(1), Add$1((Mul$1((Int$1(2), Mul$1((Var$1(0), Var$1(0))))), Mul$1((Var$1(0), Var$1(0))))))), Add$1((Mul$1((Int$1(2), Var$1(0))), Mul$1((Int$1(4), Var$1(0))))), Int$1(6)) Live: 95 58.9176s total; 0s suspend; 0s mark; 0s sweep

Moreover, the performance of the generated code is competitive with the OCaml top-level and F# interactive despite the fact that HLVM is optimized for high-performance parallel numerics and not this kind of symbolic and logic programming at all. However, this example also demonstrates that compilation times can be far too long with HLVM. Specifically, the 19-line cmp function takes 45 seconds to compile in the interactive session. Presumably this is due to a performance bug in HLVM but the problem has not yet been investigated.

This new version of HLVM will be made available as open source software on the HLVM page on OCamlCore.

Sunday, 11 April 2010

ML's powerful higher-order module system

For around 30 years, the ML family of languages have been characterised by several unusual features including a novel and powerful higher-order module system. This language feature was pioneered by Dave MacQueen in the early 1980s and found its way into the Standard ML and OCaml languages. Higher-order modules allow one module to be parameterized over another module in the form of functors that map modules to modules.

Higher-order modules turn out to be especially useful for parameterizing concrete data structures over more primitive data structures. Perhaps the most compelling application of higher-order modules comes from graph theory, where they allow algorithms over graphs to be abstracted over the concrete data structures used to represent the graphs. This application was discussed in our most recent OCaml Journal article "The A* algorithm" that describes a generalization of Dijkstra's shortest path algorithm that is widely used for route finding in game AI. The ocamlgraph library by Sylvain Conchon, Jean-Christophe FilliĆ¢tre and Julien Signoles is an excellent production-quality example of the application of higher-order modules to data structures and algorithms from graph theory.

Higher-order modules saw some significant visibility in more mainstream computer science literature, perhaps most famously in Chris Okasaki's excellent book "Purely functional data structures" where they were used to build towers of data structures, one upon another, such as a functor that creates catenable list implementations from queue implementations. But this language feature has since failed to gain traction among the wider community of ML programmers and two of the most common modern derivatives, F# and Haskell, have only the most rudimentary module systems with no support for higher-order modules at all. Few people seem to miss higher-order modules but Lennart Augustsson, author of the world's first Haskell compiler, did write a blog post "A somewhat failed adventure in Haskell abstraction" where he described the difficulty of translating some simple examples from ML to Haskell due to the lack of higher-order modules (and second-rate alternatives such as OOP). We have encountered similar problems when trying to abstract data structures over each other in F# although our book F# for Technical Computing does have a section about purely functional data structures that uses the object-oriented programming with interfaces to achieve a similar effect, albeit without the benefit of static checking and resolution that higher-order modules offer.

This raises the question: why did advanced module systems fail? The following reasons seem most likely:

  • The examples given in the literature (purely functional data structures and parsers) are not practically important enough to be sufficiently compelling.

  • The most commonly used ML implementation with higher-order modules is OCaml where this form of abstraction inhibits inlining and, consequently, can severely degrade performance. This makes it impractical to abstract over different forms of primitive arithmetic, for example.

These issues seem to have all but killed higher-order modules as a feature in modern languages.