Recent articles in the OCaml and F#.NET Journals derived high-level implementations of QR decomposition via Householder reductions. This numerical method has many applications, most notably in the computation of best fit parameters of linear sums.
Imperfections in OCaml
The OCaml programming language allows this algorithm to be expressed very elegantly in only 15 lines of code:
# let qr a = let m, n = Matrix.dim a in let rec qr_aux k q r qa = if k = n then q, Matrix.mul r a else let v = let u = Array.init m (fun i -> if i<k then 0.0 else Matrix.get qa i k) in u.(k) <- u.(k) -. norm u; Array.map (( *. ) (1.0 /. norm u)) u in let qr_aux2 m u v = Matrix.sub m (Matrix.outer u v) in qr_aux (k+1) (qr_aux2 q (Matrix.transform q (2. *| v)) v) (qr_aux2 r (2. *| v) (Matrix.transform (Matrix.transpose r) v)) (qr_aux2 qa v (Matrix.transform (Matrix.transpose qa) (2. *| v))) in let i = Matrix.init m m (fun i j -> if i=j then 1.0 else 0.0) in qr_aux 0 i i a;; val qr : Matrix.t -> Matrix.t * Matrix.t = <fun>
This is remarkably concise compared to the equivalent 2,077 lines of Fortran 77 code in LAPACK. However, the OCaml implementation suffers from two main drawbacks:
- Abstractions such as higher-order functions degrade performance. If core routines such as arithmetic operations are abstracted (which is necessary to generalize the algorithm over different numeric types such as reals, complexes and symbolic expressions) then the whole program can become over 10× slower.
- OCaml's serial run-time prohibits threads from running in parallel, leaving us with few options for parallel programming and none that are competitively performant. However, Unix fork can be used to coarsely parallelize the extraction of the Q and R matrices which can almost double performance for m×n input matrices when m≫n.
Consequently, the OCaml implementation is consistently slower than the specialized routines found in MATLAB.
Perfection in F#
The F# programming language includes a seemingly-innocuous feature called inline that allows functions marked inline to be inlined at compile time. However, inline in F# has some remarkable properties:
- Functions marked inline are structurally typed so this feature can be used to provide abstraction over different numeric types without any performance degradation at all.
- This feature even allows functional abstraction with optimal performance as well: by inlining higher-order functions and their function arguments in order to statically-expand completely specialized implementations.
Furthermore, F# inherits a concurrent garbage collector from .NET and Microsoft's Task Parallel Library provides easy-to-use parallel tasks based upon state-of-the-art wait-free work-stealing deques as seen in Cilk. This addresses the remaining imperfection in OCaml.
These features, that are unique to the F# programming language, allow the construction of parallel numerical code that combines the brevity of OCaml with the performance of Fortran. In fact, our generic implementation of QR decomposition in F# is up to 3× faster than the equivalent specialized function in MATLAB.