A quick update due to a surprise performance result for HLVM!
We knew that manipulating the shadow stack was a major slowdown in HLVM from the change in our benchmark results when the GC was first introduced a few weeks ago but we did not expect that a simple local optimization, unrolling, could go so far towards recovering performance. Moreover, this optimization was implemented in only a few lines of OCaml code.
The following results prove that HLVM now produces substantially faster programs than ocamlopt for numerical tasks on x86:
Interestingly, now that HLVM supports standalone compilation using LLVM's IR optimization passes as well as unoptimized JIT compilation, we see that LLVM's optimization passes only give small performance improvements in many cases and even substantially degrade performance on our 10-queens benchmark. This is a direct result of HLVM producing near optimal code directly (which greatly reduces compile times) and has eliminated our desire to add support for LLVM's optimization passes. We shall focus on inlining and then high-level optimization passes instead, most notably the elimination of closures.