Saturday, 28 March 2009

Performance: OCaml vs HLVM beta 0.4

A quick update due to a surprise performance result for HLVM!

We knew that manipulating the shadow stack was a major slowdown in HLVM from the change in our benchmark results when the GC was first introduced a few weeks ago but we did not expect that a simple local optimization, unrolling, could go so far towards recovering performance. Moreover, this optimization was implemented in only a few lines of OCaml code.

The following results prove that HLVM now produces substantially faster programs than ocamlopt for numerical tasks on x86:

Interestingly, now that HLVM supports standalone compilation using LLVM's IR optimization passes as well as unoptimized JIT compilation, we see that LLVM's optimization passes only give small performance improvements in many cases and even substantially degrade performance on our 10-queens benchmark. This is a direct result of HLVM producing near optimal code directly (which greatly reduces compile times) and has eliminated our desire to add support for LLVM's optimization passes. We shall focus on inlining and then high-level optimization passes instead, most notably the elimination of closures.

6 comments:

Jonathan Shore said...

I am a fan of Ocaml & F#, but cannot use the .NET CLR because I need to run high performance numerical algorithms on mac and linux.

Why not map a high performance implementation of Ocaml to the JVM instead of building your own VM. The JVM will be supporting tail optimization and would not surprise me if there are development versions that already do.

An excellent mapping to the JVM will win you much wider acceptance than a separate JVM. The appeal of F# is that it shares the same playground with other .NET languages.

Unless you are able to mobilize a massive community around your implementation, improving, maintaining. and mapping your VM to various architectures is going to forever be a game of catch-up.

I would love to see a HP Ocaml on the JVM. There is a window of opportunity for that while languages such as Scala are trying to gain acceptance. There is a lot of interest in FP languages now and at the moment Scala is the only one that provides performance on the JVM. Ocaml is a cleaner language and would well in comparison.

My opinions ...

Flying Frog Consultancy Ltd. said...

@Jonathan Shore:

I thoroughly investigated that possibility before I started my HLVM project. There are actually projects (see OCamlJava) trying to port languages from the ML family to the JVM.

Unfortunately, the JVM is just fundamentally incapable of expressing efficient solutions to a lot of problems, including most idiomatic ML code.

The JVM cannot express value types so it boxes everything from complex numbers to multiple return values. In particular, this makes it impossible to implement an efficient hash table on the JVM because it boxes every key-value entry, so Java's hash table is 32x slower than .NET's (!).

The JVM cannot express TCO so idiomatic functional code either stack overflows or requires a global change to the calling convention that runs 10x slower and is no longer directly interoperable with Java.

The JVMs FFI is incredibly slow, so calling out to native code wastes thousands of cycles per invocation rendering it useless for many applications (e.g. arbitrary-precision arithmetic).

So I concluded that I must solve these problems myself. However, I was able to build upon the superb LLVM project and that has allowed me to solve all of these problems with relatively little effort. So you can already use HLVM for high-performance parallel programming and should have no problem thrashing Java.

Jonathan Shore said...

Thanks, I see your points. It is a shame that Sun has been so backward on many language and VM issues. The JVM for better or worse is the benchmark to match / beat. Last I checked the JVM still pulls ahead of the CLR for many tests.

It may be too early for this comparison, but wondering how your VM benchmarks compare to the JVM.

For the java folks or those with a large base of java libraries, much in the same way F# provides Ocaml facilities and bridging into the .NET object system, any thoughts on writing a translator for java byte code (much in the way java byte code has been mapped into the CLR with ikvm)?

Looking forward to seeing where this goes.

Flying Frog Consultancy Ltd. said...

@Jonathan

Prompted by your request for more data on HLVM vs Java I have ported some cross-language benchmarks to HLVM and, so far, it has thrashed Java on all of them. I just blogged the first, a ray tracer, where HLVM is over 3× faster on large problems and consumes 6× less memory.

Jonathan Shore said...

Very nice. Which java and what flags? Judging from your c++ implementation, I am guessing that you are creating a new java object for each vector.

In truth the obj or tuple approach is the most appropriate, but a Java oriented benchmarker would just transform that into a more primitive implementation that avoided the excessive object creation.

I'll have to take some time and play with HLVM. I am curious as to its performance with regards to numerical operations on large arrays and matrices (without concerning the object overhead). This is what I deal with most of the time. I find the Java does very well there (faster that C# and near or sometimes faster than C++).

Much like Java, Ocaml guards array access. Later java VMs optimised the checks away in many cases, making array access as fast as C/C++.


What you have done is quite impressive. A few followup thoughts:

- how do you see developing in this env (ie debugging, etc)? (develop in .NET and deploy into this env?)
- any plans to integrate with either .NET or Java libraries. Obviously not for the collections, but for the system-level stuff, a large body of work available.

jay paul said...

Really nice post, you got great blog and Thank you for sharing This excellently written content. Waiting for next one.

HP - PRO 15.6" Laptop - 4GB Memory - 500GB Hard Drive

HP - PRO 15.6" Laptop - 4GB Memory - 500GB Hard Drive - Tungsten