Friday, 8 January 2010

HLVM on the ray tracer language comparison

We recently completed a translation of our famous ray tracer language comparison to HLVM. The translation is equivalent to the most highly optimized implementations written in other languages and this allows us to compare HLVM with a variety of competing languages for the first time. The results are astonishing.

Running the benchmark with the default settings (level=9, n=5 to render 87,381 spheres at 512×512) on 32-bit x86 gives the following times for different languages:

These results show that HLVM already provides competitive performance for a non-trivial benchmark. HLVM took 6.7s whereas C++ (compiled with g++ 4.3.3) took only 4.3s and Haskell (compiled with GHC 6.12) took 13.9s.

However, cranking up the level parameter to 12 in order to increase the complexity of the scene, rendering a whopping 5,592,405 spheres, we find that HLVM blows away the other garbage collected languages and is even able to keep up with C++:

This remarkable result is a consequence of HLVM's space-efficient data representation strategy that avoids almost all boxing and is ideal for numerical code. Moreover, these results vindicate HLVM's unique design that many people said would not be efficient.

The main factor separating these languages is their data representations. The slowest languages on this benchmark, Java and Haskell, rely upon uniform data representations that require many values to be boxed, massively increasing the burden on the garbage collector and damaging locality by spreading the same data across a much larger heap. Moreover, the inner loops of the renderer handle boxed values in these languages and, consequently, are constantly allocating and even incurring garbage collections that require them to traverse irrelevant parts of the heap. Conversely, C++ and HLVM are able to unbox virtually all values. In fact, HLVM is so successful at storing values unboxed that the only part of the benchmark to allocate is the initial creation of the scene tree and the renderer itself performs no allocations at all. Consequently, HLVM incurs no garbage collections during rendering and traverses no more heap than the C++.

Moreover, Java, Haskell and HLVM are the only languages on this benchmark providing multicore-capable garbage collectors. We have not yet parallelized this benchmark but a parallel implementation in HLVM will scale extremely well because it is not incurring garbage collections thanks to its unique data representation.

12 comments:

David Teller said...

Interesting. Is the source code for the HLVM version of the raytracer available?

ertai said...

Nice work!

Yes please provide an up to date archive of the source used to produce this benchmark.

Don't forget that benchmarking first rule should be reproductibilty of results.

Flying Frog Consultancy Ltd. said...

Yes, the source code is here in the SVN repo.

billitch said...

I'd really like to see where SBCL Common Lisp ranks. Usually it's between OCaml and Java...

Art said...

Congrats
GR8
Can't wait for parallel results
Who knew GC could be so interesting

bart said...

I found only the HLVM source code - where can I find the Haskell and others ?

Flying Frog Consultancy Ltd. said...

@Bart

The source code for the other implementations is available from here.

Daniel said...

I'd love to see a supercompiled Haskell version.

Laurent Le Brun said...

Hi,

Did you try this benchmark with F# too?

Curt Sampson said...

Art: GC is absolutely fascinating, and how programs use memory even more so. For a long, long time now main memory access has been non-uniform on even single-processor machines (due to caching), and being able to reduce cache-busting behaviour has been a key to a lot of great speed-ups.

This becomes particularly interesting when you're using a language like Haskell, where variables are immutable and data structures are persistent by default. This does not map well onto current CPU architectures, and you find yourself (if you were a compiler) looking for areas where you can be mutable anyway. The same force drives unboxed versus boxed values: access less memory. (And there's been a lot of work put into GHC related to this.)

Gabriel Claramunt said...

Imopressive... ont he other hand, benchmarks without mean and standard deviation are meaningless, IMHO

jay paul said...

Nice post! Can’t wait for the next one. Keep stuff like this coming.

HP - Split 2-in-1 13.3" Touch-Screen Laptop - 4GB Memory - 128GB Solid State Drive - Modern Silver

HP - ENVY Touchsmart 14" Refurbished Touch-Screen Laptop - 8GB Memory - 500GB Hard Drive - Midnight Black