We recently completed a translation of our famous ray tracer language comparison to HLVM. The translation is equivalent to the most highly optimized implementations written in other languages and this allows us to compare HLVM with a variety of competing languages for the first time. The results are astonishing.
Running the benchmark with the default settings (level=9, n=5 to render 87,381 spheres at 512×512) on 32-bit x86 gives the following times for different languages:
These results show that HLVM already provides competitive performance for a non-trivial benchmark. HLVM took 6.7s whereas C++ (compiled with g++ 4.3.3) took only 4.3s and Haskell (compiled with GHC 6.12) took 13.9s.
However, cranking up the level parameter to 12 in order to increase the complexity of the scene, rendering a whopping 5,592,405 spheres, we find that HLVM blows away the other garbage collected languages and is even able to keep up with C++:
This remarkable result is a consequence of HLVM's space-efficient data representation strategy that avoids almost all boxing and is ideal for numerical code. Moreover, these results vindicate HLVM's unique design that many people said would not be efficient.
The main factor separating these languages is their data representations. The slowest languages on this benchmark, Java and Haskell, rely upon uniform data representations that require many values to be boxed, massively increasing the burden on the garbage collector and damaging locality by spreading the same data across a much larger heap. Moreover, the inner loops of the renderer handle boxed values in these languages and, consequently, are constantly allocating and even incurring garbage collections that require them to traverse irrelevant parts of the heap. Conversely, C++ and HLVM are able to unbox virtually all values. In fact, HLVM is so successful at storing values unboxed that the only part of the benchmark to allocate is the initial creation of the scene tree and the renderer itself performs no allocations at all. Consequently, HLVM incurs no garbage collections during rendering and traverses no more heap than the C++.
Moreover, Java, Haskell and HLVM are the only languages on this benchmark providing multicore-capable garbage collectors. We have not yet parallelized this benchmark but a parallel implementation in HLVM will scale extremely well because it is not incurring garbage collections thanks to its unique data representation.