Saturday, 19 October 2019

Why does Java still feel bloated nowadays compared to other languages with garbage collection?

"We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp." -- Guy Steele.

The fundamental problem with Java is that Lisp sucks. Like Lisp, Java was poorly designed and then poorly implemented. McCarthy had the excuse that he didn't know any better. Whoever made Java was just ignorant.

A static type system can have many benefits. Types can detect large classes of bugs. For example, you can eliminate null reference exceptions. Types can be used to dramatically improve performance. For example, by eliminating almost all run-time type checks and by providing polymorphism with no performance impact. Whoever made Java apparently had a very bad understanding of type theory because the design of the JVM not only failed to leverage static types in any way (the JVM immediately discards almost all type information and heap allocated almost everything whether it makes sense to or not, just like Lisp) but they even defined it with fundamental bugs in the type system (Java Arrays Break Type Safety).

But bad design was only half of the story. Sun's JVM was also badly implemented. Faced with Lisp-like bad performance they began frantically building on their dodgy foundation by adding layers upon layers of progressively more sophisticated optimisations that were mostly intended to alleviate the GC due to the ridiculous heap allocation rate. Advanced parallel garbage collection algorithms to parallelize the handling of huge number of unnecessary heap allocations. Escape analysis to try to replace some of the heap allocated locals with stack allocations. And so on. Java ended up rediscovering the sufficiently smart compiler myth from Lisp: trying to optimise around a fundamentally inefficient foundation is like putting lipstick on a pig.

Overall, the result was very bad for JIT compilation performance and also particularly bad for latency and throughput isn't great. That's why Java is so laggy.

.NET had the second mover advantage and, with .NET 2.0 (2005), adopted a more sophisticated foundation with value types and reified generics. This goes a long way to leveraging static type information. Consequently, there is a lot less stress on and contention for the .NET GC and it can achieve good concurrent performance without the same sophisticated features. Simple Ahead-of-Time (AoT) compilation, no escape analysis and so on. Microsoft kept the implementation of .NET lean, stripping out any research ideas that proved to be dispensible. Overall, the result was good for compile times, throughput and latency even though most of the core Framework was written by people who did not understand how to leverage the core design (e.g. you cannot Parse without heap allocating and tuples are heap allocated).

Python (1991) and OCaml (1996) both have simple foundations but without advanced features like concurrent collection (both do memory management under a global lock). Consequently, they achieve reasonable performance on one core and, in particular, good latency.