Wednesday, 25 September 2013

How do reference counting and tracing garbage collection compare?

Reference counting works by counting the number of references to each object. When the reference count drops to zero the object is definitely unreachable and can be recycled. The first advantage is simplicity. The second advantage is that decrements can occur as locals fall out of scope so collection can be deterministic and predictable. The third advantage is that the asymptotic complexity of reference counting does not depend upon the total size of the heap. The first problem is that cycles keep reference counts above zero even when they are unreachable, so reference counting alone cannot handle the general case. The second problem with reference counting is that incrementing and decrementing a counter in an object every time a reference to it is copied or dropped is very computationally expensive because it incurs a cache miss even if nothing else in the object is read or written. The third problem is that multithreaded reference counting is non-deterministic because increments and decrements race. The fourth problem is that simple implementations like Boost's shared_ptr in C++ allow destructors to avalanche which incurs unbounded pause times.

The first problem is solved in some languages (e.g. Erlang, Mathematica) by imposing unidirectional heaps so cycles cannot be created by design and, therefore, reference counting is an accurate form of GC for them. The second problem can be addressed using a variety of techniques such as deferred decrements but they all add significant complexity and, therefore, undermine the first advantage. The third problem obviously negates the second advantage in the context of multithreaded programs and, in this era of multicore computing, most programs are multithreaded. The problem of avalanching destructors can be solved by adding a queue to defer destruction but this adds complexity and makes the solution unpredictable, i.e. negates both advantages.

Tracing garbage collection works by keeping track of references into the heap (the global roots) and then tracing through the heap to find unreachable objects eligible for collection. The first advantage of tracing collection is that it can be very simple (e.g. the mark-sweep collector in my HLVM project is ~20 lines of code). The second advantage is that it is always accurate (can collect cycles). The third advantages is that naive mark-sweep is much faster than naive reference counting and optimized tracing collectors (e.g. JVM and CLR) are significantly faster than the fastest reference counting collectors. The first disadvantage is that, although they can be deterministic (as OCaml is), they are unpredictable because it is not possible to predict when an object will be collected. The second disadvantage of naive tracing collectors is that mutator threads must all be paused for an unbounded amount of time during each collection cycle. The third disadvantage of tracing collection is that the time spent marking is proportional to the size of the entire heap.

The second disadvantage is easily reduced by adopting incremental mark-sweep and can be completely eliminated by more advanced techniques. The third disadvantage turns out to be practically irrelevant because the constant pre-factor is so small: even with the largest heaps in use today the mark phase is surprisingly quick.

There are some common myths surrounding this topic as well. Many people believe (incorrectly) that objects are always kept alive until the end of their scope in the source code and, therefore, that reference counting always collects at the earliest opportunity because it collects objects reachable from variables as they fall out of scope. In reality, this mental model is horribly broken. In reality, the GC acts at run-time when the code has been compiled so the concept of scope does not even exist. Not all locally-held references are spilled to the stack and even when a reference is spilled to the stack it is likely to be overwritten by another when it is no longer needed. Register allocators rely upon liveness analysis (not scope) to determine when references need to be kept.


David said...

Thanks very enlightening stuff this morning.

The article "Why mobile web apps are slow" also helped me put a few things in perspective as I am a back-end .NET developer and incidentally I never have do deal with the problem of GC pauses.

That other article seems to make some questionable claims one of which being that RAM will not continue improving on mobile devices. Already 2GB RAM devices are common, also, the article mentions that ARM has on-die RAM which puts an upper bound on how much RAM we can expect to have. But as far as I understand it, Atom does not which will inevitably force ARM to deal with that threat.

Intel is seemingly very effective at bringing Atom wherever they want it to be.

Taylan Bayırlı said...

David, the "Why mobile web apps are slow" article by Drew Crawford is, bluntly put, full of crap with regard to garbage collection (I don't know about the rest of it). I wrote a not-very-well-written page on that article which I'm afraid mostly boils down to "look into the cited paper for yourself to see how blatantly Crawford misinterpret it," another issue being that the papers compare reference implementations for garbage collection from roughly the 60s up to the 90s, to dlmalloc, i.e. an industry-level implementation of malloc/free if I'm not mistaken, not even to reference counting, and unsurprisingly other papers come to rather contradicting conclusions. (The consensus in research seems to be that tracing GC is generally superior to any form of refcounting.)

When Crawford tries to research a topic in-depth and gives you citations from papers, I'd always be careful to see whether he might have been making glaring mistakes in how he interprets them and represents their results to you.

Jon Harrop said...

You may like a recent blog post I wrote presenting my findings when comparing the characteristics (including performance) of reference counted Swift with OCaml and F#:

Basically, I cannot find any evidence nor logical reason to believe these common assertions about memory management. Drew dug up some academic research that, as you say, only covered the characteristics of strawman GC algorithms the likes of which haven't been seen in production for decades. Unless someone can provide some evidence to backup these claims I am unwilling to accept them at face value.