Wednesday, 29 December 2010

Why GC when you have reference counted smart pointers?

Reference counted smart pointers are a simple form of garbage collection usable from the C++ programming language. A recent question on Stack Exchange asks why anyone would want anything more when reference counted smart pointers are already available.

Other forms of garbage collection (most notably tracing GCs) have several advantages over reference counting:

  • Accuracy: Reference counting alone leaks cycles so reference counted smart pointers will leak memory in general unless other techniques are added to catch cycles. Once those techniques are added, reference counting's benefit of simplicity has vanished.

  • Throughput: Smart pointers are one of the least efficient forms of garbage collection, particularly in the context of multi-threaded applications when reference counts are bumped atomically. There are advanced reference counting techniques designed to alleviate this but tracing GCs are still the algorithm of choice in production environments.

  • Latency: Typical smart pointer implementations allow destructors to avalanche, resulting in unbounded pause times. Other forms of garbage collection are much more incremental and can even be real time, e.g. Baker's treadmill.

Many of the answers given perpetuate myths about garbage collection. There is a myth that scope-based reference counting guarantees that values are collected as soon as possible. In fact, tracing collectors can and do collect values before the end of their lexical scope if the value becomes unreachable sooner and a GC occurs. Another myth is that garbage collected languages cannot release resources deterministically. In fact, this is done in exactly the same way as in unmanaged languages. Finally, there is a myth that manual memory management minimizes latency. In fact, manual memory management often has poorer worst-case latency characteristics than garbage collection (this problem originally drove us from C++ to OCaml!) and optimizing latency in an unmanaged language is seriously hard work.


Veikko Eeva said...
This comment has been removed by the author.
Veikko Eeva said...

In some projects C++ or C may be the only viable options due to specific reasons, but in the general case, even if performance is desirable, one has to ask if using C++ over a natively gargabe collected language is worth the trouble. If some specific native libraries are needed, one has the option of using native interop.

(This is written better than the version that is now deleted.)

Tezka said...

Interesting analysis.

Do you also have any benchmark data, comparing Ocaml GC and say boost shared pointers?

Flying Frog Consultancy Ltd. said...

@Veikko: Yes. Moreover, quite a few people are interested in low-latency high-level languages for concurrent programming right now (e.g. in finance) but there are few solutions out there. Erlang is the nearest thing but without a static type system it won't be taken too seriously. Haskell has great potential here but the GC is not (yet) incremental enough.

@Tezka: I've been benchmarking allocators recently so I'll throw reference counting into the mix and see how it fares.

Tezka said...

Jon, I am also a practitioner in the field of quantitative finance, and I am familiar with Haskell, standard ML and Lisp. We write most of our computationally intensive code
In C++, but I have been entertaining moving some of our newer stuff to a functional language. Whats more important for us than concurrency is the ability to target the SSE unit, and being able to process very large binary files (even 100s of gigabytes of Monte carlo simulation is typical) although Haskell offers a nice expressiveness, I couldn't find a compelling library for SIMD and the fastest Haskell IO type, I.e, ByteString, is way slower than even mundane fread/write; it gets dusted by aio. Does Ocaml/F# have any strong solution in this regard?

veikko_e said...
This comment has been removed by the author.
Flying Frog Consultancy Ltd. said...

@Tezka: I believe OCaml's IO is quite fast but I have no data quantifying it. Is it feasible to memmap the files on a 64-bit machine?

Floating point performance is often neglected in functional language implementations. OCaml is unusually good for a standalone functional language (iff you are on x64) and F# inherits decent capabilities from the .NET platform. However, neither are great in the grand scheme of things. Both the OCaml and GHC communities have been making noises about SSE but, AFAIK, nothing has come of it yet. Moreover, boxing can be an annoyance in both OCaml and Haskell.

My hobby HLVM project is designed to address exactly these kinds of shortcomings, albeit at a cost to the performance of purely functional code. I am happy to accept that cost because purely functional code is inherently slow anyway. I found it quite easy to get a 4× speedup on a microbenchmark by using LLVM to generate SIMD code. If you want to get great performance out of either OCaml or Haskell I would advise this route: instead of writing your numerical code in those languages directly, use them to generate efficient native code via LLVM. HLVM even beats C/C++ compiled with GCC 4 on several benchmarks...

veikko_e said...

Targeting the SSE unit in F#, I guess your options boil pretty much down to using the SIMD support in the Mono framework or possibly SlimGen ( from the SlimDX group. I’m not sure how the new LLVM backend affects optimisation possibilities. Be aware that the SIMD example by Miguel that us easily to be found from the Internet seem to be misleading (or more likely, just plain broken), as refuted at, for instance.

jay paul said...

Nice post! Can’t wait for the next one. Keep stuff like this coming.

HP - ENVY 15.6" Laptop - 8GB Memory - 1TB Hard Drive - Silver

HP - Pavilion 15.6" Laptop - 4GB Memory - 750GB Hard Drive - Black (15-n030us)