Saturday, 25 September 2010

The effectiveness of Boehm's GC

Many people still seem to be trying to use Boehm's garbage collector. This is surprising because that GC is conservative, meaning it is incapable of accurately distinguishing between integers and pointers and, consequently, it is prone to memory leaks due to false positives where integers are assumed to be pointers into an allocated heap block, preventing the block from being reclaimed. Consequently, Boehm's GC is a notorious source of memory leaks.

Moreover, 32-bit machines with 4Gb of RAM and programs that use a significant proportion of that RAM are still very common and the proportion of the address space in use is, therefore, high so the probability of false positives is very high.

Imagine a 32-bit program using 40Mb of heap contains a random integer. The probability of that random integer coincidentally pointing into an allocated heap block is approximately 1% because 1% of the 4Gb address space is covered by heap blocks. Now imagine a 32-bit program containing n random ints and using a proportion p of the address space. The probability that one or more of those ints point into allocated heap blocks is 1-(1-p)n. In our previous example, if there were 100 random ints then the probability of a false positive is a whopping 63%!

This is why 32-bit programs that use Boehm's GC are so prone to memory leaks. Hash tables are particularly susceptible to this problem because hashes are effectively random ints and the spines of hash tables are large heap blocks. Indeed, this appears to explain the memory leak we observed when using hash tables on Mono back in July.

No comments: