Wednesday, 16 November 2011

What is your garbage collector collecting?

As a garbage collector runs it recycles unreachable values. A well known result in GC theory is that values tend to die in "clumps" rather than individually. This post visualizes these clumps.

We instrumented a C++ program equipped with our own mark-region collector (using 64kB regions) that is equivalent to the following F# implementation of a list-based n-queens solver in order to gather information about the structures being collected:
let safe x1 y1 x2 y2 =
  x1 <> x2 && y1 <> y2 &&
    x2 - x1 <> y2 - y1 && x1 - y2 <> x2 - y1

let ps n =
  [ for i in 1 .. n do
      for j in 1 .. n do
        yield i, j ]

let rec filter x0 y0 = function
  | [] -> []
  | (x1, y1 as q)::xys as list ->
      let xys' = filter x0 y0 xys
      if safe x0 y0 x1 y1 then
        if System.Object.ReferenceEquals(xys', xys) then list else
          q::xys'
      else
        xys'

let rec search n nqs qs ps a =
  match ps with
  | [] -> if nqs=n then a+1 else a
  | (x, y as q)::ps ->
      search n nqs qs ps a
      |> search n (nqs+1) (q::qs) (filter x y ps)

let solve n =
  search n 0 [] (ps n) 0

Stopping this program on its 1,000th GC cycle when it is solving the 11-queens problem, extracting references between allocated but unmarked cons cells, filtering out connected components with 3 or more vertices and visualizing the result using Mathematica's GraphPlot function gives the following:

In this case, we find that just 6% of the cons cells die alone (i.e. singleton lists) and 84% die in a clump of two cons cells, 6.5% die in a clump of 3 cons cells and 3.4% die in clumps containing four or more cons cells. Only a handful of clumps had interesting structures.

Although these results suggest that it might be easy to obtain a speedup by adding a special case to the list type for lists containing two elements we have found that this is not the case because the required changes also add expensive operations elsewhere, e.g. destructuring the special two-element list now incurs an allocation. Therefore, the next best solution would be to optimize the garbage collector for this case but it would be prudent to analyze the clusters collected during the running of other programs first. Analyzing the garbage generated by a program using red-black trees would be interesting...

No comments: