HLVM injects code into managed user-code functions that keep two data structures up to date:
- The shadow stack lists all references held on the stack or in registers and acts as the set of global roots into the heap. If a function accepts an argument of a reference type or allocates then the value is pushed onto the shadow stack. On function exit, the shadow stack is reset back to the way it was when the function was entered.
- The allocated list gives all of the references that have been allocated by the program. When a function allocates, the value is appended onto the allocated list.
When the allocation quota is exceeeded a garbage collection occurs. This is composed of two phases:
- Mark. Starting from the global roots on the shadow stack, all reachable data in the heap are marked as reachable (and the rest left marked as unreachable).
- Sweep. The allocated list is traversed, unreachable values are freed and all values have their mark state set back to unreachable.
This functionality requires only one non-trivial feature: given a value of a reference type it must be possible to find out what it references. In OCaml, the uniform representation makes it possible to traverse any value using the same code. HLVM uses a different approach: all values contain a pointer to a function that knows how to traverse values of that type. I called this the "visit function".
A block of OCaml code defines the def_visit, visit and def_visit_array functions. The def_visit function takes an HLVM description of a type and creates an HLVM function that finds all references contained within a value of that type. There are two kinds of reference type in HLVM: unions and arrays. The visit function does this for a non-array type and the def_visit_array function does this for an array type by creating an auxiliary function to traverse the elements of the array.
A block of HLVM code defines gc_mark_all, gc_mark, gc_sweep and gc functions. The gc_mark function traverses the shadow stack and then calls the gc_mark_all function to recursively traverse the entire heap marking values as reachable. The gc_sweep function traverses the allocated list freeing unreachable values and resetting all mark states back to unreachable. Finally, the gc function simply calls gc_mark and then gc_sweep to perform a complete garbage collection.
This simple algorithm, implemented in only a few dozen lines of OCaml code, is all that is required for HLVM to reliably recycle unreachable values. The next stage in the evolution of HLVM's garbage collector will be support for parallel threads by stopping all mutator threads when performing a collection. Once this is complete, HLVM will finally be a competitive platform for high performance parallel numerics. A further improvement will be the implementation of a variant of the Very Concurrent Garbage Collector (VCGC) that allows mutator threads and garbage collection threads to run concurrently, offering both better pause times and overall performance.