Thursday, 12 November 2009

How does HLVM's GC work?

HLVM injects code into managed user-code functions that keep two data structures up to date:

  • The shadow stack lists all references held on the stack or in registers and acts as the set of global roots into the heap. If a function accepts an argument of a reference type or allocates then the value is pushed onto the shadow stack. On function exit, the shadow stack is reset back to the way it was when the function was entered.
  • The allocated list gives all of the references that have been allocated by the program. When a function allocates, the value is appended onto the allocated list.

When the allocation quota is exceeeded a garbage collection occurs. This is composed of two phases:

  1. Mark. Starting from the global roots on the shadow stack, all reachable data in the heap are marked as reachable (and the rest left marked as unreachable).
  2. Sweep. The allocated list is traversed, unreachable values are freed and all values have their mark state set back to unreachable.

This functionality requires only one non-trivial feature: given a value of a reference type it must be possible to find out what it references. In OCaml, the uniform representation makes it possible to traverse any value using the same code. HLVM uses a different approach: all values contain a pointer to a function that knows how to traverse values of that type. I called this the "visit function".

A block of OCaml code defines the def_visit, visit and def_visit_array functions. The def_visit function takes an HLVM description of a type and creates an HLVM function that finds all references contained within a value of that type. There are two kinds of reference type in HLVM: unions and arrays. The visit function does this for a non-array type and the def_visit_array function does this for an array type by creating an auxiliary function to traverse the elements of the array.

A block of HLVM code defines gc_mark_all, gc_mark, gc_sweep and gc functions. The gc_mark function traverses the shadow stack and then calls the gc_mark_all function to recursively traverse the entire heap marking values as reachable. The gc_sweep function traverses the allocated list freeing unreachable values and resetting all mark states back to unreachable. Finally, the gc function simply calls gc_mark and then gc_sweep to perform a complete garbage collection.

This simple algorithm, implemented in only a few dozen lines of OCaml code, is all that is required for HLVM to reliably recycle unreachable values. The next stage in the evolution of HLVM's garbage collector will be support for parallel threads by stopping all mutator threads when performing a collection. Once this is complete, HLVM will finally be a competitive platform for high performance parallel numerics. A further improvement will be the implementation of a variant of the Very Concurrent Garbage Collector (VCGC) that allows mutator threads and garbage collection threads to run concurrently, offering both better pause times and overall performance.

The HLVM project is described in detail in the OCaml Journal.

5 comments:

strangerland said...

Is this for parallel computing on multicore desktops? Other machines? Cloud computing?

Flying Frog Consultancy Ltd. said...

On multicore desktops.

Mohamed Bana's Blog said...

The Flying Frog,

Can you please summarise what you don't like about the current VMs?

From what I've gathered;

a) JVM; fixnums in the VM ...
b) Mono; no tail calls, and leaks on the GC.

What else? I'm surely missing something.

and why not just modify the JVM to support those changes instead of just writing HLVM?

Flying Frog Consultancy Ltd. said...

@Mohamed:

The JVM offers excellent performance for structless imperative monomorphic code (i.e. Java code) but the JVM lacks value types, tail calls and implements parametric polymorphism via type erasure. Lack of values types makes tuples and multiple return values slow. Lack of tail calls makes functional code either very slow or very fragile. Lack of efficient parametric polymorphism makes polymorphic code slow (and almost all ML code is polymorphic thanks to automatic generalization by the compiler).

Mono has a much more compelling feature list than the JVM: it is supposed to support value types, tail calls and real parametric polymorphism and they are even reimplementing Microsoft's mainstream libraries. However, Mono simply does not deliver. Firstly, Mono does not even have an accurate garbage collector. Secondly, Mono's code generator is poor (my F# code runs 3x slower under Mono than .NET). Thirdly, Mono is largely untested because it has a tiny user base (comparable to OCaml's). Fourthly, Mono only aspires to reimplement Microsoft's previous generation of libraries (i.e. WinForms and not WPF). Finally, Mono's VM is badly written and can only be fixed by a complete rewrite because the code generator must work in harmony with a real garbage collector (that does not exist yet).

Building HLVM from scratch is easier than trying to fix all of these deficiencies in JVM or Mono.

jay paul said...

Excellent post and wonderful blog, I really like this type of interesting articles keep it u.

HP - ENVY TouchSmart Sleekbook 15.6" Touch-Screen Laptop - 6GB Memory - 750GB Hard Drive - Modern Silver

HP - ELITE 15.6" Laptop - 8GB Memory - 180GB Solid State Drive