Saturday, 14 November 2009

Google's Go programming language

Google recently released a "systems" programming language called Go (not to be confused with the existing programming language called Go!) that is designed to be simple, fast, safe and concurrent.

Google give an impressive feature list:

  • Accurate concurrent garbage collection.
  • Goroutines and channels for parallelism and concurrency.
  • Multiple return values.
  • Fast compilation times.
  • Structurally-typed object oriented programming.
  • First-class functions.

There are two different kinds of compiler available to Go programmers. The more common 8g and 6g compilers for the x86 and x86-64 architectures, respectively, are simple "splat" compilers that translate source directly into assembler. These compilers offer fast compilation but the generated code is not optimized and is generally quite slow, often several times slower than C. Moreover, in the absence of an accurate way to traverse the heap, these compilers use a GC that resorts to conservative collection (blindly traversing the entire heap looking for pointer-like ints) and the current allocator and collector implementations are extremely slow and never return memory to the OS. The gcc-go compiler is a Go front-end for the GCC compiler that offers optimization but currently has no garbage collection at all.

The use of goroutines and channels is really the defining characteristic of the Go language. This feature stems from previous work by Go's creator, Rob Pike, on the Limbo programming language that was released in 1995. Goroutines are similar to tasks in Cilk or the Microsoft TPL except that tasks return a value whereas goroutines do not. Channels are a mechanism for synchronized communication using message passing.

Go may become an interesting language in the future but, for now, the only available implementations are nowhere near competitive and the language itself is missing many common features (non-null references, generics, exceptions and union types). Our impression is that Google are developing the Go programming language largely for itself, first as a language for high-performance web programming but also potentially as a language for Google's proprietary NaCl Chrome plugin that allows x86 code to be downloaded over the web and executed by a browser.

Thursday, 12 November 2009

How does HLVM's GC work?

HLVM injects code into managed user-code functions that keep two data structures up to date:

  • The shadow stack lists all references held on the stack or in registers and acts as the set of global roots into the heap. If a function accepts an argument of a reference type or allocates then the value is pushed onto the shadow stack. On function exit, the shadow stack is reset back to the way it was when the function was entered.
  • The allocated list gives all of the references that have been allocated by the program. When a function allocates, the value is appended onto the allocated list.

When the allocation quota is exceeeded a garbage collection occurs. This is composed of two phases:

  1. Mark. Starting from the global roots on the shadow stack, all reachable data in the heap are marked as reachable (and the rest left marked as unreachable).
  2. Sweep. The allocated list is traversed, unreachable values are freed and all values have their mark state set back to unreachable.

This functionality requires only one non-trivial feature: given a value of a reference type it must be possible to find out what it references. In OCaml, the uniform representation makes it possible to traverse any value using the same code. HLVM uses a different approach: all values contain a pointer to a function that knows how to traverse values of that type. I called this the "visit function".

A block of OCaml code defines the def_visit, visit and def_visit_array functions. The def_visit function takes an HLVM description of a type and creates an HLVM function that finds all references contained within a value of that type. There are two kinds of reference type in HLVM: unions and arrays. The visit function does this for a non-array type and the def_visit_array function does this for an array type by creating an auxiliary function to traverse the elements of the array.

A block of HLVM code defines gc_mark_all, gc_mark, gc_sweep and gc functions. The gc_mark function traverses the shadow stack and then calls the gc_mark_all function to recursively traverse the entire heap marking values as reachable. The gc_sweep function traverses the allocated list freeing unreachable values and resetting all mark states back to unreachable. Finally, the gc function simply calls gc_mark and then gc_sweep to perform a complete garbage collection.

This simple algorithm, implemented in only a few dozen lines of OCaml code, is all that is required for HLVM to reliably recycle unreachable values. The next stage in the evolution of HLVM's garbage collector will be support for parallel threads by stopping all mutator threads when performing a collection. Once this is complete, HLVM will finally be a competitive platform for high performance parallel numerics. A further improvement will be the implementation of a variant of the Very Concurrent Garbage Collector (VCGC) that allows mutator threads and garbage collection threads to run concurrently, offering both better pause times and overall performance.

The HLVM project is described in detail in the OCaml Journal.