Sunday, 26 April 2009

When celebrity programmers attack: Guido on tail calls

Guido van Rossum, the celebrity programmer responsible for the Python programming language, published a horribly misinformed blog post last week that was meant to explain why tail calls are "unpythonic" and do not deserve a place in the Python language.

Given the simplicity of the subject matter and the widespread availability of accurate information from experts like Will Clinger, it seems remarkable that anyone would still be confused about tail calls. Although the presence or absence of tail calls in Python is unimportant, this is still serious because Python is the new BASIC and, consequently, Guido is abusing his celebrity status by misguiding a huge number of young programmers. Indeed, this is very visible on the ensuing Reddit discussions where the majority of comments are factually incorrect.


A tail call is a call that appears as the last thing a function does before it returns. In functional languages, this means the caller is returning the result of a function call (the tail call).

Tail call elimination, aka Tail Call Optimization (TCO), optimizes the stack space consumed by tail calls in order to allow the callee to be thought of by the programmer as a continuation of the caller, as if the body of the callee had been placed at the end of caller.

Dispelling the myths

Many people believe that tail calls exist as a workaround for the absence of loops in functional languages, allowing loops to be rewritten as tail recursive functions. This is a myth. In reality, many functional languages including both OCaml and F# provide loops as core language constructs and both loops and tail calls are widely used in such languages.

Many old programmers believe that tail call elimination is an academic curiosity. This is no longer true. In reality, tail call elimination has been implemented in .NET for eight years and will become a fully supported mainstream feature when Visual Studio 2010 is released next year. LLVM (used by Apple) also has full support for tail call elimination.

Some people misinterpret the "optimization" in TCO to mean that tail calls are supposed to be fast. In fact, TCO is an optimization in space to reduce memory consumption and tail calls can even be slower than non-tail calls, e.g. ILX tail calls on .NET.

Many people assume that the function called by a tail call must be statically known and some people even believe that tail calls may only call the function they are in. These are myths. In reality, all calls in tail position are tail calls and tail call elimination should handle all tail calls whether they are to statically known functions or not, i.e. given a variable containing a function you can tail call that function safe in the knowledge that the space consumption will be eliminated. Indeed, this is the essence of the revelation.

The revelation

Almost all non-functional programmers are unaware that tail calls facilitate a programming paradigm that they have never seen.

The ability to tail call to functions that are not statically known is the foundation that makes many combinators useful. This is a style of programming where functions are composed in order to create a pipeline for values to flow through. Without tail call elimination, every stage in the pipeline would leak stack space and the whole approach becomes unusably unreliable. The only workaround is tantamount to replacing the calling convention and destroys many valuable features including interoperability and the ability to dynamically load libraries.

This is the real reason why tail calls are valuable.

Friday, 17 April 2009

What O'Reilly don't want authors to know

Mike Hendrickson has been busy updating O'Reilly's analysis of the state of the computer book market by programming language. That means it is time for us to reiterate how authors of decent books can earn far more for their work by cutting out the middlemen including trade publishers like O'Reilly.

Traditional book publishers are a dying breed. Aside from e-books, they have been driven out by an increasing number of so-called "self-published" books. In the context of software development, this is particularly common around non-mainstream subjects and includes titles such as OCaml for Scientists and Programming in Scala. O'Reilly's analysis excluded all such books even though they are far more profitable for authors.

In order to make a case for self-publishing it is necessary to present some information about a variety of existing books:

As the OCaml and F# markets are similar sizes and the contents of OCaml for Scientists and F# for Scientists are similar, it is interesting to note that OCaml for Scientists was 22× more profitable for its author in Q4 2008 even though F# for Scientists was a new book published through a major publisher and advertised and sold on Amazon. The reason is that John Wiley & Sons have done virtually nothing to market this F# book besides placing it on Amazon and, in fact, Wiley only managed to sell one copy direct outside the US! This really highlights just how little value a trade publisher and Amazon add.

Haskell is currently receiving far more press than any other functional programming language and, consequently, Haskell books are among the best sellers with 1,491 copies sold in 2008. However, with prices like that of Real World Haskell, they would need to be selling 20,000 copies a year to compete with OCaml for Scientists in terms of the profit made by the author.

Despite being self-published, Programming in Scala is outselling Real World Haskell according to its Amazon sales rank.

Finally, companies like IdTechEx prove that self-publishing can be the foundation of a serious business on the scale of a major trade publisher like O'Reilly. This is a consequence of the characteristic V-curve of profit found in many markets where cheap and expensive luxury products give locally maximal profits. Self-publishing naturally lends itself to higher quality and higher value books.

The moral of this story is that self-publishing is the most rewarding way for good authors to reach their readers. If you are the author of a decent book, please consider publishing it yourself.

Friday, 10 April 2009

Sadek Drobi interviews Don Syme about the history of F#

Sadek Drobi has interviewed many interesting people in the past and his interviews are of particular interest because Sadek is not afraid to ask difficult questions.

Sadek recently published an interview with Don Syme, a lead developer of .NET generics and F#. In particular, Sadek asks why the experiments at Microsoft Research in writing SML and Haskell implementations on top of .NET led Don to build an eager language rather than a non-strict language like Haskell. Don describes the "dissonance" between non-strict evaluation and .NET that culminates in excessive use of monads making it unnecessarily tedious to reuse the framework, as well as the efficiency and debugging problems imposed by non-strict evaluation.

Don then goes on to discuss some of the other features of Haskell that F# does draw upon, such as type classes. In particular, Don explains how he carefully separated the proven foundations of such features, such as the use of type classes for overloading as they were originally intended, from the experimental aspects that bring a "surprising number of technical problems".

Overall this was an extremely interesting interview that touched upon many fascinating aspects of state-of-the-art language design in the context of industrial-strength programming language implementations.

Wednesday, 8 April 2009

Haskell in industry!!!

A dutch startup called Tupil just announced that their first commercial application written in Haskell has gone live.

Tupil describe themselves as:

Tupil is Chris Eidhof and Eelco Lempsink. We code for a living, using functional programming languages like Haskell. Technology is our passion and we just love building great software tools. After working for years as web developers at very nice companies, we decided to go solo, together.

Credible success stories of bleeding-edge functional languages like Haskell being used in industry are always great to see. Keep up the good work!

Tuesday, 7 April 2009

More on Haskell's hash table problems

We recently proved that Haskell's defacto-standard hash table implementation is extremely slow. So slow that optimized native-code compiled Haskell is even slower than interpreted Python and 32× slower than a decent hash table implementation! If that result were not remarkable enough, we have now discovered that the history of this issue is even more surprising...

Haskell programmers have been complaining about the poor performance of its hash table implementation for at least 3 years. The cause of the problem, a bug in the garbage collector, was identified a long time ago but that bug was never fixed. Instead, it appears that the Haskell developers went on a crusade to persuade the world that decent hash table implementations are not advantageous. For example, the book "Real World Haskell" by Don Stewart et al. explicitly states:

"Compared to a hash table, a well-implemented purely functional tree data structure will perform competitively. You should not approach trees with the assumption that your code will pay a performance penalty."

Perhaps more surprising, given Haskell's uniquely bad support for hash tables, is the fact that "Real World Haskell" is one of only three books cited from the Wikipedia article on this subject (!).

Suffice to say, there is overwhelming evidence that trees are not competitively performant when compared to hash tables. Indeed, that is precisely why hash tables are such a ubiquitous data structure. In general, we have found that trees are typically 10× slower than hash tables in practice for construction and lookup with around a million elements. That is supported by our recent benchmark where a hash table in F# took 0.7 and the fastest tree-based implementation written in Haskell (which is extremely heavily optimized for trees) was still 11× slower, taking over 7s to complete the same task.

Also, the Haskell entry for the k-nucleotide benchmark on the shootout contains the comment:

-- Note: Hash tables are not generally used in functional languages, so there
-- are no available library implementations for Haskell. This benchmark
-- requires a hash table. This is why I have implemented the hash table here.

And, again, this is just completely factually incorrect. Hash tables are just as popular in functional languages (like OCaml, F#, Scala, Clojure, Mathematica) as they are in imperative and object oriented languages for the simple reason that they are a great data structure. For example, the hash table implementation bundled with OCaml is so efficient that even the uniprocessor OCaml implementation of knucleotide thrashes the parallel Haskell implementation.

Saturday, 4 April 2009

Another serious bug in GHC

Peter Verswyvelen of Nazooka, one of the only industrial users of Haskell, uncovered another serious bug in GHC, the defacto-standard Haskell compiler. The bug afflicts floating point arithmetic in external libraries such as Qt, GTK, OpenGL, BLAS, LAPACK and FFTW on x86.

Peter said "It's insane this bug did not cause any more problems. Every call into every C function that uses floating point could have been affected". Given that this bug had gone unnoticed for so long, the only logical conclusion is that the GHC FFI is immature because it has very few users. This concurs with our previous findings that no mature software has ever been written in Haskell.

Moreover, this bug joins the list of serious FFI-related bugs in GHC along with the subtle FFI bug in GHC that helped to destroy Darcs and a memory alignment bug in GHC's FFI that caused segmentation faults.

Suffice to say, the moral of this story is that even the most advanced high-level programming language implementations have a real achilles heel when it comes to interoperability with software written in existing unsafe languages like C and C++. This is one of the major issues we are addressing in HLVM through the use of LLVM's mature and well-tested C FFI.

Friday, 3 April 2009

F# vs OCaml vs Haskell: hash table performance

I just stumbled upon an interesting benchmark. Put mappings i -> i for i in 1..10M in a hash table, print the entry for 100. Don Stewart's optimized Haskell:

import Control.Monad
import qualified Data.HashTable as H

main = do
m <- (==) H.hashInt
forM_ [1..10000000] $ \n -> H.insert m n n
v <- H.lookup m 100
print v

My OCaml translation:

let n = 10000000

let () =
let m = Hashtbl.create n in
for n=1 to n do
Hashtbl.add m n n
Printf.printf "%d\n%!" (Hashtbl.find m 100)

My F# translation:

let m = System.Collections.Generic.Dictionary()
for i = 1 to 10000000 do
m.[i] <- i
printf "%d\n" m.[100]


Haskell: 22.8s
OCaml: 2.82s
F#: 0.706s

I found it remarkable that the performance difference is so large: F# is 32× faster than the optimized Haskell here!

Thanks to everyone who has replied. Apparently this is a fundamental design flaw in GHC's garbage collector which is incapable of handling the mutation of arrays of boxed values (e.g. hash table buckets) efficiently. Unfortunately, this means that Haskell's defacto-standard compiler is incapable of handling any data structure that maps values onto values efficiently. The nearest alternative is a purely functional tree-based map but that is not only still 11× slower than F# but it culminates in a data structure that is then asymptotically slower to search as well:

import Data.IntMap
import Data.List

main = print $ m ! 100
where m = fromDistinctAscList [(i,i) | i <- [1..10000000]]

This program still takes 7.519s.

Perhaps the most remarkable find is not that OCaml and F# thrash Haskell on this benchmark (which is to be expected: OCaml and F# are not toys) but that even a Python one-liner thrashes the most heavily optimized Haskell code:

$ time python -c 'import psyco;psyco.full();print dict((i,i) for i in xrange(10000000))[100]'

real 0m5.738s
user 0m5.428s
sys 0m0.308s

If only Haskell programmers could take some time away from writing Fibonacci functions perhaps they could also build some kind of adequate compiler.