Friday, 30 January 2009

Mono does not support tail call elimination

Tail call elimination is an essential feature for functional programming languages. This feature simply refers to the ability for one function to end with a function call without leaking stack space. Tail call elimination can be essential in heavily functional code such as parser combinators and libraries written in continuation passing style.

Lack of tail calls on the defacto-standard JVM has raised concerns around languages like Scala and Clojure. Fortunately, work is underway to address this issue by implementing tail call elimination on MLVM/OpenJDK.

Microsoft's .NET has had tail call elimination for eight years and the Mono project is designed to replicate the Common Language Infrastructure (CLI) that is responsible for this feature. The Mono team have been advertising tail call elimination on their VM for many years.

However, when we tested the elimination of tail calls on several VMs we discovered that tail calls are seriously broken on Mono but work on other VMs including .NET, LLVM and, of course, OCaml. Specifically, we ran the following trivial F# program that relies upon a tail call to a dynamic location, a function odd that is passed as an argument to the higher-order even function:

let even odd n = odd(n+1)

let odd n =
printf "%d\n" n
even odd (n+1)

let (_: int) =
even odd 0

This program prints odd integers indefinitely in OCaml or F# on .NET but attempting to run the executable generated by F# on Mono 2.2 results in a stack overflow almost immediately, as Mono leaks stack space until the stack is exhausted. So Mono cannot be relied upon to execute any functional programs correctly including F# programs.

This is of particular interest to us because several people have asked us for customer support when trying to run our F# code, such as the code from our book F# for Scientists, under Mono. Our advice is to steer well clear of Mono until it becomes stable and feature complete.

Interestingly, we also tried the same test on LLVM 2.2, writing the program in LLVM IR, and the tail calls are handled flawlessly provided the fast calling convention is used. As we have shown before, LLVM also generates vastly faster code than Mono 2.2.


15 comments:

Alan said...

It was pointed out to you that mono has a testcase which implements this exact test and it works fine. You were also asked to post the generated IL for the testcase which is failing. You failed to do this. Would it be possible for you to post the IL of the application where tailcalls are not working for you as asked?

Alan said...

"So Mono cannot be relied upon to execute any functional programs correctly including all F# programs."

This statement is pure propaganda. You need solid facts, not pure speculation to make a statement like that.

Flying Frog Consultancy Ltd. said...

@Alan

Mono's test case does not implement this exact test and I already explained why on the Mono mailing list. Specifically, the branch location is dynamic in my code (it is passed in to the function as a function argument) and static in the Mono test.

My statement about functional programs is perfectly legitimate precisely because I backed it up with a simple test case that breaks Mono. What more could you want?

Flying Frog Consultancy Ltd. said...

I just tested this on LLVM and I can confirm that their implementation of tail call optimization is not broken.

Alan said...

"My statement about functional programs is perfectly legitimate precisely because I backed it up with a simple test case that breaks Mono. What more could you want?"
Do you know many F# programs with stackdepths in the order of 1000's? Enough to exceed a 2MB default stack size? That's why i say it's a solid fact and that's it's speculation.

I don't contest that mono doesn't have tailcalls for the specific case, but I am contesting that you claim this is an issue in *all* real world F# apps.

Flying Frog Consultancy Ltd. said...

@Alan

The programs should not have deep stacks. That is the whole point. This program only stack overflowed because Mono was leaking stack space. The same program run in any production-quality VM will not consume stack and will run indefinitely without issue.

Mono dies after ~400k calls. A modern machine can perform that computation in a fraction of a millisecond. So any programs that run for more than 1ms are likely to break on Mono. Suffice to say, that is most programs.

Alan said...

That's fine... as long as you ignore all the existing applications out there that run for more than 1 ms and don't leak. Of which, there is quite a lot.

Also, it's not true to call this a leak. It's an optimisation which hasn't been implemented. This is completely different to a leak.

Flying Frog Consultancy Ltd. said...

@Alan

I don't believe there are "quite a lot" of programs using tail calls running successfully on Mono for more than 1ms. You are probably thinking of C# programs that make no use of tail calls whatsoever.

Tail call elimination affects correctness so trying to pretend that it is just an unimplemented optimization in Mono is a triumph of hope over reality. This is another major bug in Mono that renders it unsuitable for a variety of serious uses, most notably anything written in Microsoft's next generation language F#.

Alan said...

http://en.wikipedia.org/wiki/Tail_call

Tail call elimination is an optimisation, hence the name "Tail call optimisation". It affects all programming languages, though as F# is a functional langauge, it can benefit more from tail call optimisation as it has higher performance.

It has nothing to do with code correctness (if i understand your usage of 'correctness' properly).

Though yes, this is an optimisation will have benefits to everything if it were to be implemented. Filing a bug report on the mono bugzilla would probably be the best way to ensure this optimisation is implemented. Attaching your testcase which demonstrates the issue would be perfect.

Flying Frog Consultancy Ltd. said...

@Alan

Clinger's formal description here is authoritative. Wikipedia is wrong.

By "correctness" I mean "returns the correct answer". Functional programs running on Mono will almost certainly overflow the stack before completing so they will not return the correct answer. So Mono's broken tail call elimination undermines the correctness of any program developed on .NET that uses ILX tail calls.

Responses from the Mono developers on their list make it clear that they do not understand this concept either so there is little reason to file a bug report. I already gave them code and an explanation so it is entirely up to them whether or not they want to learn how to do this properly. In the mean time, I have no choice but to explicitly state that we cannot support any customers who try to run our code on Mono.

Alan said...

"Functional programs running on Mono will almost certainly overflow the stack before completing so they will not return the correct answer"

*Some* functional programs will, such as ones designed explicitly to show the issue. A large amount of ones will not.

Has a bug been filed on bugzilla to ask for this optimisation to be implemented?

Flying Frog Consultancy Ltd. said...

@Alan

Combinators are one of the most fundamental constructs used in functional programming. Mono's inability to handle tail calls to function arguments correctly undermines the reliability of virtually all uses of combinators. Mono leaks on indirect tail calls and Clinger's paper shows that such calls make up 20% of all function calls in real code.

Indeed, my attempts to use and test Mono are solely because two of our customers complained about stack overflows only when trying to use our software on Mono.

Your on-going refusal to accept these facts is not constructive. Rather than posting falsehoods in pidgin English on the blog of someone who already knows better, you could be trying to fix the problem. For example, you could write an LLVM-based backend that would be vastly simpler and vastly faster than the current one with relatively little effort.

Alan said...

"you could write an LLVM-based backend that would be vastly simpler and vastly faster than the current one with relatively little effort"

Yes, you could.

Michel S. said...

@Alan

You are showing an incredible amount of ignorance. Sure, when you design a language / platform, "optimizing" tail calls is optional (though I note that the only functional language that does not guarantee TCO is Lisp, which also happens to be the most ancient).

But when you are implementing someone's specs, then if you do not deliver the required semantics, it *is* a bug. To wit, MSDN defines the tailcall operator as such:

"the current method's stack frame is removed before the actual call instruction is executed"

You should not be able to blow the stack. Full stop.

Russell Wallace said...

I tested this just now with Mono 2.4.4 (Ubuntu 10.04, F# 2.0.0.0) and it ran over 2 million iterations with no problem, so I'm happy to report that it seems the bug has now been fixed.