Saturday, 31 January 2009

Mono 2.2 still leaks memory

We have previously discussed the fact that Mono is still built upon a conservative garbage collector (Boehm's GC). This means that Mono is not capable of identifying exactly what data is reachable and, consequently, has to resort to conservative guesses that can fail to deallocate garbage, i.e. leaking memory.

Boehm's own literature describes situations where the GC might be expected to leak (lazy lists and queues) but claims that no case has even been found in practice and they could not even construct a contrived example where memory was actually leaked. Readers of our previous posts have stated that our claims of memory leaks are "bogus". So we decided to put this issue to rest.

The following trivial F# program creates a cyclic list representing a queue, adds one element and then repeatedly adds one element and removes it again:

type 'a cell = { content: 'a; mutable next: 'a cell option }

do
let mutable tail = None
if tail = None then
let cell = { content = [||]; next = None }
cell.next <- Some cell
tail <- Some cell
while true do
let tail' = Option.get tail
let cell = Some { content = [|1.;2.;3.;4.|]; next = tail'.next }
tail'.next <- cell
tail <- cell
let tail' = Option.get tail
tail'.next <- (Option.get tail'.next).next

This obviously requires only enough memory for at most two queue items, so any memory leaks will be obvious. Running this program on .NET, its memory consumption is steady at 11Mb. Running this program on Mono 2.2, the entire memory of the computer is leaked away in 60 seconds, the OS goes to swap and everything grinds to a halt.

We have also described situations where Mono 2.2 leaks stack space until the stack overflows. These results may be of interest to anyone else trying to find a usable VM to build upon.


25 comments:

Flying Frog Consultancy Ltd. said...

One of the Mono developers, Rodrigo Kumpera, has responded to our example with the assertion that "it is
quite rare to cause pathological
leaks such as this one". However, we have written dozens of different programs based upon this queue implementation and they all leak in Mono.

Alan said...

You misquote[0].

It is rare because this sort of leak can *only* happen if pointers are retained in the stack. In normal applications, this rarely (if ever) happens.

If this exact code were used in a production application, it would not 'leak'.

Also, there was also a question as to whether or not the code shown does exactly as you think it does. You were asked to supply a C# version which does the exact same task so that it could be compared. Is there any chance you could provide that?[1]


[0]"This kind of leaks are usually caused by unused stack slots that retain the dead value.
Regular code will overwrite those stack slots on method calls and let the GC collect."

[1] "If your description of the code is correct, it shouldn't "leak" even with
the Boehm GC. Write the equivalent code in C#, for example. My guess is that either your code doesn't do what you describe or there is a bug triggered by F# in the runtime and what is actually leaking is not
managed memory"

Flying Frog Consultancy Ltd. said...

@Alan

The example I presented here was actually taken from a much larger multithreaded application that we discovered was leaking on Mono. I boiled it down to the example you see here and Mono leaked on every intermediate program that I created in the process.

I don't have time to translate the code into other languages myself to see if they also break Mono. Moreover, it is obviously not feasible for us to completely rewrite our F# code bases just to work around design flaws in Mono. So the results of translating even this tiny example would not be very interesting to us.

If you would like to have a go at recreating these bugs from your favorite language I recommend you start by decompiling this trivial F# program using reflector and then boiling the code down in your own language.

Alan said...

"Moreover, it is obviously not feasible for us to completely rewrite our F# code bases just to work around design flaws in Mono"

If a conservative garbage collector was a design flaw, then they wouldn't exist. Interoping with C or C++ requires at least a partially conservative collector. It *cannot* be done with a precise garbage collector.

"So the results of translating even this tiny example would not be very interesting to us."
Yes it would, it would help find the actual cause of the bug. As was stated in your post to the mono list, that code should not leak managed memory. So that'd imply that it's not a garbage collection issue. It's something else.

If you attach a compiled version of your F# program to your original post to the mono list, that'd be perfect. I don't have an F# compiler, nor do i know if one is available for linux.

Flying Frog Consultancy Ltd. said...

@Alan

The existence of convervative GCs does not make them good design. Your statement that C/C++ interoperability requires a conservative collector is complete nonsense. OCaml, Haskell, Erlang, SML, .NET and the JVM are all obvious counter examples.

F# is freely available from here. Just install it and compile the code from this blog post.

Alan said...

As was stated in the bug report, the bug won't affect any real world applications (or at least very very very very few) because this bug is 100% entirely due to the fact that stack slots are not being overwritten, which is exactly what we said was the issue.

I've sent a new email on the mono list containing both the leaking and fixed versions and an explanation if you care to read it.

Flying Frog Consultancy Ltd. said...

@Alan

Actually the explanation given to me by Paolo "lupus" Molaro (author of Mono's JITs) turned out to be completely wrong:

"If your description of the code is correct, it shouldn't "leak" even with the Boehm GC. Write the equivalent code in C#, for example. My guess is that either your code doesn't do what you describe or there is a bug triggered by F# in the runtime and what is actually leaking is not managed memory. Post the equivalent C# code and we can easily check which case it is."

In reality, the code does exactly what I said it does, your C# repro proves that this is not an F#-specific problem and it really is leaking managed memory. The only possible conclusion is that this is another serious bug in Mono.

I can well believe your explanation but it leaves two serious problems. Firstly, there is no way for a programmer to tell what stack slots correspond to in F# source code even if they were willing to try to work around these bugs in Mono by hand. Secondly, the lavishness of your workaround really highlights the fact that this bug in Mono persists across many variations of this program. For example, if you make the queue global the bug still persists. If you split the push and pop operations into separate functions, the bug persists. Indeed, your workaround of injecting multiple redundant non-tail recursive calls interspersed with conditionals is the only alteration I have found that manages to evade the bug.

Alan said...

No, you've completed missed the point of the workaround.

Any normal program will either A) Overwrite the current stack frame or B) have more application logic in more than 1 function.

Your entire program exists inside that one stackframe which is why you're seeing this issue.

Alan said...

Regardless of any of that, it's not a mono bug. It's a boehm GC bug. File a bug report there.

Alternatively, from the very page where you read about this issue so you could construct a test which demonstrates the documented issue, you could apply the documented workaround.

Set the next pointer to null when you unhook from the list.

Flying Frog Consultancy Ltd. said...

@Alan

Take the code you presented on this list, make the queue a static variable in the class and factor the push and pop operations into separate functions and you will see that Mono still leaks memory even though there are now two additional functions with separate stack frames being called from the loop.

Boehm never pretended that his GC could be relied upon to reclaim memory automatically. So you cannot blame this leak on a bug in Boehm's GC because it is doing everything that it claims to do, i.e. nothing. This is precisely why choosing to use Boehm's GC was a fundamental design mistake for Mono.

I appreciate that you can set the pointer to null by hand if you know exactly what you are doing but that is not practical for anyone with an established code base that relies upon automatic memory management having been implemented correctly in the VM.

Alan said...

"Boehm never pretended that his GC could be relied upon to reclaim memory automatically"

If you read the doc page you linked to you should have noticed that this issue is not limited to conservative garbage collectors.

You are taking a very well documented issue and making it appear to be something huge when it's not.

I demonstrated how modifying the stack by adding recursion 'fixes' the issue. My method works. I stll recommend 'nulling' as the correct solution, as per boehm documentation.

That's my position on the matter. If you disagree, you're free to submit patches to Mono, or implement your own bug and limitation free virtual machine.

Jules said...

Alan, are you saying that this is not a bug in Mono? Even if the bug is because of a library that Mono is using, then it's still a bug that should be fixed.

I have seen this attitude towards bugs before. Mono implements lambda's in C# by translating all lambda's in a method into one class. The class contains all local variables that all the lambda's need, and methods for the bodies of the lambda. In a scope one object is created for all lambdas. So if you do this:

var y = 2;
var hugeStructure = MakeHugeThing();
var foo = (x => y + x);
var bar = (x => DoSomethingWith(x,HugeThing));
// use bar here
return foo;

Then the hugeStructure is retained in memory, because the foo lambda (and therefore bar) isn't garbage collected.

In other words, if one lambda in a method isn't garbage collected, then all lambda's aren't garbage collected (including the captured local variables).

Someone argued that this is not a bug because it doesn't happen that often. If Mono wants to be a serious VM then these kind bugs have to be fixed.

Alan said...

@Jules: The exact same 'leak' was just demonstrated to exist in MS.NET aswell. This issue is *not* limited to mono. In the boehm docs for "An Embarrassing Failure Scenario", MS.NET is hitting exactly the issue described in point 3.

As for the lambdas, yes, that could be an issue. I remember a discussion happening about this before, but i don't recall the outcome of it. Filing a bug report on it would be the best way to ensure that it is resolved.

Flying Frog Consultancy Ltd. said...

Note that I tested this code compiled with Microsoft Visual C# 2008 Express on .NET 3.5 x86 and it does not leak as Alan claims. I also tested the assembly generated by Mono's C# compiler and that also leaks on Mono and not on .NET.

Alan said...

I attached the leaking and not leaking assemblies. Anyone can reproduce my result, in fact, you independently reproduced my result after you posted the above comment.

It can and does leak on MS.NET if not compiled with just the right settings.

Flying Frog Consultancy Ltd. said...

I still cannot get any of these programs to leak under .NET when run directly (i.e. not within the debugger).

Flying Frog Consultancy Ltd. said...

@Alan

Also, your recommendation of nulling out references by hand as a workaround does not work in this case because the bogus pointer Mono is leaving in the stack frame of the "Main" function does not correspond to any variables in the source code of the C# program, i.e. it must be a temporary. Consequently, the programmer could not even nullify the pointer if they wanted to.

Alan said...

Then I can only assume you haven't run the two assemblies I attached to my email.

Alan said...

static void pop()
{
var current = tail;
var next = tail.next;
var nextnext = tail.next.next;
current.next = nextnext;
next.next = null;
}

Memory leak stopped. Comment out the 'next.next = null' and you'll leak again. As per documented workaround, this results in a *single* object being retained unnecessarily rather than an entire chain of objects.

Flying Frog Consultancy Ltd. said...

@Alan

I cannot reproduce your assembly from my code using the MS toolchain.

Alan said...

Ensure you compile in Debug mode, and not Release mode. Release mode applies optimisations regardless of the 'Optimise' flag in the settings.

johnny said...

@Alan

Why don't you go shill somewhere else?
The Mono VM is a bad joke, that much is obvious. Either accept the criticism and fix the issues or go away and do nothing.

Your 'see nothing/hear nothing' approach contributes nobody taking Mono seriously.

MasterP said...

Can you explain this Mr. Flying Frog?

http://img36.imageshack.us/img36/9480/memoryleakb.png

This is not running on Mono. This is running on .NET.

Flying Frog Consultancy Ltd. said...

@MasterP: You are running in debug mode with optimizations disabled? If you compile in release mode or enable optimizations it works.

Jay said...

Process: mono
cpu usage: constant 50%
Memory consumed within 10 minutes: 2.3 gigabytes.

There is obviously an issue.