Friday, 4 May 2012

Why are tuples not value types on .NET?

Microsoft made all tuple types reference types in the interests of simplicity.
Objectively, this was a mistake. Tuples with more than 4 fields are very unusual and should be replaced with a more typeful alternative anyway (such as a record type in F#) so only small tuples are of practical interest. My own benchmarks showed that unboxed tuples up to 512 bytes could still be faster than boxed tuples.
Although memory efficiency is one concern, the dominant issue is the overhead of the garbage collector. Allocation and collection are particularly expensive on .NET because its garbage collector has not been very heavily optimized (e.g. compared to the JVM). Moreover, the default .NET GC (workstation) has not yet been parallelized. Consequently, parallel programs that use tuples grind to a halt as all cores contend for the shared garbage collector, destroying scalability. This is not only the dominant concern but, AFAIK, was completely neglected by Microsoft when they examined this problem.
Another concern is virtual dispatch. Reference types support subtypes and, therefore, their members are typically invoked via virtual dispatch. In contrast, value types cannot support subtypes so member invocation is entirely unambiguous and can always be performed as a direct function call. Virtual dispatch is hugely expensive on modern hardware because the CPU cannot predict where the program counter will end up. The JVM goes to great lengths to optimize virtual dispatch but .NET does not. However, .NET does provide an escape from virtual dispatch in the form of value types. So representing tuples as value types could, again, have dramatically improved performance here. For example, callingGetHashCode on a 2-tuple a million times takes 0.17s but calling it on an equivalent struct takes only 0.008s, i.e. the value type is 20× faster than the reference type.
A real situation where these performance problems with tuples commonly arises is in the use of tuples as keys in dictionaries. For example, see the Stack Overflow question F# runs my algorithm slower than Python! where the author's F# program turned out to be slower than his Python precisely because he was using boxed tuples. Manually unboxing using a hand-written struct type makes his F# program several times faster, and faster than Python. These issues would never had arisen if tuples were represented by value types and not reference types to begin with...


Oleg Mihailik said...

Are you sure you are not mixing callvirt with virtual dispatch?

Whenever a non-virtual method (or a method of a sealed type) is invoked, callvirt does NOT use VMT, thus it s not virtual dispatch by any reasonable definition.

Mauricio Scheffer said...

Here's the story according to Microsoft:

TL;DR: they tried tuples as value types on the F# compiler and didn't see much improvement in performance. Still it's not clear why they kept tuples as reference types. All I can infer is that they did so just because the F# implementation was already there.

jsaul said...

Monomorphic virtual dispatch (indirect call) is not particularly expensive on modern CPUs due to the use of branch target buffers.

What's more important is that naive virtual dispatch inhibits operations such as inlining, which is generally required for eg. constant propagation, loop-invariant code motion and other macro-optimizations.

Dagdgsd Dffbd said...

The article posted was very informative and useful
thanks for sharing..
jaring futsal | jaring golf | jaring kassa / jaring polynet | jaring pengaman proyek | jaring pengaman bangunan | jaring pengaman gedung | jaring gawang | jaring paranet / jaring tanaman | jaring safety |
jaring proyek | jaring bangunan | jaring gedung | jaring outbound | jaring truk | tali tambang |
jaring pengaman | jaring golf murah | jaring futsal murah | jaring truk sawit | jaring pengaman nylon | cargo net |

Mac Topan said...

Thanks for Sharing That... Sucses for You