Thursday, 25 December 2008

Building a better future: the High-Level Virtual Machine

Microsoft's Common Language Run-time (CLR) was a fantastic idea. The ability to interoperate safely and at a high-level between different languages, from managed C++ to F#, has greatly accelerated development on the Microsoft platform. The resulting libraries, like Windows Presentation Foundation, are already a generation ahead of anything available on any other platform.

Linux and Mac OS X do not currently have the luxury of a solid foundation like the CLR. Consequently, they are composed entirely from uninteroperable components written in independent languages, from unmanaged custom C++ dialects to Objective C and Python. Some developers choose to restrict themselves to the lowest common denominator (e.g. writing GTK in C) which aids interoperability but only at a grave cost in productivity. Other developers gravitate to huge libraries written in custom dialects of particularly uninteroperable languages (e.g. Qt). Both approaches have a bleak future.

The situation is compounded by the fact that Linux has a far richer variety of programming languages than Windows, thanks to Linux being the platform of choice for academics such as programming language researchers who develop and maintain a variety of state-of-the-art programming languages, libraries and tools on the Linux platform. However, despite any benefits of languages like OCaml, Erlang, Haskell, Lisp, Scheme, ATS, Pure and others, these languages are almost entirely uninteroperable because they do not have a shared run-time and many do not even have easy foreign function interfaces (FFIs) to access existing unmanaged libraries.

If there were a high-level virtual machine (HLVM) available for Linux that could act as a common language run-time for these kinds of languages then it may be possible to build a better future for software development on these platforms. The impedance mismatch between different languages (including C) would be a lot smaller and the ability to write and consume libraries from other languages would greatly improve productivity.

We believe this approach has a bright future and, consequently, we have begun developing a new HLVM that is designed to act as a common language run-time, initially for the ML family of languages, in the hope that others will build upon it and efforts can be combined between language communities. We are using the excellent LLVM library that provides high-performance native code generation across a variety of architectures and platforms, including x86/x64 and Linux/OSX.

Although the project is still at a very early stage of development, we already have some promising results. We can compile a subset of ML including bools, ints, floats and arrays types, we have full tail calls between internal functions and the C calling convention for external functions which can be invoked directly and our implementation is 2-4× faster than OCaml on x86 at several simple benchmarks including the Sieve of Eratosthenes and a Mandelbrot renderer.

The main features that we have yet to implement are algebraic datatypes, pattern matching and garbage collection. Once those features have been completed we shall release a first version of our HLVM as an open source project and ask for contributors and developers to start improving and building upon this foundation. This will take time but hopefully we can work together to build a better future for high-level programming on the Linux and Mac OS X platforms.


Saturday, 8 November 2008

Sales of F# books

The book publisher O'Reilly wrote an interesting blog post entitled "State of the computer book market" which included a breakdown of programming books by programming language. We criticized their study before because it does not count books like OCaml for Scientists, which has been the world's best-selling book about OCaml since its publication, and also because they consider only units sold and not cost.

Wiley have gathered sales information about our book F# for Scientists and it is interesting to see how well F# is doing compared to the figures for general programming book sales last year that were quoted by O'Reilly.

A direct comparison shows that the 1,225 copies of F# for Scientists sold in Q3 2008 immediately places the F# programming language a long way ahead of all of the conventional functional languages (i.e. excluding C# and Javascript). The best selling conventional functional language in Q1 2007 was Lisp with only 557 books sold. Moreover, the Amazon sales rank of the book Expert F# is consistently slightly higher than F# for Scientists. So the total number of F# books being sold is likely to be at least 2,500 per quarter.

According to these results, the F# programming language is already far more popular than Ada, OCaml, C, Haskell, Scheme, Lisp and Groovy.


Wednesday, 8 October 2008

Mono 2.0

The Mono Project was intended to bring the benefits of Microsoft's excellent .NET framework and, in particular, the Common Language Run-time (CLR) to other operating systems such as Linux and Mac OS X. The home page even boasts that Mono is "positioned to become the leading choice for development of Linux applications".

If Mono provided a solid foundation then it could well become the platform of choice on systems like Linux but the 1.x implementations have been plagued by reliability and performance problems.

Firstly, the design and implementation of a performant concurrent garbage collector is the single most difficult of the platform's core features to implement but is of absolutely critical importance. The implementation of a real GC for Mono was originally postponed and the Boehm GC was used instead, being described as an "interim measure" over 5 years ago. The Boehm GC is fundamentally flawed in this context, leaving programs to leak memory until they are killed by the OS.

Secondly, Mono's code generator generates native code that is slower than almost all other compiled languages.

Five years have passed since then and Mono 2.0 has just been released but, contrary to previous announcements, this new version is still built upon Boehm's flawed garbage collector and our SciMark2 benchmark implementation in F# shows that Mono 2.0 is still 3× slower than .NET for numerical algorithms.

So Mono appears to be no closer to its goal of being the leading choice for development of Linux applications. This begs the question, how might Mono development be improved in the future? Failing to implement a working GC for Mono forced the Mono developers to spend a great deal of time and effort fixing bugs and addressing performance issues related specifically to the Boehm GC that, in the grand scheme of things, are worthless because that GC was only ever a stop-gap measure. Mono originally reinvented the code generator because mature reusable alternatives like LLVM were not yet available. However, the most difficult aspect of implementing a code generator for Mono is maintaining harmony with the GC but Mono still lacks a working GC so there is no harmony to be maintained. Consequently, it seems logical that the Mono developers would be wise to adopt the LLVM project to handle their code generation (because it already provides much better performance) and then continue trying to build a real GC for mono in harmony with LLVM-generated code. In fact, other projects such as PyPy are progessing so much more rapidly than Mono that they may be the first to provide a complete backend.

Hopefully the Mono project will provide the solid foundation that it intends to in the future but, in the mean time, we shall restrict ourselves to the use of robust garbage collectors.

Friday, 22 August 2008

Hundreds of copies of F# for Scientists bought by Microsoft

In 2007, Microsoft commissioned us to translate our extremely popular book OCaml for Scientists from the open source OCaml language to their .NET-based alternative F#, that was created partly due to the extensive success of OCaml within Microsoft.

The result, F# for Scientists, was published by John Wiley & Sons earlier this month and Microsoft are so impressed that they have already bought 570 copies for internal use and intend to buy hundreds more by the end of the year.

This indicates that F# for Scientists may even overtake OCaml for Scientists to become the world's most profitable book on functional programming.


Thursday, 14 August 2008

Haskell's virginity

The most popular open source project ever written in Haskell, the Darcs code management system, is being dropped by Darcs' only significant user base, the developers of the defacto-standard Haskell Compiler GHC.

The developers of GHC cited poor stability and poor performance (their benchmark results found Darcs to be up to 50× slower than its competitors at core operations). They currently intend to migrate GHC to the Git version control software, which is written in C. Even software written in Python (Mercurial) was also considered because it is so much faster than Darcs.

This led us to revisit the subject of Haskell's popularity and track record. We had reviewed Haskell last year in order to ascertain its commercial viability when we were looking to diversify into other functional languages. Our preliminary results suggested that Haskell was one of the most suitable functional languages but this recent news has brought that into question.

Our latest research produced the following statistics regarding the number of installs on Ubuntu and Debian of the most popular programs written in OCaml and Haskell along with their source code size:













NameInstallsLines of codeLanguage
FFTW184,57414,298OCaml
Unison12,86623,993OCaml
MLDonkey7,286171,332OCaml
Darcs4,36524,937
Haskell
Free Tennis4,0667,419OCaml
Planets4,0573,296OCaml
HPodder3,4652,225Haskell
LEdit2,9652,048OCaml
Hevea2,82211,596OCaml
Polygen2,6571,331OCaml

This equates to:

  • 221,293 installs of popular OCaml software compared to only 7,830 of Haskell.
  • 235,312 lines of well-tested OCaml code compared to only 27,162 lines of well-tested Haskell code.

Our OCaml products, particularly OCaml for Scientists and The OCaml Journal, have proven that OCaml is one of the few commercially-viable functional programming languages. These remarkable new figures show that Haskell is still a virgin language: despite a huge number of open source projects being started in Haskell, virtually none reach maturity and the vast majority of those never garner a significant user base (i.e. they remain untested). Only Darcs and HPodder ever became popular but the most popular, Darcs, has turned out to be too difficult to fix and optimize even by expert Haskell programmers.

Our conclusion is, of course, that we are not going to consider diversifying into the Haskell market, at least not until it matures. Right now, Scala is looking much more viable.

Sunday, 20 April 2008

Who will use F#?

This post is in response to the comment by Fernando on the previous post. Many of the statements made by Fernando reflect commonly held views but I believe the foundation (e.g. FP vs OO) is too simplistic to be an accurate predictor of what is to come.

I take issue with several of the points that you have raised, Fernando. I'll start with the ones where I can provide objective evidence rather than just opinion.

You say that "F# will be adopted by long-time functional programmers, with LISP/Haskell heritage" but Lisp/Scheme and Haskell programmers account for only 5% of our F#.NET Journal registrants whereas C#/C++/Java programmers account for 53%. The reason is that functional programmers very rarely migrate between functional languages because they are so different (e.g. Lisp vs Haskell is like C++ vs Ruby). People learning any given functional language are always predominantly from mainstream backgrounds. Moreover, the prospect of commercialization makes F# alluring and that is irrelevant for academics happily using Lisp or Haskell. I believe F# will be adopted primarily by startup companies composed of small groups of talented programmers attacking hard problems who realise that the productivity of this language gives them serious advantages over the competition.

Your statements about C# adopting functional features are correct but then you say "Granted, the C# or VB implementations are not as elegant or pure as the F# counterparts, but the features are there". That is very misleading because the features responsible for F#'s awesome productivity are certainly not there in C# and VB. I'm talking about extensive type inference with automatic generalization and pattern matching over algebraic data types, both of which underpin the productivity of all MLs including OCaml and F#. Microsoft have not even begun trying to figure out how to add these features to C# and, until they do, C# will remain in a league below F# in terms of productivity and cost effectiveness.

Now for my subjective opinions. If it were possible to have a "one size fits all" language then I think programming language researchers would already have invented it. After all, they have complete freedom to do so: their results do not even have to be practically useful or adopted. However, I believe the different programming paradigms are at odds by design and, consequently, this is a strictly either-or situation. For example, using overloading undermines type inference. This is why overloading requires type annotations for disambiguation in F#. Many other languages lie at different points along the FP-OO curve. OCaml is closer to FP and Scala is closer to OO, but F# is the only language to have ever brought the productivity of OCaml to a mainstream industrial platform like .NET. Scala does a slightly better job with respect to OO but only at the cost of a catastrophic loss in terms of productivity due to its lack of automatic generalization.

In summary, I think you are overestimating the amount of cross-pollination that will occur between languages and underestimating the amount of programmer migration that will occur.

Wednesday, 16 April 2008

Is OOP good for technical computing?

During one of the more heated discussions on the moderated F# mailing list, Jeffrey Sax stated that technical computing benefits from object oriented programming:

"In the end, you usually need both object-oriented and functional concepts working together. This is particularly true of technical computing, where you have some meaningful 'object' abstractions (vectors, matrices, curves, probability distributions...) and lots of 'functions' you want to perform on them (integration, fit a curve, solve an equation...)." - Jeffrey Sax, Extreme Optimization

We used C++ in technical computing for many years and then migrated first to Mathematica, then to OCaml and now to F#. I found this statement really surprising. From my point of view, object oriented programming has almost nothing to offer in the context of technical computing. OO languages are obviously widespread in technical computing but that is only because they are common elsewhere: none of the dominant technical computing environments (e.g. Mathematica, MATLAB, Maple, MathCAD) emphasize OOP and many numerical libraries for object oriented languages do not adopt an object oriented design (e.g. IMSL). Inheritance is the unique angle of OOP compared to more conventional approaches (in technical computing) like procedural programming, functional programming and term rewriting. These examples of vectors, matrices, curves and probability distributions all seem very bad to me. Given the choice, can OOP really be preferable in this context?

Vectors and matrices are almost always represented internally by arrays and do have associated functions (such as arithmetic operations) but such encapsulation can be provided by many different approaches, not just OOP. One might argue that real/complex, non-symmetric/symmetric/hermitian and dense/sparse matrices could be brought together into a class hierarchy and inheritance could be used to factor out commonality but there is none: storage and almost all functions (e.g. matrix-matrix multiplication) are completely different in each case.

A "curve" is just another name for a function and, therefore, is surely best represented by functional programming because that makes evaluation as easy and efficient as possible. One might argue that "curves" might have other associated functions beyond straightforward evaluation, such as a symbolic derivative. However, as soon as you step down that road term rewriting becomes preferable to OOP because it facilitates symbolic processing which is the only viable way to do computer algebra and compute general derivatives. OOP might let you encapsulate a single special case but inheritance buys you nothing.

Probability distributions are perhaps less clear cut. There are an arbitrary number of such distributions (beta, Normal, exponential, Poisson...) and they have an arbitrary number of useful functions over them (mean, median, mode, variance, standard deviation, skew, kurtosis, inverse cumulative distribution function...). Although representing probability distributions using OOP allows the set of distributions to be extended easily it makes it difficult to add new functions over distributions: you cannot retrofit a new member onto every class in a predefined hierarchy. ML-style functional programming would make it easy to add new functions but difficult to add new distributions. Term rewriting makes it easy to extend both the number of distributions and the number of functions but leads to a "rats nest" style of unstructured programming as such extensions may be placed anywhere.

Neither Jeffrey nor I are impartial in this, of course. Jeffrey sells the excellent Extreme Optimization library for C#, which makes heavy use of object orientation and my company sells a variety of products related to technical computing such as the OCaml for Scientists and F# for Scientists books, Signal Processing .NET software for C# users, Time-Frequency analysis software for Mathematica users and F# for Numerics and F# for Visualization for technical computing using Microsoft's new F# programming language. However, I do not believe I am alone in my view, not least because none of the world's foremost integrated technical computing environments are built around object oriented programming. Indeed, Mathematica is my personal favorite and it is primarily a functional language built around term rewriting.

Our best special offer ever

For a limited time only buy OCaml for Scientists, 6 month subscriptions to the OCaml and F#.NET Journals, F# for Numerics and F# for Visualization and get over £100 off!

That's a saving of over 30%!

Thursday, 10 April 2008

Memory management in Sun's Java VM

Sun released this interesting paper in 2006, describing the memory management and concurrent garbage collection strategies employed by their HotSpot Java virtual machine.

HotSpot contains one of the most advanced concurrent garbage collectors of any language implementation, rivalling Microsoft's .NET CLR implementation.


Friday, 22 February 2008

F# for Numerics now available!

Following our discovery that the .NET platform seems to have a considerable market for cost-effective libraries implementing numerical methods, we have launched F# for Numerics with 50% off for a limited time only.

Microsoft are aiming their new F# programming language directly at the technical computing market and our F# for Numerics library is the only choice for users wanting high-performance numerical methods with an elegant functional interface.

Moreover, F# for Numerics will be seamlessly interoperable with our existing F# for Visualization library. This constitutes the first integrated technical computing environment for .NET, with F# at its core.