Tuesday, 7 July 2009

Problems with fslex and fsyacc

Back in 2004, we wrote a mini Mathematica implementation in OCaml in only four days. For fun, we decided to port this OCaml program to F#. Incredibly, porting just the lexer and parser from OCaml to F# has taken longer than it took us to write the entire original OCaml code! The reason is simply that OCaml has an incredibly powerful and mature suite of compiler tools for generating lexers and parsers whereas F# does not.
Given that our original OCaml lexer and parser for Mathematica source code were written using ocamllex and ocamlyacc, it seemed obvious to translate the code into fslex and fsyacc. However, there are some serious problems with the F# versions of these tools:
  1. Incomplete: the F# tools were only ever intended for the F# compiler itself and not for wider use. Consequently, features are missing, e.g. in the regular expression implementation in fslex.
  2. Unintegrated: there are Visual Studio support for these tools and, consequently, development is cumbersome compared to OCaml.
  3. Unreliable: the fslex and fsyacc tools are not officially part of the F# distribution and, although they are used in the F# compiler itself, they are buggy: reporting bogus errors where there are none.
  4. Slow: where OCaml takes 0.05s to generate and compile the lexer and parser, F# takes over 10s. That is 200× slower! Suffice to say, that seriously bogs down development.
One potential advantage of fslex is that simply adding --unicode on the command line generates a lexer for unicode rather than ASCII text. However, we have no use for unicode and, consequently, have not tested this. In contrast, the conventional OCaml solution is to drop the byte-only ocamllex tool in favour of a different lexer generator such as the ulex unicode lexer-generating syntax extension.
There is another choice for lexing and parsing in F#: the FParsec monadic parser combinator library by Stephan Tolksdorf. The current release of this library does not compile with the F# May 2009 CTP but the latest version in the development repository does. The project is split into C# and F# code in order to write some performance-critical sections in C# but this means that it is distributed as two separate DLLs. We shall investigate this option in the future...

8 comments:

s9 said...

"The project is split into C# and F# code in order to write some performance-critical sections in C#..."

WTF?

"...but this means that it is distributed as two separate DLLs."

WTF!??!!?

Flying Frog Consultancy Ltd. said...

@ s9

Yes. Although you can interoperate between C# and F#, calling code written in the other language, it is not yet possible to write a mixed-language project that compiles to a single DLL.

This problem arises because Microsoft don't want programmers to be able to statically link .NET assemblies because it would make it much easier to steal commercial code, so they do not provide this functionality in Visual Studio. There is a program called ILMerge from Microsoft Research that should make this possible but it has been known to be seriously buggy in the past and, when we tried it on FParsec, it did not work correctly. However, we are successfully using ILMerge to recombine our commercial F# DLLs into a single DLL. The difference may be that FParsec contains inline IL assembly.

Although F# is a fantastic language on a decent platform, both F# and .NET are far from mature when compared with existing solutions like OCaml. This is quite surprising given how widely .NET is allegedly used. However, our own statistics indicate that .NET is nowhere near as common as Microsoft suggest. OCaml is currently earning us 17× more revenue from product sales than C#, for example. The main reason for that may be that OCaml is recession-proof whereas .NET is really suffering in the current financial climate, as industrial users migrate to more cost-effective solutions like rats from a sinking ship.

Hopefully the .NET market will recover over the next couple of years and the F# ecosystem will mature so that we can see some decent revenue from it.

Until then, it seems that OCaml is still the language of choice for many things including lexing and parsing.

Laurent Le Brun said...

Interesting article, with some good points.

"development is cumbersome compared to OCaml development with Emacs."

In F#, you can use either Emacs or Visual Studio (or both). In OCaml, you can use Emacs, but not Visual Studio. Why would development be more cumbersome?

I tried FParsec a few times, ad I was quite happy with it, but I never used it for big grammars. I'm looking forward reading your feedbacks on this library.

Flying Frog Consultancy Ltd. said...

@ Laurent

"In F#, you can use either Emacs or Visual Studio (or both). In OCaml, you can use Emacs, but not Visual Studio. Why would development be more cumbersome?"

OCaml's integrated support for ocamllex and ocamlyacc source code editing from within Emacs gives you features like type throwback.

In contrast, F# has no such integrated support for any development environment for either fslex or fsyacc so you cannot get any type throwback.

That makes it a lot more cumbersome to develop lexers and parsers in F# than OCaml.

MichaelGG said...

"This problem arises because Microsoft don't want programmers to be able to statically link .NET assemblies because it would make it much easier to steal commercial code, so they do not provide this functionality in Visual Studio."

What about the F# compiler's static linking and standalone support?

Flying Frog Consultancy Ltd. said...

We have since dropped our use of ILMerge in commercial products because we also found it to be buggy.

@MichaelGG

F#'s static linking and standalone support is only capable of statically linking F#'s own standard libraries so it cannot be used to link in your own C# code.

Robert Jeppesen said...

I've used --standalone with success on both F# and C# libraries. Both kinds my own. Specifically, I use it to only have one assembly to deploy to SqlClr. Works just fine.

jay paul said...

Really nice post, you got great blog and Thank you for sharing This excellently written content. Waiting for next one.

HP - PRO 15.6" Laptop - 4GB Memory - 500GB Hard Drive

HP - PRO 15.6" Laptop - 4GB Memory - 500GB Hard Drive - Tungsten