Tuesday, 4 January 2011

IO throughput: Haskell vs F#

Many applications require the ability to churn through gigabytes of data. In such cases, high IO throughput is valuable. This article examines high-throughput IO in the Haskell and F# programming languages. The Haskell solutions use the ByteString library and the F# solutions use ordinary .NET 4 IO.

An eager Haskell solution that loads the entire file into memory may be written as follows:

import System import Bits import Data.List import Data.ByteString as B main = getArgs >>= B.readFile . Data.List.head >>= print . B.foldl xor 0

A lazy alternative that loads parts of the file on-demand may be written as follows:

import System import Bits import Data.List import Data.ByteString.Lazy as B main = getArgs >>= B.readFile . Data.List.head >>= print . B.foldl xor 0

An eager F# version can simply call the ReadAllBytes function to load the entire file:

System.IO.File.ReadAllBytes file |> Array.fold (^^^) 0uy

An on-demand alternative simply opens the file and calls ReadByte on a buffered stream in a loop until the end of the file is reached:

use stream = new System.IO.FileStream(file, System.IO.FileMode.Open) use stream = new System.IO.BufferedStream(stream) let rec loop n = let b = stream.ReadByte() if b >= 0 then loop (n ^^^ b) else n loop 0

Compiling the Haskell with GHC 6.12.3 using --make -O2 flags and running the F# interactively in Visual Studio 2010 on a Dell Precision T5400 running 32-bit Windows Vista with 4Gb of RAM gave the following results for the on-demand versions:

The eager F# failed due to lack of memory on all three files. The eager Haskell obtained the correct answer on the smallest file, crashed on the middle file and produced corrupted output on the largest file. These problems appear to stem from a signed 32-bit integer overflowing inside the ByteString library, causing it to either crash when apparently-negative ByteString lengths are encountered or to read only part of a file.

Several points of interest:

  • The variations between the speed of the Haskell and F# solutions is less than 5% in each case.
  • Wrapping the IO in an IEnumerable (seq in F#) severely degrades performance.
  • On 64-bit machines, memory mapping have different trade-offs.
  • Haskell's ByteString library includes some special functions (e.g. count) that use optimized kernels written in C and assembler to manipulate large quantities of data more efficiently than is currently possible from Haskell or F#.

For more information on high-performance F#, read Visual F# 2010 for Technical Computing.

7 comments:

mattiasw said...

Hi John,

I really appreciate that you want to spread the word around F#, but what is this sample trying to show?

Are you using 32 or 64-bit Windows? Since a process on Win32 cannot be bigger than 2GB, 1.58GB is huge.

What is the purpose och having the file as a byte[] in memory?

What alternatives are there? You mention memory mapping, why isn't this an option for 32-bit Windows?

rainmaker said...

How did the lazy versions fared ?

Flying Frog Consultancy Ltd. said...

@mattiasw: "what is this sample trying to show?" How to measure the performance of binary IO from Haskell and F#. We found the performance to be surprisingly good in both cases, even through Haskell's abstractions.

"Are you using 32 or 64-bit Windows?" 32-bit Windows Vista.

"What is the purpose och having the file as a byte[] in memory?" Simplicity. .NET 4 makes it easy to load a file in its entirety as bytes or a string of text but it is not so clear how to process large files. There is no standard library function in F# to load a large file on-demand as a sequence of bytes or chars. In fact, if there were such a function it would be uselessly slow: the only viable alternative in F# would be to return a sequence of arrays because folding over an array is much faster than folding over a sequence in F#.

"What alternatives are there?" You could expose an interface like the Haskell's and use a high-level feature like a fold but we found that was much slower and, therefore, should be avoided. A buffered file stream and calls to ReadByte in an imperative loop is ideal for F#.

"You mention memory mapping, why isn't this an option for 32-bit Windows?" You don't have enough address space to memory map large files on 32-bit. On 64-bit, you could memory map huge files and let the OS take care of loading them on-demand for you. That might well be simpler and more efficient.

@rainmaker: "How did the lazy versions fared?" The on-demand versions are called lazy in Haskell speak, although they are not technically lazy because they load (or memory map) 32kB chunks of file at a time.

Veikko Eeva said...

I wonder if I understood correctly... In F# it's not feasible to use asynchronous I/O or unbuffered reads due to unmutable data structures? I'm refering to your answer to mattiasw about the availability of standard I/O APIs.

Hmm, or that there's some extra work involved in using these APIs.

mattiasw said...

If you use memory-mapped files, you are working with a window into the file. If I remember correctly, even win32 can handle files larger than 4 GB, since most likely 64-bit indexes are used.

A simple MMF sample can be found at

http://visualstudiomagazine.com/Articles/2010/11/01/Using-Memory-Mapped-Files-in-the-NET-Framework-4.aspx?Page=2

The only new stuff in .NET 4.0 is that there is a .NET-wrapper

Arseny Kapoulkine said...

This is I/O bound, at 50 Mb/sec. If a language I/O abstraction delivers less than 50 Mb/sec performance, the language implementation is seriously broken. SSD testing will likely reveal more differences.

The fact that some versions crashed/failed to compute the result, though, is... sad.

Kevin Cantu said...

I've messed around with this on Mono a bit now. On Linux, GHC is beating Mono/F# at the moment. The Mono folks are going to include F# in their packaging soon, so I expect things to improve. (Debian also needs to upgrade from Mono 2.6.7, though...)

Some graphs, etc., here: http://log.kevincantu.org/2011/01/f-vs-haskell-vs-go-or-upgrade-mono.html

I also played around with Clojure and Go. Although I haven't been getting acceptable performance out of my Clojure program, Go was pretty good.

@rainmaker: the non-lazy F# on Mono fails by design (with a descriptive error) on files larger than 2 GB, but the non-lazy Haskell just blows up.

@arseny: SATA is supposedly 1, 3, or 6 gigabits per second, right? Anyway, an SSD is on my birthday gift list...