Friday, 3 April 2009

F# vs OCaml vs Haskell: hash table performance

I just stumbled upon an interesting benchmark. Put mappings i -> i for i in 1..10M in a hash table, print the entry for 100. Don Stewart's optimized Haskell:

import Control.Monad
import qualified Data.HashTable as H

main = do
m <- H.new (==) H.hashInt
forM_ [1..10000000] $ \n -> H.insert m n n
v <- H.lookup m 100
print v

My OCaml translation:

let n = 10000000

let () =
let m = Hashtbl.create n in
for n=1 to n do
Hashtbl.add m n n
done;
Printf.printf "%d\n%!" (Hashtbl.find m 100)

My F# translation:

do
let m = System.Collections.Generic.Dictionary()
for i = 1 to 10000000 do
m.[i] <- i
printf "%d\n" m.[100]

Performance:

Haskell: 22.8s
OCaml: 2.82s
F#: 0.706s

I found it remarkable that the performance difference is so large: F# is 32× faster than the optimized Haskell here!

Thanks to everyone who has replied. Apparently this is a fundamental design flaw in GHC's garbage collector which is incapable of handling the mutation of arrays of boxed values (e.g. hash table buckets) efficiently. Unfortunately, this means that Haskell's defacto-standard compiler is incapable of handling any data structure that maps values onto values efficiently. The nearest alternative is a purely functional tree-based map but that is not only still 11× slower than F# but it culminates in a data structure that is then asymptotically slower to search as well:

import Data.IntMap
import Data.List

main = print $ m ! 100
where m = fromDistinctAscList [(i,i) | i <- [1..10000000]]

This program still takes 7.519s.

Perhaps the most remarkable find is not that OCaml and F# thrash Haskell on this benchmark (which is to be expected: OCaml and F# are not toys) but that even a Python one-liner thrashes the most heavily optimized Haskell code:

$ time python -c 'import psyco;psyco.full();print dict((i,i) for i in xrange(10000000))[100]'
100

real 0m5.738s
user 0m5.428s
sys 0m0.308s

If only Haskell programmers could take some time away from writing Fibonacci functions perhaps they could also build some kind of adequate compiler.

16 comments:

Jean-Denis said...

{-# OPTIONS -XBangPatterns #-}

import qualified Data.HashTable as H (new, insert, lookup, hashInt)
import Control.Monad

main = do
h <- H.new (==) H.hashInt
forM_ [1..n] $ \ !x -> H.insert h x x
print =<< H.lookup h 100
where n = 1000000

Try this !
And I'm not even a real Haskell programmer !

Flying Frog Consultancy Ltd. said...

@Jean-Denis

That is 2× faster than the original Haskell but still 30× slower than my F#.

Paddy3118 said...

I thought I'd try it in python, with the extra psyco, accellerator:

bash$ date; time /cygdrive/c/python26/python -c 'import psyco;psyco.full();print
dict((i,i) for i in xrange(10000000))[100]'; date
Fri Apr 3 16:57:46 GMTDT 2009
100

real 0m14.155s
user 0m0.015s
sys 0m0.031s
Fri Apr 3 16:58:02 GMTDT 2009
bash$
bash$
bash$ date; time /cygdrive/c/python26/python -c 'import psyco;psyco.full();print
dict((i,i) for i in xrange(10000000))[100]'; date
Fri Apr 3 16:58:22 GMTDT 2009
100

real 0m10.558s
user 0m0.015s
sys 0m0.030s
Fri Apr 3 16:58:33 GMTDT 2009
bash$ date; time /cygdrive/c/python26/python -c 'import psyco;psyco.full();print dict((i,i) for i in xrange(10000000))[100]'; date
Fri Apr 3 16:59:53 GMTDT 2009
100

real 0m9.860s
user 0m0.015s
sys 0m0.031s
Fri Apr 3 17:00:03 GMTDT 2009
bash$

- Paddy.

how.gauche said...

I don't have F# but I can compare C++:

#include <tr1/unordered_map>
#include <iostream>

using namespace std;
using namespace std::tr1;

int
main()
{
    const int N = 1000000;
    unordered_map<int,int> h;

    for (int i=0; i < N; i++) {
        h[i] = i;
    }

    cout << h[100] << '\n';

    return 0;
}

and Haskell:

{-# LANGUAGE BangPatterns #-}
import qualified Data.HashTable as H
import Data.Maybe

eqInt :: Int -> Int -> Bool
eqInt a b = a == b

type HT = H.HashTable Int Int

n = 1000000

loadHashtable :: HT -> IO ()
loadHashtable ht = work n
  where
    work :: Int -> IO ()
    work !i = do
      if i < 0 then
          return ()
        else do
          H.insert ht i i
          work (i-1)

main = do
  h <- H.new eqInt H.hashInt
  loadHashtable h
  i <- (H.lookup h 100) >>= (return . show . fromJust)
  putStrLn i

The results:
~/tmp/haskell/hashtable/c++ $ g++ -O2 -o hashtable hashtable.cpp 
~/tmp/haskell/hashtable/c++ $ time ./hashtable 
100
0.342 secs

~/tmp/haskell/hashtable/haskell $ ghc -O2 --make HashTest.hs 
[1 of 1] Compiling Main             ( HashTest.hs, HashTest.o )
Linking HashTest ...
~/tmp/haskell/hashtable/haskell $ time ./HashTest 
100
0.733 secs

pumpkin said...

No one uses hashtables in haskell (they're stateful by nature, and most people aren't willing to stick their otherwise-pure code into IO to use a hashtable), so I believe their code hasn't been touched in years, and is thus quite slow.

Data.Map or Data.IntMap is a lot more common in haskell, and both are quite fast. A benchmark like this is a bit like saying that C sucks because you get a stack overflow if you write your loops as function recursion. You're applying structures from one language paradigm to another.

andre said...

Changing "Hashtbl.create 1" to "Hashtbl.create n" makes the OCaml version more than twice as fast here. Do you know the initial dictionary size in F#?

Rodrigo Barrientos said...

import Data.IntMap
import Data.List

main = print $ m ! 100
where m = fromDistinctAscList [(i,i) | i <- [1..10000000]]

Very expressive, not that bad performance for an immutable data structure:

[15:40 alake:~/]% ghc -O2 --make map.hs
[1 of 1] Compiling Main ( map.hs, map.o )
Linking map ...
[15:40 alake:~/]% time ./map
100
./map 6.68s user 0.69s system 66% cpu 11.071 total

Xe said...

Hmm, here we compare performance of .Net - high quality optimizing JIT runtime with "something" for Haskell or OCaml.

Don Stewart said...

You're spending most of your time doing GC here, Jon. Also, I'm not sure why you're doing all that list allocation in the Haskell code. I'd have just written this:

import Control.Monad
import qualified Data.HashTable as H

main = do
m <- H.new (==) H.hashInt
forM_ [1..10000000] $ \n -> H.insert m n n
v <- H.lookup m 100
print v


$ ghc -O2 A.hs --make

Now, by default, slow as a dog,

$ time ./A
Just 100
./A 49.68s user 1.26s system 98% cpu 51.790 total

Bump the heap a bit:

$ time ./A +RTS -H1G
Just 100
./A +RTS -H1G 6.73s user 0.87s system 97% cpu 7.807 total

GC is traversing the boxed array structure needlessly.

Don Stewart said...

BTW, isn't this a rather strange hashtable benchmark? After all, it is equivalent to say, this, which is, what, 1000x times faster.

import Data.Array.Vector

main = print (a `indexU` 100)
where
a :: UArr Int
a = enumFromToU 0 10000000

$ time ./A
100
./A 0.00s user 0.00s system 83% cpu 0.004 total

Ah, perfect hashing :-)

k4ntico said...

@paddy

The psyco python accelerator in fact does not understand/optimize generators, and a quick test with py2.5 on windows shows that removing it from your one-liner scales running time *down* to 75%

Rewriting the loop (without generator nor acceleration) puts that factor at 43%, and using psyco on the rewritten loop brings it to 29%.

Which makes Python nearly twice as fast as *Ocaml" on this test, and a bit more than two times slower than F#.

Paddy3118 said...

@k4ntico, Thanks for the rework.
- Paddy.

Flying Frog Consultancy Ltd. said...

@k4ntico

I cannot reproduce your results using Python 2.5.4 and Psyco 1.6. I get 4.64s for your optimized Python compared to 5.8s for the old Python and 2.82s for my OCaml.

Flying Frog Consultancy Ltd. said...

Ah, I can reproduce your results if I interpret the OCaml rather than compile it. You need to compile the OCaml if you want it to run quickly, of course...

k4ntico said...

I don't have Ocaml and do not intend to install it. I simply compared running times using python 2.5.x timer module starting from the one-liner that was published here, on my laptop, and calibrated on the value you gave. Laptop is 2Ghz win xp sp3 1Gb centrino 1. I can't tell you the python 2.5 minor version as I don't have the machine at hand, assume most recent, the code I used I publiched on comp.lang.functional

Derek Cordeiro said...

Hello,

Just browsing through, I tested the same code(no psyco on system):

derekjc@derek-SVS15125CNB:~$ time ./ghc-test
100

real 0m2.074s
user 0m1.862s
sys 0m0.211s

derekjc@derek-SVS15125CNB:~$ time ./ocaml-test
100

real 0m5.740s
user 0m5.630s
sys 0m0.112s

derekjc@derek-SVS15125CNB:~$ time python -c 'print dict((i,i) for i in xrange(10000000))[100]'
100

real 0m1.229s
user 0m1.072s
sys 0m0.157s


Is python really that fast? Is fsharp(below) even faster?

derekjc@derek-SVS15125CNB:~$ time ./test.exe
100

real 0m0.527s
user 0m0.466s
sys 0m0.061s