Saturday, 31 July 2010

Mathematica 7 review: buggy but fun!

At only £195+VAT, the Mathematica 7 Home Edition is just too tempting as an executive toy but it still seems to be far too buggy to be taken seriously. After just a few hours of playing around, a variety of bugs have become apparent. Every Mathematica user fears the dreaded error box that marks the loss of all unsaved data:

Fortunately, a really serious bug in the FFT routines of Mathematica 7.0.0 was fixed for the 7.0.1 release. This was a showstopper for customers of our time-frequency analysis add-on. The severity and ubiquity of this bug really highlights just how little quality assurance goes into Wolfram's software which, in turn, goes to show how a unimportant correctness is in the creation of commercially-successful software products, even if they are used in aerospace engineering!

The first bug is in the new support for parallelism in Mathematica. Although it is only supposed to handle 4 cores, it produces pages of errors when run on a machine with more cores such as our 8 core system:

Half of the spawned kernels die and the errors are about miscommunications. Looks like it was not tested on anything beyond a quadcore.

Another bug appears when trying to load the example matrices as described in Mathematica's own documentation:

And there seems to be a nasty bug in the MinCut function that is supposed to find partitions that break the fewest edges because this function causes an access violation (aka segmentation fault) every time we try to use it:

A question on Stack Overflow highlighted another bug, this time in Mathematica's regular expression engine which crashes on non-trivial inputs, again losing all unsaved data:

So, although we do not hesitate to recommend Mathematica as an executive toy or even fun educational software for children, we still cannot recommend Mathematica for serious use due to the pervasiveness of serious bugs in this software. On the other hand, Mathematica's commercial success really demonstrates just how unimportant quality is in the software world.

As an aside, it is interesting to note that Mathematica is one of the languages with a bleak future in the face of multicore computing due to fundamental design issues. Specifically, the Mathematica system is largely built around the Mathematica language which is a term rewrite language evaluated by repeated substitution using symbolic replacement rules. The rules are, by definition, read and written via one giant global rule table and, consequently, direct concurrent use from multiple threads would introduce massive contention for the global table. The only viable solution is to duplicate the entire system separately on each core and communicate via message passing. This foregos the benefits of shared memory offered in a highly-efficient hardware-accelerated form on all multicore computers and, consequently, parallelism in Mathematica is orders of magnitude slower than in other languages. For example, even ignoring the distribution of definitions required to use parallelism in Mathematica, its ParallelTable construct is over 250× slower than the equivalent Array.Parallel.init in F#.

Still, Mathematica's accompanying literature alone justifies the price of just £195!

Tuesday, 27 July 2010

F#unctional Londoners meetup lecture (28th July 2010)

Zach Bray of Trayport and Jon Harrop of Flying Frog Consultancy Ltd. will be presenting lectures at Skills Matter eXchange London (UK) at 6:30pm on Wednesday 28th July 2010.

Many thanks to Carolyn Miller and Phil Trelford for organizing F#unctional Londoners Meetup Group, an excellent series of events!

Thursday, 22 July 2010

Haskell's hash tables revisited: part 2

Our previous blog post contained several benchmark results comparing the new GHC 6.12.3 with F#. We have since discovered some other points of interest regarding this benchmark.

Firstly, the results for Haskell rely on the use of a garbage collector that prevents parallelism. If the more modern multicore-friendly GC is used (by compiling with -threaded and running with +RTS -N8) then the time taken increases from 4.5s to 10.6s. This is over 2× slower than before and now over 13× slower than F#. Naturally, the F# was already using the multicore-capable .NET garbage collector so this was an unfair bias in favor of Haskell.

Secondly, the Haskell code exploits an algorithmic optimization on the assumption that the keys are unique. This is often not the case in practice and, again, the F# code did not exploit such an assumption so this was another unfair bias in favor of Haskell. A fairer comparison may be obtained by changing from the insert function to the update function in the Haskell code. Again, this incurs a substantial performance penalty and increases the time taken to 20.6.

With these alterations, Haskell becomes 26× slower than F# on this benchmark even with the latest GHC. Although these latest improvements are a step in the right direction, it seems that Haskell still has a long way to go before attaining competitive performance with mutable data structures.

Sunday, 18 July 2010

Haskell's hash tables revisited

Update: We have since discovered that these results were biased towards Haskell.

Mikhail Glushenkov recently announced the Haskell Platform 2010.2 RC for Windows. In particular, this is the first release to include a version of the Glasgow Haskell Compiler (6.12.3) that has the new garbage collector fix to address the performance problems Haskell programmers have been experiencing with mutable arrays of boxed values over the past 5 years, such as the spines of hash tables.

Naturally, we couldn't resist benchmarking the new release to see if it lives up to the promise of decent hash table performance. Even though this installer for the Haskell Platform is just a release candidate, we found that it installed smoothly and ran correctly first time.

First up, a repeat of our previous benchmark which inserted 10,000,000 bindings with int keys mapping to int values into an initially-empty hash table:

GHC 6.12.1: 19.2s
GHC 6.12.3: 4.48s
F# .NET 4: 0.8s

The new version of GHC is clearly a substantial improvement over previous versions in this context, completing this benchmark over 4× faster than its predecessor. The fastest Haskell is still 5.6× slower than F# but that is a more reasonable performance difference than before.

Altering the benchmark to build five hash tables containing 5,000,000 bindings each with float keys and values and using the floor or truncate functions as the hash function changes the relative performance considerably:

GHC 6.12.1: 131s
GHC 6.12.3: 30.2s
F# .NET 4: 2.98s

Here, the old GHC is a whopping 44× slower than F# and the latest GHC is still 10× slower.

Surprisingly, this turns out to be a 3 year old performance bug, this time in the implementation of basic numeric functions. Fortunately, this can be fixed simply by annotating the type of the truncate function. This gives the following performance:

GHC 6.12.1: 43.8s
GHC 6.12.3: 17.1s
F# .NET 4: 2.98s

The latest Haskell is now back to being only 5.7× slower than F#.

Here's the new F# code:

let cmp =
{ new System.Object()
interface System.Collections.Generic.IEqualityComparer with
member this.Equals(x, y) = x=y
member this.GetHashCode x = int x }
for _ in 1..5 do
let m = System.Collections.Generic.Dictionary(cmp)
for i=5000000 downto 1 do
m.[float i] <- float i
printfn "m[42] = %A" m.[42.0]

And here's the new Haskell code:

import qualified Data.HashTable as H
import GHC.Float

act 0 = return ()
act n = do
ht <- (==) (fromIntegral . (truncate :: Double -> Int)) :: IO (H.HashTable Double Double)
let loop 0 = return ()
loop i = do
H.insert ht (fromIntegral i) (fromIntegral $ i+n)
loop (i-1)
loop (5*(10^6))
ans <- H.lookup ht 42.0
print ans
act (n-1)

main :: IO ()
main = act 5

Monday, 12 July 2010

Mono 2.4 still leaking like a sieve

How fast are hash tables in Mono 2.4 compared to .NET 4? An excellent question but one which led us to another trivial program that leaks memory on Mono but not on .NET (we previously gave a simple stack implementation written in F# that leaked memory on Mono 2.2).

We tried to use the following benchmark program to measure Mono's performance when filling a hash table from empty with float keys and values:

for i in 1..10 do let t = System.Diagnostics.Stopwatch.StartNew() let m = System.Collections.Generic.Dictionary() let mutable x = 0.0 for i=1 to 10000000 do m.[x] <- x x <- x + 1.0 printfn "m[42] = %g" m.[42.0] printfn "Took %gs\n" t.Elapsed.TotalSeconds

Here's the output of that program on Mono 2.4:

$ mono HashTableBenchmark.exe m[42] = 42 Took 3.45099s m[42] = 42 Took 3.2431s m[42] = 42 Took 3.39288s m[42] = 42 Took 27.2352s Unhandled Exception: System.OutOfMemoryException: Out of memory at (wrapper managed-to-native) object:__icall_wrapper_mono_array_new_specific (intptr,int) at System.Collections.Generic.Dictionary`2[System.Double,System.Double].Resize () [0x00000] at System.Collections.Generic.Dictionary`2[System.Double,System.Double].set_Item (Double key, Double value) [0x00000] at <StartupCode$HashTableBenchmark>.$Program.main@ () [0x00000]

Even more surprisingly, this program still dies with out of memory even if we explicitly Clear the hash table after every iteration!

Animated Mathematica functions

Here's a fun web page from Wolfram Research that has animations for a bunch of Mathematica's built-in functions.

Book review: Mathematica Cookbook by Sal Mangano

O'Reilly just published a new book, the Mathematica Cookbook, about Wolfram Research's flagship product. This book contains many interesting examples from various different disciplines. Most of these are derived from freely available examples written by other people (primarily from Wolfram Research's original Mathematica Book and also the excellent Wolfram Demonstrations Project) but the author has simplified some of the programs to make them more accessible.

However, the density of the information in this book is incredibly low. Most pages are filled with superfluous Mathematica output that is often not even described in the text:

  • Dozens of triples on page 15.
  • Page 58 lists hundreds of numbers but the text does not even describe their significance.
  • Page 205 lists all of the words with a subset of the letters "thanksgiv".
  • Page 226 is raw XML data.
  • Pages 264-265 are solid code that renders a snowman with circles and some dots for snow (all in black and white).
  • Pages 303-304 are two more pages of code to plot a snowman in 3D.
  • Page 321 lists all of the properties in Mathematica's polyhedron database and page 322 draws 20 polyhedra before page 324 enumerates exactly the same polyhedron properties again.
  • Pages 334-335 enumerate image data numerically (three times).
  • Page 359 displays quadrant swapping with 25x more matrix elements than necessary.
  • Page 360 is the Fourier transform of an image and page 361 is an apparently identical one.
  • Page 507 lists every chemical element and page 571 lists every property of them.
  • Page 553 lists every property available for financial data and page 554 lists them all over again.
  • Page 572 contains an empty graph.
  • and so on...

Many pages are devoted to TreeForm plots of trees that convey no useful information, e.g. the trees drawn on page 129 in the section about red-black trees:

Would have been nice to see red and black colored nodes or even learn how to do that using Mathematica (assuming it is possible). In particular, half of page 115 is a TreeForm view so badly laid out by Mathematica that it is unreadable.

There are also some problems with the technical content itself. For example, the section on red-black trees is a simple translation from Okasaki's excellent book "Purely functional data structures". The only new functionality Sal Mangano added is a remove function but his implementation is completely wrong.

The book is chocked full of advice about optimization. This is both good and bad. Good because Mathematica users need all the help they can get with performance because Mathematica is so slow. Bad because the best advice for a Mathematica user when performance is even vaguely relevant is to ditch Mathematica in favor of a more modern tool with better performance. For example, section 14.10 "Compiling an Implementation of Explicit Trinomial for Fast Pricing of American Options" states that "You need a very fast pricer for American options" but the fastest Mathematica code given can be a whopping 960× slower than a comparably-simple F# solution. Furthermore, it turns out that this was not written by Sal Mangano (the author of the book) but by Andreas Lauschke (a Mathematica and Java consultant) and it is a translation of MATLAB code written by Ansgar J√ľngel (an academic).

There are several multi-page program listings written as plain Mathematica code with embedded comments that are difficult to read (they are not even spaced out, let alone typeset properly!) instead of prose. Page 265 is a block of code, for example:

Although the book does contain a frustrating amount of blatant filler, it is very cheap (O'Reilly) and does contain several thought-provoking examples. Whether you use Mathematica or not, you'll probably learn something useful about the state-of-the-art in the Mathematica world by skimming this book. On a higher level, you will probably learn a lot more about business if you consider what this book and the army of Mathematica devotees behind it represent. After all, there is a lot of money in starting a religion!

FWIW, here are the interesting references from the book to original sources:

Monday, 5 July 2010

Purely functional games

A recent blog post entitled "Follow up to functional programming doesn't work" recently caused a bit of a stir, encouraging Haskell programmers to dredge up the nearest things Haskell has to real computer games. A Super Mario clone, Frag (a reimplementation of part of Quake), a Gradius clone, 4Blocks and a game that blows all of the others away called Bloxors:

This beautiful little OpenGL-based puzzle game weighs in at just 613 lines of purely functional code (many of the other games use unsafe* functions to introduce uncontrolled side effects). This is also one of the few programs that cabal install actually works on.

Check it out here!

Friday, 2 July 2010

Debunking the 100× GPU vs. CPU myth

Intel recently published a paper with no less than 12 authors called Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU where they criticize the huge performance discrepancies cited by researchers publishing about General-Purpose GPU (GPGPU) programming in the context of what they call "throughput computing".

We have also noticed bad science in this domain before. We tried to reproduce the incredible results of one paper, with a view to entering this market ourselves, only to discover they had used the reference implementation of LAPACK instead of the vendor tuned implementation for their CPU that was 10× faster. Like Intel, we found that the performance advantage of a GPU was relatively modest (2.5×) given the enormous costs and liabilities of using a GPU for number crunching. However, we are fortunate to be able to simply dismiss fantastical results as irrelevant propaganda but Intel are presumably feeling the pinch as misinformed customers flock to buy nVidia's GPUs to attack problems for which a better Intel CPU would have been more appropriate.

On the other hand, the failure of manycore GPUs to attain competitive performance for all but a handful of tasks raises the question of how much general software stands to gain from the multicore revolution? Is this the end of the line for performance as we know it?

Intel's careful use of the phrase "throughput computing" is significant. Their point is that memory hierarchies are the shared headache. Applications that perform local computations in registers without need of external data can easily be made to scale extremely well. Computing Mandelbrot fractals is one example of this. For example, GPUs should excel at rendering the complex Mandelbulb fractal that was discovered in recent years:

Awesome stuff, but what is it useful for?