Tuesday, 10 August 2010

New Scala consultancy company

We have uncovered an interesting development since publishing our previous article about Martin Odersky's claim that Scala is "foremost an industrial language". Apparently, Martin Odersky stated at Scala Days 2010 that he intends to create a startup offering commercial Scala support. He also mentioned it in a comment here. Needless to say, this would be a huge step forwards for Scala!

As an interesting aside, the Scala in the Enterprise web page lists some industrial users of Scala including √Člectricit√© de France Trading, Twitter, Xebia, Xerox, FourSquare, Sony, Siemans, GridGain, AppJet, Reaktor, Nature, Managed Gaming Solutions, Tiental, Sygneca, AD Holdings, SAIC, Mimesis Republic and WattzOn.

Monday, 9 August 2010

"I think F# is very cool" - Rich Hickey

Very interesting video interview here with Rich Hickey of Clojure and Joe Pamer of F# side-by-side. When the interviewer asks Rich Hickey what he thinks of F#, he says "I think F# is very cool" and then explains why.

Rich Hickey is, of course, a visionary among programming language developers having single-handedly built a self-sustaining business around his own Clojure language. Remarkably, there are already more Clojure jobs than OCaml jobs according to IT Jobs Watch and there are now twice as many Google searches for Clojure as there are for OCaml:

Saturday, 7 August 2010

"Scala is foremost an industrial language"

In a recent interview about Scala and Clojure, Martin Odersky of Scala gave some interesting answers including the following:

Q: Rich Hickey is as well-read in the academic papers as anyone, but it’s Scala that has gained the perception as an “academic language”. Why do you think that has happened?

A: I think it’s mostly people who want to put Scala down making that comment. They take my EPFL affiliation and the papers we publish as an excuse to attach that label to the language. What’s funny is that often senior industrial Scala programmers get also accused as being academic. All this is rather ironical because Scala is foremost an industrial language with many well known companies using it. By contrast it’s much less taught at universities than Haskell or Lisp, let alone Java!

This raises the obvious question: in what sense is Scala "foremost an industrial language"?

As we understand it, Scala is developed by an academic team led by professor Odersky at an academic institution with academic funding for the sole purpose of academic research and this has culminated in a new academic paper every 7½ months on average over the past decade. That is not true of any industrial programming languages. Indeed, from our experiences with Scala this is reflected as a growth in esoteric language features when basic IDE support is neglected. Some people are trying to use Scala in industry, but that is true of many academic languages. Some funding has come from industry, including a surprising grant from Microsoft to update the .NET port of Scala, but that is presumably a tiny fraction of the total funding that has been spent on Scala. So this seems like rather an odd claim.

Friday, 6 August 2010

More OCaml trends

Paolo from Italy pointed out that the number of blog posts on OCaml has continued to increase in recent years (6,420, 10,500 and 12,100 in 2007/8/9 according to Google blog search) and referred to the success of this year's OCaml meeting with 80 delegates in France. These are certainly encouraging results but it may be worth bringing more data to the table.

Firstly, Google Trends can be used to graph the proportion of Google searches for different search terms over time. The following graph shows the trends for the keywords OCaml and F# since 2004:

As you can see, the proportion of searches for OCaml (blue) is in steady decline whereas the proportion of searches for F# (red) is on the increase and the two crossed over in 2007. In fact, we have found that Google Trends correlates very strongly with our revenue.

Secondly, we can examine statistics about the job market. The following bar chart illustrates the change in UK jobs from 2008 to 2010 for four functional programming languages (source IT Jobs Watch):
As you can see, every language saw tremendous growth in this boom period for functional programming except OCaml which actually saw a decline in the number of jobs on offer.

The deal breaker for us was, of course, revenue. The proportion of our revenue coming from OCaml has fallen steadily from 80% in 2007 to just 10% today.

Thursday, 5 August 2010

Pure F# now only 2× slower than OCaml

One of the concerns expressed by some of the remaining OCaml users, such as Yaron Minsky of Jane St Capital, is the relative performance of F# in the context of purely functional data structures. Specifically, language implementations like OCaml are heavily optimized for this use case due to their legacy and on-going use for applications such as theorem proving, which benefit greatly from the efficient handling of purely functional data structures. Historically, most users of imperative languages such as Java and C# have not required this kind of performance and, consequently, their implementations have not been optimized for this style of programming. However, the recently released .NET 4 includes a new garbage collector and we have found that it provides substantial performance improvements in the context of purely functional data structures which is of particular benefit to F# developers.

We recently examined the performance of four different purely functional heap implementations across five different benchmarks written in both OCaml and F# (see Data structures: heaps). Where our previous findings in similar benchmarks showed F# to be 5× slower than OCaml, our new results indicate that F# on .NET 4 is now only 2× slower than OCaml on average. In one case, a leftist min heap with elements inserted in descending order, F# even ran 20% faster than OCaml.

This is a surprising and encouraging result not only because it makes F# competitive for an even wider variety of tasks but because it also implies that Microsoft are taking F# so seriously that they are optimizing the .NET garbage collector for it!

Sunday, 1 August 2010

Parallel generic quicksort in Haskell

Haskell has a history of making easy problems difficult. Perhaps the most infamous example was the Sieve of Eratosthenes, which is easily implemented in any imperative language but was so difficult to write in Haskell that almost all of the solutions that had been taught in universities and used in research for the preceding 18 years had been wrong until Melissa O'Neill published a seminal paper The Genuine Sieve of Eratosthenes that gave a beautiful description of what they had been doing wrong and how it should be corrected. Melissa's solution was to use a priority queue to implement a rolling wheel of numbers. The correct solution turned out to be 10× longer than a much simpler F# solution and a whopping 100× longer than the original bastardized algorithm in Haskell.

Today, quicksort is the new Sieve of Eratosthenes. Again, the academics have addressed Haskell's failure by bastardizing the algorithm, trading orders of magnitude in performance for something that Haskell can express easily:

qsort [] = [] qsort (x:xs) = qsort (filter (< x) xs) ++ [x] ++ qsort (filter (>= x) xs)

This completely fails to capture the essence of the real quicksort algorithm (see Tony Hoare's original 1962 quicksort paper) that makes it so efficient. Specifically, the in-place partitioning using swaps.

Faced with the challenge of writing a parallel generic quicksort in Haskell, Jim Apple (who is doing a PhD on Haskell at UC Davis) kicked of proceedings with the following code:

import Data.HashTable as H import Data.Array.IO import Control.Parallel.Strategies import Control.Monad import System exch a i r = do tmpi <- readArray a i tmpr <- readArray a r writeArray a i tmpr writeArray a i tmpi bool a b c = if c then a else b quicksort arr l r = if r <= l then return () else do i <- loop (l-1) r =<< readArray arr r exch arr i r withStrategy rpar $ quicksort arr l (i-1) quicksort arr (i+1) r where loop i j v = do (i', j') <- liftM2 (,) (find (>=v) (+1) (i+1)) (find (<=v) (subtract 1) (j-1)) if (i' < j') then exch arr i' j' >> loop i' j' v else return i' find p f i = if i == l then return i else bool (return i) (find p f (f i)) . p =<< readArray arr i main = do [testSize] <- fmap (fmap read) getArgs arr <- testPar testSize ans <- readArray arr (testSize `div` 2) print ans testPar testSize = do x <- testArray testSize quicksort x 0 (testSize - 1) return x testArray :: Int -> IO (IOArray Int Double) testArray testSize = do ans <- newListArray (0,testSize-1) [fromIntegral $ H.hashString $ show i | i <- [1..testSize]] return ans

This solution uses Haskell's parallel "strategies". This concept was introduced to give Haskell programmers more control over parallelization but the only available implementation was found to leak memory and nobody was able to get it to work in this case: Jim's solution contains a concurrency bug that causes it to return incorrect results almost every time it is called.

Peaker then posted his own Haskell solution:

import Data.Array.IO import Control.Monad import Control.Concurrent bool t _f True = t bool _t f False = f swap arr i j = do (iv, jv) <- liftM2 (,) (readArray arr i) (readArray arr j) writeArray arr i jv writeArray arr j iv parallel fg bg = do m <- newEmptyMVar forkIO (bg >> putMVar m ()) fg >> takeMVar m sort arr left right = when (left < right) $ do pivot <- read right loop pivot left (right - 1) (left - 1) right where read = readArray arr sw = swap arr find n pred i = bool (find n pred (n i)) (return i) . pred i =<< read i move op d i pivot = bool (return op) (sw (d op) i >> return (d op)) =<< liftM (/=pivot) (read i) loop pivot oi oj op oq = do i <- find (+1) (const (<pivot)) oi j <- find (subtract 1) (\idx cell -> cell>pivot && idx/=left) oj if i < j then do sw i j p <- move op (+1) i pivot q <- move oq (subtract 1) j pivot loop pivot (i + 1) (j - 1) p q else do sw i right forM_ (zip [left..op-1] [i-1,i-2..]) $ uncurry sw forM_ (zip [right-1,right-2..oq+1] [i+1..]) $ uncurry sw let ni = if left >= op then i + 1 else right + i - oq nj = if right-1 <= oq then i - 1 else left + i - op let thresh = 1024 strat = if nj - left < thresh || right - ni < thresh then (>>) else parallel sort arr left nj `strat` sort arr ni right main = do arr <- newListArray (0, 5) [3,1,7,2,4,8] getElems arr >>= print sort (arr :: IOArray Int Int) 0 5 getElems arr >>= print

This solution also turned out to be buggy. Firstly, it contains a more subtle concurrency bug causes it to return incorrect results only occassionally. Peaker corrected this bug to give the following code:

import System.Time import System.Random import Data.Array.IO import Control.Monad import Control.Concurrent import Control.Exception import qualified Data.List as L bool t _ True = t bool _ f False = f swap arr i j = do (iv, jv) <- liftM2 (,) (readArray arr i) (readArray arr j) writeArray arr i jv writeArray arr j iv background task = do m <- newEmptyMVar forkIO (task >>= putMVar m) return $ takeMVar m parallel fg bg = do wait <- background bg fg >> wait sort arr left right = when (left < right) $ do pivot <- read right loop pivot left (right - 1) (left - 1) right where read = readArray arr sw = swap arr find n pred i = bool (find n pred (n i)) (return i) . pred i =<< read i move op d i pivot = bool (return op) (sw (d op) i >> return (d op)) =<< liftM (/=pivot) (read i) swapRange px x nx y ny = if px x then sw x y >> swapRange px (nx x) nx (ny y) ny else return y loop pivot oi oj op oq = do i <- find (+1) (const (<pivot)) oi j <- find (subtract 1) (\idx cell -> cell>pivot && idx/=left) oj if i < j then do sw i j p <- move op (+1) i pivot q <- move oq (subtract 1) j pivot loop pivot (i + 1) (j - 1) p q else do sw i right nj <- swapRange (<op) left (+1) (i-1) (subtract 1) ni <- swapRange (>oq) (right-1) (subtract 1) (i+1) (+1) let thresh = 1024000 strat = if nj - left < thresh || right - ni < thresh then (>>) else parallel sort arr left nj `strat` sort arr ni right timed act = do TOD beforeSec beforeUSec <- getClockTime x <- act TOD afterSec afterUSec <- getClockTime return (fromIntegral (afterSec - beforeSec) + fromIntegral (afterUSec - beforeUSec) / 1000000000000, x) main = do let n = 1000000 putStrLn "Making rands" arr <- newListArray (0, n-1) =<< replicateM n (randomRIO (0, 1000000) >>= evaluate) elems <- getElems arr putStrLn "Now starting sort" (timing, _) <- timed $ sort (arr :: IOArray Int Int) 0 (n-1) print . (L.sort elems ==) =<< getElems arr putStrLn $ "Sort took " ++ show timing ++ " seconds"

This solution does run correctly on small inputs but increasing the problem size to 1,000,000 elements results in a stack overflow. Two attempts were made to diagnose this bug, here and here, but both turned out to be wrong. The bug is actually in the getElems function of the Haskell standard library which stack overflows on long arrays.

Surprisingly, more bug fixing seems to have culminated in the world's first parallel generic quicksort written in Haskell. Furthermore, the resulting Haskell solution is only around 55% slower than the equivalent F# solution. Note that this requires the latest GHC that was only released in recent weeks.

The rise and fall of OCaml

After over 3 years, The OCaml Journal is finally closing down. New subscriptions offer access to the 75 existing articles and only a few more articles will be published, to honour outstanding subscriptions.

This marks Flying Frog Consultancy Ltd. officially pulling out of the OCaml market as our remaining OCaml products (such as the OCaml for Scientists book) require no on-going maintenance.

This change has, of course, come about due to a dramatic fall in our revenue from OCaml products since 2007. This is partly because we decided in 2007 to jump ship to Microsoft's new F# programming language. However, the same trend can be seen in the rate of posts to the OCaml mailing lists, which has fallen 60% since 2007 and has now reached its lowest level for a decade:

We blame the inability of OCaml's garbage collector to allow threads to run in parallel as the primary reason for this mass exodus. OCaml was an awesome tool in the late 1990s and, by 2004, many people were finding the OCaml language from benchmark results that showed it to be one of the fastest languages available. Ironically, that was largely due to the superior performance of OCaml's garbage collector but that same garbage collector is now a serious impediment to parallel programming on today's multicore machines. Coupled with advancements like monomorphization on .NET and LLVM-based backends with vastly better code generation like GHC, OCaml has been left in the dust. Consequently, the majority of OCaml programmers who had chosen it for performance reasons (many of whom were research scientists) are leaving.

OCaml also suffers from several other problems including the cumbersome FFI that requires separate C stubs and the lack of a native-code REPL. However, all of these problems could be addressed by developing a new VM. Indeed, at only 2kLOC our own HLVM project has demonstrated just how easy LLVM makes it to build a new high-performance multicore-capable VM for OCaml-like languages. We believe a complete replacement could be developed in just a few months but, sadly, nobody has done any work on this beyond our own tentative attempts with HLVM.

On the other hand, there was never any onus on an academic institution to create such a useful tool in the first place so it is important to remember that OCaml was only valuable thanks to the hard work and dedication of its academic creators. Thanks INRIA!