Monday, 30 May 2016

Disadvantages of purely functional programming

I have been asked to elaborate on my answer on Quora so here goes:


1.       There is no efficient purely functional unsorted dictionary or set

Since the 1990s the use of dictionaries in software has gone through the roof. Dictionaries are now a stock collection type that every programmer expects to find in their standard library.

Purely functional or persistent data structures such as those found in Okasaki’s fabulous monograph on the subject can be a great tool. In particular, the persistence they offer means you can reuse old versions of collections without having to worry about mutation. In many cases (particularly for some kinds of problems such as logic programming and compiler writing) this can make solutions shorter and clearer, partly because it makes backtracking trivial. However, persistence comes at a great cost in terms of performance: purely functional dictionaries are typically 10x slower than a decent hash table and I have seen them run up to 40x slower. For some applications this is just too slow.

Furthermore, most functional programming languages (OCaml, Haskell, Scala) are incapable of expressing a fast generic mutable hash table because they lack the killer combo of: reified generics, value types and a fast GC write barrier.

BEWARE: people who try to claim that Haskell’s purely functional dictionaries are fast by comparing them with Haskell’s mutable hash tables. The correct conclusion is that Haskell’s mutable hash tables are slow.


2.       There is no purely functional weak hash table.

With a garbage collected imperative language, the relationships between the vertices and edges of a graph can be expressed using weak hash tables. The garbage collector will then collect unreachable subgraphs for you. There is no purely functional weak hash table so, in a pure language, you must write your own garbage collector.

Note that this is a really fringe disadvantage with most developers never having used a weak hash table!


3.       There are no purely functional concurrent collections.

By definition, immutable collections cannot support concurrent mutation. Consequently, if you want a shared mutable collection such as an in-memory database then there is no efficient purely functional solution.


4.       Most graph algorithms look worse and run much slower when written in an FP style.

Purely functional programming is a great tool for some kinds of problems but graph algorithms are one place where I have noticed that pure solutions are often worse both in terms of speed and clarity.

Compare Prim’s algorithm in 12 lines of Python with Prim’s algorithm in 20 lines of Haskell. And why does the Haskell use Prim’s algorithm? Probably because Kruskal’s algorithm is built upon the union-find collection and there is no known efficient purely functional union-find collection.


5.       The inertia of traditional imperative data structures and algorithms is huge.

Beyond graph algorithms, there are many parts of computer science where 65 years of published literature has focused almost entirely upon imperative solutions. Consequently, imperative programmers can easily build upon the backs of giants whereas purely functional programmers are often left starting from scratch. After all, just a few years ago memoization in Haskell was the topic of a PhD thesis!

I once challenged a group of Haskell programmers (several of whom had PhDs in Haskell) to write an efficient generic parallelised quicksort in Haskell and this is what happened.


6.       All existing implementations of functional programming languages, both pure and impure, happen to allocate far too much by design.

Around 1960, McCarthy invented Lisp. The core data structure was the singly-linked list. Each list node was a separate heap allocated block. All modern functional languages evolved from this. In the 1970s, Scheme used essentially the same data representation strategy as Lisp. In the 1980s, SML added a little unboxing with tuples heap allocated as a single block of memory. In the 1990s, OCaml added a little more with unboxed float arrays. Haskell added the ability to unbox some data. But to date no functional programming language has unboxed tuples by default. Even F#, which sits on .NET which provides arbitrary value types, still uses .NET’s boxed tuples. Consequently, all modern functional programming languages incur very high allocation rates for essentially no good reason. Consequently, they all stress their garbage collectors far more than necessary. This is a serious problem not just because it makes serial code slow but because the garbage collector is a shared resource and, therefore, stressing the GC impedes the scalability of parallel programs.

I should note that calling this a “disadvantage” is contentious. Xavier Leroy of OCaml fame regards OCaml’s Lisp-like data representation as a good thing because it is the backbone of OCaml’s excellent performance when running the Coq theorem prover. Here Xavier asserted that “OCaml's strategy is close to optimal for symbolic computing”. Functional languages are often optimised for high performance purely functional collections at the expense of low performance imperative collections. Given that imperative collections are generally faster, this puts an artificially-low ceiling on the performance of almost all functional languages.


7.       Purely functional programming is theoretically good for parallelism but bad for performance in practice, which is the sole purpose of parallelism.

There are two reasons to write parallel programs today. The first is to write objectively fast solutions. The second is to make a slow solution less slow. In most cases, parallel programming in functional languages is a form of the latter. Almost nobody in high performance computing circles (i.e. supercomputers) is running functional code directly. When most functional programmers employ parallel programming today they do so not to attain the best absolute performance but just to improve the performance they have.

Purely functional languages like Haskell are designed to abstract away space and time. This gives you a higher-level perspective of your solution but it makes it very hard to reason about the amount of memory or length of time a Haskell program will require to produce a result. In parallel programming it is always important to make sure that the gain from parallelisation outweighs the administrative overheads of running code in parallel. Haskell makes this very hard. So hard, in fact, that published research on parallel Haskell notoriously cherry picks the degree of parallelisation that maximises performance even though that degree could not be predicted before running the program many times. I have found that straightforward parallelization often yields reliable speedups in languages like C++ but not in Haskell where performance is unpredictable.

BEWARE: People who talk only about scalability and disregard absolute performance. You can improve the scalability of almost any parallel program by redundantly recomputing the Mandelbrot set after each line of code for no reason because most of the time will then be spent in embarrassingly parallel code. Scalability is necessary but insufficient. You must also look at absolute performance.


8.       It took 50 years for normal people to dilute the smug weenies to the point where you can get a useful answer about functional programming on social media.

I’ve been doing functional programming for over 20 years now. For decades there was a social chasm between functional programmers and people who had real problems to solve. Thankfully this problem is now starting to dissolve away with functional languages like Scala, Clojure and F# being used for real work but for many years the predominantly-smug-weenies dominated the functional scene, making it hard for people to get real solutions to their real problems. Part of the reason for this was that, having languished in obscurity for so many decades, some communities (most notably Lisp) had highly evolved (but wrong) arguments as to why Lisp was good. It took me many years to figure out what was wrong with these arguments.


9.       There is a huge amount of misinformation about functional programming in circulation.

If you criticise the performance of hash tables in Haskell (more recently here) then you get pure misinformation from the leading lights of the community such as someone advising people to effectively turn off garbage collection.

For years the functional programming community brandished beautifully short implementations of the Sieve of Eratosthenes and Quicksort algorithms. These were even taught to students for years. Only many years later did it emerge that their solutions did not implement those algorithms. Melissa O’Neill even published a paper correcting the Sieve of Eratosthenes in Haskell. In particular, her genuine sieve requires 100x more code than the original Haskell. Same for quicksort where Haskell’s elegant two-line sort is over 1,000x slower than Sedgewick’s Quicksort in C because the Haskell deep copies lists over and over again, completely blowing the asymptotic IO complexity of Hoare original algorithm.

See also “Why is Haskell used so little in industry?” for a thorough debunking of the Haskell in Industry page.



Panicz said...

I think that the point (1) on the lack of "efficient purely functional unsorded dictionary or set" misses a reference to Clojure's sets and maps and an explanation what is wrong with them.

Thanks for this article, by the way.

Anthony Lloyd said...

For point (3) FP has a completely different way of handling shared state. Shared/global state is represented by a reference to immutable data structures which is updated atomically. After seeing the simplicity and correctness of this I begin to question that OO in memory database and caching systems are valid without a large degree of cross collection locking. These complex dependencies lead to concurrency bugs that are very hard to reason about and solve.

Anthony Lloyd said...

On point (7) this looks to be more about lower level languages vs higher level. Supercomputers tend to be programmed using C, C++, Fortran etc without a GC. The code is very optimised low level end of the language spectrum.

FP languages are at a higher level. Its easier and quicker to solve more complex problems with them than low level languages. Most of the time CPU cycles are cheaper than developer cycles. Hence premature optimisation being an anti-pattern.

I use F# rather than Haskell but I don't recognise your observation that parallelism in FP being unpredictable and often not resulting in a speedup. FP makes parallelisation easier and safer. It can produce a significant speedup especially for embarrassingly parallel problems. This helps tip the cycles balance towards FP lanaguages.

Arturo Hernandez said...
This comment has been removed by the author.
Egg Syntax said...

Could you link to the original Quora question? I'd be curious to take a look. Thanks!

Nicolas said...

You are very much right about many points.

At the risk of adding more abstract nonsense, I think there is a hidden layer of monoidal structure which is absolutely crucial and vastly underappreciated to any program. We are seeing it in the ST suggestion from Peaker to have

forkSTArray :: STVector s a -> Int ->
(forall s1. STVector s1 a -> ST s1 o1) ->
(forall s2. STVector s2 a -> ST s2 o2) ->
ST s (o1, o2)

That is a crucial point which generalizes to other problems, but I have not seen it being taken that seriously

Jason Dusek said...

@"Sara Haberlin" If we remove the comments, whitespace, module preamble and imports the Haskell implementation is just a little bit shorter -- 20 lines instead of 22.

Jon Harrop said...
This comment has been removed by the author.
Jon Harrop said...

@Jason: Here is an exact comparison showing just how much longer the Haskell is.

The Python (275 chars after trimming whitespace from the ends of lines):

from sys import argv
import re

T = set();
X = set();


while len(X) != vertices:
crossing = set();
for x in X:
for k in range(vertices):
if k not in X and graph[x][k] != 0:
crossing.add((x, k))
edge = sorted(crossing, key=lambda e:graph[e[0]][e[1]])[0];

and here is the 1,123 chars of Haskell):

module Data.Graph.Inductive.Query.MST (
) where

import Data.Graph.Inductive.Graph
import Data.Graph.Inductive.Internal.RootPath
import qualified Data.Graph.Inductive.Internal.Heap as H

newEdges :: Ord b => LPath b -> Context a b -> [H.Heap b (LPath b)]
newEdges (LP p) (_,_,_,s) = map (\(l,v)->H.unit l (LP ((v,l):p))) s

prim :: (Graph gr,Real b) => H.Heap b (LPath b) -> gr a b -> LRTree b
prim h g | H.isEmpty h || isEmpty g = []
prim h g =
case match v g of
(Just c,g') -> p:prim (H.mergeAll (h':newEdges p c)) g'
(Nothing,g') -> prim h' g'
where (_,p@(LP ((v,_):_)),h') = H.splitMin h

msTreeAt :: (Graph gr,Real b) => Node -> gr a b -> LRTree b
msTreeAt v g = prim (H.unit 0 (LP [(v,0)])) g

msTree :: (Graph gr,Real b) => gr a b -> LRTree b
msTree g = msTreeAt v g where ((_,v,_,_),_) = matchAny g

msPath :: Real b => LRTree b -> Node -> Node -> Path
msPath t a b = joinPaths (getLPathNodes a t) (getLPathNodes b t)

joinPaths :: Path -> Path -> Path
joinPaths p q = joinAt (head p) p q

joinAt :: Node -> Path -> Path -> Path
joinAt _ (v:vs) (w:ws) | v==w = joinAt v vs ws
joinAt x p q = reverse p++(x:q)

Jason Dusek said...

Ah, so you're removing the stuff about reading the file and setting up the graph, the regex, &c from the Python. You have saved a few lines and I see how you get your 12 lines figure now. But we can also see the Python gains at least one line by not exporting a function; whereas the Haskell is a module intended to be re-used. To be fair you have to remove the module declaration, saving four or five lines, depending.

There are many more characters in the Haskell answer, I suppose; but one is little accustomed to counting length of code that way.

Jon Harrop said...

@Jason: Yes. The Haskell code is far more verbose for many reasons, one of which is that Haskell cannot express a solution to this problem succinctly. The type annotations are a major eyesore and repeating "joinAt" three times in the definition of that one function is absurd.

Andrew Webb said...

>> some communities (most notably Lisp) had highly evolved (but wrong)
>> arguments as to why Lisp was good. It took me many years to figure
>> out what was wrong with these arguments.

Can you elaborate on this? I, for one, am fascinated with this issue but haven't put in the years of thinking about it that you have.

Jules said...

Is OCaml's strategy really close to optimal for symbolic computing? I think it would be fairly easy to beat it with a type specialised representation. Algebraic data types are represented as a pointer to a tag (64 bit) + payload in OCaml. You can eliminate the tag by encoding the data constructor within the pointer (plenty of space on 64 bit). This eliminates 64 bits of space per value, and it gives you access to the constructor without a memory dereference, which would speed up pattern matching. This representation would not be possible with a uniform representation, as the GC wouldn't know how to scan such a pointer.

In addition you could unbox tuples, and you could unbox algebraic data types when the payload fits directly in the remaining bits of the pointer. For example Choice would then not need any pointers on a 64 bit architecture.

Jules said...

That should be Choice < Bool , Int32 >.

YouLoseBellyFat said...

csharp codes is great coding site

erik said...

I believe Chicken Scheme allocates conses on the stack (via a cheney-on-the-mta esque compilation strategy), which would essentially be an unboxed 2-tuple.

Ville said...

Even F#, which sits on .NET which provides arbitrary value types, still uses .NET’s boxed tuples.

F# will get value tuples soon.

Jason Dusek said...
This comment has been removed by the author.
Jason Dusek said...
This comment has been removed by the author.
Jason Dusek said...

@"Jon Harrop":

You write:

> Yes. The Haskell code is far more verbose for many reasons, one of which is that Haskell cannot express a solution to this problem succinctly.

and then:

> The type annotations are a major eyesore and repeating "joinAt" three times in the definition of that one function is absurd.

The first excerpt has to be considered a kind of assuming the conclusion: if Haskell is incapable of succinctly expressing a solution, the result must of course be verbose.

With regards to the types being an eyesore: Haskell has type annotations and Python until recently didn't; but if the type annotations were in Python, would they be better? If so, in what way?

Unknown said...

@nicholas - do you mean that many operations are associative?

Sean Pietz said...

This is confusing the concepts of purity and immutability. It's possible to use mutable data structures in purely functional code. Pure code has no "side-effects" but it can still have effects!

Sean Pietz said...

This is confusing the concepts of purity and immutability. It's possible to use mutable data structures in purely functional code. Pure code has no "side-effects" but it can still have effects!

Sean Pietz said...

And why is it this guy so bent on trashing haskell and pure fp. He seems bitter for some reason. Maybe tried to learn haskell, found it too difficult -- sour grapes?

Anonymous said...

The graph argument is false, if you base algo on a particular underlying implementation. e.g. someone has written a bunch of algos that all assume a Rose-Tree. then the argument is really 'you shouldn't use a rose tree as a generic graph representation' ... which I'd agree with. Abstract out the interfaces that you expect from a graph and I think this becomes simpler and potentially faster depending on what underlying representation is used.

Jon Harrop said...

@erik: "I believe Chicken Scheme allocates conses on the stack (via a cheney-on-the-mta esque compilation strategy), which would essentially be an unboxed 2-tuple."

No it isn't! The JVM made the same mistake with escape analysis. Stack allocating temporaries turns out to be a relatively unproductive special case to optimise. It is more important to unbox data in the heap.

@Sean Pietz: "This is confusing the concepts of purity and immutability. It's possible to use mutable data structures in purely functional code. Pure code has no "side-effects" but it can still have effects!"

I have restricted consideration to fully persistent data structures, true. There are grey areas like Conchon and Filliatre's somewhat persistent disjoint union and that collection does offer excellent performance. So YMMV.

@Michael Lewis: "Abstract out the interfaces that you expect from a graph and I think this becomes simpler and potentially faster depending on what underlying representation is used"

Can you simplify the Haskell code I posted above to the point that it is as short as the Python I posted?

pranjal rajput said...

interesting article.
what are your thoughts on future of Elixir?
does it have any promise of being used seriously in industry ?

Eirik Tsarpalis said...

Great article.

Re: point 3 a good compromise between immutable collections and concurrency is wrap around using some form of OCC, e.g. clojure-style atoms (an F# port It's a good replacement for ConcurrentDictionary if you don't mind the perf impact and need a good way of handing out persistent copies of the collection.

Teju Teju said...

It was really a nice article and i was really impressed by reading this Data Science online Training India

Unknown said...

Great post dear. It definitely has increased my knowledge on Spark. Please keep sharing similar write ups of yours. You can check this too for Spark tutrial as i have recorded this recently on Spark. and i'm sure it will be helpful to you.

Brain Carve said...

nice post..Abacus Classes in chennai said...

Thank you for sharing this Blog, It includes very well information.
iphone apps training in hyderabad
iphone job oriented course

kHAN BOOKS said...

Download computer Programming ebooks for free

Roja Priya said...

Thank you for sharing your article. Great efforts put it to find the list of articles which is very useful to know, Definitely will share the same to other forums.
Data Science Training in chennai at Credo Systemz | data science course fees in chennai | data science course in chennai quora | data science with python training in chennai

sumant kumar said...

thank you for sharing useful post.
web programming tutorial

BCL India said...

Nice post! A good Brain Development program for kids helps to their overall development effect on their human development later. So it is parents responsibility to give best brain training to your child.

Brain Development Exercise

kate technologies said...

Thank you for your post. This is excellent information. It is amazing and wonderful to visit your site.
php application development
php web application development company india

Brain Carve said...

Low Cost Franchise Opportunities in chennai
education franchise opportunities
franchise opportunities in chennai

Anjali Siva said...

The given information was excellent and useful. This is one of the excellent blog, I have come across. Do share more.
Data Science Training in Chennai
Data Analytics Training in Chennai
Data Science Certification in Chennai
Data Science Training in Velachery
R Training in Chennai
R Programming Training in Chennai
Machine Learning Training in Chennai
Machine Learning institute in Chennai
Data Science Course in Chennai

divya said...

Amazing experience on reading your article. It is really nice and informative.
Python Training in Chennai
Python Classes in Chennai
JAVA Training in Chennai
Hadoop Training in Chennai
Selenium Training in Chennai
Python Training in Chennai
Python Course in Chennai

Indonesian Training Courses said...

Durai Raj said...

Magnificent article!!! the blog which you have shared is informative...Thanks for sharing with us...
Digital Marketing Training in Coimbatore
Digital Marketing Course in Coimbatore
digital marketing courses in bangalore
best digital marketing courses in bangalore
RPA training in bangalore
Selenium Training in Bangalore
Java Training in Madurai
Oracle Training in Coimbatore
PHP Training in Coimbatore

Kerrthika K said...

It's Looks deeply awesome article!!which you have posted is useful.
IELTS Coaching in Anna Nagar
German Classes in Anna Nagar
Spoken English Class in Anna Nagar
French Classes in Anna Nagar
AWS Training in Anna Nagar

Riya Raj said...

Outstanding blog!!! Thanks for sharing with us...
IELTS Coaching in Coimbatore
best ielts coaching center in coimbatore
Software Testing Course in Coimbatore
Spoken English Class in Coimbatore
Web Designing Course in Coimbatore
Tally Course in Coimbatore

Aman CSE said...

Thanks for sharing such a wonderful blog on Machine learning.This blog contains so much data about Machine learning ,like if anyone who is searching for the Machine learning data will easily grab the knowledge of Machine learning from this .Requested you to please keep sharing these type of useful content so that other can get benefit from your shared content.
Thanks and Regards,
Top institutes for machine learning in chennai
best machine learning institute in chennai
artificial intelligence and machine learning course in chennai

Durai Raj said...

Thanks for this blog. It is more Interesting...
CCNA Course in Coimbatore
CCNA Course in Coimbatore With Placement
CCNA Course in Madurai
Best CCNA Institute in Madurai
Java Training in Bangalore
Python Training in Bangalore
IELTS Coaching in Madurai
IELTS Coaching in Coimbatore
Java Training in Coimbatore

kavithasathish said...

Great information to everyone. Great job done.
IELTS Coaching in Chennai
IELTS Training in Chennai
Spoken English Classes in Chennai
French Classes in Chennai
pearson vue
German Classes in Chennai
IELTS Coaching in anna nagar
ielts coaching in chennai anna nagar said...


lavithran said...

Good job and great informative blog.
Japanese Classes in Chennai
Japanese Course in Chennai
Best Spoken English Classes in Chennai
French Language Classes in Chennai
pearson vue exam centers in chennai
German Classes in Chennai
Japanese Classes in OMR
Japanese Classes in Porur
French Classes in anna nagar
Spoken English Class in Anna Nagar

Aparna said...

Great article..! I am very glad to visit your excellent post and Surely this content will use in my future. Please share more information with us...
Oracle Training in Chennai
Oracle Certification in Chennai
Tableau Training in Chennai
Oracle DBA Training in Chennai
Pega Training in Chennai
Linux Training in Chennai
Primavera Training in Chennai
Unix Training in Chennai
Power BI Training in Chennai
Oracle Training in Tambaram

Arya said...

Thanks for sharing such a nice Blog.I like it.
mcafee antivirus activation key
norton setup with product key
activate my norton product key
how to run mcafee unique product key
comcast support telephone number
AVG Antivirus toll free number
webroot antivirus support phone number
kaspersky contact number
Outlook support number
microsoft edge support number

htop said...

thanks for sharing this useful message
aws training center in chennai
aws training in chennai
aws training institute in chennai
best angularjs training in chennai
angular js training in sholinganallur
angularjs training in chennai
azure training in chennai
best devops training in chennai

Anbarasan14 said...

I really appreciate the author for taking some valuable time to share such a quality content with us.

IELTS Classes in Mumbai
IELTS Coaching in Mumbai
IELTS Mumbai
IELTS Center in Mumbai
Best IELTS Coaching in Mumbai
Spoken English Classes in Chennai
Spoken English Class in Chennai
Spoken English in Chennai

htop said...

azure training in chennai
best devops training in chennai
best hadoop training in chennai
best hadoop training in omr
hadoop training in sholinganallur