I have been asked to elaborate on my answer on Quora so here goes:
1. There is no efficient purely functional unsorted dictionary or set
Since the 1990s the use of dictionaries in software has gone through the roof. Dictionaries are now a stock collection type that every programmer expects to find in their standard library.
Purely functional or persistent data structures such as those found in Okasaki’s fabulous monograph on the subject can be a great tool. In particular, the persistence they offer means you can reuse old versions of collections without having to worry about mutation. In many cases (particularly for some kinds of problems such as logic programming and compiler writing) this can make solutions shorter and clearer, partly because it makes backtracking trivial. However, persistence comes at a great cost in terms of performance: purely functional dictionaries are typically 10x slower than a decent hash table and I have seen them run up to 40x slower. For some applications this is just too slow.
Furthermore, most functional programming languages (OCaml, Haskell, Scala) are incapable of expressing a fast generic mutable hash table because they lack the killer combo of: reified generics, value types and a fast GC write barrier.
BEWARE: people who try to claim that Haskell’s purely functional dictionaries are fast by comparing them with Haskell’s mutable hash tables. The correct conclusion is that Haskell’s mutable hash tables are slow.
2. There is no purely functional weak hash table.
With a garbage collected imperative language, the relationships between the vertices and edges of a graph can be expressed using weak hash tables. The garbage collector will then collect unreachable subgraphs for you. There is no purely functional weak hash table so, in a pure language, you must write your own garbage collector.
Note that this is a really fringe disadvantage with most developers never having used a weak hash table!
3. There are no purely functional concurrent collections.
By definition, immutable collections cannot support concurrent mutation. Consequently, if you want a shared mutable collection such as an in-memory database then there is no efficient purely functional solution.
4. Most graph algorithms look worse and run much slower when written in an FP style.
Purely functional programming is a great tool for some kinds of problems but graph algorithms are one place where I have noticed that pure solutions are often worse both in terms of speed and clarity.
Compare Prim’s algorithm in 12 lines of Python with Prim’s algorithm in 20 lines of Haskell. And why does the Haskell use Prim’s algorithm? Probably because Kruskal’s algorithm is built upon the union-find collection and there is no known efficient purely functional union-find collection.
5. The inertia of traditional imperative data structures and algorithms is huge.
Beyond graph algorithms, there are many parts of computer science where 65 years of published literature has focused almost entirely upon imperative solutions. Consequently, imperative programmers can easily build upon the backs of giants whereas purely functional programmers are often left starting from scratch. After all, just a few years ago memoization in Haskell was the topic of a PhD thesis!
I once challenged a group of Haskell programmers (several of whom had PhDs in Haskell) to write an efficient generic parallelised quicksort in Haskell and this is what happened.
6. All existing implementations of functional programming languages, both pure and impure, happen to allocate far too much by design.
Around 1960, McCarthy invented Lisp. The core data structure was the singly-linked list. Each list node was a separate heap allocated block. All modern functional languages evolved from this. In the 1970s, Scheme used essentially the same data representation strategy as Lisp. In the 1980s, SML added a little unboxing with tuples heap allocated as a single block of memory. In the 1990s, OCaml added a little more with unboxed float arrays. Haskell added the ability to unbox some data. But to date no functional programming language has unboxed tuples by default. Even F#, which sits on .NET which provides arbitrary value types, still uses .NET’s boxed tuples. Consequently, all modern functional programming languages incur very high allocation rates for essentially no good reason. Consequently, they all stress their garbage collectors far more than necessary. This is a serious problem not just because it makes serial code slow but because the garbage collector is a shared resource and, therefore, stressing the GC impedes the scalability of parallel programs.
I should note that calling this a “disadvantage” is contentious. Xavier Leroy of OCaml fame regards OCaml’s Lisp-like data representation as a good thing because it is the backbone of OCaml’s excellent performance when running the Coq theorem prover. Here Xavier asserted that “OCaml's strategy is close to optimal for symbolic computing”. Functional languages are often optimised for high performance purely functional collections at the expense of low performance imperative collections. Given that imperative collections are generally faster, this puts an artificially-low ceiling on the performance of almost all functional languages.
7. Purely functional programming is theoretically good for parallelism but bad for performance in practice, which is the sole purpose of parallelism.
There are two reasons to write parallel programs today. The first is to write objectively fast solutions. The second is to make a slow solution less slow. In most cases, parallel programming in functional languages is a form of the latter. Almost nobody in high performance computing circles (i.e. supercomputers) is running functional code directly. When most functional programmers employ parallel programming today they do so not to attain the best absolute performance but just to improve the performance they have.
Purely functional languages like Haskell are designed to abstract away space and time. This gives you a higher-level perspective of your solution but it makes it very hard to reason about the amount of memory or length of time a Haskell program will require to produce a result. In parallel programming it is always important to make sure that the gain from parallelisation outweighs the administrative overheads of running code in parallel. Haskell makes this very hard. So hard, in fact, that published research on parallel Haskell notoriously cherry picks the degree of parallelisation that maximises performance even though that degree could not be predicted before running the program many times. I have found that straightforward parallelization often yields reliable speedups in languages like C++ but not in Haskell where performance is unpredictable.
BEWARE: People who talk only about scalability and disregard absolute performance. You can improve the scalability of almost any parallel program by redundantly recomputing the Mandelbrot set after each line of code for no reason because most of the time will then be spent in embarrassingly parallel code. Scalability is necessary but insufficient. You must also look at absolute performance.
8. It took 50 years for normal people to dilute the smug weenies to the point where you can get a useful answer about functional programming on social media.
I’ve been doing functional programming for over 20 years now. For decades there was a social chasm between functional programmers and people who had real problems to solve. Thankfully this problem is now starting to dissolve away with functional languages like Scala, Clojure and F# being used for real work but for many years the predominantly-smug-weenies dominated the functional scene, making it hard for people to get real solutions to their real problems. Part of the reason for this was that, having languished in obscurity for so many decades, some communities (most notably Lisp) had highly evolved (but wrong) arguments as to why Lisp was good. It took me many years to figure out what was wrong with these arguments.
9. There is a huge amount of misinformation about functional programming in circulation.
If you criticise the performance of hash tables in Haskell (more recently here) then you get pure misinformation from the leading lights of the community such as someone advising people to effectively turn off garbage collection.
For years the functional programming community brandished beautifully short implementations of the Sieve of Eratosthenes and Quicksort algorithms. These were even taught to students for years. Only many years later did it emerge that their solutions did not implement those algorithms. Melissa O’Neill even published a paper correcting the Sieve of Eratosthenes in Haskell. In particular, her genuine sieve requires 100x more code than the original Haskell. Same for quicksort where Haskell’s elegant two-line sort is over 1,000x slower than Sedgewick’s Quicksort in C because the Haskell deep copies lists over and over again, completely blowing the asymptotic IO complexity of Hoare original algorithm.
See also “Why is Haskell used so little in industry?” for a thorough debunking of the Haskell in Industry page.