Saturday, 4 May 2019

Qi, Lisp and O'Caml compared - performance shootout

An old article by Mark Tarver reproduced here for historical interest:

This test was conducted to determine the relative efficiency and code size of hand-coded Lisp, O'Caml and Qi Turbo-E using the benchmark challenge provided by Jon Harrop.  Dr Harrop also provided the Lisp source written by Andre Thieme, Nathan Forster and Pascal Constanza.
Thanks to many people who corrected these tests and made them possible.
Turbo-E is an unreleased version of Qi which factorises its pattern-matching.
Jon Harrop Challenge Problem
Posted by Dr Harrop
The problem is to simplify symbolic expressions by applying the following rewrite rules from the leaves up:
rational n + rational m -> rational(n + m)
rational n * rational m -> rational(n * m)
symbol x -> symbol x
0+f -> f
f+0 -> f
0*f -> 0
f*0 -> 0
1*f -> f
f*1 -> f
a+(b+c) -> (a+b)+c
a*(b*c) -> (a*b)*c
Thanks to

Nathan Froyd
Andre Thieme
Pascal Constanza
Dan Bensen

for providing code

Jon Harrop (for both providing code and providing motivation to develop Turbo-E)
Marcus Breing (for correcting my first run)
The Results
The benchmark was to simplify [* x [+ [+ [* 12 0] [+ 23 8]] y]]. 10^7 iterations were used as the benchmark. A 2.6GHz, 500Mb RAM machine was used to drive the test. Windows XP was the OS. Times measure only processor time (not real time) and are rounded to 1/10th of a second.
A 2.6GHz, 500Mb RAM machine was used to drive the test. Windows XP was the OS. Times measure only processor time (not real time) and are rounded to 1/10th of a second.
SBCL 1.0 
Qi Turbo-E under SBCL 1.0 
O'Caml 3.09.3 
time vs OCaml1.8x1.7x7.5x4.1x2.6x1x
The timer used in Lisp was
(defun tt (N) (gc) (time (test N)))

(defun test (N)
(cond ((zerop N) 0)
(t (simplify *expr*) (test (1- N)))))
took .2 s for 10^7 iterations with (simplify *expr*) removed. All timings were adjusted by deducting .2s from each.
I could run Jon 's solution in interpreted mode, but could not get my O'Caml compiler to work. Based on the ratio of interpreted time to his compiled time, I could construct a projected time.
The Solutions
Here are the solutions; you can download the code here.
Language: Qi
Author: Mark Tarver
Length: 15 lines
(define simplify
  [Op A B] -> (s Op (simplify A) (simplify B))
  A -> A)
(define s
  + M N -> (+ M N)      where (and (number? M) (number? N))
  + 0 F -> F
  + F 0 -> F
  + A [+ B C] -> (simplify [+ [+ A B] C])
  * M N -> (* M N)    where (and (number? M) (number? N))
  * 0 F -> 0
  * F 0 -> 0
  * F 1 -> F
  * 1 F -> F
  * A [* B C] -> (simplify [* [* A B] C])
  Op A B -> [Op A B])
* Note I have used prefix notation for this problem though Dr Harrop's formulation uses infix. The first Lisp solutions followed prefix and so I chose to make my program comparable.*
The generated code is
(DEFUN simplify (V148)
(IF (CONSP V148)
(LET ((Cdr159 (CDR V148)))
(IF (CONSP Cdr159)
(LET ((Cdr158 (CDR Cdr159)))
(IF (CONSP Cdr158)
(IF (NULL (CDR Cdr158))
(s (CAR V148) (simplify (CAR Cdr159))
(simplify (CAR Cdr158))))
(GO tag154))
(GO tag154)))
(GO tag154))))
(RETURN V148))))

(DEFUN s (V149 V150 V151)
(IF (EQ '+ V149)
(RETURN (THE NUMBER (+ V150 V151)))
(IF (EQL 0 V150) (RETURN V151)
(IF (EQL 0 V151) (RETURN V150)
(IF (CONSP V151)
(LET ((Cdr170 (CDR V151)))
(IF (EQ '+ (CAR V151))
(IF (CONSP Cdr170)
(LET ((Cdr169 (CDR Cdr170)))
(IF (CONSP Cdr169)
(IF (NULL (CDR Cdr169))
(CONS '+
(LIST '+ V150
(CAR Cdr170))
(GO tag160))
(GO tag160)))
(GO tag160))
(GO tag160)))
(GO tag160))))))
(IF (EQ '* V149)
(RETURN (THE NUMBER (* V150 V151)))
(IF (EQL 0 V150) (RETURN 0)
(IF (EQL 0 V151) (RETURN 0)
(IF (EQL 1 V151) (RETURN V150)
(IF (EQL 1 V150) (RETURN V151)
(IF (CONSP V151)
(LET ((Cdr183 (CDR V151)))
(IF (EQ '* (CAR V151))
(IF (CONSP Cdr183)
(LET ((Cdr182 (CDR Cdr183)))
(IF (CONSP Cdr182)
(IF (NULL (CDR Cdr182))
(CONS '*
(LIST '* V150
(GO tag171))
(GO tag171)))
(GO tag171))
(GO tag171)))
(GO tag171))))))))
(RETURN (LIST V149 V150 V151))))))
Language: OCaml
Author: Jon Harrop
Length: 15 lines
let rec ( +: ) f g = match f, g with
  | `Int n, `Int m -> `Int (n +/ m)
  | `Int (Int 0), e | e, `Int (Int 0) -> e
  | f, `Add(g, h) -> f +: g +: h
  | f, g -> `Add(f, g)
let rec ( *: ) f g = match f, g with
  | `Int n, `Int m -> `Int (n */ m)
  | `Int (Int 0), e | e, `Int (Int 0) -> `Int (Int 0)
  | `Int (Int 1), e | e, `Int (Int 1) -> e
  | f, `Mul(g, h) -> f *: g *: h
  | f, g -> `Mul(f, g)
let rec simplify = function
  | `Int _ | `Var _ as f -> f
  | `Add (f, g) -> simplify f +: simplify g
  | `Mul (f, g) -> simplify f *: simplify g
Language: Lisp
Author: Andre Thieme
Length: 23 lines
(defun simplify (a)
   (if (atom a)
       (destructuring-bind (op x y) a
        (let* ((f (simplify x))
               (g (simplify y))
               (nf (numberp f))
               (ng (numberp g))
               (+? (eq '+ op))
               (*? (eq '* op)))
            ((and +? nf ng)                   (+ f g))
            ((and +? nf (zerop f))            g)
            ((and +? ng (zerop g))            f)
            ((and (listp g) (eq op (first g)))
             (destructuring-bind (op2 u v) g
               (simplify `(,op (,op ,f ,u) ,v))))
            ((and *? nf ng)                   (* f g))
            ((and *? (or (and nf (zerop f))
                         (and ng (zerop g)))) 0)
            ((and *? nf (= 1 f))              g)
            ((and *? ng (= 1 g))              f)
            (t                                `(,op ,f ,g)))))))
Language: Lisp
Author: Nathan Froyd
Length: 39 lines
(defun simplify-no-redundant-checks (xexpr)
    (if (atom xexpr)
      (let ((op (first xexpr))
            (z (second xexpr))
            (y (third xexpr)))
        (let* ((f (simplify-no-redundant-checks z))
               (g (simplify-no-redundant-checks y))
               (nf (numberp f))
               (ng (numberp g)))
             (if (eq '+ op) (go OPTIMIZE-PLUS) (go TEST-MULTIPLY))
             (when (and nf ng) (return-from simplify-no-redundant-checks (+ f g)))
             (when (eql f 0) (return-from simplify-no-redundant-checks g))
             (when (eql g 0) (return-from simplify-no-redundant-checks f))
             (go REARRANGE-EXPR)
             (unless (eq '* op) (go REARRANGE-EXPR))
             (when (and nf ng) (return-from simplify-no-redundant-checks (* f g)))
             (when (or (eql f 0) (eql g 0)) (return-from simplify-no-redundant-checks 0))
             (when (eql f 1) (return-from simplify-no-redundant-checks g))
             (when (eql g 1) (return-from simplify-no-redundant-checks f))
             (when (and (listp g) (eq op (first g)))
               (let ((op2 (first g))
                     (u (second g))
                     (v (third g)))
                 (declare (ignore op2))
                 (return-from simplify-no-redundant-checks
                   (simplify-no-redundant-checks (list op (list op f u) v)))))
             (if (and (eq f z) (eq g y))
                 (return-from simplify-no-redundant-checks xexpr)
                 (return-from simplify-no-redundant-checks (list op f g))))))))
Language: Lisp
Author: Pascal Constanza
Length: 25 lines
(defstruct add x y)
(defstruct mul x y)
(defgeneric simplify-add (x y)
   (:method ((x number) (y number)) (+ x y))
   (:method ((x (eql 0)) y) y)
   (:method (x (y (eql 0))) x)
   (:method (x (y add))
    (simplify-add (simplify-add x (add-x y)) (add-y y)))
   (:method (x y) (make-add :x x :y y)))
(defgeneric simplify-mul (x y)
    (:method ((x number) (y number)) (* x y))
   (:method ((x (eql 0)) y) 0)
   (:method (x (y (eql 0))) 0)
   (:method ((x (eql 1)) y) y)
   (:method (x (y (eql 1))) x)
   (:method (x (y mul))
    (simplify-mul (simplify-mul x (mul-x y)) (mul-y y)))
   (:method (x y) (make-mul :x x :y y)))
(defgeneric simplify (exp)
   (:method (exp) exp)
   (:method ((exp add))
    (simplify-add (simplify (add-x exp)) (simplify (add-y exp))))
   (:method ((exp mul))
    (simplify-mul (simplify (mul-x exp)) (simplify (mul-y exp)))))
Language: Lisp
Author: Dan Bensen
Length: 34 lines
(defmacro defop (func op op-symbl ident zero-case)
  (let ((e1 (gensym "E1"))
         (e2 (gensym "E2")))
     `(defun ,func (,e1 ,e2)
       (declare (optimize (speed 3)))
        ,(delete :no-case
          `(case ,e1
            (,ident ,e2)
            (t ,(delete :no-case
               `(case ,e2
                (,ident ,e1)
                (t (cond
                   ((and (rationalp ,e1)
                         (rationalp ,e2))
                    (,op ,e1 ,e2))
                   ((atom ,e2)
                    (list ,e1 ,op-symbl ,e2))
                   (t (case
                        (cadr ,e2)
                      (,op-symbl (,func (,func ,e1 (car ,e2))
                                        (caddr ,e2)))
                      (t  (list ,e1 ,op-symbl ,e2))))))))))))))
(defop apply+ + '+ 0 :no-case)
(defop apply* * '* 1 ( 0  0 ))
(defun simplify (expr)
   (if (atom expr)
     (let ((e1 (simplify (car   expr)))
           (e2 (simplify (caddr expr))))
       (case (cadr expr)
             ('+ (apply+ e1 e2))
             ('* (apply* e1 e2))))))

Monday, 29 April 2019

The “Blub Paradox” and C++

Here is another of my answers from Stack Overflow that is getting down votes:

is there some powerful language feature or idiom that you make use of in a language that would be hard to conceptualize or implement if you were writing only in c++?
Are there any useful concepts or techniques that you have encountered in other languages that you would have found difficult to conceptualize had you been writing or "thinking" in c++?
C++ makes many approaches intractable. I would go so far as to say that most of programming is hard to conceptualize if you limit yourself to C++. Here are some examples of problems that are much more easily solved in ways that C++ makes hard.

Register allocation and calling conventions

Many people think of C++ as a bare metal low level language but it really isn't. By abstracting away important details of the machine, C++ makes it hard to conceptualize practicalities like register allocation and calling conventions.
To learn about concepts like these I recommend having a go at some assembly language programming and check out this article about ARM code generation quality.

Run-time code generation

If you only know C++ then you probably think that templates are the be-all and end-all of metaprogramming. They aren't. In fact, they are an objectively bad tool for metaprogramming. Any program that manipulates another program is a metaprogram, including interpreters, compilers, computer algebra systems and theorem provers. Run-time code generation is a useful feature for this.
I recommend firing up a Scheme implementation and playing with EVAL to learn about metacircular evaluation.

Manipulating trees

Trees are everywhere in programming. In parsing you have abstract syntax trees. In compilers you have IRs that are trees. In graphics and GUI programming you have scene trees.
This "Ridiculously Simple JSON Parser for C++" weighs in at just 484 LOC which is very small for C++. Now compare it with my own simple JSON parser which weighs in at just 60 LOC of F#. The difference is primarily because ML's algebraic datatypes and pattern matching (including active patterns) make it vastly easier to manipulate trees.
Check out red-black trees in OCaml too.

Purely functional data structures

Lack of GC in C++ makes it practically impossible to adopt some useful approaches. Purely functional data structures are one such tool.
For example, check out this 47-line regular expression matcher in OCaml. The brevity is due largely to the extensive use of purely functional data structures. In particular, the use of dictionaries with keys that are sets. That is really hard to do in C++ because the stdlib dictionaries and sets are all mutable but you cannot mutate a dictionary's keys or you break the collection.
Logic programming and undo buffers are other practical examples where purely functional data structures make something that is hard in C++ really easy in other languages.

Tail calls

Not only does C++ not guarantee tail calls but RAII is fundamentally at odds with it because destructors get in the way of a call in tail position. Tail calls let you make an unbounded number of function calls using only a bounded amount of stack space. This is great for implementing state machines, including extensible state machines and it is a great "get out of jail free" card in many otherwise-awkward circumstances.
For example, check out this implementation of the 0-1 knapsack problem using continuation-passing style with memoization in F# from the finance industry. When you have tail calls, continuation passing style can be an obvious solution but C++ makes it intractable.


Another obvious example is concurrent programming. Although this is entirely possible in C++ it is extremely error prone compared to other tools, most notably communicating sequential processes as seen in languages like Erlang, Scala and F#.

Monday, 22 April 2019

In terms of performance speed, is Swift faster than Java?

This is an old 2017 answer of mine from Quora that has been deleted by moderators for violating their rules, kept here for historical interest:
Many people here are claiming "yes" that Swift is fast. However, all of the benchmark results I can find indicate that Swift is much slower than most other languages, up to 24x slower than C++:
  • n-body: swift 24.09s vs Java 22.6s.
  • fannuck-redux: swift 59.5s vs Java 17.41s. Swift is 3.4x slower than Java.
  • spectral-norm: swift 15.7s vs Java 4.28s. Swift is 3.7x slower than Java.
EDIT: The shootout appears to have been updated. The new Swift code is still memory unsafe and still compiled in memory unsafe -Ounchecked mode. Swift now beats Java on 3/7 tasks (fannkuch-redux, mandelbrot and binary-trees), roughly draws on two (nbody and fasta-redux) and loses on two (fasta and spectral norm). Shame there is no memory safe Swift for comparison. I’d also note that many of these benchmarks are apples vs oranges comparisons with, for example, the Java and Swift implementations of binary-trees using completely different algorithms and data structures.
Other (non-Java) benchmarks found:
My impression is that Swift was vastly slower than almost all other languages at many common tasks when it was first released but it has improved very rapidly. However, it still seems to be several times slower than C++ in many cases and, therefore, I expect it is still significantly slower than languages like F# and Scala.
I am also concerned about many of the benchmarks being used. The shootout is notoriously bad science with submissions subjectively "de-optimised" by its owner for being too fast rendering the results total garbage. Numerical benchmarks like DGEMM and FFTs are largely irrelevant. I am much more interested in symbolic performance, not just because that is more relevant to the code I write but because it will stress Swift's reference counting garbage collection which I suspect will be its Achilles heel. The JSON serialization benchmark was very interesting to me as a consequence. I am also disturbed by the use of optimisations that remove memory safety in Swift (-Ounsafe) and the use of inaccurate numerics in C++ (-ffast-math).
A post on Hacker News (Swift Performance: Too Slow for Production) says that short-lived objects in the JSON serializer are to blame for the poor performance. In other words, the reference counted memory management is the problem as I suspected.

Tuesday, 6 November 2018

Benchmarking in the web age

The TechEmpower website contains some fascinating benchmarks of servers. The results on this benchmark of multiple requests to servers provide some insight into the performance characteristics of .NET on a modern problem. Specifically, the C# on ASP.NET Core solutions range from 2.5-80× slower than fastest solution which is written in Rust. In fact, C# is beaten by the following programming languages in order:

  1. Rust
  2. Java
  3. Kotlin
  4. Go
  5. C
  6. Perl
  7. Clojure
  8. PHP
  9. C++

Furthermore, .NET Core is Microsoft's new improved and faster version of .NET aimed specifically at these kinds of tasks. So why is it beaten by all those languages? I suspect that a large part of this is the change in workload from the kind of number crunching .NET was designed for to a modern string-heavy workload and I suspect .NET's GC isn't as optimised for this as the JVM is. As we have found, .NET has really poor support for JSON compared to other languages and frameworks, with support fragmented across many non-standard libraries. Amusingly, OCaml has standardised on Yojson which we have found to be much faster and more reliable despite the fact that it is a hobby project.

Another interesting observation is that C and C++ are doing surprisingly badly. The fact that Perl and PHP are beating C++ is doubtless because they are calling out to C code but that seems less likely for Rust, Java, Kotlin and Go. One interesting hypothesis I read on the interwebs is that C++ developers tend to copy strings a lot in order to avoid memory management bugs whereas Rust protects its developers from such bugs so they can be more efficient without having to worry. If true, that would be quite amazing because the user-land code behind these benchmarks isn't that big and, yet, C++ has already failed to scale and has developers reaching for one-size-fits-all-badly solutions.

Whatever the case, one thing is clear: the time is ripe for new languages, VMs and frameworks.

Sunday, 4 November 2018

On "Quantifying the Performance of Garbage Collection vs. Explicit Memory Management"

The computer science research paper "Quantifying the Performance of Garbage Collection vs. Explicit Memory Management" by Emery Berger and Matthew Hertz contains an interesting study about memory management. However, the conclusions given in the paper were badly worded and are now being used to justify an anti-GC ideology.


That paper describes an experiment that analyzed the performance of a benchmark suite using:
  1. Tracing garbage collection.
  2. Oracular memory management (precomputing the earliest point free could have been inserted).
The experiment was performed:
  • On one VM, the Jikes Research Virtual Machine (RVM).
  • Using one programming language, Java, and consequently one programming paradigm, object oriented programming.
  • Using each of the five different garbage collection algorithms provided by that VM.
The five GC algorithms are:
  • GenMS - Appel-style generational collector (1988)
  • GenCopy - two generations with copying mature space
  • CopyMS - nursery with whole-heap collection
  • Semispace - Cheney semi-space (1970)
  • MarkSweep - non-relocating, non-copying single-generation
Note that none of these GC algorithms are in widespread use today. The only one that comes close is Semispace which is one of Erlang's GC algorithms, albeit used in a substantially different way. In fact, the algorithms studied are all at least 30 years old.

The Obvious Discussion

The experimental results presented in the paper are very interesting and immediately raise many questions. Perhaps the most obvious question is: do these results generalize to any other programming paradigms, languages, virtual machines or garbage collection algorithms?
In the absence of evidence let's just think about this first. Functional languages use different data structures (purely functional or persistent collections) in different ways (recursion via tail calls rather than loops and mutation) so I see no reason to assume that they would behave in the same way that Java does in this context. Indeed, lean functional languages like OCaml famously require far less memory than Java to solve the same problem and the relative bloat is usually attributed to object oriented programming.
Closer to Java is C# on .NET but even this is substantially different because reified generics and value types result in aggressive unboxing giving a flatter heap topology and relieving stress from the GC. This manifests particularly in the context of hash tables where .NET provides an efficient generic hash table implementation that the JVM is incapable of expressing. So it seems extremely unlikely that these results would even generalize to .NET languages like C#.
Still, the observations are interesting.

Their Conclusions

Sadly, rather than asking these obvious questions and suggesting further work the authors chose to draw several completely unjustified conclusions. Worse, this paper passed peer review and has been published containing these unjustified conclusions and is now being bandied around on the internet.
Firstly, the paper concludes:
"In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management"
The evidence described in the paper justifies the conclusion that at least for these benchmarks in Java on the Jikes RVM the GenMS GC algorithm requires five times more memory than oracular memory management to achieve the same level of performance. This extremely specific observation cannot possibly be generalized to all garbage collectors.
The paper goes on to conclude:
"Practitioners can use these results to guide their choice of explicitly-managed languages like C or C++, or garbage-collected languages like Java or C#"
Not only is there no evidence that these results have any bearing on C# whatsoever but there is also no evidence to substantiate the implication that the oracular memory management studied here is at all representative of hand-written C or C++ code. Remember: the oracle precomputes the optimal place to free every heap allocated block of memory. Is it reasonable to assume that human software developers will do the same? Absolutely not. In practice, software developers using such languages employ pool allocators that amortise the cost of allocation at the expense of dramatically increasing the amount of floating garbage. Furthermore, the program generated by the oracular method is not a general solution that will run correctly on different inputs: it is only correct for and is optimised for the one input it was given! This is akin to hoisting computation out of the benchmark and is completely unrepresentative of real software.
So, again, it seems extremely unlikely that the oracular memory management described here (although interesting) is at all representative of manual memory management in practice.
Finally, there are two major families of garbage collection: tracing and reference counting. All of the garbage collectors covered by this paper are tracing garbage collectors so the results say nothing about reference counting at all, much less the whole of garbage collection in general.

Our Conclusions

This paper is another piece of an interesting puzzle but it is a puzzle that will never be solved. No amount of objective quantitative factual evidence can ever be enough to justify a completely general conclusion about garbage collection. Hopefully someday someone will repeat this experiment using a different VM, like .NET Core, and a different language and paradigm, like F#. Until then, there is no evidence to support the hypothesis that garbage collection requires 5x more memory to achieve the same performance as manual memory management.

Tuesday, 16 October 2018

Some objective quantitative measurements I once posted on Usenet:
Jon Harrop wrote:
> Andreas Rossberg wrote:
>> That is a wild claim, and I doubt that you have any serious statistics
>> to back it up.
> Here are some statistics on the proportion of lines of code devoted to
> type annotations from 175kLOC of production OCaml and 5kLOC of production
> Haskell:
> OCaml:
> Hevea 9.0%
> ADVI 8.6%
> FFTW3 5.2%
> Unison 3.5%
> MLDonkey 2.5%
> LEdit 1.4%
> MTASC 0.0%
> HLVM 0.0%
> Haskell:
> XMonad 19%
> Darcs 12%

For further comparison, here are some statistics for compilers written in
OCaml and Standard ML:

OCaml: 6.3% of 217kLOC
MosML: 13% of 69kLOC

Monday, 3 September 2018

What languages did you learn, in what order?

I once compiled the following list of programming languages and which year I started learning them:
  • 1981: Sinclair BASIC (on a ZX81)
  • 1983: BBC BASIC (on a BBC Micro)
  • 1985: Pascal (installed via ROM)
  • 1987: Logo
  • 1987: 6502 Assembly (on a BBC Micro)
  • 1989: ARM assembly (on an Acorn Archimedes)
  • 1992: Casio fx-7700 BASIC
  • 1992: C (using the excellent Norcroft compiler for Acorn computers)
  • 1994: C++ (the awful Beebug Easy C++)
  • 1995: UFI (on a VAX running OpenVMS)
  • 1996: StrongARM assembly
  • 1996: Standard ML
  • 1997: Mathematica
  • 1998: Quake C
  • 2004: OCaml
  • 2006: Java
  • 2007: Common Lisp
  • 2007: Scheme
  • 2007: Scala
  • 2007: Haskell
  • 2007: F#
  • 2009: Clojure
  • 2010: HLVM
  • 2017: Elm
  • 2017: Javascript
I was recently concerned to hear from Socio-PLT: Quantitative and Social Theories for Programming Language Adoption that the number of languages a programmer knows stagnates after the age of just 20. The decade is nearing its end but I have only learned three languages. Now I'm wondering which languages I should learn next...