 let rec test n =
 if n < 100000000 then
 let timer = System.Diagnostics.Stopwatch.StartNew()
 let mutable c = 0L
 for _ in 1..100000000 / n do
 c < c + int64 (Array.init n byte).Length
 c < c / int64 n
 printfn "%d, %g" n (timer.Elapsed.TotalSeconds / float c)
 test (n+1 + (n >>> 3))
The Flying Frog Blog
Sunday, 30 June 2019
Saturday, 4 May 2019
Qi, Lisp and O'Caml compared  performance shootout
This test was conducted to determine the relative efficiency and code size of handcoded Lisp, O'Caml and Qi TurboE using the benchmark challenge provided by Jon Harrop. Dr Harrop also provided the Lisp source written by Andre Thieme, Nathan Forster and Pascal Constanza.
rational n * rational m > rational(n * m)
symbol x > symbol x
0+f > f
f+0 > f
0*f > 0
f*0 > 0
1*f > f
f*1 > f
a+(b+c) > (a+b)+c
a*(b*c) > (a*b)*c
Nathan Froyd
Andre Thieme
Pascal Constanza
Dan Bensen
for providing code
Jon Harrop (for both providing code and providing motivation to develop TurboE)
Marcus Breing (for correcting my first run)
SBCL 1.0  
Qi TurboE under SBCL 1.0  
O'Caml 3.09.3 
Mark

Nathan

Andre

Pascal

Dan

Jon
 
time

3.6s

3.4s

15s

8.2s

5.1s

2.0s

loc

15

39

23

24

34

15

time vs OCaml  1.8x  1.7x  7.5x  4.1x  2.6x  1x 
(defun test (N)
(cond ((zerop N) 0)
(t (simplify *expr*) (test (1 N)))))
Author: Mark Tarver
Length: 15 lines
[Op A B] > (s Op (simplify A) (simplify B))
A > A)
+ M N > (+ M N) where (and (number? M) (number? N))
+ 0 F > F
+ F 0 > F
+ A [+ B C] > (simplify [+ [+ A B] C])
* M N > (* M N) where (and (number? M) (number? N))
* 0 F > 0
* F 0 > 0
* F 1 > F
* 1 F > F
* A [* B C] > (simplify [* [* A B] C])
Op A B > [Op A B])
(BLOCK NIL
(TAGBODY
(IF (CONSP V148)
(LET ((Cdr159 (CDR V148)))
(IF (CONSP Cdr159)
(LET ((Cdr158 (CDR Cdr159)))
(IF (CONSP Cdr158)
(IF (NULL (CDR Cdr158))
(RETURN
(s (CAR V148) (simplify (CAR Cdr159))
(simplify (CAR Cdr158))))
(GO tag154))
(GO tag154)))
(GO tag154))))
tag154
(RETURN V148))))
(DEFUN s (V149 V150 V151)
(BLOCK NIL
(TAGBODY
(IF (EQ '+ V149)
(IF (AND (NUMBERP V150) (NUMBERP V151))
(RETURN (THE NUMBER (+ V150 V151)))
(IF (EQL 0 V150) (RETURN V151)
(IF (EQL 0 V151) (RETURN V150)
(IF (CONSP V151)
(LET ((Cdr170 (CDR V151)))
(IF (EQ '+ (CAR V151))
(IF (CONSP Cdr170)
(LET ((Cdr169 (CDR Cdr170)))
(IF (CONSP Cdr169)
(IF (NULL (CDR Cdr169))
(RETURN
(simplify
(CONS '+
(CONS
(LIST '+ V150
(CAR Cdr170))
Cdr169))))
(GO tag160))
(GO tag160)))
(GO tag160))
(GO tag160)))
(GO tag160))))))
tag160
(TAGBODY
(IF (EQ '* V149)
(IF (AND (NUMBERP V150) (NUMBERP V151))
(RETURN (THE NUMBER (* V150 V151)))
(IF (EQL 0 V150) (RETURN 0)
(IF (EQL 0 V151) (RETURN 0)
(IF (EQL 1 V151) (RETURN V150)
(IF (EQL 1 V150) (RETURN V151)
(IF (CONSP V151)
(LET ((Cdr183 (CDR V151)))
(IF (EQ '* (CAR V151))
(IF (CONSP Cdr183)
(LET ((Cdr182 (CDR Cdr183)))
(IF (CONSP Cdr182)
(IF (NULL (CDR Cdr182))
(RETURN
(simplify
(CONS '*
(CONS
(LIST '* V150
(CAR
Cdr183))
Cdr182))))
(GO tag171))
(GO tag171)))
(GO tag171))
(GO tag171)))
(GO tag171))))))))
tag171
(RETURN (LIST V149 V150 V151))))))
Author: Jon Harrop
Length: 15 lines
 `Int n, `Int m > `Int (n +/ m)
 `Int (Int 0), e  e, `Int (Int 0) > e
 f, `Add(g, h) > f +: g +: h
 f, g > `Add(f, g)
 `Int n, `Int m > `Int (n */ m)
 `Int (Int 0), e  e, `Int (Int 0) > `Int (Int 0)
 `Int (Int 1), e  e, `Int (Int 1) > e
 f, `Mul(g, h) > f *: g *: h
 f, g > `Mul(f, g)
 `Int _  `Var _ as f > f
 `Add (f, g) > simplify f +: simplify g
 `Mul (f, g) > simplify f *: simplify g
Author: Andre Thieme
Length: 23 lines
(if (atom a)
a
(destructuringbind (op x y) a
(let* ((f (simplify x))
(g (simplify y))
(nf (numberp f))
(ng (numberp g))
(+? (eq '+ op))
(*? (eq '* op)))
(cond
((and +? nf ng) (+ f g))
((and +? nf (zerop f)) g)
((and +? ng (zerop g)) f)
((and (listp g) (eq op (first g)))
(destructuringbind (op2 u v) g
(simplify `(,op (,op ,f ,u) ,v))))
((and *? nf ng) (* f g))
((and *? (or (and nf (zerop f))
(and ng (zerop g)))) 0)
((and *? nf (= 1 f)) g)
((and *? ng (= 1 g)) f)
(t `(,op ,f ,g)))))))
Author: Nathan Froyd
Length: 39 lines
(if (atom xexpr)
xexpr
(let ((op (first xexpr))
(z (second xexpr))
(y (third xexpr)))
(let* ((f (simplifynoredundantchecks z))
(g (simplifynoredundantchecks y))
(nf (numberp f))
(ng (numberp g)))
(tagbody
START
(if (eq '+ op) (go OPTIMIZEPLUS) (go TESTMULTIPLY))
OPTIMIZEPLUS
(when (and nf ng) (returnfrom simplifynoredundantchecks (+ f g)))
TESTPLUSZEROS
(when (eql f 0) (returnfrom simplifynoredundantchecks g))
(when (eql g 0) (returnfrom simplifynoredundantchecks f))
(go REARRANGEEXPR)
TESTMULTIPLY
(unless (eq '* op) (go REARRANGEEXPR))
OPTIMIZEMULTIPLY
(when (and nf ng) (returnfrom simplifynoredundantchecks (* f g)))
TESTMULTIPLYZEROSANDONES
(when (or (eql f 0) (eql g 0)) (returnfrom simplifynoredundantchecks 0))
(when (eql f 1) (returnfrom simplifynoredundantchecks g))
(when (eql g 1) (returnfrom simplifynoredundantchecks f))
REARRANGEEXPR
(when (and (listp g) (eq op (first g)))
(let ((op2 (first g))
(u (second g))
(v (third g)))
(declare (ignore op2))
(returnfrom simplifynoredundantchecks
(simplifynoredundantchecks (list op (list op f u) v)))))
MAYBECONSEXPR
(if (and (eq f z) (eq g y))
(returnfrom simplifynoredundantchecks xexpr)
(returnfrom simplifynoredundantchecks (list op f g))))))))
Author: Pascal Constanza
Length: 25 lines
(defstruct mul x y)
(:method ((x number) (y number)) (+ x y))
(:method ((x (eql 0)) y) y)
(:method (x (y (eql 0))) x)
(:method (x (y add))
(simplifyadd (simplifyadd x (addx y)) (addy y)))
(:method (x y) (makeadd :x x :y y)))
(:method ((x number) (y number)) (* x y))
(:method ((x (eql 0)) y) 0)
(:method (x (y (eql 0))) 0)
(:method ((x (eql 1)) y) y)
(:method (x (y (eql 1))) x)
(:method (x (y mul))
(simplifymul (simplifymul x (mulx y)) (muly y)))
(:method (x y) (makemul :x x :y y)))
(:method (exp) exp)
(:method ((exp add))
(simplifyadd (simplify (addx exp)) (simplify (addy exp))))
(:method ((exp mul))
(simplifymul (simplify (mulx exp)) (simplify (muly exp)))))
Author: Dan Bensen
Length: 34 lines
(let ((e1 (gensym "E1"))
(e2 (gensym "E2")))
`(defun ,func (,e1 ,e2)
(declare (optimize (speed 3)))
,(delete :nocase
`(case ,e1
,zerocase
(,ident ,e2)
(t ,(delete :nocase
`(case ,e2
,zerocase
(,ident ,e1)
(t (cond
((and (rationalp ,e1)
(rationalp ,e2))
(,op ,e1 ,e2))
((atom ,e2)
(list ,e1 ,opsymbl ,e2))
(t (case
(cadr ,e2)
(,opsymbl (,func (,func ,e1 (car ,e2))
(caddr ,e2)))
(t (list ,e1 ,opsymbl ,e2))))))))))))))
(defop apply* * '* 1 ( 0 0 ))
(if (atom expr)
expr
(let ((e1 (simplify (car expr)))
(e2 (simplify (caddr expr))))
(case (cadr expr)
('+ (apply+ e1 e2))
('* (apply* e1 e2))))))
Monday, 29 April 2019
The “Blub Paradox” and C++
is there some powerful language feature or idiom that you make use of in a language that would be hard to conceptualize or implement if you were writing only in c++?Are there any useful concepts or techniques that you have encountered in other languages that you would have found difficult to conceptualize had you been writing or "thinking" in c++?
Register allocation and calling conventions
Runtime code generation
EVAL
to learn about metacircular evaluation.Manipulating trees
Purely functional data structures
Tail calls
Concurrency
Monday, 22 April 2019
In terms of performance speed, is Swift faster than Java?
 nbody: swift 24.09s vs Java 22.6s.
 fannuckredux: swift 59.5s vs Java 17.41s. Swift is 3.4x slower than Java.
 spectralnorm: swift 15.7s vs Java 4.28s. Swift is 3.7x slower than Java.
 Swift running 11x slower than C because "The dynamic memory allocation turned out to be a major performance killer" in C vs Swift 1.2 and Getting Killed
 "Swift is 4x slower than Objective C" Mobile App Performance,
 "Swift is 24x slower than C++" Swift, C++ Performance
 Swift serializing JSON 15x slower than Objective C in Swift Performance: Too Slow For Production
 Memorysafe Swift is running 22x slower than C in Swift v1.2 Î²2 Performance.
 Swift ran up to 10,000x slower than C++ in Swift performance: sorting arrays although memorysafe Swift is now (Xcode 6 beta 5) running as fast as C there with ordinary optimisations enabled.
Tuesday, 6 November 2018
Benchmarking in the web age
The TechEmpower website contains some fascinating benchmarks of servers. The results on this benchmark of multiple requests to servers provide some insight into the performance characteristics of .NET on a modern problem. Specifically, the C# on ASP.NET Core solutions range from 2.580× slower than fastest solution which is written in Rust. In fact, C# is beaten by the following programming languages in order:
 Rust
 Java
 Kotlin
 Go
 C
 Perl
 Clojure
 PHP
 C++
Furthermore, .NET Core is Microsoft's new improved and faster version of .NET aimed specifically at these kinds of tasks. So why is it beaten by all those languages? I suspect that a large part of this is the change in workload from the kind of number crunching .NET was designed for to a modern stringheavy workload and I suspect .NET's GC isn't as optimised for this as the JVM is. As we have found, .NET has really poor support for JSON compared to other languages and frameworks, with support fragmented across many nonstandard libraries. Amusingly, OCaml has standardised on Yojson which we have found to be much faster and more reliable despite the fact that it is a hobby project.
Another interesting observation is that C and C++ are doing surprisingly badly. The fact that Perl and PHP are beating C++ is doubtless because they are calling out to C code but that seems less likely for Rust, Java, Kotlin and Go. One interesting hypothesis I read on the interwebs is that C++ developers tend to copy strings a lot in order to avoid memory management bugs whereas Rust protects its developers from such bugs so they can be more efficient without having to worry. If true, that would be quite amazing because the userland code behind these benchmarks isn't that big and, yet, C++ has already failed to scale and has developers reaching for onesizefitsallbadly solutions.
Whatever the case, one thing is clear: the time is ripe for new languages, VMs and frameworks.
Sunday, 4 November 2018
On "Quantifying the Performance of Garbage Collection vs. Explicit Memory Management"
Introduction
That paper describes an experiment that analyzed the performance of a benchmark suite using: Tracing garbage collection.
 Oracular memory management (precomputing the earliest point free could have been inserted).
 On one VM, the Jikes Research Virtual Machine (RVM).
 Using one programming language, Java, and consequently one programming paradigm, object oriented programming.
 Using each of the five different garbage collection algorithms provided by that VM.
 GenMS  Appelstyle generational collector (1988)
 GenCopy  two generations with copying mature space
 CopyMS  nursery with wholeheap collection
 Semispace  Cheney semispace (1970)
 MarkSweep  nonrelocating, noncopying singlegeneration
The Obvious Discussion
The experimental results presented in the paper are very interesting and immediately raise many questions. Perhaps the most obvious question is: do these results generalize to any other programming paradigms, languages, virtual machines or garbage collection algorithms?In the absence of evidence let's just think about this first. Functional languages use different data structures (purely functional or persistent collections) in different ways (recursion via tail calls rather than loops and mutation) so I see no reason to assume that they would behave in the same way that Java does in this context. Indeed, lean functional languages like OCaml famously require far less memory than Java to solve the same problem and the relative bloat is usually attributed to object oriented programming.
Closer to Java is C# on .NET but even this is substantially different because reified generics and value types result in aggressive unboxing giving a flatter heap topology and relieving stress from the GC. This manifests particularly in the context of hash tables where .NET provides an efficient generic hash table implementation that the JVM is incapable of expressing. So it seems extremely unlikely that these results would even generalize to .NET languages like C#.
Still, the observations are interesting.
Their Conclusions
Sadly, rather than asking these obvious questions and suggesting further work the authors chose to draw several completely unjustified conclusions. Worse, this paper passed peer review and has been published containing these unjustified conclusions and is now being bandied around on the internet.Firstly, the paper concludes:
"In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management"The evidence described in the paper justifies the conclusion that at least for these benchmarks in Java on the Jikes RVM the GenMS GC algorithm requires five times more memory than oracular memory management to achieve the same level of performance. This extremely specific observation cannot possibly be generalized to all garbage collectors.
"Practitioners can use these results to guide their choice of explicitlymanaged languages like C or C++, or garbagecollected languages like Java or C#"Not only is there no evidence that these results have any bearing on C# whatsoever but there is also no evidence to substantiate the implication that the oracular memory management studied here is at all representative of handwritten C or C++ code. Remember: the oracle precomputes the optimal place to free every heap allocated block of memory. Is it reasonable to assume that human software developers will do the same? Absolutely not. In practice, software developers using such languages employ pool allocators that amortise the cost of allocation at the expense of dramatically increasing the amount of floating garbage. Furthermore, the program generated by the oracular method is not a general solution that will run correctly on different inputs: it is only correct for and is optimised for the one input it was given! This is akin to hoisting computation out of the benchmark and is completely unrepresentative of real software.
So, again, it seems extremely unlikely that the oracular memory management described here (although interesting) is at all representative of manual memory management in practice.
Finally, there are two major families of garbage collection: tracing and reference counting. All of the garbage collectors covered by this paper are tracing garbage collectors so the results say nothing about reference counting at all, much less the whole of garbage collection in general.
Our Conclusions
This paper is another piece of an interesting puzzle but it is a puzzle that will never be solved. No amount of objective quantitative factual evidence can ever be enough to justify a completely general conclusion about garbage collection. Hopefully someday someone will repeat this experiment using a different VM, like .NET Core, and a different language and paradigm, like F#. Until then, there is no evidence to support the hypothesis that garbage collection requires 5x more memory to achieve the same performance as manual memory management.Tuesday, 16 October 2018
Jon Harrop wrote:
> Andreas Rossberg wrote:
>> That is a wild claim, and I doubt that you have any serious statistics
>> to back it up.
>
> Here are some statistics on the proportion of lines of code devoted to
> type annotations from 175kLOC of production OCaml and 5kLOC of production
> Haskell:
>
> OCaml:
> Hevea 9.0%
> ADVI 8.6%
> FFTW3 5.2%
> Unison 3.5%
> MLDonkey 2.5%
> LEdit 1.4%
> MTASC 0.0%
> HLVM 0.0%
>
> Haskell:
> XMonad 19%
> Darcs 12%
For further comparison, here are some statistics for compilers written in
OCaml and Standard ML:
OCaml: 6.3% of 217kLOC
MosML: 13% of 69kLOC