The Flying Frog Blog
Saturday, 19 October 2019
Why does Java still feel bloated nowadays compared to other languages with garbage collection?
The fundamental problem with Java is that Lisp sucks. Like Lisp, Java was poorly designed and then poorly implemented. McCarthy had the excuse that he didn't know any better. Whoever made Java was just ignorant.
A static type system can have many benefits. Types can detect large classes of bugs. For example, you can eliminate null reference exceptions. Types can be used to dramatically improve performance. For example, by eliminating almost all runtime type checks and by providing polymorphism with no performance impact. Whoever made Java apparently had a very bad understanding of type theory because the design of the JVM not only failed to leverage static types in any way (the JVM immediately discards almost all type information and heap allocated almost everything whether it makes sense to or not, just like Lisp) but they even defined it with fundamental bugs in the type system (Java Arrays Break Type Safety).
But bad design was only half of the story. Sun's JVM was also badly implemented. Faced with Lisplike bad performance they began frantically building on their dodgy foundation by adding layers upon layers of progressively more sophisticated optimisations that were mostly intended to alleviate the GC due to the ridiculous heap allocation rate. Advanced parallel garbage collection algorithms to parallelize the handling of huge number of unnecessary heap allocations. Escape analysis to try to replace some of the heap allocated locals with stack allocations. And so on. Java ended up rediscovering the sufficiently smart compiler myth from Lisp: trying to optimise around a fundamentally inefficient foundation is like putting lipstick on a pig.
Overall, the result was very bad for JIT compilation performance and also particularly bad for latency and throughput isn't great. That's why Java is so laggy.
.NET had the second mover advantage and, with .NET 2.0 (2005), adopted a more sophisticated foundation with value types and reified generics. This goes a long way to leveraging static type information. Consequently, there is a lot less stress on and contention for the .NET GC and it can achieve good concurrent performance without the same sophisticated features. Simple AheadofTime (AoT) compilation, no escape analysis and so on. Microsoft kept the implementation of .NET lean, stripping out any research ideas that proved to be dispensible. Overall, the result was good for compile times, throughput and latency even though most of the core Framework was written by people who did not understand how to leverage the core design (e.g. you cannot Parse without heap allocating and tuples are heap allocated).
Python (1991) and OCaml (1996) both have simple foundations but without advanced features like concurrent collection (both do memory management under a global lock). Consequently, they achieve reasonable performance on one core and, in particular, good latency.
Thursday, 22 August 2019
Harrop's Tenth Rule
"Any sufficiently complicated Lisp program contains an ad hoc informallyspecified bugridden slow implementation of half of ML."Specifically, I was amazed by the following quote from that essay:
"Dr. Mark Tarver — twicequoted, above — wrote a dialect of Lisp called Qi. It is less than ten thousand lines of macros running atop Clisp. It implements most of the unique features of Haskell and OCaml. In some respects, Qi surpasses them. For instance, Qi's type inferencing engine is Turing complete. In a world where teams of talented academics were needed to write Haskell, one man, Dr. Tarver wrote Qi all by his lonesome.
Read that paragraph, again, and extrapolate."Several misconceptions are packed into this one paragraph.
 Qi and Shen are nowhere near implementing "most of the unique features of Haskell and OCaml".
 As an integral part of the compilation process, the objective is to make type inference as efficient as possible in all cases of practical interest. Making it Turing complete is precisely the opposite so far from "surpassing" conventional solutions this is far worse in practice. Furthermore, C++ templates and F# type providers already demonstrated how awful compile times are when compilation stages are made Turing complete.
 The "teams of talented academics" weren't just writing the compiler, they were inventing the languages.
Whereas Standard ML had a formal specification and significant parts of OCaml have been proven correct, Qi and Shen are adhoc and informally specified.
Whereas Haskell and OCaml are mature languages (my company had 1,000 industrial clients who were using OCaml!), Qi and Shen boast ~100 tests in the compiler suite and no users at all (their subreddit has seen 14 posts over the past 9 years!) so we can assume they will be bug ridden.
Back in the day, Mark Tarver ran a benchmark comparing his Qi code to my OCaml code and found that Qi was 1.8x slower.
So this is exactly what I was talking about when I first wrote my tenth rule.
Sunday, 30 June 2019
 let rec test n =
 if n < 100000000 then
 let timer = System.Diagnostics.Stopwatch.StartNew()
 let mutable c = 0L
 for _ in 1..100000000 / n do
 c < c + int64 (Array.init n byte).Length
 c < c / int64 n
 printfn "%d, %g" n (timer.Elapsed.TotalSeconds / float c)
 test (n+1 + (n >>> 3))
Saturday, 4 May 2019
Qi, Lisp and O'Caml compared  performance shootout
This test was conducted to determine the relative efficiency and code size of handcoded Lisp, O'Caml and Qi TurboE using the benchmark challenge provided by Jon Harrop. Dr Harrop also provided the Lisp source written by Andre Thieme, Nathan Forster and Pascal Constanza.
rational n * rational m > rational(n * m)
symbol x > symbol x
0+f > f
f+0 > f
0*f > 0
f*0 > 0
1*f > f
f*1 > f
a+(b+c) > (a+b)+c
a*(b*c) > (a*b)*c
Nathan Froyd
Andre Thieme
Pascal Constanza
Dan Bensen
for providing code
Jon Harrop (for both providing code and providing motivation to develop TurboE)
Marcus Breing (for correcting my first run)
SBCL 1.0  
Qi TurboE under SBCL 1.0  
O'Caml 3.09.3 
Mark

Nathan

Andre

Pascal

Dan

Jon
 
time

3.6s

3.4s

15s

8.2s

5.1s

2.0s

loc

15

39

23

24

34

15

time vs OCaml  1.8x  1.7x  7.5x  4.1x  2.6x  1x 
(defun test (N)
(cond ((zerop N) 0)
(t (simplify *expr*) (test (1 N)))))
Author: Mark Tarver
Length: 15 lines
[Op A B] > (s Op (simplify A) (simplify B))
A > A)
+ M N > (+ M N) where (and (number? M) (number? N))
+ 0 F > F
+ F 0 > F
+ A [+ B C] > (simplify [+ [+ A B] C])
* M N > (* M N) where (and (number? M) (number? N))
* 0 F > 0
* F 0 > 0
* F 1 > F
* 1 F > F
* A [* B C] > (simplify [* [* A B] C])
Op A B > [Op A B])
(BLOCK NIL
(TAGBODY
(IF (CONSP V148)
(LET ((Cdr159 (CDR V148)))
(IF (CONSP Cdr159)
(LET ((Cdr158 (CDR Cdr159)))
(IF (CONSP Cdr158)
(IF (NULL (CDR Cdr158))
(RETURN
(s (CAR V148) (simplify (CAR Cdr159))
(simplify (CAR Cdr158))))
(GO tag154))
(GO tag154)))
(GO tag154))))
tag154
(RETURN V148))))
(DEFUN s (V149 V150 V151)
(BLOCK NIL
(TAGBODY
(IF (EQ '+ V149)
(IF (AND (NUMBERP V150) (NUMBERP V151))
(RETURN (THE NUMBER (+ V150 V151)))
(IF (EQL 0 V150) (RETURN V151)
(IF (EQL 0 V151) (RETURN V150)
(IF (CONSP V151)
(LET ((Cdr170 (CDR V151)))
(IF (EQ '+ (CAR V151))
(IF (CONSP Cdr170)
(LET ((Cdr169 (CDR Cdr170)))
(IF (CONSP Cdr169)
(IF (NULL (CDR Cdr169))
(RETURN
(simplify
(CONS '+
(CONS
(LIST '+ V150
(CAR Cdr170))
Cdr169))))
(GO tag160))
(GO tag160)))
(GO tag160))
(GO tag160)))
(GO tag160))))))
tag160
(TAGBODY
(IF (EQ '* V149)
(IF (AND (NUMBERP V150) (NUMBERP V151))
(RETURN (THE NUMBER (* V150 V151)))
(IF (EQL 0 V150) (RETURN 0)
(IF (EQL 0 V151) (RETURN 0)
(IF (EQL 1 V151) (RETURN V150)
(IF (EQL 1 V150) (RETURN V151)
(IF (CONSP V151)
(LET ((Cdr183 (CDR V151)))
(IF (EQ '* (CAR V151))
(IF (CONSP Cdr183)
(LET ((Cdr182 (CDR Cdr183)))
(IF (CONSP Cdr182)
(IF (NULL (CDR Cdr182))
(RETURN
(simplify
(CONS '*
(CONS
(LIST '* V150
(CAR
Cdr183))
Cdr182))))
(GO tag171))
(GO tag171)))
(GO tag171))
(GO tag171)))
(GO tag171))))))))
tag171
(RETURN (LIST V149 V150 V151))))))
Author: Jon Harrop
Length: 15 lines
 `Int n, `Int m > `Int (n +/ m)
 `Int (Int 0), e  e, `Int (Int 0) > e
 f, `Add(g, h) > f +: g +: h
 f, g > `Add(f, g)
 `Int n, `Int m > `Int (n */ m)
 `Int (Int 0), e  e, `Int (Int 0) > `Int (Int 0)
 `Int (Int 1), e  e, `Int (Int 1) > e
 f, `Mul(g, h) > f *: g *: h
 f, g > `Mul(f, g)
 `Int _  `Var _ as f > f
 `Add (f, g) > simplify f +: simplify g
 `Mul (f, g) > simplify f *: simplify g
Author: Andre Thieme
Length: 23 lines
(if (atom a)
a
(destructuringbind (op x y) a
(let* ((f (simplify x))
(g (simplify y))
(nf (numberp f))
(ng (numberp g))
(+? (eq '+ op))
(*? (eq '* op)))
(cond
((and +? nf ng) (+ f g))
((and +? nf (zerop f)) g)
((and +? ng (zerop g)) f)
((and (listp g) (eq op (first g)))
(destructuringbind (op2 u v) g
(simplify `(,op (,op ,f ,u) ,v))))
((and *? nf ng) (* f g))
((and *? (or (and nf (zerop f))
(and ng (zerop g)))) 0)
((and *? nf (= 1 f)) g)
((and *? ng (= 1 g)) f)
(t `(,op ,f ,g)))))))
Author: Nathan Froyd
Length: 39 lines
(if (atom xexpr)
xexpr
(let ((op (first xexpr))
(z (second xexpr))
(y (third xexpr)))
(let* ((f (simplifynoredundantchecks z))
(g (simplifynoredundantchecks y))
(nf (numberp f))
(ng (numberp g)))
(tagbody
START
(if (eq '+ op) (go OPTIMIZEPLUS) (go TESTMULTIPLY))
OPTIMIZEPLUS
(when (and nf ng) (returnfrom simplifynoredundantchecks (+ f g)))
TESTPLUSZEROS
(when (eql f 0) (returnfrom simplifynoredundantchecks g))
(when (eql g 0) (returnfrom simplifynoredundantchecks f))
(go REARRANGEEXPR)
TESTMULTIPLY
(unless (eq '* op) (go REARRANGEEXPR))
OPTIMIZEMULTIPLY
(when (and nf ng) (returnfrom simplifynoredundantchecks (* f g)))
TESTMULTIPLYZEROSANDONES
(when (or (eql f 0) (eql g 0)) (returnfrom simplifynoredundantchecks 0))
(when (eql f 1) (returnfrom simplifynoredundantchecks g))
(when (eql g 1) (returnfrom simplifynoredundantchecks f))
REARRANGEEXPR
(when (and (listp g) (eq op (first g)))
(let ((op2 (first g))
(u (second g))
(v (third g)))
(declare (ignore op2))
(returnfrom simplifynoredundantchecks
(simplifynoredundantchecks (list op (list op f u) v)))))
MAYBECONSEXPR
(if (and (eq f z) (eq g y))
(returnfrom simplifynoredundantchecks xexpr)
(returnfrom simplifynoredundantchecks (list op f g))))))))
Author: Pascal Constanza
Length: 25 lines
(defstruct mul x y)
(:method ((x number) (y number)) (+ x y))
(:method ((x (eql 0)) y) y)
(:method (x (y (eql 0))) x)
(:method (x (y add))
(simplifyadd (simplifyadd x (addx y)) (addy y)))
(:method (x y) (makeadd :x x :y y)))
(:method ((x number) (y number)) (* x y))
(:method ((x (eql 0)) y) 0)
(:method (x (y (eql 0))) 0)
(:method ((x (eql 1)) y) y)
(:method (x (y (eql 1))) x)
(:method (x (y mul))
(simplifymul (simplifymul x (mulx y)) (muly y)))
(:method (x y) (makemul :x x :y y)))
(:method (exp) exp)
(:method ((exp add))
(simplifyadd (simplify (addx exp)) (simplify (addy exp))))
(:method ((exp mul))
(simplifymul (simplify (mulx exp)) (simplify (muly exp)))))
Author: Dan Bensen
Length: 34 lines
(let ((e1 (gensym "E1"))
(e2 (gensym "E2")))
`(defun ,func (,e1 ,e2)
(declare (optimize (speed 3)))
,(delete :nocase
`(case ,e1
,zerocase
(,ident ,e2)
(t ,(delete :nocase
`(case ,e2
,zerocase
(,ident ,e1)
(t (cond
((and (rationalp ,e1)
(rationalp ,e2))
(,op ,e1 ,e2))
((atom ,e2)
(list ,e1 ,opsymbl ,e2))
(t (case
(cadr ,e2)
(,opsymbl (,func (,func ,e1 (car ,e2))
(caddr ,e2)))
(t (list ,e1 ,opsymbl ,e2))))))))))))))
(defop apply* * '* 1 ( 0 0 ))
(if (atom expr)
expr
(let ((e1 (simplify (car expr)))
(e2 (simplify (caddr expr))))
(case (cadr expr)
('+ (apply+ e1 e2))
('* (apply* e1 e2))))))
Monday, 29 April 2019
The “Blub Paradox” and C++
is there some powerful language feature or idiom that you make use of in a language that would be hard to conceptualize or implement if you were writing only in c++?Are there any useful concepts or techniques that you have encountered in other languages that you would have found difficult to conceptualize had you been writing or "thinking" in c++?
Register allocation and calling conventions
Runtime code generation
EVAL
to learn about metacircular evaluation.Manipulating trees
Purely functional data structures
Tail calls
Concurrency
Monday, 22 April 2019
In terms of performance speed, is Swift faster than Java?
 nbody: swift 24.09s vs Java 22.6s.
 fannuckredux: swift 59.5s vs Java 17.41s. Swift is 3.4x slower than Java.
 spectralnorm: swift 15.7s vs Java 4.28s. Swift is 3.7x slower than Java.
 Swift running 11x slower than C because "The dynamic memory allocation turned out to be a major performance killer" in C vs Swift 1.2 and Getting Killed
 "Swift is 4x slower than Objective C" Mobile App Performance,
 "Swift is 24x slower than C++" Swift, C++ Performance
 Swift serializing JSON 15x slower than Objective C in Swift Performance: Too Slow For Production
 Memorysafe Swift is running 22x slower than C in Swift v1.2 Î²2 Performance.
 Swift ran up to 10,000x slower than C++ in Swift performance: sorting arrays although memorysafe Swift is now (Xcode 6 beta 5) running as fast as C there with ordinary optimisations enabled.
Tuesday, 6 November 2018
Benchmarking in the web age
The TechEmpower website contains some fascinating benchmarks of servers. The results on this benchmark of multiple requests to servers provide some insight into the performance characteristics of .NET on a modern problem. Specifically, the C# on ASP.NET Core solutions range from 2.580× slower than fastest solution which is written in Rust. In fact, C# is beaten by the following programming languages in order:
 Rust
 Java
 Kotlin
 Go
 C
 Perl
 Clojure
 PHP
 C++
Furthermore, .NET Core is Microsoft's new improved and faster version of .NET aimed specifically at these kinds of tasks. So why is it beaten by all those languages? I suspect that a large part of this is the change in workload from the kind of number crunching .NET was designed for to a modern stringheavy workload and I suspect .NET's GC isn't as optimised for this as the JVM is. As we have found, .NET has really poor support for JSON compared to other languages and frameworks, with support fragmented across many nonstandard libraries. Amusingly, OCaml has standardised on Yojson which we have found to be much faster and more reliable despite the fact that it is a hobby project.
Another interesting observation is that C and C++ are doing surprisingly badly. The fact that Perl and PHP are beating C++ is doubtless because they are calling out to C code but that seems less likely for Rust, Java, Kotlin and Go. One interesting hypothesis I read on the interwebs is that C++ developers tend to copy strings a lot in order to avoid memory management bugs whereas Rust protects its developers from such bugs so they can be more efficient without having to worry. If true, that would be quite amazing because the userland code behind these benchmarks isn't that big and, yet, C++ has already failed to scale and has developers reaching for onesizefitsallbadly solutions.
Whatever the case, one thing is clear: the time is ripe for new languages, VMs and frameworks.