 FSharp.Json
 SpanJson.Fsharp.Formatter
 Utf8Json.FSharpExtensions
 FsPickler.Json
 Falanx.Proto.Codec.Json
 Fleece.FSharpData
 EdIlyin.FSharp.Elm.Core
 FSharpEnt.Common
 FsJson
 Giraffe.JsonTherapy
 Kephas.Serialization.Json
 Vertigo.Json
 FSharp.Azure.Storage
 FSharp.Data
 FSharp.SimpleJson
 Newtonsoft.Json.FSharp
 FSharpUtils.Newtonsoft.JsonValue
 Fable.Remoting.Json 2.2.0
 Fable.SimpleJson
 Thoth.Json 2.5.0
 Thoth.Json.Net 2.5.0
 JsonFSharp
 Jet.JsonNet.Converters
The Flying Frog Blog
Wednesday, 5 February 2020
On the importance of Serialization
Serialization is a core feature that all modern languages should not only provide but be designed for. By not doing this, .NET and F# gave rise to a cottage industry of JSON serializers:
Saturday, 19 October 2019
Why does Java still feel bloated nowadays compared to other languages with garbage collection?
"We were after the C++ programmers. We managed to drag a lot of them about halfway to Lisp."  Guy Steele.
The fundamental problem with Java is that Lisp sucks. Like Lisp, Java was poorly designed and then poorly implemented. McCarthy had the excuse that he didn't know any better. Whoever made Java was just ignorant.
A static type system can have many benefits. Types can detect large classes of bugs. For example, you can eliminate null reference exceptions. Types can be used to dramatically improve performance. For example, by eliminating almost all runtime type checks and by providing polymorphism with no performance impact. Whoever made Java apparently had a very bad understanding of type theory because the design of the JVM not only failed to leverage static types in any way (the JVM immediately discards almost all type information and heap allocated almost everything whether it makes sense to or not, just like Lisp) but they even defined it with fundamental bugs in the type system (Java Arrays Break Type Safety).
But bad design was only half of the story. Sun's JVM was also badly implemented. Faced with Lisplike bad performance they began frantically building on their dodgy foundation by adding layers upon layers of progressively more sophisticated optimisations that were mostly intended to alleviate the GC due to the ridiculous heap allocation rate. Advanced parallel garbage collection algorithms to parallelize the handling of huge number of unnecessary heap allocations. Escape analysis to try to replace some of the heap allocated locals with stack allocations. And so on. Java ended up rediscovering the sufficiently smart compiler myth from Lisp: trying to optimise around a fundamentally inefficient foundation is like putting lipstick on a pig.
Overall, the result was very bad for JIT compilation performance and also particularly bad for latency and throughput isn't great. That's why Java is so laggy.
.NET had the second mover advantage and, with .NET 2.0 (2005), adopted a more sophisticated foundation with value types and reified generics. This goes a long way to leveraging static type information. Consequently, there is a lot less stress on and contention for the .NET GC and it can achieve good concurrent performance without the same sophisticated features. Simple AheadofTime (AoT) compilation, no escape analysis and so on. Microsoft kept the implementation of .NET lean, stripping out any research ideas that proved to be dispensible. Overall, the result was good for compile times, throughput and latency even though most of the core Framework was written by people who did not understand how to leverage the core design (e.g. you cannot Parse without heap allocating and tuples are heap allocated).
Python (1991) and OCaml (1996) both have simple foundations but without advanced features like concurrent collection (both do memory management under a global lock). Consequently, they achieve reasonable performance on one core and, in particular, good latency.
The fundamental problem with Java is that Lisp sucks. Like Lisp, Java was poorly designed and then poorly implemented. McCarthy had the excuse that he didn't know any better. Whoever made Java was just ignorant.
A static type system can have many benefits. Types can detect large classes of bugs. For example, you can eliminate null reference exceptions. Types can be used to dramatically improve performance. For example, by eliminating almost all runtime type checks and by providing polymorphism with no performance impact. Whoever made Java apparently had a very bad understanding of type theory because the design of the JVM not only failed to leverage static types in any way (the JVM immediately discards almost all type information and heap allocated almost everything whether it makes sense to or not, just like Lisp) but they even defined it with fundamental bugs in the type system (Java Arrays Break Type Safety).
But bad design was only half of the story. Sun's JVM was also badly implemented. Faced with Lisplike bad performance they began frantically building on their dodgy foundation by adding layers upon layers of progressively more sophisticated optimisations that were mostly intended to alleviate the GC due to the ridiculous heap allocation rate. Advanced parallel garbage collection algorithms to parallelize the handling of huge number of unnecessary heap allocations. Escape analysis to try to replace some of the heap allocated locals with stack allocations. And so on. Java ended up rediscovering the sufficiently smart compiler myth from Lisp: trying to optimise around a fundamentally inefficient foundation is like putting lipstick on a pig.
Overall, the result was very bad for JIT compilation performance and also particularly bad for latency and throughput isn't great. That's why Java is so laggy.
.NET had the second mover advantage and, with .NET 2.0 (2005), adopted a more sophisticated foundation with value types and reified generics. This goes a long way to leveraging static type information. Consequently, there is a lot less stress on and contention for the .NET GC and it can achieve good concurrent performance without the same sophisticated features. Simple AheadofTime (AoT) compilation, no escape analysis and so on. Microsoft kept the implementation of .NET lean, stripping out any research ideas that proved to be dispensible. Overall, the result was good for compile times, throughput and latency even though most of the core Framework was written by people who did not understand how to leverage the core design (e.g. you cannot Parse without heap allocating and tuples are heap allocated).
Python (1991) and OCaml (1996) both have simple foundations but without advanced features like concurrent collection (both do memory management under a global lock). Consequently, they achieve reasonable performance on one core and, in particular, good latency.
Thursday, 22 August 2019
Harrop's Tenth Rule
I was just reading one of the many selfsoothing essays from the Lisp community and remembered my parody of Greenspun's Tenth Rule:
Whereas Standard ML had a formal specification and significant parts of OCaml have been proven correct, Qi and Shen are adhoc and informally specified.
Whereas Haskell and OCaml are mature languages (my company had 1,000 industrial clients who were using OCaml!), Qi and Shen boast ~100 tests in the compiler suite and no users at all (their subreddit has seen 14 posts over the past 9 years!) so we can assume they will be bug ridden.
Back in the day, Mark Tarver ran a benchmark comparing his Qi code to my OCaml code and found that Qi was 1.8x slower.
So this is exactly what I was talking about when I first wrote my tenth rule.
"Any sufficiently complicated Lisp program contains an ad hoc informallyspecified bugridden slow implementation of half of ML."Specifically, I was amazed by the following quote from that essay:
"Dr. Mark Tarver — twicequoted, above — wrote a dialect of Lisp called Qi. It is less than ten thousand lines of macros running atop Clisp. It implements most of the unique features of Haskell and OCaml. In some respects, Qi surpasses them. For instance, Qi's type inferencing engine is Turing complete. In a world where teams of talented academics were needed to write Haskell, one man, Dr. Tarver wrote Qi all by his lonesome.
Read that paragraph, again, and extrapolate."Several misconceptions are packed into this one paragraph.
 Qi and Shen are nowhere near implementing "most of the unique features of Haskell and OCaml".
 As an integral part of the compilation process, the objective is to make type inference as efficient as possible in all cases of practical interest. Making it Turing complete is precisely the opposite so far from "surpassing" conventional solutions this is far worse in practice. Furthermore, C++ templates and F# type providers already demonstrated how awful compile times are when compilation stages are made Turing complete.
 The "teams of talented academics" weren't just writing the compiler, they were inventing the languages.
Whereas Standard ML had a formal specification and significant parts of OCaml have been proven correct, Qi and Shen are adhoc and informally specified.
Whereas Haskell and OCaml are mature languages (my company had 1,000 industrial clients who were using OCaml!), Qi and Shen boast ~100 tests in the compiler suite and no users at all (their subreddit has seen 14 posts over the past 9 years!) so we can assume they will be bug ridden.
Back in the day, Mark Tarver ran a benchmark comparing his Qi code to my OCaml code and found that Qi was 1.8x slower.
So this is exactly what I was talking about when I first wrote my tenth rule.
Sunday, 30 June 2019
An interesting comment buried in a discussion:
“in many garbage collected languages, it becomes orders of magnitude slower as the size of objects increases”
We can test your hypothesis. The following F# code allocates arrays of different lengths and measures the time taken:
 let rec test n =
 if n < 100000000 then
 let timer = System.Diagnostics.Stopwatch.StartNew()
 let mutable c = 0L
 for _ in 1..100000000 / n do
 c < c + int64 (Array.init n byte).Length
 c < c / int64 n
 printfn "%d, %g" n (timer.Elapsed.TotalSeconds / float c)
 test (n+1 + (n >>> 3))
The results show that the time taken to allocate objects up to 100MiB on 64bit .NET 4.7.2 is linear in object size as expected. No "orders of magnitude" slowdown.
I'm not surprised because I have never seen this "orders of magnitude" slowdown that you refer to. The only GC'd language you referred to is Java so I assume you are familiar with Java. Are you saying that the behaviour is different with Java? Can you port my 9line F# program to Java and see what graph you get?
"Overall new+delete is always faster than GC for all sizes."
Are you really saying that the same program written with new+delete will always be faster than with new+GC?
If so, there are many counter examples. Here are implementations of Hans Boehm's binary trees memory allocation benchmark written in C++ using the default new+delete and in F# using the tracing GC:
Here are the results:
The tracing GC is 7x faster than new+delete on average. Furthermore, the first thing you would do to optimise the C++ is replace the individual allocations with a pool allocator precisely because the default new and delete are so slow.
Here is one from Microsoft: Performance Quiz #6 — Chinese/English Dictionary reader
Raymond Chen was blogging about developing a Chinese/English dictionary in C++. Rico Mariani ported it to C# and noted that the unoptimised C# code was faster than the first few optimised C++ implementations. The C++ was eventually optimised to beat the C# by removing all OOP, all RAII and all new+delete precisely because they are so inefficient. The final C++ code was effectively just C.
"As the number of objects increases, GC becomes slower and slower making the system unusable, and its really a no competition."
That is theoretically true for tracing GCs but not reference counting GCs. However, the effect is so small that GCs are used with heap sizes up to 8TB (see Java Heap Size  Azul Systems, Inc.). For example, the GC'd language OCaml running on a supercomputer once held the record for the largest symbolic computation ever performed (Archives of the Caml mailing list > Message from Thomas Fischbacher). I’ve used OCaml on supercomputers myself.
"Most GC based systems managing large amount of memory/objects today use offheap memory for that reason."
Firstly, you've written "memory/objects" so note that your statement is true for objects and not for memory. Secondly, I've worked on large systems for decades (my background is in HPC) and have never seen anyone move to offheap memory for that reason. Were they using Java?
"With most compilers RAII injects exactly as much code as necessary for cleanup and nothing more."
Firstly, you don't need to inject any code at the end of scope for cleanup. After all, tracing GC's don't. Secondly, virtual destructors are an obvious counter example. To avoid undefined behaviour when delete'ing a derived class via a pointer to its base class the base class is given a virtual destructor culminating in many expensive dynamic jumps to noops.
"Most implementations use some form of DFA not code generation."
From the .NET docs "If a Regex object is constructed with the RegexOptions.Compiled option, it compiles the regular expression to explicit MSIL code instead of highlevel regular expression internal instructions. This allows .NET's justintime (JIT) compiler to convert the expression to native machine code for higher performance" Compilation and Reuse in Regular Expressions
"Also note that any kind of code generation implementation has a very high initialization cost (compilation + JIT etc)."
This quick test shows C++ running 17x slower: C++ vs .NET regex performance
"The boost/pcre2 regex implementations are faster than Java, for example."
I'm sure your observation is correct but the correct conclusion is that Java is slow.
"Resource leaks (including memory) in GC based systems are extremely common."
Not in my experience but I suspect you're talking specifically about Java.
"In general my experience has been that programmers who have had good experience in C++ are usually much better off even when dealing with GC languages because they are overall more careful."
Experienced people are generally better.
Labels:
c++,
f#,
garbage collection,
java,
memory management,
performance
Saturday, 4 May 2019
Qi, Lisp and O'Caml compared  performance shootout
An old article by Mark Tarver reproduced here for historical interest:
This test was conducted to determine the relative efficiency and code size of handcoded Lisp, O'Caml and Qi TurboE using the benchmark challenge provided by Jon Harrop. Dr Harrop also provided the Lisp source written by Andre Thieme, Nathan Forster and Pascal Constanza.
This test was conducted to determine the relative efficiency and code size of handcoded Lisp, O'Caml and Qi TurboE using the benchmark challenge provided by Jon Harrop. Dr Harrop also provided the Lisp source written by Andre Thieme, Nathan Forster and Pascal Constanza.
Thanks to many people who corrected these tests and made them possible.
TurboE is an unreleased version of Qi which factorises its patternmatching.
Jon Harrop Challenge Problem
Posted by Dr Harrop
The problem is to simplify symbolic expressions by applying the following rewrite rules from the leaves up:
rational n + rational m > rational(n + m)
rational n * rational m > rational(n * m)
symbol x > symbol x
0+f > f
f+0 > f
0*f > 0
f*0 > 0
1*f > f
f*1 > f
a+(b+c) > (a+b)+c
a*(b*c) > (a*b)*c
rational n * rational m > rational(n * m)
symbol x > symbol x
0+f > f
f+0 > f
0*f > 0
f*0 > 0
1*f > f
f*1 > f
a+(b+c) > (a+b)+c
a*(b*c) > (a*b)*c
Thanks to
Nathan Froyd
Andre Thieme
Pascal Constanza
Dan Bensen
for providing code
Jon Harrop (for both providing code and providing motivation to develop TurboE)
Marcus Breing (for correcting my first run)
Nathan Froyd
Andre Thieme
Pascal Constanza
Dan Bensen
for providing code
Jon Harrop (for both providing code and providing motivation to develop TurboE)
Marcus Breing (for correcting my first run)
The Results
The benchmark was to simplify [* x [+ [+ [* 12 0] [+ 23 8]] y]]. 10^7 iterations were used as the benchmark. A 2.6GHz, 500Mb RAM machine was used to drive the test. Windows XP was the OS. Times measure only processor time (not real time) and are rounded to 1/10th of a second.
A 2.6GHz, 500Mb RAM machine was used to drive the test. Windows XP was the OS. Times measure only processor time (not real time) and are rounded to 1/10th of a second.
Key:
SBCL 1.0  
Qi TurboE under SBCL 1.0  
O'Caml 3.09.3 
Mark

Nathan

Andre

Pascal

Dan

Jon
 
time

3.6s

3.4s

15s

8.2s

5.1s

2.0s

loc

15

39

23

24

34

15

time vs OCaml  1.8x  1.7x  7.5x  4.1x  2.6x  1x 
The timer used in Lisp was
(defun tt (N) (gc) (time (test N)))
(defun test (N)
(cond ((zerop N) 0)
(t (simplify *expr*) (test (1 N)))))
(defun test (N)
(cond ((zerop N) 0)
(t (simplify *expr*) (test (1 N)))))
took .2 s for 10^7 iterations with (simplify *expr*) removed. All timings were adjusted by deducting .2s from each.
I could run Jon 's solution in interpreted mode, but could not get my O'Caml compiler to work. Based on the ratio of interpreted time to his compiled time, I could construct a projected time.
The Solutions
Language: Qi
Author: Mark Tarver
Length: 15 lines
Author: Mark Tarver
Length: 15 lines
(define simplify
[Op A B] > (s Op (simplify A) (simplify B))
A > A)
[Op A B] > (s Op (simplify A) (simplify B))
A > A)
(define s
+ M N > (+ M N) where (and (number? M) (number? N))
+ 0 F > F
+ F 0 > F
+ A [+ B C] > (simplify [+ [+ A B] C])
* M N > (* M N) where (and (number? M) (number? N))
* 0 F > 0
* F 0 > 0
* F 1 > F
* 1 F > F
* A [* B C] > (simplify [* [* A B] C])
Op A B > [Op A B])
+ M N > (+ M N) where (and (number? M) (number? N))
+ 0 F > F
+ F 0 > F
+ A [+ B C] > (simplify [+ [+ A B] C])
* M N > (* M N) where (and (number? M) (number? N))
* 0 F > 0
* F 0 > 0
* F 1 > F
* 1 F > F
* A [* B C] > (simplify [* [* A B] C])
Op A B > [Op A B])
* Note I have used prefix notation for this problem though Dr Harrop's formulation uses infix. The first Lisp solutions followed prefix and so I chose to make my program comparable.*
The generated code is
(DEFUN simplify (V148)
(BLOCK NIL
(TAGBODY
(IF (CONSP V148)
(LET ((Cdr159 (CDR V148)))
(IF (CONSP Cdr159)
(LET ((Cdr158 (CDR Cdr159)))
(IF (CONSP Cdr158)
(IF (NULL (CDR Cdr158))
(RETURN
(s (CAR V148) (simplify (CAR Cdr159))
(simplify (CAR Cdr158))))
(GO tag154))
(GO tag154)))
(GO tag154))))
tag154
(RETURN V148))))
(DEFUN s (V149 V150 V151)
(BLOCK NIL
(TAGBODY
(IF (EQ '+ V149)
(IF (AND (NUMBERP V150) (NUMBERP V151))
(RETURN (THE NUMBER (+ V150 V151)))
(IF (EQL 0 V150) (RETURN V151)
(IF (EQL 0 V151) (RETURN V150)
(IF (CONSP V151)
(LET ((Cdr170 (CDR V151)))
(IF (EQ '+ (CAR V151))
(IF (CONSP Cdr170)
(LET ((Cdr169 (CDR Cdr170)))
(IF (CONSP Cdr169)
(IF (NULL (CDR Cdr169))
(RETURN
(simplify
(CONS '+
(CONS
(LIST '+ V150
(CAR Cdr170))
Cdr169))))
(GO tag160))
(GO tag160)))
(GO tag160))
(GO tag160)))
(GO tag160))))))
tag160
(TAGBODY
(IF (EQ '* V149)
(IF (AND (NUMBERP V150) (NUMBERP V151))
(RETURN (THE NUMBER (* V150 V151)))
(IF (EQL 0 V150) (RETURN 0)
(IF (EQL 0 V151) (RETURN 0)
(IF (EQL 1 V151) (RETURN V150)
(IF (EQL 1 V150) (RETURN V151)
(IF (CONSP V151)
(LET ((Cdr183 (CDR V151)))
(IF (EQ '* (CAR V151))
(IF (CONSP Cdr183)
(LET ((Cdr182 (CDR Cdr183)))
(IF (CONSP Cdr182)
(IF (NULL (CDR Cdr182))
(RETURN
(simplify
(CONS '*
(CONS
(LIST '* V150
(CAR
Cdr183))
Cdr182))))
(GO tag171))
(GO tag171)))
(GO tag171))
(GO tag171)))
(GO tag171))))))))
tag171
(RETURN (LIST V149 V150 V151))))))
(BLOCK NIL
(TAGBODY
(IF (CONSP V148)
(LET ((Cdr159 (CDR V148)))
(IF (CONSP Cdr159)
(LET ((Cdr158 (CDR Cdr159)))
(IF (CONSP Cdr158)
(IF (NULL (CDR Cdr158))
(RETURN
(s (CAR V148) (simplify (CAR Cdr159))
(simplify (CAR Cdr158))))
(GO tag154))
(GO tag154)))
(GO tag154))))
tag154
(RETURN V148))))
(DEFUN s (V149 V150 V151)
(BLOCK NIL
(TAGBODY
(IF (EQ '+ V149)
(IF (AND (NUMBERP V150) (NUMBERP V151))
(RETURN (THE NUMBER (+ V150 V151)))
(IF (EQL 0 V150) (RETURN V151)
(IF (EQL 0 V151) (RETURN V150)
(IF (CONSP V151)
(LET ((Cdr170 (CDR V151)))
(IF (EQ '+ (CAR V151))
(IF (CONSP Cdr170)
(LET ((Cdr169 (CDR Cdr170)))
(IF (CONSP Cdr169)
(IF (NULL (CDR Cdr169))
(RETURN
(simplify
(CONS '+
(CONS
(LIST '+ V150
(CAR Cdr170))
Cdr169))))
(GO tag160))
(GO tag160)))
(GO tag160))
(GO tag160)))
(GO tag160))))))
tag160
(TAGBODY
(IF (EQ '* V149)
(IF (AND (NUMBERP V150) (NUMBERP V151))
(RETURN (THE NUMBER (* V150 V151)))
(IF (EQL 0 V150) (RETURN 0)
(IF (EQL 0 V151) (RETURN 0)
(IF (EQL 1 V151) (RETURN V150)
(IF (EQL 1 V150) (RETURN V151)
(IF (CONSP V151)
(LET ((Cdr183 (CDR V151)))
(IF (EQ '* (CAR V151))
(IF (CONSP Cdr183)
(LET ((Cdr182 (CDR Cdr183)))
(IF (CONSP Cdr182)
(IF (NULL (CDR Cdr182))
(RETURN
(simplify
(CONS '*
(CONS
(LIST '* V150
(CAR
Cdr183))
Cdr182))))
(GO tag171))
(GO tag171)))
(GO tag171))
(GO tag171)))
(GO tag171))))))))
tag171
(RETURN (LIST V149 V150 V151))))))
Language: OCaml
Author: Jon Harrop
Length: 15 lines
Author: Jon Harrop
Length: 15 lines
let rec ( +: ) f g = match f, g with
 `Int n, `Int m > `Int (n +/ m)
 `Int (Int 0), e  e, `Int (Int 0) > e
 f, `Add(g, h) > f +: g +: h
 f, g > `Add(f, g)
 `Int n, `Int m > `Int (n +/ m)
 `Int (Int 0), e  e, `Int (Int 0) > e
 f, `Add(g, h) > f +: g +: h
 f, g > `Add(f, g)
let rec ( *: ) f g = match f, g with
 `Int n, `Int m > `Int (n */ m)
 `Int (Int 0), e  e, `Int (Int 0) > `Int (Int 0)
 `Int (Int 1), e  e, `Int (Int 1) > e
 f, `Mul(g, h) > f *: g *: h
 f, g > `Mul(f, g)
 `Int n, `Int m > `Int (n */ m)
 `Int (Int 0), e  e, `Int (Int 0) > `Int (Int 0)
 `Int (Int 1), e  e, `Int (Int 1) > e
 f, `Mul(g, h) > f *: g *: h
 f, g > `Mul(f, g)
let rec simplify = function
 `Int _  `Var _ as f > f
 `Add (f, g) > simplify f +: simplify g
 `Mul (f, g) > simplify f *: simplify g
 `Int _  `Var _ as f > f
 `Add (f, g) > simplify f +: simplify g
 `Mul (f, g) > simplify f *: simplify g
Language: Lisp
Author: Andre Thieme
Length: 23 lines
Author: Andre Thieme
Length: 23 lines
(defun simplify (a)
(if (atom a)
a
(destructuringbind (op x y) a
(let* ((f (simplify x))
(g (simplify y))
(nf (numberp f))
(ng (numberp g))
(+? (eq '+ op))
(*? (eq '* op)))
(cond
((and +? nf ng) (+ f g))
((and +? nf (zerop f)) g)
((and +? ng (zerop g)) f)
((and (listp g) (eq op (first g)))
(destructuringbind (op2 u v) g
(simplify `(,op (,op ,f ,u) ,v))))
((and *? nf ng) (* f g))
((and *? (or (and nf (zerop f))
(and ng (zerop g)))) 0)
((and *? nf (= 1 f)) g)
((and *? ng (= 1 g)) f)
(t `(,op ,f ,g)))))))
(if (atom a)
a
(destructuringbind (op x y) a
(let* ((f (simplify x))
(g (simplify y))
(nf (numberp f))
(ng (numberp g))
(+? (eq '+ op))
(*? (eq '* op)))
(cond
((and +? nf ng) (+ f g))
((and +? nf (zerop f)) g)
((and +? ng (zerop g)) f)
((and (listp g) (eq op (first g)))
(destructuringbind (op2 u v) g
(simplify `(,op (,op ,f ,u) ,v))))
((and *? nf ng) (* f g))
((and *? (or (and nf (zerop f))
(and ng (zerop g)))) 0)
((and *? nf (= 1 f)) g)
((and *? ng (= 1 g)) f)
(t `(,op ,f ,g)))))))
Language: Lisp
Author: Nathan Froyd
Length: 39 lines
Author: Nathan Froyd
Length: 39 lines
(defun simplifynoredundantchecks (xexpr)
(if (atom xexpr)
xexpr
(let ((op (first xexpr))
(z (second xexpr))
(y (third xexpr)))
(let* ((f (simplifynoredundantchecks z))
(g (simplifynoredundantchecks y))
(nf (numberp f))
(ng (numberp g)))
(tagbody
START
(if (eq '+ op) (go OPTIMIZEPLUS) (go TESTMULTIPLY))
OPTIMIZEPLUS
(when (and nf ng) (returnfrom simplifynoredundantchecks (+ f g)))
TESTPLUSZEROS
(when (eql f 0) (returnfrom simplifynoredundantchecks g))
(when (eql g 0) (returnfrom simplifynoredundantchecks f))
(go REARRANGEEXPR)
TESTMULTIPLY
(unless (eq '* op) (go REARRANGEEXPR))
OPTIMIZEMULTIPLY
(when (and nf ng) (returnfrom simplifynoredundantchecks (* f g)))
TESTMULTIPLYZEROSANDONES
(when (or (eql f 0) (eql g 0)) (returnfrom simplifynoredundantchecks 0))
(when (eql f 1) (returnfrom simplifynoredundantchecks g))
(when (eql g 1) (returnfrom simplifynoredundantchecks f))
REARRANGEEXPR
(when (and (listp g) (eq op (first g)))
(let ((op2 (first g))
(u (second g))
(v (third g)))
(declare (ignore op2))
(returnfrom simplifynoredundantchecks
(simplifynoredundantchecks (list op (list op f u) v)))))
MAYBECONSEXPR
(if (and (eq f z) (eq g y))
(returnfrom simplifynoredundantchecks xexpr)
(returnfrom simplifynoredundantchecks (list op f g))))))))
(if (atom xexpr)
xexpr
(let ((op (first xexpr))
(z (second xexpr))
(y (third xexpr)))
(let* ((f (simplifynoredundantchecks z))
(g (simplifynoredundantchecks y))
(nf (numberp f))
(ng (numberp g)))
(tagbody
START
(if (eq '+ op) (go OPTIMIZEPLUS) (go TESTMULTIPLY))
OPTIMIZEPLUS
(when (and nf ng) (returnfrom simplifynoredundantchecks (+ f g)))
TESTPLUSZEROS
(when (eql f 0) (returnfrom simplifynoredundantchecks g))
(when (eql g 0) (returnfrom simplifynoredundantchecks f))
(go REARRANGEEXPR)
TESTMULTIPLY
(unless (eq '* op) (go REARRANGEEXPR))
OPTIMIZEMULTIPLY
(when (and nf ng) (returnfrom simplifynoredundantchecks (* f g)))
TESTMULTIPLYZEROSANDONES
(when (or (eql f 0) (eql g 0)) (returnfrom simplifynoredundantchecks 0))
(when (eql f 1) (returnfrom simplifynoredundantchecks g))
(when (eql g 1) (returnfrom simplifynoredundantchecks f))
REARRANGEEXPR
(when (and (listp g) (eq op (first g)))
(let ((op2 (first g))
(u (second g))
(v (third g)))
(declare (ignore op2))
(returnfrom simplifynoredundantchecks
(simplifynoredundantchecks (list op (list op f u) v)))))
MAYBECONSEXPR
(if (and (eq f z) (eq g y))
(returnfrom simplifynoredundantchecks xexpr)
(returnfrom simplifynoredundantchecks (list op f g))))))))
Language: Lisp
Author: Pascal Constanza
Length: 25 lines
Author: Pascal Constanza
Length: 25 lines
(defstruct add x y)
(defstruct mul x y)
(defstruct mul x y)
(defgeneric simplifyadd (x y)
(:method ((x number) (y number)) (+ x y))
(:method ((x (eql 0)) y) y)
(:method (x (y (eql 0))) x)
(:method (x (y add))
(simplifyadd (simplifyadd x (addx y)) (addy y)))
(:method (x y) (makeadd :x x :y y)))
(:method ((x number) (y number)) (+ x y))
(:method ((x (eql 0)) y) y)
(:method (x (y (eql 0))) x)
(:method (x (y add))
(simplifyadd (simplifyadd x (addx y)) (addy y)))
(:method (x y) (makeadd :x x :y y)))
(defgeneric simplifymul (x y)
(:method ((x number) (y number)) (* x y))
(:method ((x (eql 0)) y) 0)
(:method (x (y (eql 0))) 0)
(:method ((x (eql 1)) y) y)
(:method (x (y (eql 1))) x)
(:method (x (y mul))
(simplifymul (simplifymul x (mulx y)) (muly y)))
(:method (x y) (makemul :x x :y y)))
(:method ((x number) (y number)) (* x y))
(:method ((x (eql 0)) y) 0)
(:method (x (y (eql 0))) 0)
(:method ((x (eql 1)) y) y)
(:method (x (y (eql 1))) x)
(:method (x (y mul))
(simplifymul (simplifymul x (mulx y)) (muly y)))
(:method (x y) (makemul :x x :y y)))
(defgeneric simplify (exp)
(:method (exp) exp)
(:method ((exp add))
(simplifyadd (simplify (addx exp)) (simplify (addy exp))))
(:method ((exp mul))
(simplifymul (simplify (mulx exp)) (simplify (muly exp)))))
(:method (exp) exp)
(:method ((exp add))
(simplifyadd (simplify (addx exp)) (simplify (addy exp))))
(:method ((exp mul))
(simplifymul (simplify (mulx exp)) (simplify (muly exp)))))
Language: Lisp
Author: Dan Bensen
Length: 34 lines
Author: Dan Bensen
Length: 34 lines
(defmacro defop (func op opsymbl ident zerocase)
(let ((e1 (gensym "E1"))
(e2 (gensym "E2")))
`(defun ,func (,e1 ,e2)
(declare (optimize (speed 3)))
,(delete :nocase
`(case ,e1
,zerocase
(,ident ,e2)
(t ,(delete :nocase
`(case ,e2
,zerocase
(,ident ,e1)
(t (cond
((and (rationalp ,e1)
(rationalp ,e2))
(,op ,e1 ,e2))
((atom ,e2)
(list ,e1 ,opsymbl ,e2))
(t (case
(cadr ,e2)
(,opsymbl (,func (,func ,e1 (car ,e2))
(caddr ,e2)))
(t (list ,e1 ,opsymbl ,e2))))))))))))))
(let ((e1 (gensym "E1"))
(e2 (gensym "E2")))
`(defun ,func (,e1 ,e2)
(declare (optimize (speed 3)))
,(delete :nocase
`(case ,e1
,zerocase
(,ident ,e2)
(t ,(delete :nocase
`(case ,e2
,zerocase
(,ident ,e1)
(t (cond
((and (rationalp ,e1)
(rationalp ,e2))
(,op ,e1 ,e2))
((atom ,e2)
(list ,e1 ,opsymbl ,e2))
(t (case
(cadr ,e2)
(,opsymbl (,func (,func ,e1 (car ,e2))
(caddr ,e2)))
(t (list ,e1 ,opsymbl ,e2))))))))))))))
(defop apply+ + '+ 0 :nocase)
(defop apply* * '* 1 ( 0 0 ))
(defop apply* * '* 1 ( 0 0 ))
(defun simplify (expr)
(if (atom expr)
expr
(let ((e1 (simplify (car expr)))
(e2 (simplify (caddr expr))))
(case (cadr expr)
('+ (apply+ e1 e2))
('* (apply* e1 e2))))))
(if (atom expr)
expr
(let ((e1 (simplify (car expr)))
(e2 (simplify (caddr expr))))
(case (cadr expr)
('+ (apply+ e1 e2))
('* (apply* e1 e2))))))
Monday, 29 April 2019
The “Blub Paradox” and C++
Here is another of my answers from Stack Overflow that is getting down votes:
is there some powerful language feature or idiom that you make use of in a language that would be hard to conceptualize or implement if you were writing only in c++?Are there any useful concepts or techniques that you have encountered in other languages that you would have found difficult to conceptualize had you been writing or "thinking" in c++?
C++ makes many approaches intractable. I would go so far as to say that most of programming is hard to conceptualize if you limit yourself to C++. Here are some examples of problems that are much more easily solved in ways that C++ makes hard.
Register allocation and calling conventions
Many people think of C++ as a bare metal low level language but it really isn't. By abstracting away important details of the machine, C++ makes it hard to conceptualize practicalities like register allocation and calling conventions.
To learn about concepts like these I recommend having a go at some assembly language programming and check out this article about ARM code generation quality.
Runtime code generation
If you only know C++ then you probably think that templates are the beall and endall of metaprogramming. They aren't. In fact, they are an objectively bad tool for metaprogramming. Any program that manipulates another program is a metaprogram, including interpreters, compilers, computer algebra systems and theorem provers. Runtime code generation is a useful feature for this.
I recommend firing up a Scheme implementation and playing with
EVAL
to learn about metacircular evaluation.Manipulating trees
Trees are everywhere in programming. In parsing you have abstract syntax trees. In compilers you have IRs that are trees. In graphics and GUI programming you have scene trees.
This "Ridiculously Simple JSON Parser for C++" weighs in at just 484 LOC which is very small for C++. Now compare it with my own simple JSON parser which weighs in at just 60 LOC of F#. The difference is primarily because ML's algebraic datatypes and pattern matching (including active patterns) make it vastly easier to manipulate trees.
Check out redblack trees in OCaml too.
Purely functional data structures
Lack of GC in C++ makes it practically impossible to adopt some useful approaches. Purely functional data structures are one such tool.
For example, check out this 47line regular expression matcher in OCaml. The brevity is due largely to the extensive use of purely functional data structures. In particular, the use of dictionaries with keys that are sets. That is really hard to do in C++ because the stdlib dictionaries and sets are all mutable but you cannot mutate a dictionary's keys or you break the collection.
Logic programming and undo buffers are other practical examples where purely functional data structures make something that is hard in C++ really easy in other languages.
Tail calls
Not only does C++ not guarantee tail calls but RAII is fundamentally at odds with it because destructors get in the way of a call in tail position. Tail calls let you make an unbounded number of function calls using only a bounded amount of stack space. This is great for implementing state machines, including extensible state machines and it is a great "get out of jail free" card in many otherwiseawkward circumstances.
For example, check out this implementation of the 01 knapsack problem using continuationpassing style with memoization in F# from the finance industry. When you have tail calls, continuation passing style can be an obvious solution but C++ makes it intractable.
Concurrency
Another obvious example is concurrent programming. Although this is entirely possible in C++ it is extremely error prone compared to other tools, most notably communicating sequential processes as seen in languages like Erlang, Scala and F#.
Monday, 22 April 2019
In terms of performance speed, is Swift faster than Java?
This is an old 2017 answer of mine from Quora that has been deleted by moderators for violating their rules, kept here for historical interest:
Many people here are claiming "yes" that Swift is fast. However, all of the benchmark results I can find indicate that Swift is much slower than most other languages, up to 24x slower than C++:
 nbody: swift 24.09s vs Java 22.6s.
 fannuckredux: swift 59.5s vs Java 17.41s. Swift is 3.4x slower than Java.
 spectralnorm: swift 15.7s vs Java 4.28s. Swift is 3.7x slower than Java.
EDIT: The shootout appears to have been updated. The new Swift code is still memory unsafe and still compiled in memory unsafe Ounchecked mode. Swift now beats Java on 3/7 tasks (fannkuchredux, mandelbrot and binarytrees), roughly draws on two (nbody and fastaredux) and loses on two (fasta and spectral norm). Shame there is no memory safe Swift for comparison. I’d also note that many of these benchmarks are apples vs oranges comparisons with, for example, the Java and Swift implementations of binarytrees using completely different algorithms and data structures.
Other (nonJava) benchmarks found:
 Swift running 11x slower than C because "The dynamic memory allocation turned out to be a major performance killer" in C vs Swift 1.2 and Getting Killed
 "Swift is 4x slower than Objective C" Mobile App Performance,
 "Swift is 24x slower than C++" Swift, C++ Performance
 Swift serializing JSON 15x slower than Objective C in Swift Performance: Too Slow For Production
 Memorysafe Swift is running 22x slower than C in Swift v1.2 Î²2 Performance.
 Swift ran up to 10,000x slower than C++ in Swift performance: sorting arrays although memorysafe Swift is now (Xcode 6 beta 5) running as fast as C there with ordinary optimisations enabled.
My impression is that Swift was vastly slower than almost all other languages at many common tasks when it was first released but it has improved very rapidly. However, it still seems to be several times slower than C++ in many cases and, therefore, I expect it is still significantly slower than languages like F# and Scala.
I am also concerned about many of the benchmarks being used. The shootout is notoriously bad science with submissions subjectively "deoptimised" by its owner for being too fast rendering the results total garbage. Numerical benchmarks like DGEMM and FFTs are largely irrelevant. I am much more interested in symbolic performance, not just because that is more relevant to the code I write but because it will stress Swift's reference counting garbage collection which I suspect will be its Achilles heel. The JSON serialization benchmark was very interesting to me as a consequence. I am also disturbed by the use of optimisations that remove memory safety in Swift (Ounsafe) and the use of inaccurate numerics in C++ (ffastmath).
A post on Hacker News (Swift Performance: Too Slow for Production) says that shortlived objects in the JSON serializer are to blame for the poor performance. In other words, the reference counted memory management is the problem as I suspected.
Subscribe to:
Posts (Atom)