## Friday, 4 May 2012

### Speeds of different kinds of "functions" in Mathematica

Note that you are never really "executing functions" in Mathematica. You are just applying rewrite rules to change one expression into another.
Consider mapping the `Sqrt` function over a packed array of floating point numbers. The fastest solution in Mathematica is to apply the `Sqrt` function directly to the packed array because it happens to implement exactly what we want and is optimized for this special case:
``````In := N@Range;
In := Sqrt[xs]; // AbsoluteTiming
Out = {0.0060000, Null}
``````
We might define a global rewrite rule that has terms of the form `sqrt[x]` rewritten to `Sqrt[x]` such that the square root will be calculated:
``````In := Clear[sqrt];
sqrt[x_] := Sqrt[x];
Map[sqrt, xs]; // AbsoluteTiming
Out = {0.4800007, Null}
``````
Note that this is ~100× slower than the previous solution.
Alternatively, we might define a global rewrite rule that replaces the symbol `sqrt` with a lambda function that invokes `Sqrt`:
``````In := Clear[sqrt];
sqrt = Function[{x}, Sqrt[x]];
Map[sqrt, xs]; // AbsoluteTiming
Out = {0.0500000, Null}
``````
Note that this is ~10× faster than the previous solution.
Why? Because the slow second solution is looking up the rewrite rule `sqrt[x_] :> Sqrt[x]` in the inner loop (for each element of the array) whereas the fast third solution looks up the value`Function[...]` of the symbol `sqrt` once and then applies that lambda function repeatedly. In contrast, the fastest first solution is a loop calling `sqrt` written in C. So searching the global rewrite rules is extremely expensive and term rewriting is expensive.
If so, why is `Sqrt` ever fast? You might expect a 2× slowdown instead of 10× because we've replaced one lookup for `Sqrt` with two lookups for `sqrt` and `Sqrt` in the inner loop but this is not so because`Sqrt` has the special status of being a built-in function that will be matched in the core of the Mathematica term rewriter itself rather than via the general-purpose global rewrite table.
Other people have described much smaller performance differences between similar functions. I believe the performance differences in those cases are just minor differences in the exact implementation of Mathematica's internals. The biggest issue with Mathematica is the global rewrite table. In particular, this is where Mathematica diverges from traditional term-level interpreters.
You can learn a lot about Mathematica's performance by writing mini Mathematica implementations. In this case, the above solutions might be compiled to (for example) F#. The array may be created like this:
``````> let xs = [|1.0..100000.0|];;
...
``````
The built-in `sqrt` function can be converted into a closure and given to the `map` function like this:
``````> Array.map sqrt xs;;
Real: 00:00:00.006, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
...
``````
This takes 6ms just like `Sqrt[xs]` in Mathematica. But that is to be expected because this code has been JIT compiled down to machine code by .NET for fast evaluation.
Looking up rewrite rules in Mathematica's global rewrite table is similar to looking up the closure in a dictionary keyed on its function name. Such a dictionary can be constructed like this in F#:
``````> open System.Collections.Generic;;
> let fns = Dictionary<string, (obj -> obj)>(dict["sqrt", unbox >> sqrt >> box]);;
``````
This is similar to the `DownValues` data structure in Mathematica, except that we aren't searching multiple resulting rules for the first to match on the function arguments.
The program then becomes:
``````> Array.map (fun x -> fns.["sqrt"] (box x)) xs;;
Real: 00:00:00.044, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
...
``````
Note that we get a similar 10× performance degradation due to the hash table lookup in the inner loop.
An alternative would be to store the `DownValues` associated with a symbol in the symbol itself in order to avoid the hash table lookup.
We can even write a complete term rewriter in just a few lines of code. Terms may be expressed as values of the following type:
``````> type expr =
| Float of float
| Symbol of string
| Packed of float []
| Apply of expr * expr [];;
``````
Note that `Packed` implements Mathematica's packed lists, i.e. unboxed arrays.
The following `init` function constructs a `List` with `n` elements using the function `f`, returning a`Packed` if every return value was a `Float` or a more general `Apply(Symbol "List", ...)`otherwise:
``````> let init n f =
let rec packed ys i =
if i=n then Packed ys else
match f i with
| Float y ->
ys.[i] <- y
packed ys (i+1)
| y ->
Apply(Symbol "List", Array.init n (fun j ->
if j<i then Float ys.[i]
elif j=i then y
else f j))
packed (Array.zeroCreate n) 0;;
val init : int -> (int -> expr) -> expr``````
The following `rule` function uses pattern matching to identify expressions that it can understand and replaces them with other expressions:
``````> let rec rule = function
| Apply(Symbol "Sqrt", [|Float x|]) ->
Float(sqrt x)
| Apply(Symbol "Map", [|f; Packed xs|]) ->
init xs.Length (fun i -> rule(Apply(f, [|Float xs.[i]|])))
| f -> f;;
val rule : expr -> expr``````
Note that the type of this function `expr -> expr` is characteristic of term rewriting: rewriting replaces expressions with other expressions rather than reducing them to values.
Our program can now be defined and executed by our custom term rewriter:
``````> rule (Apply(Symbol "Map", [|Symbol "Sqrt"; Packed xs|]));;
Real: 00:00:00.049, CPU: 00:00:00.046, GC gen0: 24, gen1: 0, gen2: 0
``````
We've recovered the performance of `Map[Sqrt, xs]` in Mathematica!
We can even recover the performance of `Sqrt[xs]` by adding an appropriate rule:
``````| Apply(Symbol "Sqrt", [|Packed xs|]) ->
Packed(Array.map sqrt xs)
``````
I wrote an article on term rewriting in F#.