Friday, 4 May 2012

Speeds of different kinds of "functions" in Mathematica


Note that you are never really "executing functions" in Mathematica. You are just applying rewrite rules to change one expression into another.
Consider mapping the Sqrt function over a packed array of floating point numbers. The fastest solution in Mathematica is to apply the Sqrt function directly to the packed array because it happens to implement exactly what we want and is optimized for this special case:
In[1] := N@Range[100000];
In[2] := Sqrt[xs]; // AbsoluteTiming
Out[2] = {0.0060000, Null}
We might define a global rewrite rule that has terms of the form sqrt[x] rewritten to Sqrt[x] such that the square root will be calculated:
In[3] := Clear[sqrt];
         sqrt[x_] := Sqrt[x];
         Map[sqrt, xs]; // AbsoluteTiming
Out[3] = {0.4800007, Null}
Note that this is ~100× slower than the previous solution.
Alternatively, we might define a global rewrite rule that replaces the symbol sqrt with a lambda function that invokes Sqrt:
In[4] := Clear[sqrt];
         sqrt = Function[{x}, Sqrt[x]];
         Map[sqrt, xs]; // AbsoluteTiming
Out[4] = {0.0500000, Null}
Note that this is ~10× faster than the previous solution.
Why? Because the slow second solution is looking up the rewrite rule sqrt[x_] :> Sqrt[x] in the inner loop (for each element of the array) whereas the fast third solution looks up the valueFunction[...] of the symbol sqrt once and then applies that lambda function repeatedly. In contrast, the fastest first solution is a loop calling sqrt written in C. So searching the global rewrite rules is extremely expensive and term rewriting is expensive.
If so, why is Sqrt ever fast? You might expect a 2× slowdown instead of 10× because we've replaced one lookup for Sqrt with two lookups for sqrt and Sqrt in the inner loop but this is not so becauseSqrt has the special status of being a built-in function that will be matched in the core of the Mathematica term rewriter itself rather than via the general-purpose global rewrite table.
Other people have described much smaller performance differences between similar functions. I believe the performance differences in those cases are just minor differences in the exact implementation of Mathematica's internals. The biggest issue with Mathematica is the global rewrite table. In particular, this is where Mathematica diverges from traditional term-level interpreters.
You can learn a lot about Mathematica's performance by writing mini Mathematica implementations. In this case, the above solutions might be compiled to (for example) F#. The array may be created like this:
> let xs = [|1.0..100000.0|];;
...
The built-in sqrt function can be converted into a closure and given to the map function like this:
> Array.map sqrt xs;;
Real: 00:00:00.006, CPU: 00:00:00.015, GC gen0: 0, gen1: 0, gen2: 0
...
This takes 6ms just like Sqrt[xs] in Mathematica. But that is to be expected because this code has been JIT compiled down to machine code by .NET for fast evaluation.
Looking up rewrite rules in Mathematica's global rewrite table is similar to looking up the closure in a dictionary keyed on its function name. Such a dictionary can be constructed like this in F#:
> open System.Collections.Generic;;
> let fns = Dictionary<string, (obj -> obj)>(dict["sqrt", unbox >> sqrt >> box]);;
This is similar to the DownValues data structure in Mathematica, except that we aren't searching multiple resulting rules for the first to match on the function arguments.
The program then becomes:
> Array.map (fun x -> fns.["sqrt"] (box x)) xs;;
Real: 00:00:00.044, CPU: 00:00:00.031, GC gen0: 0, gen1: 0, gen2: 0
...
Note that we get a similar 10× performance degradation due to the hash table lookup in the inner loop.
An alternative would be to store the DownValues associated with a symbol in the symbol itself in order to avoid the hash table lookup.
We can even write a complete term rewriter in just a few lines of code. Terms may be expressed as values of the following type:
> type expr =
    | Float of float
    | Symbol of string
    | Packed of float []
    | Apply of expr * expr [];;
Note that Packed implements Mathematica's packed lists, i.e. unboxed arrays.
The following init function constructs a List with n elements using the function f, returning aPacked if every return value was a Float or a more general Apply(Symbol "List", ...)otherwise:
> let init n f =
    let rec packed ys i =
      if i=n then Packed ys else
        match f i with
        | Float y ->
            ys.[i] <- y
            packed ys (i+1)
        | y ->
            Apply(Symbol "List", Array.init n (fun j ->
              if j<i then Float ys.[i]
              elif j=i then y
              else f j))
    packed (Array.zeroCreate n) 0;;
val init : int -> (int -> expr) -> expr
The following rule function uses pattern matching to identify expressions that it can understand and replaces them with other expressions:
> let rec rule = function
    | Apply(Symbol "Sqrt", [|Float x|]) ->
        Float(sqrt x)
    | Apply(Symbol "Map", [|f; Packed xs|]) ->
        init xs.Length (fun i -> rule(Apply(f, [|Float xs.[i]|])))
    | f -> f;;
val rule : expr -> expr
Note that the type of this function expr -> expr is characteristic of term rewriting: rewriting replaces expressions with other expressions rather than reducing them to values.
Our program can now be defined and executed by our custom term rewriter:
> rule (Apply(Symbol "Map", [|Symbol "Sqrt"; Packed xs|]));;
Real: 00:00:00.049, CPU: 00:00:00.046, GC gen0: 24, gen1: 0, gen2: 0
We've recovered the performance of Map[Sqrt, xs] in Mathematica!
We can even recover the performance of Sqrt[xs] by adding an appropriate rule:
| Apply(Symbol "Sqrt", [|Packed xs|]) ->
    Packed(Array.map sqrt xs)
I wrote an article on term rewriting in F#.

No comments:

Post a Comment