## Beautiful differentiation

Lately I’ve been playing again with parametric surfaces in Haskell. Surface rendering requires normals, which can be constructed from partial derivatives, which brings up automatic differentiation (AD). Playing with some refactoring, I’ve stumbled across a terser, lovelier formulation for the derivative rules than I’ve seen before.

Edits:

• 2008-05-08: Added source files: NumInstances.hs and Dif.hs.
• 2008-05-20: Changed some variable names for clarity and consistency. For instance, `x@(D x0 x')` instead of `p@(D x x')`.
• 2008-05-20: Removed extraneous `Fractional` constraint in the `Floating` instance of `Dif`.

### Automatic differentiation

The idea of AD is to simultaneously manipulate values and derivatives. Overloading of the standard numerical operations (and literals) makes this combined manipulation as simple and pretty as manipulating values without derivatives.

In Functional Differentiation of Computer Programs, Jerzy Karczmarczuk extended the usual trick to a “lazy tower of derivatives”. He exploited Haskell’s laziness to carry infinitely many derivatives, rather than just one. Lennart Augustsson’s AD post contains a summary of Jerzy’s idea and an application. I’ll use some of the details from Lennart’s version, for simplicity.

For some perspectives on the mathematical structuure under AD, see sigfpe’s AD post, and Non-standard analysis, automatic differentiation, Haskell, and other stories.

The tower of derivatives can be represented as an infinite list. Since we’ll use operator overloadings that are not meaningful for lists in general, let’s instead define a new data type:

``````data Dif a = D a (Dif a)
``````

Given a function `f :: a -> Dif b`, `f a` has the form `D x (D x' (D x'' ...))`, where `x` is the value at `a`, and `x'`, `x''` …, are the derivatives (first, second, …) at `a`.

Constant functions have all derivatives equal to zero.

``````dConst :: Num a => a -> Dif a
dConst x0 = D x0 dZero

dZero :: Num a => Dif a
dZero = D 0 dZero
``````

``````instance Num a => Num (Dif a) where
fromInteger               = dConst . fromInteger
D x0 x' + D y0 y'         = D (x0 + y0) (x' + y')
D x0 x' - D y0 y'         = D (x0 - y0) (x' - y')
x@(D x0 x') * y@(D y0 y') = D (x0 * y0) (x' * y + x * y')
``````

In each of the right-hand sides of these last three definitions, the first argument to `D` is constructed using `Num a`, while the second argument is recursively constructed using `Num (Dif a)`.

Jerzy’s paper uses a function to provide all of the derivatives of a given function (called `dlift` from Section 3.3):

``````lift :: Num a => [a -> a] -> Dif a -> Dif a
lift (f : f') p@(D x x') = D (f x) (x' * lift f' p)
``````

The given list of functions are all of the derivatives of a given function. Then, derivative towers can be constructed by definitions like the following:

``````instance Floating a => Floating (Dif a) where
pi               = dConst pi
exp (D x x')     = r where r = D (exp x) (x' * r)
log p@(D x x')   = D (log x) (x' / p)
sqrt (D x x')    = r where r = D (sqrt x) (x' / (2 * r))
sin              = lift (cycle [sin, cos, negate . sin, negate . cos])
cos              = lift (cycle [cos, negate . sin, negate . cos, sin])
asin p@(D x x')  = D (asin x) ( x' / sqrt(1 - sqr p))
acos p@(D x x')  = D (acos x) (-x' / sqrt(1 - sqr p))
atan p@(D x x')  = D (atan x) ( x' / (sqr p - 1))

sqr :: Num a => a -> a
sqr x = x*x
``````

### Reintroducing the chain rule

The code above, which corresponds to section 3 of Jerzy’s paper, is fairly compact. It can be made prettier, however, which is the point of this blog post.

First, let’s simplify the `lift` so that it expresses the chain rule directly. In fact, this definition is just like `dlift` from Section 2 (not Section 3) of Jerzy’s paper. It’s the same code, but at a different type, here being used to manipulate infinite derivative towers instead of just value and derivative.

``````dlift :: Num a => (a -> a) -> (Dif a -> Dif a) -> Dif a -> Dif a
dlift f f' =  u@(D u0 u') -> D (f u0) (f' u * u')
``````

This operator lets us write simpler definitions.

``````instance Floating a => Floating (Dif a) where
pi    = dConst pi
exp   = dlift exp exp
log   = dlift log recip
sqrt  = dlift sqrt (recip . (2*) . sqrt)
sin   = dlift sin cos
cos   = dlift cos (negate . sin)
asin  = dlift asin ( x -> recip (sqrt (1 - sqr x)))
acos  = dlift acos ( x -> - recip (sqrt (1 - sqr x)))
atan  = dlift atan ( x -> recip (sqr x + 1))
sinh  = dlift sinh cosh
cosh  = dlift cosh sinh
asinh = dlift asinh ( x -> recip (sqrt (sqr x + 1)))
acosh = dlift acosh ( x -> - recip (sqrt (sqr x - 1)))
atanh = dlift atanh ( x -> recip (1 - sqr x))
``````

The necessary recursion has moved out of the lifting function into the class instance (second argument to `dlift`).

Notice that `dlift` and the `Floating` instance are the same code (with minor variations) as in Jerzy’s section two. In that section, however, the code computes only first derivatives, while here, we’re computing all of them.

The last steps are cosmetic. The goal is to make the derivative functions used with `lift` easier to read and write.

Just as we’ve overloaded numeric operations for derivative towers (`Dif`), let’s also overload them for functions. This trick is often used informally in math. For instance, given functions `f` and `g`, one might write `f + g` to mean `x -> f x + g x`. Using applicative functor notation makes these instances a breeze to define:

``````instance Num b => Num (a->b) where
fromInteger = pure . fromInteger
(+)         = liftA2 (+)
(*)         = liftA2 (*)
negate      = fmap negate
abs         = fmap abs
signum      = fmap signum
``````

The other numeric class instances are analogous. (Any applicative functor can be given these same instance definitions.)

As a final touch, define an infix operator to replace the name “`dlift`“:

``````infix 0 >-<
(>-<) = dlift
``````

Now the complete code:

``````instance Num a => Num (Dif a) where
fromInteger               = dConst . fromInteger
D x0 x' + D y0 y'         = D (x0 + y0) (x' + y')
D x0 x' - D y0 y'         = D (x0 - y0) (x' - y')
x@(D x0 x') * y@(D y0 y') = D (x0 * y0) (x' * y + x * y')

negate = negate >-< -1
abs    = abs    >-< signum
signum = signum >-< 0

instance Fractional a => Fractional (Dif a) where
fromRational = dConst . fromRational
recip        = recip >-< - sqr recip

instance Floating a => Floating (Dif a) where
pi    = dConst pi
exp   = exp   >-< exp
log   = log   >-< recip
sqrt  = sqrt  >-< recip (2 * sqrt)
sin   = sin   >-< cos
cos   = cos   >-< - sin
sinh  = sinh  >-< cosh
cosh  = cosh  >-< sinh
asin  = asin  >-< recip (sqrt (1-sqr))
acos  = acos  >-< recip (- sqrt (1-sqr))
atan  = atan  >-< recip (1+sqr)
asinh = asinh >-< recip (sqrt (1+sqr))
acosh = acosh >-< recip (- sqrt (sqr-1))
atanh = atanh >-< recip (1-sqr)
``````

The operators and literals on the right of the `(>-<)` are overloaded for the type `Dif a -> Dif a`. For instance, in the definition of `sqrt`,

``````2     :: Dif a -> Dif a
recip :: (Dif a -> Dif a) -> (Dif a -> Dif a)
(*)   :: (Dif a -> Dif a) -> (Dif a -> Dif a)
-> (Dif a -> Dif a)
``````

### Try it

You can try out this code yourself. Just grab the source files: NumInstances.hs and Dif.hs. Enjoy!

1. #### Punya:

Abelson and Sussman’s Structure and Interpretation of Classical Mechanics has a Scheme implementation of this idea, which they use to implement an environment for symbolically and numerically manipulating physical systems.

2. #### Jaak:

It’s pretty and all, but either you have something missing or something wrong. Because the code doesn’t simply work (lots of complaints along the lines of “Could not deduce…” and “Couldn’t match expected type…”), something funky seems to be going on with dlift. I’d appreciate full working code.

• Jaak
3. #### Perehene:

Very interesting post. I am a newbie in haskell and I wanted to give it a try, but I get a bunch of errors in ghci. Could you put the source code and the required options to use it in ghc? Thanks a lot!

4. #### conal:

Jaak & Perehene: I’ve added source files: NumInstances.hs and Dif.hs. Enjoy!

5. #### Nikolay:

multiplication may pull x’ and y’ while result is just requested for (D (x*y) _ _), I think

6. #### Conal Elliott » Blog Archive » Higher-dimensional, higher-order derivatives, functionally:

[…] post Beautiful differentiation showed some lovely code that makes it easy to compute not just the values of user-written […]

7. #### Samuel Bronson:

What’s with the funky `"’"` characters??? Why not normal `"'"`?

8. #### Conal Elliott » Blog Archive » What is automatic differentiation, and why does it work?:

[…] can also be made prettier, as in Beautiful differentiation. Add an operator that captures the chain rule, which is behind the differentiation laws listed […]

9. #### Conal Elliott » Blog Archive » Why program with continuous time?:

[…] has been integral to FRP since the first incarnation as ActiveVRML. I’ve written several things recently about denotational […]

10. #### Conal Elliott » Blog Archive » Is program proving viable and useful?:

[…] For more examples of formally simple specifications, see the papers Denotational design with type class morphisms and Beautiful differentiation. […]

11. #### Conal Elliott » Blog Archive » Garbage collecting the semantics of FRP:

[…] interactive behaviors are some sort of function with all of its derivatives. See Beautiful differentiation for an specification and derived implementation of numeric operations, and more generally of […]

12. #### Aleksandar Bakic:

I believe Calculus in Coinductive Form is relevant.