Higher-dimensional, higher-order derivatives, functionally

The post Beautiful differentiation showed some lovely code that makes it easy to compute not just the values of user-written functions, but also all of its derivatives (infinitely many). This elegant technique is limited, however, to functions over a scalar (one-dimensional) domain. Next, we explored what it means to transcend that limitation, asking and answering the question What is a derivative, really? The answer to that question is that derivative values are linear maps saying how small input changes result in output changes. This answer allows us to unify several different notions of derivatives and their corresponding chain rules into a single simple and powerful form.

This third post combines the ideas from the two previous posts, to easily compute infinitely many derivatives of functions over arbitrary-dimensional domains.

The code shown here is part of a new Haskell library, which you can download and play with or peruse on the web.

The general setting: vector spaces

Linear maps (transformations) lie at the heart of the generalized idea of derivative described earlier. Talking about linearity requires a few simple operations, which are encapsulated in the the abstract interface known from math as a vector space.

A vector space v has an associated type s of scalar values (a field) and a set of operations. In Haskell,

class VectorSpace v s | v -> s where
  zeroV   :: v              -- the zero vector
  (*^)    :: s -> v -> v    -- scale a vector
  (^+^)   :: v -> v -> v    -- add vectors
  negateV :: v -> v         -- additive inverse

In many cases, we’ll want to add inner (dot) products as well, to form an inner product space:

class VectorSpace v s => InnerSpace v s | v -> s where
  (<.>) :: v -> v -> s

Several other useful operations can be defined in terms of these five methods. For instance, vector subtraction and linear interpolation for vector spaces, and magnitude and normalization (rescaling to unit length) for inner product spaces. The vector-space library defines instances for Float, Double, and Complex, as well as pairs, triples, and quadruples of vectors, and functions with vector ranges. (By “vector” here, I mean any instance of VectorSpace, recursively).

It’s pretty easy to define new instances of your own. For instance, here is the library’s definition of functions as vector spaces, using the same techniques as before:

instance VectorSpace v s => VectorSpace (a->v) s where
  zeroV   = pure   zeroV
  (*^) s  = fmap   (s *^)
  (^+^)   = liftA2 (^+^)
  negateV = fmap   negateV

Linear transformations could perhaps be defined as an abstract data type, with primitives and a composition operator. I don’t know how to provide enough primitives for all possibly types of interest. I also played with linear maps as a type family, indexed on the domain or range type, but it didn’t quite work out for me. For now, I’ll simply represent a linear map as a function, define a type synonym as reminder of intention:

type a :-* b = a -> b       -- linear map

This definition makes some things quite convenient. Function composition, (.), implements linear map composition. The function VectorSpace instance (above) gives the customary meaning for linear maps as vector spaces. Like, (->), this new (:-*) operator is right-associative, so a :-* b :-* c means a :-* (b :-* c).

Derivative towers

A derivative tower contains a value and all derivatives of a function at a point. Previously, I’d suggested the following type for derivative towers.

data a :> b = D b (a :> (a :-* b))   -- old definition

The values in one of these towers have types b, a :-> b, a :-> a :-> b, …. So, for instance, a second derivative value is a linear map from a to linear maps from a to b. (Uncurrying a second derivative yields a bilinear map.)

Since making this suggestion, I’ve gotten simpler code using the following variation, which I’ll use instead:

data a :> b = D b (a :-* (a :> b))

Now a tower value is a regular value, plus a linear map that yields a tower for the derivative.

We can also write this second version more simply, without the linearity reminder:

data a :> b = D b (a :~> b)

where a :~> b is the type of infinitely differentiable functions, represented as a function that produces a derivative tower:

type a :~> b = a -> (a :> b)


As in Beautiful differentiation, constant functions have all derivatives equal to zero:

dConst :: VectorSpace b s => b -> a:>b
dConst b = b `D` const dZero
dZero :: VectorSpace b s => a:>b
dZero = dConst zeroV

Note the use of the standard Haskell function const, which makes constant functions (always returning the same value). Also, the use of the zero vector required me to use a VectorSpace constraint in the type signature. (I could have used 0 and Num instead, but Num requires more methods and so is less general than VectorSpace.)

The differentiable identity function plays a very important role. Its towers are sometimes called “the derivation variable” or similar, but it’s a not really a variable. The definition is quite terse:

dId :: VectorSpace u s => u :~> u
dId u = D u (\ du -> dConst du)

What’s going on here? The differentiable identity function, dId, takes an argument u and yields a tower. The regular value (the 0th derivative) is simply the argument u, as one would expect from an identity function. The derivative (a linear map) turns a tiny input offset, du, to a resulting output offset, which is also du (also as expected from an identity function). The higher derivatives are all zero, so our first derivative tower is dConst du.

Linear functions

Returning, for a few moments, to thinking of derivatives as numbers, let’s consider about the function f = \ x -> m * x + b for some values m and b. We’d usually say that the derivative of f is equal to m everywhere, and indeed f can be interpreted as a line with (constant) slope m and y-intercept b. In the language of linear algebra, the function f is affine in general, and is (more specifically) linear only when b == 0.

In the generalized view of derivatives as linear maps, we say instead that the derivative is \ x -> m * x. The derivative everywhere is almost the same as f itself. If we take b == 0 (so that f is linear and not just affine), then the derivative of f is exactly f, everywhere! Consequently, its higher derivatives are all zero.

In the generalized view of derivatives as linear maps, this relationship always holds. The derivative of a linear function f is f everywhere. We can encapsulate this general property as a utility function:

linearD :: VectorSpace v s => (u :-* v) -> (u :~> v)
linearD f u = D (f u) (\ du -> dConst (f du))

The dConst here sets up all of the higher derivatives to be zero. This definition can also be written more succinctly:

linearD f u = D (f u) (dConst . f)

You may have noticed a similarity between this discussion of linear functions and the identity function above. This similarity is more than coincidental, because the identity function is linear. With this insight, we can write a more compact definition for dId, replacing the one above:

dId = linearD id

As other examples of linear functions, here are differentiable versions of the functions fst and snd, which extract element from a pair.

fstD :: VectorSpace a s => (a,b) :~> a
fstD = linearD fst
sndD :: VectorSpace b s => (a,b) :~> b
sndD = linearD snd

Numeric operations

Numeric operations can be specified much as they were previously. First, those definition again (with variable names changed),

instance Num b => Num (Dif b) where
  fromInteger               = dConst . fromInteger
  D u0 u' + D v0 v'         = D (u0 + v0) (u' + v')
  D u0 u' - D v0 v'         = D (u0 - v0) (u' - v')
  u@(D u0 u') * v@(D v0 v') = D (u0 * v0) (u' * v + u * v')

Now the new definition:

instance (Num b, VectorSpace b b) => Num (a:>b) where
  fromInteger               = dConst . fromInteger
  D u0 u' + D v0 v'         = D (u0 + v0) (u' + v')
  D u0 u' - D v0 v'         = D (u0 - v0) (u' - v')
  u@(D u0 u') * v@(D v0 v') =
    D (u0 * v0) (\ da -> (u * v' da) + (u' da * v))

The main change shows up in multiplication. It is no longer meaningful to write something like u' * v, because u' :: b :-* (a :> b), while v :: a :> b. Instead, v' gets applied to the small change in input before multiplying by u. Likewise, u' gets applied to the small change in input before multiplying by v.

The same sort of change has happened silently in the sum and difference cases, but are hidden by the numeric overloadings provided for functions. Written more explicitly:

  D u0 u' + D v0 v' = D (u0 + v0) (\ da -> u' da + v' da)

By the way, a bit of magic can also hide the “\ da -> ...” in the definition of multiplication:

  u@(D u0 u') * v@(D v0 v') = D (u0 * v0) ((u *) . v' + (* v) . u')

The derivative part can be deciphered as follows: transform (the input change) by v' and then pre-multiply by u; transform (the input change) by u' and then post-multiply by v; and add the result. If this sort of wizardry isn’t your game, forget about it and use the more explicit form.

Composition — the chain rule

Here’s the chain rule we used earlier.

(>-<) :: (Num a) => (a -> a) -> (Dif a -> Dif a) -> (Dif a -> Dif a)
f >-< f' = \ u@(D u0 u') -> D (f u0) (f' u * u')

The new one differs just slightly:

(>-<) :: VectorSpace u s =>
         (u -> u) -> ((a :> u) -> (a :> s)) -> (a :> u) -> (a :> u)
f >-< f' = \ u@(D u0 u') -> D (f u0) (\ da -> f' u *^ u' da)

Or we can hide the da, as with multiplication:

f >-< f' = \ u@(D u0 u') -> D (f u0) ((f' u *^) . u')

With this change, all of the method definitions in Beautiful differentiation work as before, with only the For instance,

instance (Fractional b, VectorSpace b b) => Fractional (a:>b) where
  fromRational = dConst . fromRational
  recip        = recip >-< recip sqr

See the library for details.

The chain rule pure and simple

The (>-<) operator above is specialized form of the chain rule that is convenient for automatic differentiation. In its simplest and most general form, the chain rule says

deriv (f . g) x = deriv f (g x) . deriv g x

The composition on the right hand side is on linear maps (derivatives). You may be used to seeing the chain rule in one or more of its specialized forms, using some form of product (scalar/scalar, scalar/vector, vector/vector dot, matrix/vector) instead of composition. Those forms all mean the same as this general case, but are defined on various representations of linear maps, instead of linear maps themselves.

The chain rule above constructs only the first derivatives. Instead, we’ll construct all of the derivatives by using all of the derivatives of f and g.

(@.) :: (b :~> c) -> (a :~> b) -> (a :~> c)
(f @. g) a0 = D c0 (c' @. b')
    D b0 b' = g a0
    D c0 c' = f b0

Coming attractions

In this post, we’ve combined derivative towers with generalized derivatives (based on linear maps), for constructing infinitely many derivatives of functions over multi-dimensional (or scalar) domains. The inner workings are subtler than the previous code, but almost as simple to express and just as easy to use.

If you’re interested in learning more about generalized derivatives, I recommend the book Calculus on Manifolds.

Future posts will include:

  • A look at an efficiency issue and consider some solutions.
  • Elegant executable specifications of smooth surfaces, using derivatives for the surface normals used in shading.


  1. Dougal Stanton:

    Fabulous stuff! You also helped me to realise that something I’d been playing with recently was a vector space.

  2. Lemming:

    I guess the derivative tower is a power series when neglecting factorial factors.

  3. conal:

    Yes, the notion of derivative towers in Beautiful differentiation seems to correspond to the Taylor series expanded at that point, with coefficients multiplied by progressive factorials. See The Music of Streams by Doug McIlroy, especially the conversions between Horner and Maclaurin form.

    The towers in this current post (“Higher dimensional, …”) are much more general, and I imagine have an analogous relationship to a generalized notion of Taylor series and probably to section 2.4 in The Music of Streams.

  4. Conal Elliott » Blog Archive » Functional linear maps:

    [...] About « Higher-dimensional, higher-order derivatives, functionally [...]

  5. Functional Thinking » Performance problems with functional representation of derivatives:

    [...] some Haskell code for infinite Higher-dimensional, higher-order derivatives, discussed in his blog. Interesting stuff! Conal and I tried using the implementation of these ideas inside an animation [...]

  6. Mark Wassell:


    I am trying to write a Haskell function that will create a surface by extruding a (closed) curve f along a curve g. So given f: u -> (x,y) and g : v -> (x,y,z), I end up with h : (u,v) -> (x,y,z).

    As I understand it the calculation for a given (u,v) is to calculate the slope of g at v and then to translate and tilt f u accordingly.

    I think I need to have Double :~> (Double,Double) and Double :~> (Double,Double,Double) as types for f and g. Firstly I need the slope of g at v and secondly I need to pass on the means of obtaining the derivatives of the resulting surface (although thinking about the torus example, this can come for free?). I will be returning (Double,Double) :~> (Double,Double,Double).

    Does this sound as though I am on the right track?

  7. conal:

    Tracy Harms brought my attention to a discussion of this post on Jchat

    The initial response from John Randall was consistent with what I was describing, as was the follow-up from Tracy. The rest of the many responses seemed to miss the essence of the idea (the perspective of calculus on manifolds) by continuing to identifying derivatives with representations of linear maps (such as numbers, vectors, and matrices) rather than linear maps themselves. Correspondingly, they used various forms of multiplication of representations (e.g., scalar, dot, or matrix product) rather than the simpler and more general operation of composition of linear maps/functions.

    I suspect that some of the confusion came from the dual role of linear maps in differentiation:

    • The derivative functional is itself a linear map between functions.
    • A single derivative value (an element of the range of the derivative of a function) is also a linear map.

    The latter use of linear maps is what I described.

    About Raul Miller’s post, 0 represents the (only) constant linear map, and 1 (or an identity matrix, etc) represents the identity linear map.

  8. Conal Elliott » Blog Archive » Simpler, more efficient, functional linear maps:

    [...] Higher-dimensional, higher-order derivatives, functionally [...]

  9. Conal Elliott » Blog Archive » Comparing formulations of higher-dimensional, higher-order derivatives:

    [...] Jason’s IntMap-based implementation and my LinearMap-based implementation described in Higher-dimensional, higher-order derivatives, functionally and in Simpler, more efficient, functional linear maps. For the case of Rn, my formulation uses a [...]

  10. Conal Elliott » Blog Archive » Thoughts on semantics for 3D graphics:

    [...] a :> b contains all derivatives (including zeroth) at a point of a function of type a->b. See Higher-dimensional, higher-order derivatives, functionally. We could perhaps also include derivatives of material [...]

  11. John Salvatier:

    Your blog posts/article on automatic differentiation are proving very interesting and I think useful to me. I am writing a very general automatic differentiation package in python (which can fake a lot of functional programming), and this has been the most interesting resource I have found. It’s been slow going since I knew no Haskell before coming across your stuff, but this is all really elegant.


Leave a comment