Archive for 18th May 2008

What is a derivative, really?

The post Beautiful differentiation showed how easily and beautifully one can construct an infinite tower of derivative values in Haskell programs, while computing plain old values. The trick (from Jerzy Karczmarczuk) was to overload numeric operators to operate on the following (co)recursive type:

data Dif b = D b (Dif b)

This representation, however, works only when differentiating functions from a scalar (one-dimensional) domain, i.e., functions of type a -> b for a scalar type a. The reason for this limitation is that only in those cases can the type of derivative values be identified with the type of regular values.

Consider a function f :: (R,R) -> R, where R is, say, Double. The value of f at a domain value (x,y) has type R, but the derivative of f consists of two partial derivatives. Moreover, the second derivative consists of four partial second-order derivatives (or three, depending how you count). A function f :: (R,R) -> (R,R,R) also has two partial derivatives at each point (x,y), each of which is a triple. That pair of triples is commonly written as a two-by-three matrix.

Each of these situations has its own derivative shape and its own chain rule (for the derivative of function compositions), using plain-old multiplication, scalar-times-vector, vector-dot-vector, matrix-times-vector, or matrix-times-matrix. Second derivatives are more complex and varied.

How many forms of derivatives and chain rules are enough? Are we doomed to work with a plethora of increasingly complex types of derivatives, as well as the diverse chain rules needed to accommodate all compatible pairs of derivatives? Fortunately, not. There is a single, simple, unifying generalization. By reconsidering what we mean by a derivative value, we can see that these various forms are all representations of a single notion, and all the chain rules mean the same thing on the meanings of the representations.

This blog post is about that unifying view of derivatives.


  • 2008-05-20: There are several comments about this post on reddit.
  • 2008-05-20: Renamed derivative operator from D to deriv to avoid confusion with the data constructor for derivative towers.
  • 2008-05-20: Renamed linear map type from (:->) to (:-*) to make it visually closer to a standard notation.

Continue reading ‘What is a derivative, really?’ »