## Differentiation of higher-order types

A “one-hole context” is a data structure with one piece missing. Conor McBride pointed out that the derivative of a regular type is its type of one-hole contexts. When a data structure is assembled out of common functor combinators, a corresponding type of one-hole contexts can be derived mechanically by rules that mirror the standard derivative rules learned in beginning differential calculus.

I’ve been playing with functor combinators lately. I was delighted to find that the data-structure derivatives can be expressed directly using the standard functor combinators and type families.

The code in this blog post is taken from the Haskell library functor-combo.

See also the Haskell Wikibooks page on zippers, especially the section called “Differentiation of data types”.

I mean this post not as new research, but rather as a tidy, concrete presentation of some of Conor’s delightful insight.

### Functor combinators

Let’s use the same set of functor combinators as in *Elegant memoization with higher-order types* and *Memoizing higher-order functions*:

data Void a -- no constructors type Unit a = Const () a data Const x a = Const x newtype Id a = Id a data (f :+: g) a = InL (f a) | InR (g a) data (f :*: g) a = f a :*: g a newtype (g :. f) a = O (g (f a))

### Derivatives

The derivative of a functor is another functor. Since the shape of the derivative is non-uniform (depends on the shape of the functor being differentiated) define a higher-order type family:

type family Der (f :: (* → *)) :: (* → *)

The usual derivative rules can then be translated without applying much imagination.
That is, if we start with derivative rules in their *functional* form (e.g., as in the paper *Beautiful differentiation*, Section 2 and Figure 1).

For instance, the derivative of the constant function is the constant 0 function, and the derivative of the identity function is the constant 1 function.
If `der`

is the derivative functional mapping functions (of real numbers) to functions,

On the right-hand sides, I am exploiting the function instances of `Num`

from the library applicative-numbers.
To be more explicit, I could have written “`const 0`

” and “`const 1`

“.

Correspondingly,

type instance Der (Const x) = Void -- 0 type instance Der Id = Unit -- 1

Note that the types `Void a`

and `Unit a`

have 0 and 1 element, respectively, if we ignore ⊥.
Moreover, `Void`

is a sort of additive identity, and `Unit`

is a sort of multiplicative identity, again ignoring ⊥.
For these reasons, `Void`

and `Unit`

might be more aptly named “`Zero`

” and “`One`

“.

The first rule says that the a value of type `Const x a`

has no one-hole context (for type `a`

), which is true, since there is an `x`

but no `a`

.
The second rule says that there is exactly one possible context for `Id a`

, since the one and only `a`

value must be removed, and no information remains.

A (one-hole) context for a sum is a context for the left or the right possibility of the sum:

type instance Der (f :+: g) = Der f :+: Der g

Correspondingly, the derivative of a sum of functions is the sum of the functions’ derivatives::

der (f + g) ≡ der f + der g

Again I’m using the function `Num`

instance from applicative-numbers.

For a pair, the one hole of a context can be made somewhere in the first component or somewhere in the second component. So the pair context consists of a holey first component and a full second component or a full first component and a holey second component.

type instance Der (f :*: g) = Der f :*: g :+: f :*: Der g

Similarly, for functions:

der (f * g) ≡ der f * g + f * der g

Finally, consider functor composition.
If `g`

and `f`

are container types, then `(g :. f) a`

is the type of `g`

containers of `f`

containers of `a`

elements.
The `a`

-shaped hole must come from one of the contained `f a`

structures.

type instance Der (g :. f) = (Der g :. f) :*: Der f

Here’s one way to think of this derivative functor:
to make an `a`

-shaped hole in a `g (f a)`

, first remove an `f a`

structure, leaving an `(f a)`

-shaped hole, and then put back all but an `a`

value extracted from the removed `f a`

struture.
So the overall (one-hole) context can be assembled from two parts: a `g`

context of `f a`

structures, and an `f`

context of `a`

values.

The corresponding rule for function derivatives:

der (g ∘ f) ≡ (der g ∘ f) * der f

which again uses `Num`

on functions.
Written out more explicitly:

der (g ∘ f) a ≡ der g (f a) * der f a

which may look more like the form you’re used to.

### Summary of derivatives

To emphasize the correspondence between forms of differentiation, here are rules for *function* and *functor* derivatives:

der (const x) ≡ 0 Der (Const x) ≡ Void der id ≡ 1 Der Id ≡ Unit der (f + g) ≡ der f + der g Der (f :+: g) ≡ Der f :+: Der g der (f * g) ≡ der f * g + f * der g Der (f :*: g) ≡ Der f :*: g :+: f :*: Der g der (g ∘ f) ≡ (der g ∘ f) * der f Der (g :. f) ≡ (Der g :. f) :*: Der f

### Filling holes

Each derivative functor is a one-hole container. One useful operation on derivatives is filling that hole.

fillC :: Functor f ⇒ Der f a → a → f a

The specifics of how to fill in a hole will depend on the choice of functor `f`

, so let’s make the `fillC`

operation a method of a new type class.
This new class is also a handy place to stash the associated type of derivatives, as an alternative to the top-level declarations above.

class Functor f ⇒ Holey f where type Der f :: * → * fillC :: Der f a → a → f a

I’ll add one more method to this class in an upcoming post.

For `Const x`

, there are no cases to handle, since there are no holes.

instance Holey (Const x) where type Der (Const x) = Void fillC = error "fillC for Const x: no Der values"

I added a definition just to keep the compiler from complaining.
This particular `fillC`

can only be applied to a value of type `Void a`

, and there are no such values other than ⊥.

Is there a more elegant way to define functions over data types with no constructors? One idea is to provide a single, polymorphic function over void types:

voidF :: Void a → b voidF = error "voidF: no value of type Void"

And use whenever as needed, e.g.,

` fillC = voidF`

Next is our identity functor:

instance Holey Id where type Der Id = Unit fillC (Const ()) a = Id a

More succinctly,

fillC (Const ()) = Id

For sums,

instance (Holey f, Holey g) ⇒ Holey (f :+: g) where type Der (f :+: g) = Der f :+: Der g fillC (InL df) a = InL (fillC df a) fillC (InR df) a = InR (fillC df a)

or

fillC (InL df) = InL ∘ fillC df fillC (InR df) = InR ∘ fillC df

Products also have two cases, since the derivative of a product is a sum:

instance (Holey f, Holey g) ⇒ Holey (f :*: g) where type Der (f :*: g) = Der f :*: g :+: f :*: Der g fillC (InL (dfa :*: ga)) a = fillC dfa a :*: ga fillC (InR ( fa :*: dga)) a = fa :*: fillC dga a

Less pointfully,

fillC (InL (dfa :*: ga)) = (:*: ga) ∘ fillC dfa fillC (InR ( fa :*: dga)) = (fa :*:) ∘ fillC dga

Finally, functor composition:

instance (Holey f, Holey g) ⇒ Holey (g :. f) where type Der (g :. f) = (Der g :. f) :*: Der f fillC (O dgfa :*: dfa) a = O (fillC dgfa (fillC dfa a))

The less pointful form is more telling.

fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa

In words: filling of the derivative of a composition is a composition of filling of the derivatives.

### Thoughts on composition

Let’s return to the derivative rules for composition, i.e., the chain rule, on functions and on functors:

der (g ∘ f) ≡ (der g ∘ f) * der f Der (g :. f) ≡ (Der g :. f) :*: Der f

Written in this way, the functor rule looks quite compelling.
Something bothers me, however.
For functions, multiplication is a special case, not the general case, and is only meaningful and correct when differentiating functions from scalars to scalars.
In general, derivative values are *linear maps*, and the chain rule uses composition on linear maps rather than multiplication on scalars (that *represent* linear maps).
I’ve written several posts on derivatives and a paper *Beautiful differentiation*, describing this perspective, which comes from calculus on manifolds.

Look again at the less pointful formulation of `fillC`

for derivatives of compositions:

fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa

The product in this case is just structural.
The actual use in `fillC`

is indeed a composition of linear maps.
In this context, “linear” has a different meaning from before.
It’s another way of saying “fills a *one-hole* context” (as the linear patterns of term rewriting and of ML & Haskell).

So maybe there’s a more general/abstract view of *functor* derivatives, just as there is a more general/abstract view of *function* derivatives.
In that view, we might replace the functor chain rule’s product with a notion of composition.

## Paul Brauner:

Hi, nice post! I’ve tried to compile the last version of functor-combo (0.0.3) but one file seems to be missing:

cabal: can’t find source for FunctorCombo/StrictMemo in src

28 July 2010, 9:32 pm## Conal Elliott » Blog Archive » Another angle on zippers:

[...] About « Differentiation of higher-order types [...]

29 July 2010, 9:15 am## conal:

Thanks, Paul. Fixed now. Please try again.

29 July 2010, 9:45 am## monoidal:

I like the analogy between value-level and type-level differentation. In Omega programming language, level polymorphism is supported, and corresponding values and types can be defined simultaneously. (The definition gives also automatically kinds, sorts etc.). Unfortunately I am unable to define value functions and type functions with one definition, so I have to give analogous definitions. http://hpaste.org/40138/level_polymorphism_in_omega

26 September 2010, 5:21 am