Differentiation of higher-order types

A “one-hole context” is a data structure with one piece missing. Conor McBride pointed out that the derivative of a regular type is its type of one-hole contexts. When a data structure is assembled out of common functor combinators, a corresponding type of one-hole contexts can be derived mechanically by rules that mirror the standard derivative rules learned in beginning differential calculus.

I’ve been playing with functor combinators lately. I was delighted to find that the data-structure derivatives can be expressed directly using the standard functor combinators and type families.

The code in this blog post is taken from the Haskell library functor-combo.

See also the Haskell Wikibooks page on zippers, especially the section called “Differentiation of data types”.

I mean this post not as new research, but rather as a tidy, concrete presentation of some of Conor’s delightful insight.

Functor combinators

Let’s use the same set of functor combinators as in Elegant memoization with higher-order types and Memoizing higher-order functions:

data Void a   -- no constructors
 
type Unit a        = Const () a
 
data Const x a     = Const x
 
newtype Id a       = Id a
 
data (f :+: g) a   = InL (f a) | InR (g a)
 
data (f :*: g) a   = f a :*: g a
 
newtype (g :. f) a = O (g (f a))

Derivatives

The derivative of a functor is another functor. Since the shape of the derivative is non-uniform (depends on the shape of the functor being differentiated) define a higher-order type family:

type family Der (f :: (**)) :: (**)

The usual derivative rules can then be translated without applying much imagination. That is, if we start with derivative rules in their functional form (e.g., as in the paper Beautiful differentiation, Section 2 and Figure 1).

For instance, the derivative of the constant function is the constant 0 function, and the derivative of the identity function is the constant 1 function. If der is the derivative functional mapping functions (of real numbers) to functions,

der (const x)0
der id1

On the right-hand sides, I am exploiting the function instances of Num from the library applicative-numbers. To be more explicit, I could have written “const 0” and “const 1“.

Correspondingly,

type instance Der (Const x) = Void   -- 0
 
type instance Der Id        = Unit   -- 1

Note that the types Void a and Unit a have 0 and 1 element, respectively, if we ignore ⊥. Moreover, Void is a sort of additive identity, and Unit is a sort of multiplicative identity, again ignoring ⊥. For these reasons, Void and Unit might be more aptly named “Zero” and “One“.

The first rule says that the a value of type Const x a has no one-hole context (for type a), which is true, since there is an x but no a. The second rule says that there is exactly one possible context for Id a, since the one and only a value must be removed, and no information remains.

A (one-hole) context for a sum is a context for the left or the right possibility of the sum:

type instance Der (f :+: g) = Der f :+: Der g

Correspondingly, the derivative of a sum of functions is the sum of the functions’ derivatives::

der (f + g) ≡ der f + der g

Again I’m using the function Num instance from applicative-numbers.

For a pair, the one hole of a context can be made somewhere in the first component or somewhere in the second component. So the pair context consists of a holey first component and a full second component or a full first component and a holey second component.

type instance Der (f :*: g) = Der f :*: g  :+:  f :*: Der g

Similarly, for functions:

der (f * g) ≡ der f * g + f * der g

Finally, consider functor composition. If g and f are container types, then (g :. f) a is the type of g containers of f containers of a elements. The a-shaped hole must come from one of the contained f a structures.

type instance Der (g :. f) = (Der g :. f) :*: Der f

Here’s one way to think of this derivative functor: to make an a-shaped hole in a g (f a), first remove an f a structure, leaving an (f a)-shaped hole, and then put back all but an a value extracted from the removed f a struture. So the overall (one-hole) context can be assembled from two parts: a g context of f a structures, and an f context of a values.

The corresponding rule for function derivatives:

der (g ∘ f)(der g ∘ f) * der f

which again uses Num on functions. Written out more explicitly:

der (g ∘ f) a ≡ der g (f a)  * der f a

which may look more like the form you’re used to.

Summary of derivatives

To emphasize the correspondence between forms of differentiation, here are rules for function and functor derivatives:

der (const x)0
Der (Const x) ≡ Void
 
der id1
Der Id ≡ Unit
 
der (f  +  g) ≡ der f  +  der g
Der (f :+: g) ≡ Der f :+: Der g
 
der (f  *  g) ≡ der f  *  g  +  f  *  der g
Der (f :*: g) ≡ Der f :*: g :+: f :*: Der g
 
der (g  ∘ f)(der g  ∘ f)  *  der f
Der (g :. f)(Der g :. f) :*: Der f

Filling holes

Each derivative functor is a one-hole container. One useful operation on derivatives is filling that hole.

fillC :: Functor f ⇒ Der f a → a → f a

The specifics of how to fill in a hole will depend on the choice of functor f, so let’s make the fillC operation a method of a new type class. This new class is also a handy place to stash the associated type of derivatives, as an alternative to the top-level declarations above.

class Functor f ⇒ Holey f where
  type Der f :: **
  fillC :: Der f a → a → f a

I’ll add one more method to this class in an upcoming post.

For Const x, there are no cases to handle, since there are no holes.

instance Holey (Const x) where
  type Der (Const x) = Void
  fillC = error "fillC for Const x: no Der values"

I added a definition just to keep the compiler from complaining. This particular fillC can only be applied to a value of type Void a, and there are no such values other than ⊥.

Is there a more elegant way to define functions over data types with no constructors? One idea is to provide a single, polymorphic function over void types:

  voidF :: Void a → b
  voidF = error "voidF: no value of type Void"

And use whenever as needed, e.g.,

  fillC = voidF

Next is our identity functor:

instance Holey Id where
  type Der Id = Unit
  fillC (Const ()) a = Id a

More succinctly,

  fillC (Const ()) = Id

For sums,

instance (Holey f, Holey g) ⇒ Holey (f :+: g) where
  type Der (f :+: g) = Der f :+: Der g
  fillC (InL df) a = InL (fillC df a)
  fillC (InR df) a = InR (fillC df a)

or

  fillC (InL df) = InL ∘ fillC df
  fillC (InR df) = InR ∘ fillC df

Products also have two cases, since the derivative of a product is a sum:

instance (Holey f, Holey g) ⇒ Holey (f :*: g) where
  type Der (f :*: g) = Der f :*: g  :+:  f :*: Der g
  fillC (InL (dfa :*:  ga)) a = fillC dfa a :*: ga
  fillC (InR ( fa :*: dga)) a = fa :*: fillC dga a

Less pointfully,

  fillC (InL (dfa :*:  ga)) = (:*: ga) ∘ fillC dfa
  fillC (InR ( fa :*: dga)) = (fa :*:) ∘ fillC dga

Finally, functor composition:

instance (Holey f, Holey g) ⇒ Holey (g :. f) where
  type Der (g :. f) = (Der g :. f) :*: Der f
  fillC (O dgfa :*: dfa) a = O (fillC dgfa (fillC dfa a))

The less pointful form is more telling.

  fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa

In words: filling of the derivative of a composition is a composition of filling of the derivatives.

Thoughts on composition

Let’s return to the derivative rules for composition, i.e., the chain rule, on functions and on functors:

der (g  ∘ f)(der g  ∘ f)  *  der f
 
Der (g :. f)(Der g :. f) :*: Der f

Written in this way, the functor rule looks quite compelling. Something bothers me, however. For functions, multiplication is a special case, not the general case, and is only meaningful and correct when differentiating functions from scalars to scalars. In general, derivative values are linear maps, and the chain rule uses composition on linear maps rather than multiplication on scalars (that represent linear maps). I’ve written several posts on derivatives and a paper Beautiful differentiation, describing this perspective, which comes from calculus on manifolds.

Look again at the less pointful formulation of fillC for derivatives of compositions:

  fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa

The product in this case is just structural. The actual use in fillC is indeed a composition of linear maps. In this context, “linear” has a different meaning from before. It’s another way of saying “fills a one-hole context” (as the linear patterns of term rewriting and of ML & Haskell).

So maybe there’s a more general/abstract view of functor derivatives, just as there is a more general/abstract view of function derivatives. In that view, we might replace the functor chain rule’s product with a notion of composition.

4 Comments

  1. Paul Brauner:

    Hi, nice post! I’ve tried to compile the last version of functor-combo (0.0.3) but one file seems to be missing:

    cabal: can’t find source for FunctorCombo/StrictMemo in src

  2. Conal Elliott » Blog Archive » Another angle on zippers:

    [...] About « Differentiation of higher-order types [...]

  3. conal:

    Thanks, Paul. Fixed now. Please try again.

  4. monoidal:

    I like the analogy between value-level and type-level differentation. In Omega programming language, level polymorphism is supported, and corresponding values and types can be defined simultaneously. (The definition gives also automatically kinds, sorts etc.). Unfortunately I am unable to define value functions and type functions with one definition, so I have to give analogous definitions. http://hpaste.org/40138/level_polymorphism_in_omega

Leave a comment