A “one-hole context” is a data structure with one piece missing. Conor McBride pointed out that the derivative of a regular type is its type of one-hole contexts. When a data structure is assembled out of common functor combinators, a corresponding type of one-hole contexts can be derived mechanically by rules that mirror the standard derivative rules learned in beginning differential calculus.
I’ve been playing with functor combinators lately. I was delighted to find that the data-structure derivatives can be expressed directly using the standard functor combinators and type families.
The code in this blog post is taken from the Haskell library functor-combo.
See also the Haskell Wikibooks page on zippers, especially the section called “Differentiation of data types”.
I mean this post not as new research, but rather as a tidy, concrete presentation of some of Conor’s delightful insight.
data Void a -- no constructors type Unit a = Const () a data Const x a = Const x newtype Id a = Id a data (f :+: g) a = InL (f a) | InR (g a) data (f :*: g) a = f a :*: g a newtype (g :. f) a = O (g (f a))
The derivative of a functor is another functor. Since the shape of the derivative is non-uniform (depends on the shape of the functor being differentiated) define a higher-order type family:
type family Der (f :: (* → *)) :: (* → *)
The usual derivative rules can then be translated without applying much imagination. That is, if we start with derivative rules in their functional form (e.g., as in the paper Beautiful differentiation, Section 2 and Figure 1).
For instance, the derivative of the constant function is the constant 0 function, and the derivative of the identity function is the constant 1 function.
der is the derivative functional mapping functions (of real numbers) to functions,
der (const x) ≡ 0 der id ≡ 1
On the right-hand sides, I am exploiting the function instances of
Num from the library applicative-numbers.
To be more explicit, I could have written “
const 0” and “
type instance Der (Const x) = Void -- 0 type instance Der Id = Unit -- 1
Note that the types
Void a and
Unit a have 0 and 1 element, respectively, if we ignore ⊥.
Void is a sort of additive identity, and
Unit is a sort of multiplicative identity, again ignoring ⊥.
For these reasons,
Unit might be more aptly named “
Zero” and “
The first rule says that the a value of type
Const x a has no one-hole context (for type
a), which is true, since there is an
x but no
The second rule says that there is exactly one possible context for
Id a, since the one and only
a value must be removed, and no information remains.
A (one-hole) context for a sum is a context for the left or the right possibility of the sum:
type instance Der (f :+: g) = Der f :+: Der g
Correspondingly, the derivative of a sum of functions is the sum of the functions’ derivatives::
der (f + g) ≡ der f + der g
Again I’m using the function
Num instance from applicative-numbers.
For a pair, the one hole of a context can be made somewhere in the first component or somewhere in the second component. So the pair context consists of a holey first component and a full second component or a full first component and a holey second component.
type instance Der (f :*: g) = Der f :*: g :+: f :*: Der g
Similarly, for functions:
der (f * g) ≡ der f * g + f * der g
Finally, consider functor composition.
f are container types, then
(g :. f) a is the type of
g containers of
f containers of
a-shaped hole must come from one of the contained
f a structures.
type instance Der (g :. f) = (Der g :. f) :*: Der f
Here’s one way to think of this derivative functor:
to make an
a-shaped hole in a
g (f a), first remove an
f a structure, leaving an
(f a)-shaped hole, and then put back all but an
a value extracted from the removed
f a struture.
So the overall (one-hole) context can be assembled from two parts: a
g context of
f a structures, and an
f context of
The corresponding rule for function derivatives:
der (g ∘ f) ≡ (der g ∘ f) * der f
which again uses
Num on functions.
Written out more explicitly:
der (g ∘ f) a ≡ der g (f a) * der f a
which may look more like the form you’re used to.
Summary of derivatives
To emphasize the correspondence between forms of differentiation, here are rules for function and functor derivatives:
der (const x) ≡ 0 Der (Const x) ≡ Void der id ≡ 1 Der Id ≡ Unit der (f + g) ≡ der f + der g Der (f :+: g) ≡ Der f :+: Der g der (f * g) ≡ der f * g + f * der g Der (f :*: g) ≡ Der f :*: g :+: f :*: Der g der (g ∘ f) ≡ (der g ∘ f) * der f Der (g :. f) ≡ (Der g :. f) :*: Der f
Each derivative functor is a one-hole container. One useful operation on derivatives is filling that hole.
fillC :: Functor f ⇒ Der f a → a → f a
The specifics of how to fill in a hole will depend on the choice of functor
f, so let’s make the
fillC operation a method of a new type class.
This new class is also a handy place to stash the associated type of derivatives, as an alternative to the top-level declarations above.
class Functor f ⇒ Holey f where type Der f :: * → * fillC :: Der f a → a → f a
I’ll add one more method to this class in an upcoming post.
Const x, there are no cases to handle, since there are no holes.
instance Holey (Const x) where type Der (Const x) = Void fillC = error "fillC for Const x: no Der values"
I added a definition just to keep the compiler from complaining.
fillC can only be applied to a value of type
Void a, and there are no such values other than ⊥.
Is there a more elegant way to define functions over data types with no constructors? One idea is to provide a single, polymorphic function over void types:
voidF :: Void a → b voidF = error "voidF: no value of type Void"
And use whenever as needed, e.g.,
fillC = voidF
Next is our identity functor:
instance Holey Id where type Der Id = Unit fillC (Const ()) a = Id a
fillC (Const ()) = Id
instance (Holey f, Holey g) ⇒ Holey (f :+: g) where type Der (f :+: g) = Der f :+: Der g fillC (InL df) a = InL (fillC df a) fillC (InR df) a = InR (fillC df a)
fillC (InL df) = InL ∘ fillC df fillC (InR df) = InR ∘ fillC df
Products also have two cases, since the derivative of a product is a sum:
instance (Holey f, Holey g) ⇒ Holey (f :*: g) where type Der (f :*: g) = Der f :*: g :+: f :*: Der g fillC (InL (dfa :*: ga)) a = fillC dfa a :*: ga fillC (InR ( fa :*: dga)) a = fa :*: fillC dga a
fillC (InL (dfa :*: ga)) = (:*: ga) ∘ fillC dfa fillC (InR ( fa :*: dga)) = (fa :*:) ∘ fillC dga
Finally, functor composition:
instance (Holey f, Holey g) ⇒ Holey (g :. f) where type Der (g :. f) = (Der g :. f) :*: Der f fillC (O dgfa :*: dfa) a = O (fillC dgfa (fillC dfa a))
The less pointful form is more telling.
fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa
In words: filling of the derivative of a composition is a composition of filling of the derivatives.
Thoughts on composition
Let’s return to the derivative rules for composition, i.e., the chain rule, on functions and on functors:
der (g ∘ f) ≡ (der g ∘ f) * der f Der (g :. f) ≡ (Der g :. f) :*: Der f
Written in this way, the functor rule looks quite compelling. Something bothers me, however. For functions, multiplication is a special case, not the general case, and is only meaningful and correct when differentiating functions from scalars to scalars. In general, derivative values are linear maps, and the chain rule uses composition on linear maps rather than multiplication on scalars (that represent linear maps). I’ve written several posts on derivatives and a paper Beautiful differentiation, describing this perspective, which comes from calculus on manifolds.
Look again at the less pointful formulation of
fillC for derivatives of compositions:
fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa
The product in this case is just structural.
The actual use in
fillC is indeed a composition of linear maps.
In this context, “linear” has a different meaning from before.
It’s another way of saying “fills a one-hole context” (as the linear patterns of term rewriting and of ML & Haskell).
So maybe there’s a more general/abstract view of functor derivatives, just as there is a more general/abstract view of function derivatives. In that view, we might replace the functor chain rule’s product with a notion of composition.