<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Conal Elliott &#187; derivative</title>
	<atom:link href="http://conal.net/blog/tag/derivative/feed" rel="self" type="application/rss+xml" />
	<link>http://conal.net/blog</link>
	<description>Inspirations &#38; experiments, mainly about denotative/functional programming in Haskell</description>
	<lastBuildDate>Thu, 25 Jul 2019 18:15:11 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.1.17</generator>
	<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2F&amp;language=en_US&amp;category=text&amp;title=Conal+Elliott&amp;description=Inspirations+%26amp%3B+experiments%2C+mainly+about+denotative%2Ffunctional+programming+in+Haskell&amp;tags=blog" type="text/html" />
	<item>
		<title>Another angle on zippers</title>
		<link>http://conal.net/blog/posts/another-angle-on-zippers</link>
		<comments>http://conal.net/blog/posts/another-angle-on-zippers#comments</comments>
		<pubDate>Thu, 29 Jul 2010 17:06:10 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[functor]]></category>
		<category><![CDATA[zipper]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=154</guid>
		<description><![CDATA[The zipper is an efficient and elegant data structure for purely functional editing of tree-like data structures, first published by Gérard Huet. Zippers maintain a location of focus in a tree and support navigation operations (up, down, left, right) and editing (replace current focus). The original zipper type and operations are customized for a single [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Another angle on zippers

Tags: derivative, functor, zipper

URL: http://conal.net/blog/posts/another-angle-on-zippers/

-->

<!-- references -->

<!-- teaser -->

<p>The zipper is an efficient and elegant data structure for purely functional editing of tree-like data structures, <a href="http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf" title="paper by Gérard Huet">first published by Gérard Huet</a>.
Zippers maintain a location of focus in a tree and support navigation operations (up, down, left, right) and editing (replace current focus).</p>

<p>The original zipper type and operations are customized for a single type, but it&#8217;s not hard to see how to adapt to other tree-like types, and hence to regular data types.
There have been many follow-up papers to <em><a href="http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf" title="paper by Gérard Huet">The Zipper</a></em>, including a polytypic version in the paper <em><a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.1.6342" title="paper by Ralf Hinze, Johan Jeuring, and Andres Löh">Type-indexed data types</a></em>.</p>

<p>All of the zipper adaptations and generalizations I&#8217;ve seen so far maintain the original navigation interface.
In this post, I propose an alternative interface that appears to significantly simplify matters.
There are only two navigation functions instead of four, and each of the two is specified and implemented via a fairly simple one-liner.</p>

<p>I haven&#8217;t used this new zipper formulation in an application yet, so I do not know whether some usefulness has been lost in simplifying the interface.</p>

<p>The code in this blog post is taken from the Haskell library <a href="http://hackage.haskell.org/package/functor-combo" title="Hackage entry: functor-combo">functor-combo</a> and completes the <code>Holey</code> type class introduced in <em><a href="http://conal.net/blog/posts/differentiation-of-higher-order-types/" title="blog post">Differentiation of higher-order types</a></em>.</p>

<p><strong>Edits</strong>:</p>

<ul>
<li>2010-07-29: Removed some stray <code>Just</code> applications in <code>up</code> definitions.  (Thanks, illissius.)</li>
<li>2010-07-29: Augmented my complicated definition of <code>tweak2</code> with a much simpler version from Sjoerd Visscher.</li>
<li>2010-07-29: Replaced <code>fmap (first (:ds'))</code> with <code>(fmap.first) (:ds')</code> in <code>down</code> definitions.  (Thanks, Sjoerd.)</li>
</ul>

<!-- without a comment or something here, the last item above becomes a paragraph -->

<p><span id="more-154"></span></p>

<h3>Extraction</h3>

<p>The post <em><a href="http://conal.net/blog/posts/differentiation-of-higher-order-types/" title="blog post">Differentiation of higher-order types</a></em> gave part of a type class for one-hole contexts (functor derivatives) and the filling of those contexts:</p>

<pre><code>class Functor f ⇒ Holey f where
  type Der f :: * → *
  fillC :: Der f a → a → f a
</code></pre>

<p>The arguments of <code>fillC</code> correspond roughly to the components of what Gérard Huet called a &#8220;location&#8221;, namely context and something to fill the context:</p>

<pre><code>type Loc f a = (Der f a, a)
</code></pre>

<p>So an alternative hole-filling interface is</p>

<pre><code>fill :: Holey f ⇒ Loc f a → f a
fill = uncurry fillC
</code></pre>

<p>Now consider a reverse operation, a kind of <em>extraction</em>:</p>

<pre><code>guess1 :: f a → Loc f a
</code></pre>

<p>There&#8217;s an awkward problem here.
What if <code>f a</code> has more than one possible hole, or has no hole at all?
If more than one, then which do we pick?
Perhaps the left-most.
If none, then we might want to have a failure representation, e.g.,</p>

<pre><code>guess2 :: f a → Maybe (Loc f a)
</code></pre>

<p>To handle the more-than-one possibility, we could add another method for traversing the various extractions, like the <code>go_right</code> operation in <em><a href="http://www.st.cs.uni-saarland.de/edu/seminare/2005/advanced-fp/docs/huet-zipper.pdf" title="paper by Gérard Huet">The Zipper</a></em>, section 2.2.
I don&#8217;t know what changes we&#8217;d have to make to the <code>Loc</code> type.</p>

<p>We could instead use a list of possible extractions.</p>

<pre><code>guess3 :: f a → [Loc f a]
</code></pre>

<p>Why a <em>list</em>?
I guess because it&#8217;s in our habitual functional toolbox, and it covers any number of alternative extracted locations.
On the other hand, our toolbox is growing, and sometimes list isn&#8217;t the best functor for the job.
For instance, we might use a finger tree, which has better performance for some sequence operations.</p>

<p>Or we could a functor closer at hand, namely <code>f</code> itself.</p>

<pre><code>class Functor f ⇒ Holey f where
  type Der f :: * → *
  fillC   :: Der f a → a → f a
  extract :: f a → f (Loc f a)
</code></pre>

<p>For instance, when <code>f ≡ []</code>, <code>extract</code> returns a list of extractions; and when <code>f ≡ Id :*: Id</code>, <code>extract</code> returns a pair of extractions.</p>

<h3>How to extract</h3>

<p>A constant functor has void derivative.
Extraction yields another constant structure, with the same data but a different type:</p>

<pre><code>instance Holey (Const x) where
  type Der (Const x) = Void
  fillC = voidF
  extract (Const x) = Const x
</code></pre>

<p>The identity functor has exactly one opportunity for a hole, leaving no information behind:</p>

<pre><code>instance Holey Id where
  type Der Id = Unit
  fillC (Const ()) = Id
  extract (Id a) = Id (Const (), a)
</code></pre>

<p>The definitions of <code>Der</code> and <code>fillC</code> above and below are lifted directly from <em><a href="http://conal.net/blog/posts/differentiation-of-higher-order-types/" title="blog post">Differentiation of higher-order types</a></em>.</p>

<p>For sums, there are two cases: <code>InL fa, InR ga :: (f :+: g) a</code>.
Starting with the first case:</p>

<pre><code>InL fa :: (f :+: g) a

fa :: f a

extract fa :: f (Loc f a)

           :: f (Der f a, a)

(fmap.first) InL (extract fa) :: f ((Der f :+: Der g) a, a)

                              :: f ((Der (f :+: g) a), a)
</code></pre>

<p>See <em><a href="http://conal.net/blog/posts/semantic-editor-combinators/" title="blog post">Semantic editor combinators</a></em> for an explanation of <code>(fmap.first)</code> friends.
Continuing, apply the definition of <code>Der</code> on sums:</p>

<pre><code>InL ((fmap.first) InL (extract fa)) :: (f :+: g) ((Der (f :+: g) a), a)

                                    :: (f :+: g) (Loc (f :+: g) a)
</code></pre>

<p>The two steps that introduce <code>g</code> are motivated by the required type of <code>extract</code>.
Similarly, for the second case:</p>

<pre><code>InR ((fmap.first) InR (extract ga)) :: (f :+: g) (Loc (f :+: g) a)
</code></pre>

<p>So,</p>

<pre><code>instance (Holey f, Holey g) ⇒ Holey (f :+: g) where
  type Der (f :+: g) = Der f :+: Der g
  fillC (InL df) = InL ∘ fillC df
  fillC (InR df) = InR ∘ fillC df
  extract (InL fa) = InL ((fmap.first) InL (extract fa))
  extract (InR ga) = InR ((fmap.first) InR (extract ga))
</code></pre>

<p>For products, recall the derivative type:</p>

<pre><code>  type Der (f :*: g) = Der f :*: g  :+:  f :*: Der g
</code></pre>

<p>To extract from a product, we extract from either component and then pair with the other component.
The form of an argument to <code>extract</code> is</p>

<pre><code>fa :*: ga :: (f :*: g) a
</code></pre>

<p>Again, start with the left part:</p>

<pre><code>fa :: f a

extract fa :: f (Loc f a)
           :: f (Der f a, a)

(fmap.first) (:*: ga) (extract fa) :: f ((Der f :*: g) a, a)

(fmap.first) (InL ∘ (:*: ga)) (extract fa)
  :: f (((Der f :*: g) :+: (f :*: Der g)) a, a)

  :: f ((Der (f :*: g)) a, a)
</code></pre>

<p>Similarly, for the second component,</p>

<pre><code>(fmap.first) (InR ∘ (fa :*:)) (extract ga)
  :: g ((Der (f :*: g)) a, a)
</code></pre>

<p>Combining the two extraction routes:</p>

<pre><code>(fmap.first) (InL ∘ (:*: ga)) (extract fa) :*:
(fmap.first) (InR ∘ (fa :*:)) (extract ga)
  :: (f :*: g) (Der (f :*: g) a, a)
</code></pre>

<p>So,</p>

<pre><code>instance (Holey f, Holey g) ⇒ Holey (f :*: g) where
  type Der (f :*: g) = Der f :*: g  :+:  f :*: Der g
  fillC (InL (dfa :*:  ga)) = (:*: ga) ∘ fillC dfa
  fillC (InR ( fa :*: dga)) = (fa :*:) ∘ fillC dga
  extract (fa :*: ga) = 
    (fmap.first) (InL ∘ (:*: ga)) (extract fa) :*:
    (fmap.first) (InR ∘ (fa :*:)) (extract ga)
</code></pre>

<p>Finally, the chain rule, for functor composition:</p>

<pre><code>type Der (g :. f) = Der g :. f  :*:  Der f
</code></pre>

<p>A value of type <code>(g :. f) a</code> has form <code>O gfa</code>, where <code>gfa :: g (f a)</code>.
To extract:</p>

<ul>
<li>form all <code>g</code>-extractions, yielding values of type <code>fa :: f a</code> and their contexts of type <code>Der g (f a)</code>;</li>
<li>form all <code>f</code>-extractions of each such <code>fa</code>, yielding values of type <code>a</code> and their contexts of type <code>Der f a</code>; and</li>
<li>reassemble these pieces into the shape determined by <code>Der (g :. f)</code>.</li>
</ul>

<p>Let&#8217;s go:</p>

<pre><code>gfa :: g (f a)

extract gfa :: g (Loc g (f a))

            :: g (Der g (f a), f a)

fmap (second extract) (extract gfa)
  :: g (Der g (f a), f (Loc f a))
</code></pre>

<p>Continuing, the following lemmas come in handy.</p>

<pre><code>tweak2 :: Functor f ⇒
          (dg (f a), f (df a, a)) → f (((dg :. f) :*: df) a, a)
tweak2 = (fmap.first) chainRule ∘ tweak1

tweak1 :: Functor f ⇒
          (dg (fa), f (dfa, a)) → f ((dg (fa), dfa), a)
tweak1 = fmap lassoc ∘ squishP

squishP :: Functor f ⇒ (a, f b) → f (a,b)
squishP (a,fb) = fmap (a,) fb

chainRule :: (dg (f a), df a) → ((dg :. f) :*: df) a
chainRule (dgfa, dfa) = O dgfa :*: dfa

lassoc :: (p,(q,r)) → ((p,q),r)
lassoc    (p,(q,r)) =  ((p,q),r)
</code></pre>

<p><em>Edit:</em> Sjoerd Visscher found a much simpler form to replace the previous group of definitions:</p>

<pre><code>tweak2 (dgfa, fl) = (fmap.first) (O dgfa :*:) fl
</code></pre>

<p>More specifically,</p>

<pre><code>tweak2 :: Functor f =&gt; (Der g (f a), f (Loc f a))
                    -&gt; f (((Der g :. f) :*: Der f) a, a)
       :: Functor f =&gt; (Der g (f a), f (Loc f a))
                    -&gt; f (Der (g :. f) a, a)
       :: Functor f =&gt; (Der g (f a), f (Loc f a))
                    -&gt; f (Loc (g :. f) a)
</code></pre>

<p>This lemma gives just what we need to tweak the inner extraction:</p>

<pre><code>fmap (tweak2 ∘ second extract) (extract gfa) :: g (f (Loc (g :. f) a))
</code></pre>

<p>So</p>

<pre><code>extractGF :: (Holey f, Holey g) ⇒
             g (f a) → g (f (Loc (g :. f) a))
extractGF = fmap (tweak2 ∘ second extract) ∘ extract
</code></pre>

<p>and</p>

<pre><code>instance (Holey f, Holey g) ⇒ Holey (g :. f) where
  type Der (g :.  f) = Der g :. f  :*:  Der f
  fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa
  extract = inO extractGF
</code></pre>

<p>where <code>inO</code> is from <a href="http://hackage.haskell.org/packages/archive/TypeCompose/latest/doc/html/Control-Compose.html" title="module documentation">Control.Compose</a>, and is defined using the ideas from <em><a href="http://conal.net/blog/posts/prettier-functions-for-wrapping-and-wrapping/" title="blog post">Prettier functions for wrapping and wrapping</a></em> and the notational improvement from Matt Hellige&#8217;s <em><a href="http://matt.immute.net/content/pointless-fun" title="blog post by Matt Hellige">Pointless fun</a></em>.</p>

<pre><code>-- | Apply a unary function within the 'O' constructor.
inO :: (g (f a) → g' (f' a')) → ((g :. f) a → (g' :. f') a')
inO = unO ~&gt; O

infixr 1 ~&gt;
-- | Add pre- and post processing
(~&gt;) :: (a' → a) → (b → b') → ((a → b) → (a' → b'))
(f ~&gt; h) g = h ∘ g ∘ f
</code></pre>

<p>In case you&#8217;re wondering, these definitions did not come to me effortlessly.
I sweated through the derivation, guided always by my intuition and the necessary types, as determined by the shape of <code>Der (g :. f)</code>.
The type-checker helped me get from one step to the next.</p>

<p>I do a lot of type-directed derivations of this style while I program in Haskell, with the type-checker checking each step for me.
I&#8217;d love to have mechanized help in <em>creating</em> these derivations, not just <em>checking</em> them.</p>

<h3>Zippers</h3>

<p>How does the <code>Holey</code> class relate to zippers?
As in a few recent blog posts, let&#8217;s use the fact that regular data types are isomorphic to fixed-points of functors.</p>

<p>Functor fixed-points are like function fixed points</p>

<pre><code>fix f = f (fix f)

type Fix f = f (Fix f)
</code></pre>

<p>However, Haskell doesn&#8217;t support recursive type synonyms, so use a <code>newtype</code>:</p>

<pre><code>newtype Fix f = Fix { unFix :: f (Fix f) }
</code></pre>

<p>A context for a functor fixed-point is either empty, if we&#8217;re at the very top of an &#8220;<code>f</code>-tree&#8221;, or it&#8217;s an <code>f</code>-context for <code>f (Fix f)</code>, and a parent context:</p>

<pre><code>data Context f = TopC | Context (Der f (Fix f)) (Context f)  -- first try
</code></pre>

<p>Hm.
On the outside, <code>Context f</code> looks like a list, so let&#8217;s use a list instead:</p>

<pre><code>type Context f = [Der f (Fix f)]
</code></pre>

<p>The location type we used above is</p>

<pre><code>type Loc f a = (Der f a, a)
</code></pre>

<p>Similarly, define a type of zippers (also called &#8220;locations&#8221;) for functor fixed-points:</p>

<pre><code>type Zipper f = (Context f, Fix f)
</code></pre>

<p>This <code>Zipper</code> type corresponds to a zipper, and has operations <code>up</code> and <code>down</code>.
The <code>down</code> motion can yield multiple results.</p>

<pre><code>up   :: Holey f ⇒ Zipper f →    Zipper f

down :: Holey f ⇒ Zipper f → f (Zipper f)
</code></pre>

<p>Since <code>down</code> yields an <code>f</code>-collection of locations, we do not need sibling navigation functions (<code>left</code> &amp; <code>right</code>).</p>

<p>To move up in <code>Zipper</code>, strip off a derivative (one-hole functor context) and fill the hole with the current tree, leaving the other derivatives as the remaining fixed-point context.
Like so:</p>

<pre><code>up   :: Holey f ⇒ Zipper f →    Zipper f
up (d:ds', t) = (ds', Fix (fill (d,t)))
</code></pre>

<p>To see how the typing works out:</p>

<pre><code>(d:ds', t) :: Zipper f
(d:ds', t) :: (Context f, Fix f)

d:ds' :: [Der f (Fix f)]

t :: Fix f

d   ::  Der f (Fix f)
ds' :: [Der f (Fix f)]

fill :: Loc f b → f b
fill :: (Der f b, b) → f b
fill :: (Der f (Fix f), Fix f) → f (Fix f)

fill (d,t) :: f (Fix f)

Fix (fill (d,t)) :: Fix f

(ds', Fix (fill (d,t))) :: (Context f, Fix f)
                        :: Zipper f
</code></pre>

<p>Note that the <code>up</code> motion fails when at the top of a zipper (empty context).
If desired, we can also provide an unfailing version (really, a version with explictly typed failure):</p>

<pre><code>up' :: Holey f =&gt; Zipper f -&gt; Maybe (Zipper f)
up' ([]   , _) = Nothing
up' l          = Just (up l)
</code></pre>

<p>To move down in an <code>f</code>-tree <code>t</code>, form the extractions of <code>t</code>, each of which has a derivative and a sub-tree.
The derivative becomes part of an extended fixed-point context, and the sub-tree becomes the new sub-tree of focus.</p>

<pre><code>down :: Holey f ⇒ Zipper f → f (Zipper f)
down (ds', t) = (fmap.first) (:ds') (extract (unFix t)) (unFix t))
</code></pre>

<p>The typing (in case you&#8217;re curious):</p>

<pre><code>(ds',t) :: Zipper f
        :: (Context f, Fix f)
        :: ([Der f (Fix f)], Fix f)

ds' :: [Der f (Fix f)]
t :: Fix f
unFix t :: f (Fix f)

extract (unFix t) :: f (Der f (Fix f), Fix f)

(fmap.first) (:ds') (extract (unFix t))
  :: ([Der f (Fix f)], Fix f)
  :: (Context f, Fix f)
  :: LocFix f
</code></pre>

<h3>Zipping back to regular data types</h3>

<p>I like the (functor) fixed-point perspective on regular data types, for its austere formal simplicity.
It shows me the naked essence of regular data types, so I can more easily see and more deeply understand patterns like memoization, derivatives, and zippers.</p>

<p>For convenience and friendliness of <em>use</em>, I prefer working with regular types directly, rather than through the (nearly) isomorphic form of functor fixed-points.
While the fixed-point perspective is formalism-friendly, the <em>pattern functor</em> perspective is more user-friendly, allowing us to work with our familiar regular data as they are.</p>

<p>As in <em><a href="http://conal.net/blog/posts/elegant-memoization-with-higher-order-types/" title="blog post">Elegant memoization with higher-order types</a></em>, let&#8217;s use the following class:</p>

<pre><code>class Functor (PF t) ⇒ Regular t where
  type PF t :: * → *
  wrap   :: PF t t → t
  unwrap :: t → PF t t
</code></pre>

<p>The idea is that a type <code>t</code> is isomorphic to <code>Fix (PF t)</code>, although really there may be more points of undefinedness in the fixed-point representation, so rather than an isomorphism, we have an embedding/projection pair.</p>

<p>The notions of context and location are similar to the ones above:</p>

<pre><code>type Context t = [Der (PF t) t]

type Zipper t = (Context t, t)
</code></pre>

<p>So are the <code>up</code> and <code>down</code> motions, in which <code>wrap</code> and <code>unwrap</code> replace <code>Fix</code> and <code>unFix</code>:</p>

<pre><code>up   :: (Regular t, Holey (PF t)) ⇒ Zipper t →       Zipper t
down :: (Regular t, Holey (PF t)) ⇒ Zipper t → PF t (Zipper t)

up (d:ds', t) = (ds', wrap (fill (d,t)))

down (ds', t) = (fmap.first) (:ds') (extract (unwrap t))
</code></pre>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=154&amp;md5=29e8305b61bfd15d10c9a8d4f4b13bdf"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/another-angle-on-zippers/feed</wfw:commentRss>
		<slash:comments>16</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fanother-angle-on-zippers&amp;language=en_GB&amp;category=text&amp;title=Another+angle+on+zippers&amp;description=The+zipper+is+an+efficient+and+elegant+data+structure+for+purely+functional+editing+of+tree-like+data+structures%2C+first+published+by+G%C3%A9rard+Huet.+Zippers+maintain+a+location+of+focus+in+a...&amp;tags=derivative%2Cfunctor%2Czipper%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Differentiation of higher-order types</title>
		<link>http://conal.net/blog/posts/differentiation-of-higher-order-types</link>
		<comments>http://conal.net/blog/posts/differentiation-of-higher-order-types#comments</comments>
		<pubDate>Thu, 29 Jul 2010 02:45:51 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[functor]]></category>
		<category><![CDATA[zipper]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=140</guid>
		<description><![CDATA[A &#8220;one-hole context&#8221; is a data structure with one piece missing. Conor McBride pointed out that the derivative of a regular type is its type of one-hole contexts. When a data structure is assembled out of common functor combinators, a corresponding type of one-hole contexts can be derived mechanically by rules that mirror the standard [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Differentiation of higher-order types

Tags: derivative, functor, zipper

URL: http://conal.net/blog/posts/differentiation-of-higher-order-types/

-->

<!-- references -->

<!-- teaser -->

<p>A &#8220;one-hole context&#8221; is a data structure with one piece missing.
Conor McBride pointed out that <a href="http://www.cs.nott.ac.uk/~ctm/diff.pdf" title="paper by Conor McBride">the derivative of a regular type is its type of one-hole contexts</a>.
When a data structure is assembled out of common functor combinators, a corresponding type of one-hole contexts can be derived mechanically by rules that mirror the standard derivative rules learned in beginning differential calculus.</p>

<p>I&#8217;ve been playing with functor combinators lately.
I was delighted to find that the data-structure derivatives can be expressed directly using the standard functor combinators and type families.</p>

<p>The code in this blog post is taken from the Haskell library <a href="http://hackage.haskell.org/package/functor-combo" title="Hackage entry: functor-combo">functor-combo</a>.</p>

<p>See also the <a href="http://en.wikibooks.org/wiki/Haskell/Zippers" title="Wikibooks entry">Haskell Wikibooks page on zippers</a>, especially the section called &#8220;Differentiation of data types&#8221;.</p>

<p>I mean this post not as new research, but rather as a tidy, concrete presentation of some of Conor&#8217;s delightful insight.</p>

<!--
**Edits**:

* 2009-02-09: just fiddling around
-->

<!-- without a comment or something here, the last item above becomes a paragraph -->

<p><span id="more-140"></span></p>

<h3>Functor combinators</h3>

<p>Let&#8217;s use the same set of functor combinators as in <em><a href="http://conal.net/blog/posts/elegant-memoization-with-higher-order-types/" title="blog post">Elegant memoization with higher-order types</a></em> and <em><a href="http://conal.net/blog/posts/memoizing-higher-order-functions/" title="blog post">Memoizing higher-order functions</a></em>:</p>

<pre><code>data Void a   -- no constructors

type Unit a        = Const () a

data Const x a     = Const x

newtype Id a       = Id a

data (f :+: g) a   = InL (f a) | InR (g a)

data (f :*: g) a   = f a :*: g a

newtype (g :. f) a = O (g (f a))
</code></pre>

<h3>Derivatives</h3>

<p>The derivative of a functor is another functor.
Since the shape of the derivative is non-uniform (depends on the shape of the functor being differentiated) define a higher-order <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/type-families.html" title="GHC documentation on type families">type family</a>:</p>

<pre><code>type family Der (f :: (* → *)) :: (* → *)
</code></pre>

<p>The usual derivative rules can then be translated without applying much imagination.
That is, if we start with derivative rules in their <em>functional</em> form (e.g., as in the paper <em><a href="http://conal.net/blog/posts/paper-beautiful-differentiation/" title="blog post">Beautiful differentiation</a></em>, Section 2 and Figure 1).</p>

<p>For instance, the derivative of the constant function is the constant 0 function, and the derivative of the identity function is the constant 1 function.
If <code>der</code> is the derivative functional mapping functions (of real numbers) to functions,</p>

<pre><code>der (const x) ≡ 0
der id        ≡ 1
</code></pre>

<p>On the right-hand sides, I am exploiting the function instances of <code>Num</code> from the library <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/applicative-numbers" title="Haskell library">applicative-numbers</a>.
To be more explicit, I could have written &#8220;<code>const 0</code>&#8221; and &#8220;<code>const 1</code>&#8220;.</p>

<p>Correspondingly,</p>

<pre><code>type instance Der (Const x) = Void   -- 0

type instance Der Id        = Unit   -- 1
</code></pre>

<p>Note that the types <code>Void a</code> and <code>Unit a</code> have 0 and 1 element, respectively, if we ignore ⊥.
Moreover, <code>Void</code> is a sort of additive identity, and <code>Unit</code> is a sort of multiplicative identity, again ignoring ⊥.
For these reasons, <code>Void</code> and <code>Unit</code> might be more aptly named &#8220;<code>Zero</code>&#8221; and &#8220;<code>One</code>&#8220;.</p>

<p>The first rule says that the a value of type <code>Const x a</code> has no one-hole context (for type <code>a</code>), which is true, since there is an <code>x</code> but no <code>a</code>.
The second rule says that there is exactly one possible context for <code>Id a</code>, since the one and only <code>a</code> value must be removed, and no information remains.</p>

<p>A (one-hole) context for a sum is a context for the left or the right possibility of the sum:</p>

<pre><code>type instance Der (f :+: g) = Der f :+: Der g
</code></pre>

<p>Correspondingly, the derivative of a sum of functions is the sum of the functions&#8217; derivatives::</p>

<pre><code>der (f + g) ≡ der f + der g
</code></pre>

<p>Again I&#8217;m using the function <code>Num</code> instance from <a href="http://hackage.haskell.org/cgi-bin/hackage-scripts/package/applicative-numbers" title="Haskell library">applicative-numbers</a>.</p>

<p>For a pair, the one hole of a context can be made somewhere in the first component or somewhere in the second component.
So the pair context consists of a holey first component and a full second component or a full first component and a holey second component.</p>

<pre><code>type instance Der (f :*: g) = Der f :*: g  :+:  f :*: Der g
</code></pre>

<p>Similarly, for functions:</p>

<pre><code>der (f * g) ≡ der f * g + f * der g
</code></pre>

<p>Finally, consider functor composition.
If <code>g</code> and <code>f</code> are container types, then <code>(g :. f) a</code> is the type of <code>g</code> containers of <code>f</code> containers of <code>a</code> elements.
The <code>a</code>-shaped hole must come from one of the contained <code>f a</code> structures.</p>

<pre><code>type instance Der (g :. f) = (Der g :. f) :*: Der f
</code></pre>

<p>Here&#8217;s one way to think of this derivative functor:
to make an <code>a</code>-shaped hole in a <code>g (f a)</code>, first remove an <code>f a</code> structure, leaving an <code>(f a)</code>-shaped hole, and then put back all but an <code>a</code> value extracted from the removed <code>f a</code> struture.
So the overall (one-hole) context can be assembled from two parts: a <code>g</code> context of <code>f a</code> structures, and an <code>f</code> context of <code>a</code> values.</p>

<p>The corresponding rule for function derivatives:</p>

<pre><code>der (g ∘ f) ≡ (der g ∘ f) * der f
</code></pre>

<p>which again uses <code>Num</code> on functions.
Written out more explicitly:</p>

<pre><code>der (g ∘ f) a ≡ der g (f a)  * der f a
</code></pre>

<p>which may look more like the form you&#8217;re used to.</p>

<h3>Summary of derivatives</h3>

<p>To emphasize the correspondence between forms of differentiation, here are rules for <em>function</em> and <em>functor</em> derivatives:</p>

<pre><code>der (const x) ≡ 0
Der (Const x) ≡ Void

der id ≡ 1
Der Id ≡ Unit

der (f  +  g) ≡ der f  +  der g
Der (f :+: g) ≡ Der f :+: Der g

der (f  *  g) ≡ der f  *  g  +  f  *  der g
Der (f :*: g) ≡ Der f :*: g :+: f :*: Der g

der (g  ∘ f) ≡ (der g  ∘ f)  *  der f
Der (g :. f) ≡ (Der g :. f) :*: Der f
</code></pre>

<h3>Filling holes</h3>

<p>Each derivative functor is a one-hole container.
One useful operation on derivatives is filling that hole.</p>

<pre><code>fillC :: Functor f ⇒ Der f a → a → f a
</code></pre>

<p>The specifics of how to fill in a hole will depend on the choice of functor <code>f</code>, so let&#8217;s make the <code>fillC</code> operation a method of a new type class.
This new class is also a handy place to stash the associated type of derivatives, as an alternative to the top-level declarations above.</p>

<pre><code>class Functor f ⇒ Holey f where
  type Der f :: * → *
  fillC :: Der f a → a → f a
</code></pre>

<p>I&#8217;ll add one more method to this class in an upcoming post.</p>

<p>For <code>Const x</code>, there are no cases to handle, since there are no holes.</p>

<pre><code>instance Holey (Const x) where
  type Der (Const x) = Void
  fillC = error "fillC for Const x: no Der values"
</code></pre>

<p>I added a definition just to keep the compiler from complaining.
This particular <code>fillC</code> can only be applied to a value of type <code>Void a</code>, and there are no such values other than ⊥.</p>

<p>Is there a more elegant way to define functions over data types with no constructors?
One idea is to provide a single, polymorphic function over void types:</p>

<pre><code>  voidF :: Void a → b
  voidF = error "voidF: no value of type Void"
</code></pre>

<p>And use whenever as needed, e.g.,</p>

<pre><code>  fillC = voidF
</code></pre>

<p>Next is our identity functor:</p>

<pre><code>instance Holey Id where
  type Der Id = Unit
  fillC (Const ()) a = Id a
</code></pre>

<p>More succinctly,</p>

<pre><code>  fillC (Const ()) = Id
</code></pre>

<p>For sums,</p>

<pre><code>instance (Holey f, Holey g) ⇒ Holey (f :+: g) where
  type Der (f :+: g) = Der f :+: Der g
  fillC (InL df) a = InL (fillC df a)
  fillC (InR df) a = InR (fillC df a)
</code></pre>

<p>or</p>

<pre><code>  fillC (InL df) = InL ∘ fillC df
  fillC (InR df) = InR ∘ fillC df
</code></pre>

<p>Products also have two cases, since the derivative of a product is a sum:</p>

<pre><code>instance (Holey f, Holey g) ⇒ Holey (f :*: g) where
  type Der (f :*: g) = Der f :*: g  :+:  f :*: Der g
  fillC (InL (dfa :*:  ga)) a = fillC dfa a :*: ga
  fillC (InR ( fa :*: dga)) a = fa :*: fillC dga a
</code></pre>

<p>Less pointfully,</p>

<pre><code>  fillC (InL (dfa :*:  ga)) = (:*: ga) ∘ fillC dfa
  fillC (InR ( fa :*: dga)) = (fa :*:) ∘ fillC dga
</code></pre>

<p>Finally, functor composition:</p>

<pre><code>instance (Holey f, Holey g) ⇒ Holey (g :. f) where
  type Der (g :. f) = (Der g :. f) :*: Der f
  fillC (O dgfa :*: dfa) a = O (fillC dgfa (fillC dfa a))
</code></pre>

<p>The less pointful form is more telling.</p>

<pre><code>  fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa
</code></pre>

<p>In words: filling of the derivative of a composition is a composition of filling of the derivatives.</p>

<h3>Thoughts on composition</h3>

<p>Let&#8217;s return to the derivative rules for composition, i.e., the chain rule, on functions and on functors:</p>

<pre><code>der (g  ∘ f) ≡ (der g  ∘ f)  *  der f

Der (g :. f) ≡ (Der g :. f) :*: Der f
</code></pre>

<p>Written in this way, the functor rule looks quite compelling.
Something bothers me, however.
For functions, multiplication is a special case, not the general case, and is only meaningful and correct when differentiating functions from scalars to scalars.
In general, derivative values are <em>linear maps</em>, and the chain rule uses composition on linear maps rather than multiplication on scalars (that <em>represent</em> linear maps).
I&#8217;ve written several <a href="http://conal.net/blog/tag/derivative/" title="Posts on derivatives">posts on derivatives</a> and a paper <em><a href="http://conal.net/blog/posts/paper-beautiful-differentiation/" title="blog post">Beautiful differentiation</a></em>, describing this perspective, which comes from calculus on manifolds.</p>

<p>Look again at the less pointful formulation of <code>fillC</code> for derivatives of compositions:</p>

<pre><code>  fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa
</code></pre>

<p>The product in this case is just structural.
The actual use in <code>fillC</code> is indeed a composition of linear maps.
In this context, &#8220;linear&#8221; has a different meaning from before.
It&#8217;s another way of saying &#8220;fills a <em>one-hole</em> context&#8221; (as the linear patterns of term rewriting and of ML &amp; Haskell).</p>

<p>So maybe there&#8217;s a more general/abstract view of <em>functor</em> derivatives, just as there is a more general/abstract view of <em>function</em> derivatives.
In that view, we might replace the functor chain rule&#8217;s product with a notion of composition.</p>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=140&amp;md5=aa94cf980e9605e54025a8adb6e7d65f"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/differentiation-of-higher-order-types/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fdifferentiation-of-higher-order-types&amp;language=en_GB&amp;category=text&amp;title=Differentiation+of+higher-order+types&amp;description=A+%26%238220%3Bone-hole+context%26%238221%3B+is+a+data+structure+with+one+piece+missing.+Conor+McBride+pointed+out+that+the+derivative+of+a+regular+type+is+its+type+of+one-hole+contexts.+When+a...&amp;tags=derivative%2Cfunctor%2Czipper%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Garbage collecting the semantics of FRP</title>
		<link>http://conal.net/blog/posts/garbage-collecting-the-semantics-of-frp</link>
		<comments>http://conal.net/blog/posts/garbage-collecting-the-semantics-of-frp#comments</comments>
		<pubDate>Mon, 04 Jan 2010 21:55:30 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[design]]></category>
		<category><![CDATA[FRP]]></category>
		<category><![CDATA[functional reactive programming]]></category>
		<category><![CDATA[semantics]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=96</guid>
		<description><![CDATA[Ever since ActiveVRML, the model we&#8217;ve been using in functional reactive programming (FRP) for interactive behaviors is (T-&#62;a) -&#62; (T-&#62;b), for dynamic (time-varying) input of type a and dynamic output of type b (where T is time). In &#8220;Classic FRP&#8221; formulations (including ActiveVRML, Fran &#38; Reactive), there is a &#8220;behavior&#8221; abstraction whose denotation is a [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Garbage collecting the semantics of FRP

Tags: FRP, functional reactive programming, semantics, design, derivative

URL: http://conal.net/blog/posts/garbage-collecting-the-semantics-of-frp/

-->

<!-- references -->

<!-- teaser -->

<p>Ever since <a href="http://conal.net/papers/ActiveVRML/" title="Tech report: &quot;A Brief Introduction to ActiveVRML&quot;">ActiveVRML</a>, the model we&#8217;ve been using in functional reactive programming (FRP) for interactive behaviors is <code>(T-&gt;a) -&gt; (T-&gt;b)</code>, for dynamic (time-varying) input of type <code>a</code> and dynamic output of type <code>b</code> (where <code>T</code> is time).
In &#8220;Classic FRP&#8221; formulations (including <a href="http://conal.net/papers/ActiveVRML/" title="Tech report: &quot;A Brief Introduction to ActiveVRML&quot;">ActiveVRML</a>, <a href="http://conal.net/papers/icfp97/" title="paper">Fran</a> &amp; <a href="http://conal.net/papers/push-pull-frp/" title="Paper by Conal Elliott and Paul Hudak">Reactive</a>), there is a &#8220;behavior&#8221; abstraction whose denotation is a function of time.
Interactive behaviors are then modeled as host language (e.g., Haskell) functions between behaviors.
Problems with this formulation are described in <em><a href="http://conal.net/blog/posts/why-classic-FRP-does-not-fit-interactive-behavior/" title="blog post">Why classic FRP does not fit interactive behavior</a></em>.
These same problems motivated &#8220;Arrowized FRP&#8221;.
In Arrowized FRP, behaviors (renamed &#8220;signals&#8221;) are purely conceptual.
They are part of the semantic model but do not have any realization in the programming interface.
Instead, the abstraction is a <em>signal transformer</em>, <code>SF a b</code>, whose semantics is <code>(T-&gt;a) -&gt; (T-&gt;b)</code>.
See <em><a href="http://conal.net/papers/genuinely-functional-guis.pdf" title="Paper by Antony Courtney and Conal Elliott">Genuinely Functional User Interfaces</a></em> and <em><a href="http://www.haskell.org/yale/papers/haskellworkshop02/" title="Paper by Henrik Nilsson, Antony Courtney, and John Peterson">Functional Reactive Programming, Continued</a></em>.</p>

<p>Whether in its classic or arrowized embodiment, I&#8217;ve been growing uncomfortable with this semantic model of functions between time functions.
A few weeks ago, I realized that one source of discomfort is that this model is <em>mostly junk</em>.</p>

<p>This post contains some partially formed thoughts about how to eliminate the junk (&#8220;garbage collect the semantics&#8221;), and what might remain.</p>

<!--
**Edits**:

* 2009-02-09: just fiddling around
-->

<!-- without a comment or something here, the last item above becomes a paragraph -->

<p><span id="more-96"></span></p>

<p>There are two generally desirable properties for a denotational semantics: <em>full abstraction</em> and <em>junk-freeness</em>.
Roughly, &#8220;full abstraction&#8221; means we must not distinguish between what is (operationally) indistinguishable, while &#8220;junk-freeness&#8221; means that every semantic value must be denotable.</p>

<p>FRP&#8217;s semantic model, <code>(T-&gt;a) -&gt; (T-&gt;b)</code>, allows not only arbitrary (computable) transformation of input values, but also of time.
The output at some time can depend on the input at any time at all, or even on the input at arbitrarily many different times.
Consequently, this model allows respoding to <em>future</em> input, violating a principle sometimes called &#8220;causality&#8221;, which is that outputs may depend on the past or present but not the future.</p>

<p>In a causal system, the present can reach backward to the past but not forward the future.
I&#8217;m uneasy about this ability as well.
Arbitrary access to the past may be much more powerful than necessary.
As evidence, consult the system we call (physical) Reality.
As far as I can tell, Reality operates without arbitrary access to the past or to the future, and it does a pretty good job at expressiveness.</p>

<p>Moreover, arbitrary past access is also problematic to implement in its semantically simple generality.</p>

<p>There is a thing we call informally &#8220;memory&#8221;, which at first blush may look like access to the past, it isn&#8217;t really.
Rather, memory is access to a <em>present</em> input, which has come into being through a process of filtering, gradual accumulation, and discarding (forgetting).
I&#8217;m talking about &#8220;memory&#8221; here in the sense of what our brains do, but also what all the rest of physical reality does.
For instance, weather marks on a rock are part of the rock&#8217;s (present) memory of the past weather.</p>

<p>A very simple memory-less semantic model of interactive behavior is just <code>a -&gt; b</code>.
This model is too restrictive, however, as it cannot support <em>any</em> influence of the past on the present.</p>

<p>Which leaves a question: what is a simple and adequate formal model of interactive behavior that reaches neither into the past nor into the future, and yet still allows the past to influence the present?
Inspired in part by a design principle I call &#8220;what would reality do?&#8221; (WWRD), I&#8217;m happy to have some kind of infinitesimal access to the past, but nothing further.</p>

<p>My current intuition is that differentiation/integration plays a crucial role.
That information is carried forward moment by moment in time as &#8220;momentum&#8221; in some sense.</p>

<blockquote>
  <p><em>I call intuition cosmic fishing. You feel a nibble, then you&#8217;ve got to hook the fish.</em> &#8211; Buckminster Fuller</p>
</blockquote>

<p>Where to go with these intuitions?</p>

<p>Perhaps interactive behaviors are some sort of function with all of its derivatives.
See <em><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="blog post">Beautiful differentiation</a></em> for an specification and derived implementation of numeric operations, and more generally of <code>Functor</code> and <code>Applicative</code>, on which much of FRP is based.</p>

<p>I suspect the whole event model can be replaced by integration.
Integration is the main remaining piece.</p>

<p>How weak a semantic model can let us define integration?</p>

<h3>Thanks</h3>

<p>My thanks to Luke Palmer and to Noam Lewis for some clarifying chats about these half-baked ideas.
And to the folks on #haskell IRC for <a href="http://tunes.org/~nef/logs/haskell/10.01.04">brainstorming titles for this post</a>.
My favorite suggestions were</p>

<ul>
<li>luqui: instance HasJunk FRP where</li>
<li>luqui: Functional reactive programming&#8217;s semantic baggage</li>
<li>sinelaw: FRP, please take out the trash!</li>
<li>cale: Garbage collecting the semantics of FRP</li>
<li>BMeph: Take out the FRP-ing Trash</li>
</ul>

<p>all of which I preferred over my original &#8220;FRP is mostly junk&#8221;.</p>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=96&amp;md5=a0b309c313791bd63f34ab08b5fb4c3b"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/garbage-collecting-the-semantics-of-frp/feed</wfw:commentRss>
		<slash:comments>34</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fgarbage-collecting-the-semantics-of-frp&amp;language=en_GB&amp;category=text&amp;title=Garbage+collecting+the+semantics+of+FRP&amp;description=Ever+since+ActiveVRML%2C+the+model+we%26%238217%3Bve+been+using+in+functional+reactive+programming+%28FRP%29+for+interactive+behaviors+is+%28T-%26gt%3Ba%29+-%26gt%3B+%28T-%26gt%3Bb%29%2C+for+dynamic+%28time-varying%29+input+of+type+a+and+dynamic+output...&amp;tags=derivative%2Cdesign%2CFRP%2Cfunctional+reactive+programming%2Csemantics%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Paper: Beautiful differentiation</title>
		<link>http://conal.net/blog/posts/paper-beautiful-differentiation</link>
		<comments>http://conal.net/blog/posts/paper-beautiful-differentiation#comments</comments>
		<pubDate>Tue, 24 Feb 2009 08:05:10 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[applicative functor]]></category>
		<category><![CDATA[beautiful code]]></category>
		<category><![CDATA[calculus on manifolds]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[functor]]></category>
		<category><![CDATA[linear map]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[paper]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=85</guid>
		<description><![CDATA[I have another paper draft for submission to ICFP 2009. This one is called Beautiful differentiation, The paper is a culmination of the several posts I&#8217;ve written on derivatives and automatic differentiation (AD). I&#8217;m happy with how the derivation keeps getting simpler. Now I&#8217;ve boiled extremely general higher-order AD down to a Functor and Applicative [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Paper: Beautiful differentiation

Tags: derivative, functor, applicative functor, beautiful code, calculus on manifolds, linear map, math, paper

URL: http://conal.net/blog/posts/paper-beautiful-differentiation/

-->

<!-- references -->

<!-- teaser -->

<p>I have another paper draft for submission to <a href="http://www.cs.nott.ac.uk/~gmh/icfp09.html" title="conference page">ICFP 2009</a>.
This one is called <em><a href="http://conal.net/papers/beautiful-differentiation" title="paper">Beautiful differentiation</a></em>, 
The paper is a culmination of the <a href="http://conal.net/blog/tag/derivative/">several posts</a> I&#8217;ve written on derivatives and automatic differentiation (AD).
I&#8217;m happy with how the derivation keeps getting simpler.
Now I&#8217;ve boiled extremely general higher-order AD down to a <code>Functor</code> and <code>Applicative</code> morphism.</p>

<p>I&#8217;d love to get some readings and feedback.
I&#8217;m a bit over the page the limit, so I&#8217;ll have to do some trimming before submitting.</p>

<p>The abstract:</p>

<blockquote>
  <p>Automatic differentiation (AD) is a precise, efficient, and convenient
  method for computing derivatives of functions. Its implementation can be
  quite simple even when extended to compute all of the higher-order
  derivatives as well. The higher-dimensional case has also been tackled,
  though with extra complexity. This paper develops an implementation of
  higher-dimensional, higher-order differentiation in the extremely
  general and elegant setting of <em>calculus on manifolds</em> and derives that
  implementation from a simple and precise specification.</p>
  
  <p>In order to motivate and discover the implementation, the paper poses
  the question &#8220;What does AD mean, independently of implementation?&#8221; An
  answer arises in the form of <em>naturality</em> of sampling a function and its
  derivative. Automatic differentiation flows out of this naturality
  condition, together with the chain rule. Graduating from first-order to
  higher-order AD corresponds to sampling all derivatives instead of just
  one. Next, the notion of a derivative is generalized via the notions of
  vector space and linear maps. The specification of AD adapts to this
  elegant and very general setting, which even <em>simplifies</em> the
  development.</p>
</blockquote>

<p>You can <a href="http://conal.net/papers/beautiful-differentiation" title="paper">get the paper and see current errata here</a>.</p>

<p>The submission deadline is March 2, so comments before then are most helpful to me.</p>

<p>Enjoy, and thanks!</p>

<!--
**Edits**:

* 2009-02-09: just fiddling around
-->
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=85&amp;md5=2f6565c8f001a5925c3e6dbd29158c37"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/paper-beautiful-differentiation/feed</wfw:commentRss>
		<slash:comments>22</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fpaper-beautiful-differentiation&amp;language=en_GB&amp;category=text&amp;title=Paper%3A+Beautiful+differentiation&amp;description=I+have+another+paper+draft+for+submission+to+ICFP+2009.+This+one+is+called+Beautiful+differentiation%2C+The+paper+is+a+culmination+of+the+several+posts+I%26%238217%3Bve+written+on+derivatives+and...&amp;tags=applicative+functor%2Cbeautiful+code%2Ccalculus+on+manifolds%2Cderivative%2Cfunctor%2Clinear+map%2Cmath%2Cpaper%2Cblog" type="text/html" />
	</item>
		<item>
		<title>What is automatic differentiation, and why does it work?</title>
		<link>http://conal.net/blog/posts/what-is-automatic-differentiation-and-why-does-it-work</link>
		<comments>http://conal.net/blog/posts/what-is-automatic-differentiation-and-why-does-it-work#comments</comments>
		<pubDate>Wed, 28 Jan 2009 20:09:42 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[semantics]]></category>
		<category><![CDATA[type class morphism]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=79</guid>
		<description><![CDATA[Bertrand Russell remarked that Everything is vague to a degree you do not realize till you have tried to make it precise. I&#8217;m mulling over automatic differentiation (AD) again, neatening up previous posts on derivatives and on linear maps, working them into a coherent whole for an ICFP submission. I understand the mechanics and some [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- teaser -->

<p
>Bertrand Russell remarked that</p
>

<blockquote
><p
  ><em
    >Everything is vague to a degree you do not realize till you have tried to make it precise.</em
    ></p
  ></blockquote
>

<p
>I&#8217;m mulling over automatic differentiation (AD) again, neatening up previous posts on <a href="http://conal.net/blog/tag/derivative/" title="posts on derivatives"
  >derivatives</a
  > and on <a href="http://conal.net/blog/tag/linear-map/" title="posts on linear maps"
  >linear maps</a
  >, working them into a coherent whole for an ICFP submission. I understand the mechanics and some of the reasons for its correctness. After all, it&#8217;s &quot;just the chain rule&quot;.</p
>

<p
>As usual, in the process of writing, I bumped up against Russell&#8217;s principle. I felt a growing uneasiness and realized that I didn&#8217;t understand AD in the way I like to understand software, namely,</p
>

<ul
><li
  ><em
    >What</em
    > does it mean, independently of implementation?</li
  ><li
  ><em
    >How</em
    > do the implementation and its correctness flow gracefully from that meaning?</li
  ><li
  ><em
    >Where</em
    > else might we go, guided by answers to the first two questions?</li
  ></ul
>

<p
>Ever since writing <em
  ><a href="http://conal.net/papers/simply-reactive" title="paper"
    >Simply efficient functional reactivity</a
    ></em
  >, the idea of <a href="http://conal.net/blog/tag/type-class-morphism/" title="posts on type class morphisms"
  >type class morphisms</a
  > keeps popping up for me as a framework in which to ask and answer these questions. To my delight, this framework gives me new and more satisfying insight into automatic differentiation.</p
>

<p><span id="more-79"></span></p>

<div id="whats-a-derivative"
><h3
  >What&#8217;s a derivative?</h3
  ><p
  >My first guess is that AD has something to do with derivatives, which then raises the question of what is a derivative. For now, I&#8217;m going to substitute a popular but problematic answer to that question and say that</p
  ><pre class="sourceCode haskell"
  ><code
    >deriv <span class="dv"
      >&#8759;</span
      > &#8943; &#8658; (a &#8594; b) &#8594; (a &#8594; b) <span class="co"
      >--  simplification</span
      ><br
       /></code
    ></pre
  ><p
  >As discussed in <em
    ><a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="blog post"
      >What is a derivative, really?</a
      ></em
    >, the popular answer has limited usefulness, applying just to scalar (one-dimensional) domain. The real deal involves distinguishing the type <code
    >b</code
    > from the type <code
    >a :-* b</code
    > of <a href="http://conal.net/blog/tag/linear-map/" title="posts on linear maps"
    >linear maps</a
    > from <code
    >a</code
    > to <code
    >b</code
    >.</p
  ><pre class="sourceCode haskell"
  ><code
    >deriv <span class="dv"
      >&#8759;</span
      > (<span class="dt"
      >VectorSpace</span
      > u, <span class="dt"
      >VectorSpace</span
      > v) &#8658; (u &#8594; v) &#8594; (u &#8594; (u <span class="fu"
      >:-*</span
      > v))<br
       /></code
    ></pre
  ><div id="why-care-about-derivatives"
  ><h4
    >Why care about derivatives?</h4
    ><p
    >Derivatives are useful in a variety of application areas, including root-finding, optimization, curve and surface tessellation, and computation of surface normals for 3D rendering. Considering the usefulness of derivatives, it is worthwhile to find software methods that are</p
    ><ul
    ><li
      >simple (to implement and verify),</li
      ><li
      >convenient,</li
      ><li
      >accurate,</li
      ><li
      >efficient, and</li
      ><li
      >general.</li
      ></ul
    ></div
  ></div
>

<div id="what-isnt-ad"
><h3
  >What <em
    >isn't</em
    > AD?</h3
  ><div id="numeric-approximation"
  ><h4
    >Numeric approximation</h4
    ><p
    >One differentiation method <em
      >numeric approximation</em
      >, using simple finite differences. This method is based on the definition of (scalar) derivative:</p
    ><div class=math-inset>
<p
    ><span class="math"
      ><em
    >d</em
    ><em
    >e</em
    ><em
    >r</em
    ><em
    >i</em
    ><em
    >v</em
    > <em
    >f</em
    > <em
    >x</em
    > ≡ lim<sub
    ><em
      >h</em
      > → 0</sub
    >(<em
    >f</em
    > (<em
    >x</em
    > + <em
    >h</em
    >) - <em
    >f</em
    > <em
    >x</em
    >) / <em
    >h</em
    ></span
      ></p
    ></div>
<p
    >The left-hand side reads &quot;the derivative of <em
      >f</em
      > at <em
      >x</em
      >&quot;.</p
    ><p
    >To approximate the derivative, use</p
    ><div class=math-inset>
<p
    ><span class="math"
      ><em
    >d</em
    ><em
    >e</em
    ><em
    >r</em
    ><em
    >i</em
    ><em
    >v</em
    > <em
    >f</em
    > <em
    >x</em
    > ≈ (<em
    >f</em
    > (<em
    >x</em
    > + <em
    >h</em
    >) - <em
    >f</em
    > <em
    >x</em
    >) / <em
    >h</em
    ></span
      ></p
    ></div>
<p
    >for a small value of <em
      >h</em
      >. While very simple, this method is often inaccurate, due to choosing either too large or too small a value for <em
      >h</em
      >. (Small values of <em
      >h</em
      > lead to rounding errors.) More sophisticated variations improve accuracy while sacrificing simplicity.</p
    ></div
  ><div id="symbolic-differentiation"
  ><h4
    >Symbolic differentiation</h4
    ><p
    >A second method is <em
      >symbolic differentiation</em
      >. Instead of using the definition of <em
      >deriv</em
      > directly, the symbolic method uses a collection of rules, such as those below:</p
    ><pre class="sourceCode haskell"
    ><code
      >deriv (u <span class="fu"
    >+</span
    > v)   &#8801; deriv u <span class="fu"
    >+</span
    > deriv v<br
     />deriv (u <span class="fu"
    >*</span
    > v)   &#8801; deriv v <span class="fu"
    >*</span
    > u <span class="fu"
    >+</span
    > deriv u <span class="fu"
    >*</span
    > v<br
     />deriv (<span class="fu"
    >-</span
    > u)     &#8801; <span class="fu"
    >-</span
    > deriv u<br
     />deriv (<span class="fu"
    >exp</span
    > u)   &#8801; deriv u <span class="fu"
    >*</span
    > <span class="fu"
    >exp</span
    > u<br
     />deriv (<span class="fu"
    >log</span
    > u)   &#8801; deriv u <span class="fu"
    >/</span
    > u<br
     />deriv (<span class="fu"
    >sqrt</span
    > u)  &#8801; deriv u <span class="fu"
    >/</span
    > (<span class="dv"
    >2</span
    > <span class="fu"
    >*</span
    > <span class="fu"
    >sqrt</span
    > u)<br
     />deriv (<span class="fu"
    >sin</span
    > u)   &#8801; deriv u <span class="fu"
    >*</span
    > <span class="fu"
    >cos</span
    > u<br
     />deriv (<span class="fu"
    >cos</span
    > u)   &#8801; deriv u <span class="fu"
    >*</span
    > (<span class="fu"
    >-</span
    > <span class="fu"
    >sin</span
    > u)<br
     />deriv (<span class="fu"
    >asin</span
    > u)  &#8801; deriv u<span class="fu"
    >/</span
    >(<span class="fu"
    >sqrt</span
    > (<span class="dv"
    >1</span
    > <span class="fu"
    >-</span
    > u<span class="fu"
    >^</span
    ><span class="dv"
    >2</span
    >))<br
     />deriv (<span class="fu"
    >acos</span
    > u)  &#8801; <span class="fu"
    >-</span
    > deriv u<span class="fu"
    >/</span
    >(<span class="fu"
    >sqrt</span
    > (<span class="dv"
    >1</span
    > <span class="fu"
    >-</span
    > u<span class="fu"
    >^</span
    ><span class="dv"
    >2</span
    >))<br
     />deriv (<span class="fu"
    >atan</span
    > u)  &#8801; deriv u <span class="fu"
    >/</span
    > (u<span class="fu"
    >^</span
    ><span class="dv"
    >2</span
    > <span class="fu"
    >+</span
    > <span class="dv"
    >1</span
    >)<br
     />deriv (<span class="fu"
    >sinh</span
    > u)  &#8801; deriv u <span class="fu"
    >*</span
    > <span class="fu"
    >cosh</span
    > u<br
     />deriv (<span class="fu"
    >cosh</span
    > u)  &#8801; deriv u <span class="fu"
    >*</span
    > <span class="fu"
    >sinh</span
    > u<br
     />deriv (<span class="fu"
    >asinh</span
    > u) &#8801; deriv u <span class="fu"
    >/</span
    > (<span class="fu"
    >sqrt</span
    > (u<span class="fu"
    >^</span
    ><span class="dv"
    >2</span
    > <span class="fu"
    >+</span
    > <span class="dv"
    >1</span
    >))<br
     />deriv (<span class="fu"
    >acosh</span
    > u) &#8801; <span class="fu"
    >-</span
    > deriv u <span class="fu"
    >/</span
    > (<span class="fu"
    >sqrt</span
    > (u<span class="fu"
    >^</span
    ><span class="dv"
    >2</span
    > <span class="fu"
    >-</span
    > <span class="dv"
    >1</span
    >))<br
     />deriv (<span class="fu"
    >atanh</span
    > u) &#8801; deriv u <span class="fu"
    >/</span
    > (<span class="dv"
    >1</span
    > <span class="fu"
    >-</span
    > u<span class="fu"
    >^</span
    ><span class="dv"
    >2</span
    >)<br
     /></code
      ></pre
    ><p
    >There are two main drawbacks to the symbolic approach to differentiation.</p
    ><ul
    ><li
      >As a symbolic method, it requires access to and transformation of source code, and placing restrictions on that source code.</li
      ><li
      >Implementations tend to be quite expensive and in particular perform redundant computation. (I wonder if this latter criticism is a straw man argument. Are symbolic methods <em
    >necessarily</em
    > expensive or just when implemented naïvely? For instance, can simply memoized symbolic differentiation be nearly as cheap as AD?)</li
      ></ul
    ></div
  ></div
>

<div id="what-is-ad-and-how-does-it-work"
><h3
  >What is AD and how does it work?</h3
  ><p
  >A third method is the topic of this post, namely <em
    >automatic differentiation</em
    > (also called &quot;algorithmic differentiation&quot;), or &quot;AD&quot;. The idea of AD is to simultaneously manipulate values and derivatives. Overloading of the standard numerical operations (and literals) makes this combined manipulation as convenient and elegant as manipulating values without derivatives.</p
  ><p
  >The implementation of AD can be quite simple, as shown below:</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="kw"
      >data</span
      > <span class="dt"
      >D</span
      > a <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > a a <span class="kw"
      >deriving</span
      > (<span class="kw"
      >Eq</span
      >,<span class="kw"
      >Show</span
      >)<br
       /><br
       /><span class="kw"
      >instance</span
      > <span class="kw"
      >Num</span
      > a &#8658; <span class="kw"
      >Num</span
      > (<span class="dt"
      >D</span
      > a) <span class="kw"
      >where</span
      ><br
       />  <span class="dt"
      >D</span
      > x x' <span class="fu"
      >+</span
      > <span class="dt"
      >D</span
      > y y' <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (x<span class="fu"
      >+</span
      >y) (x'<span class="fu"
      >+</span
      >y')<br
       />  <span class="dt"
      >D</span
      > x x' <span class="fu"
      >*</span
      > <span class="dt"
      >D</span
      > y y' <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (x<span class="fu"
      >*</span
      >y) (y'<span class="fu"
      >*</span
      >x <span class="fu"
      >+</span
      > x'<span class="fu"
      >*</span
      >y)<br
       />  <span class="fu"
      >fromInteger</span
      > x   <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >fromInteger</span
      > x) <span class="dv"
      >0</span
      ><br
       />  <span class="fu"
      >negate</span
      > (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >negate</span
      > x) (<span class="fu"
      >negate</span
      > x')<br
       />  <span class="fu"
      >signum</span
      > (<span class="dt"
      >D</span
      > x _ ) <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >signum</span
      > x) <span class="dv"
      >0</span
      ><br
       />  <span class="fu"
      >abs</span
      >    (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >abs</span
      > x) (x' <span class="fu"
      >*</span
      > <span class="fu"
      >signum</span
      > x)<br
       /><br
       /><span class="kw"
      >instance</span
      > <span class="kw"
      >Fractional</span
      > x &#8658; <span class="kw"
      >Fractional</span
      > (<span class="dt"
      >D</span
      > x) <span class="kw"
      >where</span
      ><br
       />  <span class="fu"
      >fromRational</span
      > x  <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >fromRational</span
      > x) <span class="dv"
      >0</span
      ><br
       />  <span class="fu"
      >recip</span
      >  (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >recip</span
      > x) (- x' <span class="fu"
      >/</span
      > sqr x)<br
       /><br
       />sqr <span class="dv"
      >&#8759;</span
      > <span class="kw"
      >Num</span
      > a &#8658; a &#8594; a<br
       />sqr x <span class="fu"
      >=</span
      > x <span class="fu"
      >*</span
      > x<br
       /><br
       /><span class="kw"
      >instance</span
      > <span class="kw"
      >Floating</span
      > x &#8658; <span class="kw"
      >Floating</span
      > (<span class="dt"
      >D</span
      > x) <span class="kw"
      >where</span
      ><br
       />  &#960;              <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > &#960; <span class="dv"
      >0</span
      ><br
       />  <span class="fu"
      >exp</span
      >    (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >exp</span
      >    x) (x' <span class="fu"
      >*</span
      > <span class="fu"
      >exp</span
      > x)<br
       />  <span class="fu"
      >log</span
      >    (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >log</span
      >    x) (x' <span class="fu"
      >/</span
      > x)<br
       />  <span class="fu"
      >sqrt</span
      >   (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >sqrt</span
      >   x) (x' <span class="fu"
      >/</span
      > (<span class="dv"
      >2</span
      > <span class="fu"
      >*</span
      > <span class="fu"
      >sqrt</span
      > x))<br
       />  <span class="fu"
      >sin</span
      >    (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >sin</span
      >    x) (x' <span class="fu"
      >*</span
      > <span class="fu"
      >cos</span
      > x)<br
       />  <span class="fu"
      >cos</span
      >    (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >cos</span
      >    x) (x' <span class="fu"
      >*</span
      > (<span class="fu"
      >-</span
      > <span class="fu"
      >sin</span
      > x))<br
       />  <span class="fu"
      >asin</span
      >   (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >asin</span
      >   x) (x' <span class="fu"
      >/</span
      > <span class="fu"
      >sqrt</span
      > (<span class="dv"
      >1</span
      > <span class="fu"
      >-</span
      > sqr x))<br
       />  <span class="fu"
      >acos</span
      >   (<span class="dt"
      >D</span
      > x x') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (<span class="fu"
      >acos</span
      >   x) (x' <span class="fu"
      >/</span
      > (<span class="fu"
      >-</span
      >  <span class="fu"
      >sqrt</span
      > (<span class="dv"
      >1</span
      > <span class="fu"
      >-</span
      > sqr x)))<br
       />  <span class="co"
      >-- &#8943;</span
      ><br
       /></code
    ></pre
  ><p
  >As an example, define</p
  ><pre class="sourceCode haskell"
  ><code
    >f1 <span class="dv"
      >&#8759;</span
      > <span class="kw"
      >Floating</span
      > a &#8658; a &#8594; a<br
       />f1 z <span class="fu"
      >=</span
      > <span class="fu"
      >sqrt</span
      > (<span class="dv"
      >3</span
      > <span class="fu"
      >*</span
      > <span class="fu"
      >sin</span
      > z)<br
       /></code
    ></pre
  ><p
  >and try it out in GHCi:</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >*</span
      ><span class="dt"
      >Main</span
      ><span class="fu"
      >&gt;</span
      > f1 (<span class="dt"
      >D</span
      > <span class="dv"
      >2</span
      > <span class="dv"
      >1</span
      >)<br
       /><span class="dt"
      >D</span
      > <span class="dv"
      >1</span
      ><span class="fu"
      >.</span
      ><span class="dv"
      >6516332160855343</span
      > (<span class="fu"
      >-</span
      ><span class="dv"
      >0</span
      ><span class="fu"
      >.</span
      ><span class="dv"
      >3779412091869595</span
      >)<br
       /></code
    ></pre
  ><p
  >To test correctness, here is a symbolically differentiated version:</p
  ><pre class="sourceCode haskell"
  ><code
    >f2 <span class="dv"
      >&#8759;</span
      > <span class="kw"
      >Floating</span
      > a &#8658; a &#8594; <span class="dt"
      >D</span
      > a<br
       />f2 x <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (f1 x) (<span class="dv"
      >3</span
      > <span class="fu"
      >*</span
      > <span class="fu"
      >cos</span
      > x <span class="fu"
      >/</span
      > (<span class="dv"
      >2</span
      > <span class="fu"
      >*</span
      > <span class="fu"
      >sqrt</span
      > (<span class="dv"
      >3</span
      > <span class="fu"
      >*</span
      > <span class="fu"
      >sin</span
      > x)))<br
       /></code
    ></pre
  ><p
  >Try it out:</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >*</span
      ><span class="dt"
      >Main</span
      ><span class="fu"
      >&gt;</span
      > f2 <span class="dv"
      >2</span
      ><br
       /><span class="dt"
      >D</span
      > <span class="dv"
      >1</span
      ><span class="fu"
      >.</span
      ><span class="dv"
      >6516332160855343</span
      > (<span class="fu"
      >-</span
      ><span class="dv"
      >0</span
      ><span class="fu"
      >.</span
      ><span class="dv"
      >3779412091869595</span
      >)<br
       /></code
    ></pre
  ><p
  >The can also be made prettier, as in <em
    ><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="blog post"
      >Beautiful differentiation</a
      ></em
    >. Add an operator that captures the chain rule, which is behind the differentiation laws listed above.</p
  ><pre class="sourceCode haskell"
  ><code
    >infix  <span class="dv"
      >0</span
      > <span class="fu"
      >&gt;-&lt;</span
      ><br
       />(<span class="fu"
      >&gt;-&lt;</span
      >) <span class="dv"
      >&#8759;</span
      > <span class="kw"
      >Num</span
      > a &#8658; (a &#8594; a) &#8594; (a &#8594; a) &#8594; (<span class="dt"
      >D</span
      > a &#8594; <span class="dt"
      >D</span
      > a)<br
       />(f <span class="fu"
      >&gt;-&lt;</span
      > f') (<span class="dt"
      >D</span
      > a a') <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (f a) (a' <span class="fu"
      >*</span
      > f' a)<br
       /></code
    ></pre
  ><p
  >Then, e.g.,</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="kw"
      >instance</span
      > <span class="kw"
      >Floating</span
      > a &#8658; <span class="kw"
      >Floating</span
      > (<span class="dt"
      >D</span
      > a) <span class="kw"
      >where</span
      ><br
       />  &#960;   <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > &#960; <span class="dv"
      >0</span
      ><br
       />  <span class="fu"
      >exp</span
      >  <span class="fu"
      >=</span
      > <span class="fu"
      >exp</span
      >  <span class="fu"
      >&gt;-&lt;</span
      > <span class="fu"
      >exp</span
      ><br
       />  <span class="fu"
      >log</span
      >  <span class="fu"
      >=</span
      > <span class="fu"
      >log</span
      >  <span class="fu"
      >&gt;-&lt;</span
      > <span class="fu"
      >recip</span
      ><br
       />  <span class="fu"
      >sqrt</span
      > <span class="fu"
      >=</span
      > <span class="fu"
      >sqrt</span
      > <span class="fu"
      >&gt;-&lt;</span
      > <span class="fu"
      >recip</span
      > (<span class="dv"
      >2</span
      > <span class="fu"
      >*</span
      > <span class="fu"
      >sqrt</span
      >)<br
       />  <span class="fu"
      >sin</span
      >  <span class="fu"
      >=</span
      > <span class="fu"
      >sin</span
      >  <span class="fu"
      >&gt;-&lt;</span
      > <span class="fu"
      >cos</span
      ><br
       />  <span class="fu"
      >cos</span
      >  <span class="fu"
      >=</span
      > <span class="fu"
      >cos</span
      >  <span class="fu"
      >&gt;-&lt;</span
      > <span class="fu"
      >-</span
      > <span class="fu"
      >sin</span
      ><br
       />  <span class="fu"
      >asin</span
      > <span class="fu"
      >=</span
      > <span class="fu"
      >asin</span
      > <span class="fu"
      >&gt;-&lt;</span
      > <span class="fu"
      >recip</span
      > (<span class="fu"
      >sqrt</span
      > (<span class="dv"
      >1</span
      ><span class="fu"
      >-</span
      >sqr))<br
       />  <span class="fu"
      >acos</span
      > <span class="fu"
      >=</span
      > <span class="fu"
      >acos</span
      > <span class="fu"
      >&gt;-&lt;</span
      > <span class="fu"
      >recip</span
      > (<span class="fu"
      >-</span
      > <span class="fu"
      >sqrt</span
      > (<span class="dv"
      >1</span
      ><span class="fu"
      >-</span
      >sqr))<br
       />  <span class="co"
      >-- &#8943;</span
      ><br
       /></code
    ></pre
  ><p
  >This AD implementation satisfy most of our criteria very well:</p
  ><ul
  ><li
    >It is simple to implement and verify. Both the implementation and its correctness follow directly from the familiar laws given above.</li
    ><li
    >It is convenient to use, as shown with <code
      >f1</code
      > above.</li
    ><li
    >It is accurate, as shown above, producing <em
      >exactly</em
      > the same result as the symbolic differentiated code (<code
      >f2</code
      >).</li
    ><li
    >It is efficient, involving no iteration or redundant computation.</li
    ></ul
  ><p
  >The formulation above does less well with <em
    >generality</em
    >:</p
  ><ul
  ><li
    >It computes only first derivatives.</li
    ><li
    >It applies (correctly) only to functions over a scalar (one-dimensional) domain, excluding even complex numbers.</li
    ></ul
  ><p
  >Both of these limitations are removed in the post <em
    ><a href="http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/" title="blog post"
      >Higher-dimensional, higher-order derivatives, functionally</a
      ></em
    >.</p
  ></div
>

<div id="what-is-ad-really"
><h3
  >What is AD, really?</h3
  ><p
  >How do we know whether this AD implementation is correct? We can't begin to address this question until we first answer a more fundamental one: what does its correctness mean?</p
  ><div id="a-model-for-ad"
  ><h4
    >A model for AD</h4
    ><p
    >I'm pretty sure AD has something to do with calculating a function's values and derivative values simultaneously, so I'll start there.</p
    ><pre class="sourceCode haskell"
    ><code
      >withD <span class="dv"
    >&#8759;</span
    > &#8943; &#8658; (a &#8594; a) &#8594; (a &#8594; <span class="dt"
    >D</span
    > a)<br
     />withD f x <span class="fu"
    >=</span
    > <span class="dt"
    >D</span
    > (f x) (deriv f x)<br
     /></code
      ></pre
    ><p
    >Or, in point-free form,</p
    ><pre class="sourceCode haskell"
    ><code
      >withD f <span class="fu"
    >=</span
    > liftA2 <span class="dt"
    >D</span
    > f (deriv f)<br
     /></code
      ></pre
    ><p
    >Since, on functions,</p
    ><pre class="sourceCode haskell"
    ><code
      >liftA2 h f g <span class="fu"
    >=</span
    > &#955; x &#8594; h (f x) (g x)<br
     /></code
      ></pre
    ><p
    >We don't have an implementation of <code
      >deriv</code
      >, so this definition of <code
      >withD</code
      > will serve as a specification, not an implementation.</p
    ><p
    >If AD is structured as type class instances, then I'd want there to be a compelling interpretation function that is faithful to each of those classes, as in the principle of <a href="http://conal.net/blog/tag/type-class-morphism/" title="posts on type class morphisms"
      >type class morphisms</a
      >, which is to say that the interpretation of each method corresponds to the same method for the interpretation.</p
    ><p
    >For AD, the interpretation function is <code
      >withD</code
      >. It's turned around this time (mapping <em
      >to</em
      > instead of <em
      >from</em
      > our type), as is sometimes the case. The <code
      >Num</code
      >, <code
      >Fractional</code
      >, and <code
      >Floating</code
      > morphisms provide the specifications of the instances:</p
    ><pre class="sourceCode haskell"
    ><code
      >withD (u <span class="fu"
    >+</span
    > v) &#8801; withD u <span class="fu"
    >+</span
    > withD v<br
     />withD (u <span class="fu"
    >*</span
    > v) &#8801; withD u <span class="fu"
    >*</span
    > withD v<br
     />withD (<span class="fu"
    >sin</span
    > u) &#8801; <span class="fu"
    >sin</span
    > (withD u)<br
     />&#8943;<br
     /></code
      ></pre
    ><p
    >Note here that the methods on the left are on <code
      >a &#8594; a</code
      >, and on the right are on <code
      >a &#8594; D a</code
      >.</p
    ><p
    >These (morphism) properties exactly define correctness of any implementation of AD, answering my first question:</p
    ><blockquote
    ><p
      ><em
    >What</em
    > does it mean, independently of implementation?</p
      ></blockquote
    ></div
  ></div
>

<div id="deriving-an-ad-implementation"
><h3
  >Deriving an AD implementation</h3
  ><p
  >Now that we have a simple, formal specification of AD (numeric type class morphisms), we can try to prove that the implementation above satisfies the specification. Better yet, let's do the reverse, and use the morphism properties to <em
    >discover</em
    > the implementation, and prove it correct in the process.</p
  ><div id="addition"
  ><h4
    >Addition</h4
    ><p
    >Here is the addition specification:</p
    ><pre class="sourceCode haskell"
    ><code
      >withD (u <span class="fu"
    >+</span
    > v) &#8801; withD u <span class="fu"
    >+</span
    > withD v<br
     /></code
      ></pre
    ><p
    >Start with the left-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD (u <span class="fu"
    >+</span
    > v)<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >+</span
    > v) (deriv (u <span class="fu"
    >+</span
    > v))<br
     />&#8801;   <span class="co"
    >{- deriv rule for (+) -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >+</span
    > v) (deriv u <span class="fu"
    >+</span
    > deriv v)<br
     />&#8801;   <span class="co"
    >{- liftA2 on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > ((u <span class="fu"
    >+</span
    > v) x) ((deriv u <span class="fu"
    >+</span
    > deriv v) x)<br
     />&#8801;   <span class="co"
    >{- (+) on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x <span class="fu"
    >+</span
    > v x) (deriv u x <span class="fu"
    >+</span
    > deriv v x)<br
     /></code
      ></pre
    ><p
    >Then start over with the right-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD u <span class="fu"
    >+</span
    > withD v<br
     />&#8801;   <span class="co"
    >{- (+) on functions -}</span
    ><br
     />   &#955; x &#8594; withD u x <span class="fu"
    >+</span
    > withD v x<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x) (deriv u x) <span class="fu"
    >+</span
    > <span class="dt"
    >D</span
    > (v x) (deriv v x)<br
     /></code
      ></pre
    ><p
    >We need a definition of <code
      >(+)</code
      > on <code
      >D</code
      > that makes these two final forms equal, i.e.,</p
    ><pre class="sourceCode haskell"
    ><code
      >   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x <span class="fu"
    >+</span
    > v x) (deriv u x <span class="fu"
    >+</span
    > deriv v x)<br
     />&#8801;<br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x) (deriv u x) <span class="fu"
    >+</span
    > <span class="dt"
    >D</span
    > (v x) (deriv v x)<br
     /></code
      ></pre
    ><p
    >An easy choice is</p
    ><pre class="sourceCode haskell"
    ><code
      ><span class="dt"
    >D</span
    > a a' <span class="fu"
    >+</span
    > <span class="dt"
    >D</span
    > b b' <span class="fu"
    >=</span
    > <span class="dt"
    >D</span
    > (a <span class="fu"
    >+</span
    > b) (a' <span class="fu"
    >+</span
    > b')<br
     /></code
      ></pre
    ><p
    >This definition provides the missing link and that completes the proof that</p
    ><pre class="sourceCode haskell"
    ><code
      >withD (u <span class="fu"
    >+</span
    > v) &#8801; withD u <span class="fu"
    >+</span
    > withD v<br
     /></code
      ></pre
    ></div
  ><div id="multiplication"
  ><h4
    >Multiplication</h4
    ><p
    >The specification:</p
    ><pre class="sourceCode haskell"
    ><code
      >withD (u <span class="fu"
    >*</span
    > v) &#8801; withD u <span class="fu"
    >*</span
    > withD v<br
     /></code
      ></pre
    ><p
    >Reason similarly to the addition case. Begin with the left hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD (u <span class="fu"
    >*</span
    > v)<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >*</span
    > v) (deriv (u <span class="fu"
    >*</span
    > v))<br
     />&#8801;   <span class="co"
    >{- deriv rule for (*) -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >*</span
    > v) (deriv u <span class="fu"
    >*</span
    > v <span class="fu"
    >+</span
    > deriv v <span class="fu"
    >*</span
    > u)<br
     />&#8801;   <span class="co"
    >{- liftA2 on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > ((u <span class="fu"
    >*</span
    > v) x) ((deriv u <span class="fu"
    >*</span
    > v <span class="fu"
    >+</span
    > deriv v <span class="fu"
    >*</span
    > u) x)<br
     />&#8801;   <span class="co"
    >{- (*) and (+) on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x <span class="fu"
    >*</span
    > v x) (deriv u x <span class="fu"
    >*</span
    > v x <span class="fu"
    >+</span
    > <span class="fu"
    >*</span
    > deriv v x <span class="fu"
    >*</span
    > u x)<br
     /></code
      ></pre
    ><p
    >Then start over with the right-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD u <span class="fu"
    >*</span
    > withD v<br
     />&#8801;   <span class="co"
    >{- (*) on functions -}</span
    ><br
     />   &#955; x &#8594; withD u x <span class="fu"
    >*</span
    > withD v x<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x) (deriv u x) <span class="fu"
    >*</span
    > <span class="dt"
    >D</span
    > (v x) (deriv v x)<br
     /></code
      ></pre
    ><p
    >Sufficient definition:</p
    ><pre class="sourceCode haskell"
    ><code
      ><span class="dt"
    >D</span
    > a a' <span class="fu"
    >*</span
    > <span class="dt"
    >D</span
    > b b' <span class="fu"
    >=</span
    > <span class="dt"
    >D</span
    > (a <span class="fu"
    >+</span
    > b) (a' <span class="fu"
    >*</span
    > b <span class="fu"
    >+</span
    > b' <span class="fu"
    >*</span
    > a)<br
     /></code
      ></pre
    ></div
  ><div id="sine"
  ><h4
    >Sine</h4
    ><p
    >Specification:</p
    ><pre class="sourceCode haskell"
    ><code
      >withD (<span class="fu"
    >sin</span
    > u) &#8801; <span class="fu"
    >sin</span
    > (withD u)<br
     /></code
      ></pre
    ><p
    >Begin with the left hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD (<span class="fu"
    >sin</span
    > u)<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > u) (deriv (<span class="fu"
    >sin</span
    > u))<br
     />&#8801;   <span class="co"
    >{- deriv rule for sin -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > u) (deriv u <span class="fu"
    >*</span
    > <span class="fu"
    >cos</span
    > u)<br
     />&#8801;   <span class="co"
    >{- liftA2 on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > ((<span class="fu"
    >sin</span
    > u) x) ((deriv u <span class="fu"
    >*</span
    > <span class="fu"
    >cos</span
    > u) x)<br
     />&#8801;   <span class="co"
    >{- sin, (*) and cos on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > (u x)) (deriv u x <span class="fu"
    >*</span
    > <span class="fu"
    >cos</span
    > (u x))<br
     /></code
      ></pre
    ><p
    >Then start over with the right-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   <span class="fu"
    >sin</span
    > (withD u)<br
     />&#8801;   <span class="co"
    >{- sin on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="fu"
    >sin</span
    > (withD u x)<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   &#955; x &#8594; <span class="fu"
    >sin</span
    > (<span class="dt"
    >D</span
    > (u x) (deriv u x))<br
     /></code
      ></pre
    ><p
    >Sufficient definition:</p
    ><pre class="sourceCode haskell"
    ><code
      ><span class="fu"
    >sin</span
    > (<span class="dt"
    >D</span
    > a a') <span class="fu"
    >=</span
    > <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > a) (a' <span class="fu"
    >*</span
    > <span class="fu"
    >cos</span
    > a)<br
     /></code
      ></pre
    ><p
    >Or, using the chain rule operator,</p
    ><pre class="sourceCode haskell"
    ><code
      ><span class="fu"
    >sin</span
    > <span class="fu"
    >=</span
    > <span class="fu"
    >sin</span
    > <span class="fu"
    >&gt;-&lt;</span
    > <span class="fu"
    >cos</span
    ><br
     /></code
      ></pre
    ><p
    >The whole implementation can be derived in exactly this style, answering my second question:</p
    ><blockquote
    ><p
      ><em
    >How</em
    > does the implementation and its correctness flow gracefully from that meaning?</p
      ></blockquote
    ></div
  ></div
>

<div id="higher-order-derivatives"
><h3
  >Higher-order derivatives</h3
  ><p
  >Given answers to the first two questions, let's, turn to the third:</p
  ><blockquote
  ><p
    ><em
      >Where</em
      > else might we go, guided by answers to the first two questions?</p
    ></blockquote
  ><p
  >Jerzy Karczmarczuk extended the <code
    >D</code
    > representation above to an infinite &quot;lazy tower of derivatives&quot;, in the paper <em
    ><a href="http://citeseer.ist.psu.edu/karczmarczuk98functional.html" title="ICFP '98 paper by Jerzy Karczmarczuk"
      >Functional Differentiation of Computer Programs</a
      ></em
    >.</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="kw"
      >data</span
      > <span class="dt"
      >D</span
      > a <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > a (<span class="dt"
      >D</span
      > a)<br
       /></code
    ></pre
  ><p
  >The <code
    >withD</code
    > function easily adapts to this new <code
    >D</code
    > type:</p
  ><pre class="sourceCode haskell"
  ><code
    >withD <span class="dv"
      >&#8759;</span
      > &#8943; &#8658; (a &#8594; a) &#8594; (a &#8594; <span class="dt"
      >D</span
      > a)<br
       />withD f x <span class="fu"
      >=</span
      > <span class="dt"
      >D</span
      > (f x) (withD (deriv f) x)<br
       /></code
    ></pre
  ><p
  >or</p
  ><pre class="sourceCode haskell"
  ><code
    >withD f <span class="fu"
      >=</span
      > liftA2 <span class="dt"
      >D</span
      > f (withD (deriv f))<br
       /></code
    ></pre
  ><p
  >These definitions were not brilliant insights. I looked for the simplest, type-correct possibility (without using ⊥).</p
  ><p
  >Similarly, I'll try tweaking the previous derivations and see what pops out.</p
  ><div id="addition-1"
  ><h4
    >Addition</h4
    ><p
    >Left-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD (u <span class="fu"
    >+</span
    > v)<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >+</span
    > v) (withD (deriv (u <span class="fu"
    >+</span
    > v)))<br
     />&#8801;   <span class="co"
    >{- deriv rule for (+) -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >+</span
    > v) (withD (deriv u <span class="fu"
    >+</span
    > deriv v))<br
     />&#8801;   <span class="co"
    >{- (fixed-point) induction withD and (+) -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >+</span
    > v) (withD (deriv u) <span class="fu"
    >+</span
    > withD (deriv v))<br
     />&#8801;   <span class="co"
    >{- def of liftA2 and (+) on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x <span class="fu"
    >+</span
    > v x) (withD (deriv u) x <span class="fu"
    >+</span
    > withD (deriv v) x)<br
     /></code
      ></pre
    ><p
    >Right-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD u <span class="fu"
    >+</span
    > withD v<br
     />&#8801;   <span class="co"
    >{- (+) on functions -}</span
    ><br
     />   &#955; x &#8594; withD u x <span class="fu"
    >+</span
    > withD v x<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x) (withD (deriv u x)) <span class="fu"
    >+</span
    > <span class="dt"
    >D</span
    > (v x) (withD (deriv v x))<br
     /></code
      ></pre
    ><p
    >Again, we need a definition of <code
      >(+)</code
      > on <code
      >D</code
      > that makes the LHS and RHS final forms equal, i.e.,</p
    ><pre class="sourceCode haskell"
    ><code
      >   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x <span class="fu"
    >+</span
    > v x) (withD (deriv u) x <span class="fu"
    >+</span
    > with (deriv v) x)<br
     />&#8801;<br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x) (withD (deriv u) x) <span class="fu"
    >+</span
    > <span class="dt"
    >D</span
    > (v x) (withD (deriv v) x)<br
     /></code
      ></pre
    ><p
    >Again, an easy choice is</p
    ><pre class="sourceCode haskell"
    ><code
      ><span class="dt"
    >D</span
    > a a' <span class="fu"
    >+</span
    > <span class="dt"
    >D</span
    > b b' <span class="fu"
    >=</span
    > <span class="dt"
    >D</span
    > (a <span class="fu"
    >+</span
    > b) (a' <span class="fu"
    >+</span
    > b')<br
     /></code
      ></pre
    ></div
  ><div id="multiplication-1"
  ><h4
    >Multiplication</h4
    ><p
    >Left-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD (u <span class="fu"
    >*</span
    > v)<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >*</span
    > v) (withD (deriv (u <span class="fu"
    >*</span
    > v)))<br
     />&#8801;   <span class="co"
    >{- deriv rule for (*) -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >*</span
    > v) (withD (deriv u <span class="fu"
    >*</span
    > v <span class="fu"
    >+</span
    > deriv v <span class="fu"
    >*</span
    > u))<br
     />&#8801;   <span class="co"
    >{- induction for withD/(+) -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >*</span
    > v) (withD (deriv u <span class="fu"
    >*</span
    > v) <span class="fu"
    >+</span
    > withD (deriv v <span class="fu"
    >*</span
    > u))<br
     />&#8801;   <span class="co"
    >{- induction for withD/(*) -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (u <span class="fu"
    >*</span
    > v) (withD (deriv u) <span class="fu"
    >*</span
    > withD v <span class="fu"
    >+</span
    > withD (deriv v) <span class="fu"
    >*</span
    > withD u)<br
     />&#8801;   <span class="co"
    >{- liftA2, (*), (+) on functions -}</span
    ><br
     />   &#955; x &#8594; liftA2 <span class="dt"
    >D</span
    > (u x <span class="fu"
    >*</span
    > v x) (withD (deriv u) x <span class="fu"
    >*</span
    > withD v x <span class="fu"
    >+</span
    > withD (deriv v) x <span class="fu"
    >*</span
    > withD u x)<br
     /></code
      ></pre
    ><p
    >Right-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD u <span class="fu"
    >*</span
    > withD v<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > u (withD (deriv u)) <span class="fu"
    >*</span
    > liftA2 <span class="dt"
    >D</span
    > v (withD (deriv v))<br
     />&#8801;   <span class="co"
    >{- liftA2 and (*) on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (u x) (withD (deriv u) x) <span class="fu"
    >*</span
    > <span class="dt"
    >D</span
    > (v x) (withD (deriv v) x)<br
     /></code
      ></pre
    ><p
    >A sufficient definition:</p
    ><pre class="sourceCode haskell"
    ><code
      >a<span class="fu"
    >@</span
    >(<span class="dt"
    >D</span
    > a0 a') <span class="fu"
    >*</span
    > b<span class="fu"
    >@</span
    >(<span class="dt"
    >D</span
    > b0 b') <span class="fu"
    >=</span
    > <span class="dt"
    >D</span
    > (a0 <span class="fu"
    >+</span
    > b0) (a' <span class="fu"
    >*</span
    > b <span class="fu"
    >+</span
    > b' <span class="fu"
    >*</span
    > a)<br
     /></code
      ></pre
    ><p
    >Because</p
    ><pre class="sourceCode haskell"
    ><code
      >withD u x &#8801; <span class="dt"
    >D</span
    > (u x) (withD (deriv u) x)<br
     /><br
     />withD v x &#8801; <span class="dt"
    >D</span
    > (v x) (withD (deriv v) x)<br
     /></code
      ></pre
    ></div
  ><div id="sine-1"
  ><h4
    >Sine</h4
    ><p
    >Left-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   withD (<span class="fu"
    >sin</span
    > u)<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > u) (withD (deriv (<span class="fu"
    >sin</span
    > u)))<br
     />&#8801;   <span class="co"
    >{- deriv rule for sin -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > u) (withD (deriv u <span class="fu"
    >*</span
    > <span class="fu"
    >cos</span
    > u))<br
     />&#8801;   <span class="co"
    >{- induction for withD/(*) -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > u) (withD (deriv u) <span class="fu"
    >*</span
    > withD (<span class="fu"
    >cos</span
    > u))<br
     />&#8801;   <span class="co"
    >{- induction for withD/cos -}</span
    ><br
     />   liftA2 <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > u) (withD (deriv u) <span class="fu"
    >*</span
    > <span class="fu"
    >cos</span
    > (withD u))<br
     />&#8801;   <span class="co"
    >{- liftA2, sin, cos and (*) on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > (u x)) (withD (deriv u) x <span class="fu"
    >*</span
    > <span class="fu"
    >cos</span
    > (withD u x))<br
     /></code
      ></pre
    ><p
    >Right-hand side:</p
    ><pre class="sourceCode haskell"
    ><code
      >   <span class="fu"
    >sin</span
    > (withD u)<br
     />&#8801;   <span class="co"
    >{- def of withD -}</span
    ><br
     />   <span class="fu"
    >sin</span
    > (liftA2 <span class="dt"
    >D</span
    > u (withD (deriv u)))<br
     />&#8801;   <span class="co"
    >{- liftA2 and sin on functions -}</span
    ><br
     />   &#955; x &#8594; <span class="fu"
    >sin</span
    > (<span class="dt"
    >D</span
    > (u x) (withD (deriv u) x))<br
     /></code
      ></pre
    ><p
    >To make the LHS and RHS final forms equal, define</p
    ><pre class="sourceCode haskell"
    ><code
      ><span class="fu"
    >sin</span
    > a<span class="fu"
    >@</span
    >(<span class="dt"
    >D</span
    > a0 a') &#8801; <span class="dt"
    >D</span
    > (<span class="fu"
    >sin</span
    > a0) (a' <span class="fu"
    >*</span
    > <span class="fu"
    >cos</span
    > a)<br
     /></code
      ></pre
    ></div
  ></div
>

<div id="higher-dimensional-derivatives"
><h3
  >Higher-dimensional derivatives</h3
  ><p
  >I'll save non-scalar (&quot;multi-variate&quot;) differentiation for another time. In addition to the considerations above, the key ideas are in <em
    ><a href="http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/" title="blog post"
      >Higher-dimensional, higher-order derivatives, functionally</a
      ></em
    > and <em
    ><a href="http://conal.net/blog/posts/simpler-more-efficient-functional-linear-maps/" title="blog post"
      >Simpler, more efficient, functional linear maps</a
      ></em
    >.</p
  ></div
>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=79&amp;md5=868d8f06073e4586b26d7e6de7c7e848"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/what-is-automatic-differentiation-and-why-does-it-work/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fwhat-is-automatic-differentiation-and-why-does-it-work&amp;language=en_GB&amp;category=text&amp;title=What+is+automatic+differentiation%2C+and+why+does+it+work%3F&amp;description=Bertrand+Russell+remarked+that+Everything+is+vague+to+a+degree+you+do+not+realize+till+you+have+tried+to+make+it+precise.+I%26%238217%3Bm+mulling+over+automatic+differentiation+%28AD%29+again%2C+neatening...&amp;tags=derivative%2Csemantics%2Ctype+class+morphism%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Comparing formulations of higher-dimensional, higher-order derivatives</title>
		<link>http://conal.net/blog/posts/comparing-formulations-of-higher-dimensional-higher-order-derivatives</link>
		<comments>http://conal.net/blog/posts/comparing-formulations-of-higher-dimensional-higher-order-derivatives#comments</comments>
		<pubDate>Sun, 25 Jan 2009 01:40:07 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[comparison]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[linear map]]></category>
		<category><![CDATA[math]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=77</guid>
		<description><![CDATA[I just reread Jason Foutz&#8217;s post Higher order multivariate automatic differentiation in Haskell, as I&#8217;m thinking about this topic again. I like his trick of using an IntMap to hold the partial derivatives and (recursively) the partials of those partials, etc. Some thoughts: I bet one can eliminate the constant (C) case in Jason&#8217;s representation, [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Comparing formulations of higher-dimensional, higher-order derivatives

Tags: derivative, linear map, math, comparison

URL: http://conal.net/blog/posts/comparing-formulations-of-higher-dimensional-higher-order-derivatives/

-->

<!-- references -->

<!-- -->

<p>I just reread Jason Foutz&#8217;s post <a href="http://metavar.blogspot.com/2008/02/higher-order-multivariate-automatic.html">Higher order multivariate automatic differentiation in Haskell</a>, as I&#8217;m thinking about this topic again.  I like his trick of using an <code>IntMap</code> to hold the partial derivatives and (recursively) the partials of those partials, etc.</p>

<p>Some thoughts:</p>

<ul>
<li><p>I bet one can eliminate the constant (<code>C</code>) case in Jason&#8217;s representation, and hence 3/4 of the cases to handle, without much loss in performance.  He already has a fairly efficient representation of constants, which is a <code>D</code> with an empty <code>IntMap</code>.</p></li>
<li><p>I imagine there&#8217;s also a nice generalization of the code for combining two finite maps used in his third multiply case.  The code&#8217;s meaning and correctness follows from a model for those maps as total functions with missing elements denoting a default value (zero in this case).</p></li>
<li><p>Jason&#8217;s data type reminds me of a sparse matrix representation, but cooler in how it&#8217;s infinitely nested.  Perhaps depth <em>n</em> (starting with zero) is a sparse <em>n-dimensional</em> matrix.</p></li>
<li><p>Finally, I suspect there&#8217;s a close connection between Jason&#8217;s <code>IntMap</code>-based implementation and my <code>LinearMap</code>-based implementation described in <em><a href="http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/" title="blog post">Higher-dimensional, higher-order derivatives, functionally</a></em> and in <em><a href="http://conal.net/blog/posts/simpler-more-efficient-functional-linear-maps/">Simpler, more efficient, functional linear maps</a></em>.  For the case of <em>R<sup>n</sup></em>, my formulation uses a trie with entries for <em>n</em> basis elements, while Jason&#8217;s uses an <code>IntMap</code> (which is also a trie) with <em>n</em> entries (counting any implicit zeros).</p></li>
</ul>

<p>I suspect Jason&#8217;s formulation is more efficient (since it optimizes the constant case), while mine is more statically typed and more flexible (since it handles more than <em>R<sup>n</sup></em>).</p>

<p>For optimizing constants, I think I&#8217;d prefer having a single constructor with a <code>Maybe</code> for the derivatives, to eliminate code duplication.</p>

<p>I am still trying to understand the paper <em><a href="http://www.bcl.hamilton.ie/~qobi/tower/papers/popl2007a.pdf" title="paper by Barak Pearlmutter and Jeffrey Siskind">Lazy Multivariate Higher-Order Forward-Mode AD</a></em>, with its management of various epsilons.</p>

<p>A final remark: I prefer the term &#8220;higher-dimensional&#8221; over the traditional &#8220;multivariate&#8221;.
I hear classic syntax/semantics confusion in the latter.</p>

<!--
**Edits**:

* 2008-02-09: just fiddling around
-->
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=77&amp;md5=b3c10818ce32ef5eab99aec62217d345"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/comparing-formulations-of-higher-dimensional-higher-order-derivatives/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fcomparing-formulations-of-higher-dimensional-higher-order-derivatives&amp;language=en_GB&amp;category=text&amp;title=Comparing+formulations+of+higher-dimensional%2C+higher-order+derivatives&amp;description=I+just+reread+Jason+Foutz%26%238217%3Bs+post+Higher+order+multivariate+automatic+differentiation+in+Haskell%2C+as+I%26%238217%3Bm+thinking+about+this+topic+again.+I+like+his+trick+of+using+an+IntMap+to+hold...&amp;tags=comparison%2Cderivative%2Clinear+map%2Cmath%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Functional linear maps</title>
		<link>http://conal.net/blog/posts/functional-linear-maps</link>
		<comments>http://conal.net/blog/posts/functional-linear-maps#comments</comments>
		<pubDate>Wed, 04 Jun 2008 05:49:20 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[linear map]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[type family]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=51</guid>
		<description><![CDATA[Two earlier posts described a simple and general notion of derivative that unifies the many concrete notions taught in traditional calculus courses. All of those variations turn out to be concrete representations of the single abstract notion of a linear map. Correspondingly, the various forms of mulitplication in chain rules all turn out to be [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Functional linear maps

Tags: linear maps, math, derivatives, type families

URL: http://conal.net/blog/posts/functional-linear-maps/

-->

<!-- references -->

<!-- teaser -->

<p>Two <a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="Blog post: &quot;What is a derivative, really?&quot;">earlier</a> <a href="http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/" title="Blog post: &quot;Higher-dimensional, higher-order derivatives, functionally&quot;">posts</a> described a simple and general notion of <em>derivative</em> that unifies the many concrete notions taught in traditional calculus courses.
All of those variations turn out to be concrete representations of the single abstract notion of a <em>linear map</em>.
Correspondingly, the various forms of mulitplication in chain rules all turn out to be implementations of <em>composition</em> of linear maps.
For simplicity, I suggested a direct implementation of linear maps as functions.
Unfortunately, that direct representation thwarts efficiency, since functions, unlike data structures, do not cache by default.</p>

<p>This post presents a <em>data</em> representation of linear maps that makes crucial use of (a) linearity and (b) the recently added language feature <em>indexed type families</em> (&#8220;associated types&#8221;).</p>

<p>For a while now, I&#8217;ve wondered if a library for linear maps could replace and generalize matrix libraries.
After all, matrices represent of a restricted class of linear maps.
Unlike conventional matrix libraries, however, the linear map library described in this post captures matrix/linear-map dimensions via <em>static typing</em>.
The composition function defined below statically enforces the conformability property required of matrix multiplication (which implements linear map composition).
Likewise, conformance for addition of linear maps is also enforced simply and statically.
Moreover, with sufficiently sophisticated coaxing of the Haskell compiler, of the sort <a href="http://reddit.com/goto?id=6lx36" title="Blog post: &quot;Haskell as fast as C: working at a high altitude for low level performance&quot;">Don Stewart does</a>, perhaps a library like this one could also have terrific performance.  (It doesn&#8217;t yet.)</p>

<p>You can read and try out the code for this post in the module <a href="http://code.haskell.org/vector-space/doc/html/src/Data-LinearMap.html" title="Source module: Data.LinearMap">Data.LinearMap</a> in version 0.2.0 or later of the <a href="http://haskell.org/haskellwiki/vector-space" title="Library wiki page: &quot;vector-space&quot;">vector-space</a> package.
That module also contains an implementation of linear map composition, as well as <code>Functor</code>-like and <code>Applicative</code>-like operations.
<a href="http://www.unsafeperformio.com/index.php" title="Andy Gill's home page">Andy Gill</a> has been helping me get to the bottom of some some severe performance problems, apparently involving huge amounts of redundant dictionary creation.</p>

<p><strong>Edits</strong>:</p>

<ul>
<li>2008-06-04: Brief explanation of the associated data type declaration.</li>
</ul>

<!-- without a comment or something here, the last item above becomes a paragraph -->

<p><span id="more-51"></span></p>

<h3>Linear maps</h3>

<p>Semantically, a <em>linear map</em> is a function <code>f :: a -&gt; b</code> such that, for all scalar values <code>c</code> and &#8220;vectors&#8221; <code>u, v :: a</code>, the following properties hold:</p>

<pre><code>f (c *^ u)  == c *^ f u
f (u ^+^ v) == f u ^+^ f v
</code></pre>

<p>where <code>(*^)</code> and <code>(^+^)</code> are scalar multiplication and vector addition.
(See <code>VectorSpace</code> details in a <a href="http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/" title="Blog post: &quot;Higher-dimensional, higher-order derivatives, functionally&quot;">previous post</a>.)</p>

<p>Although the semantics of a linear map will be a function, the representation will be a data structure.</p>

<pre><code>data a :-* b = ...
</code></pre>

<p>The semantic function is</p>

<pre><code>lapply :: (LMapDom a s, VectorSpace b s) =&gt;
          (a :-* b) -&gt; (a -&gt; b)  -- result will be linear
</code></pre>

<p>The first constraint says that we know how to represent linear maps whose domain is the vector space <code>a</code>, which has the associated scalar field <code>s</code>.
The second constraint say that <code>b</code> must be a vector space over that same scalar field.</p>

<p>Conversely, there is also a function to turn any linear function into a linear map:</p>

<pre><code>linear :: LMapDom a s =&gt; (a -&gt; b) -&gt; (a :-* b)  -- argument must be linear
</code></pre>

<p>These two functions and the linear map data type are packaged up as the <code>LMapDom</code> type class:</p>

<pre><code>-- | Domain of a linear map.
class VectorSpace a s =&gt; LMapDom a s | a -&gt; s where
  -- | Linear map type
  data (:-*) a :: * -&gt; *
  -- | Linear map as function
  lapply :: VectorSpace b s =&gt; (a :-* b) -&gt; (a -&gt; b)
  -- | Function (assumed linear) as linear map.
  linear :: (a -&gt; b) -&gt; (a :-* b)
</code></pre>

<p>The <code>data</code> definition means that the data type <code>(a :-* b)</code> (of linear maps from <code>a</code> to <code>b</code>) has a variety of representations, each one associated with a type <code>a</code>.</p>

<p>These two conversion functions are required to be inverses:</p>

<pre><code>{-# RULES

"linear.lapply"   forall m. linear (lapply m) = m

"lapply.linear"   forall f. lapply (linear f) = f

 #-}
</code></pre>

<h3>Scalar domains</h3>

<p>Consider a linear function <code>f</code> over a scalar domain.
Then</p>

<pre><code>f s == f (s *^ 1)
    == s *^ f 1  -- by linearity
</code></pre>

<p>Therefore, <code>f</code> is fully determined by its value at <code>1</code>, and so an adequate representation of <code>f</code> is then simply the value <code>f 1</code>.</p>

<p>This observation leads to <code>LMapDom</code> instances like the following:</p>

<pre><code>instance LMapDom Double Double where
  data Double :-* o  = DoubleL o
  lapply (DoubleL o) = (*^ o)
  linear f           = DoubleL (f 1)
</code></pre>

<h3>Non-scalar domains</h3>

<p>Maps over non-scalar domains are a little trickier.
Consider a linear function <code>f</code> over a domain of pairs of scalar values.
Then</p>

<pre><code>f (a,b) == f (a *^ (1,0) ^+^ b *^ (0,1))
        == f (a *^ (1,0)) ^+^ f (b *^ (0,1))  -- linearity
        == a *^ f (1,0) ^+^ b *^ f (0,1)      -- linearity twice more
</code></pre>

<p>So <code>f</code> is determined by <code>f (1,0)</code> and <code>f (0,1)</code> and thus can be represented by those two values.</p>

<pre><code>instance LMapDom (Double,Double) Double where
  data (Double,Double) :-* o = PairD o o
  PairD ao bo `lapply` (a,b) = a *^ ao ^+^ b *^ bo
  linear f = PairD (f (0,1)) (f (1,0))
</code></pre>

<p>and similarly for triples, etc.</p>

<p>This definition works fine, but I want something compositional.
I&#8217;d like linear maps over pairs of pairs and so on.</p>

<h3>Composing domains</h3>

<p>We can still use part of our linearity property.  Using <code>zeroV</code> as the zero vector for arbitrary vector spaces,</p>

<pre><code>f (a,b) == f ((a,zeroV) ^+^ (zeroV,b))
        == f (a,zeroV) ^+^ f (zeroV,b)
</code></pre>

<p>We see that <code>f</code> is determined by its behavior when either argument is zero.</p>

<p>In other words, <code>f</code> can be reconstructed from two other functions over simpler domains:</p>

<pre><code>fa a = f (a,zeroV)
fb b = f (zeroV,b)
</code></pre>

<p>If <code>f :: (a,b) -&gt; o</code>, then <code>fa :: a -&gt; o</code> and <code>fb :: b -&gt; o</code>.</p>

<p>Exercise: show that <code>fa</code> and <code>fb</code> are linear if <code>f</code> is.
We can thus reduce the problem of representing the linear function <code>f</code> to the problems of representing <code>fa</code> and <code>fb</code>.
This insight is captured in the following <code>LMapDom</code> instance:</p>

<pre><code>instance (LMapDom a s, LMapDom b s) =&gt; LMapDom (a,b) s where
  data (a,b) :-* o = PairL (a :-* o) (b :-* o)
  PairL ao bo `lapply` (a,b) = ao `lapply` a ^+^ bo `lapply` b
  linear f = PairL (linear ( a -&gt; f (a,zeroV)))
                   (linear ( b -&gt; f (zeroV,b)))
</code></pre>

<p>Of course, there are similar instances for triples, etc, as well as for tuple variants with strict fields, such as OpenGL&#8217;s <code>Vector2</code> and <code>Vector3</code> types.</p>

<h3>What have we done?</h3>

<p>If you&#8217;ve studied linear algebra, you may be thinking now about the idea of a <em>basis</em> of a vector space.
A basis is a minimal set of vectors that can be combined linearly to cover the entire vector space.
Any linear map is determined by its behavior on any basis.
For scalars, the set <code>{1}</code> is a basis, while for pairs of scalars, <code>{(1,0), (0,1)}</code> is a basis.
It is not just coincidental that exactly these basis vectors showed up in the definitions of <code>linear</code> for <code>Double</code> and <code>(Double,Double)</code>.</p>

<p>In the general pairing instance of <code>LMapDom</code> above, bases are built up recursively.
Each recursive call to <code>linear</code> results in a data structure that holds the values of <code>fa</code> over a basis for <code>a</code> and the values of <code>fb</code> over a basis for <code>b</code>.
Each of those basis vectors corresponds to a basis vector for <code>(a,b)</code>, by <code>zeroV</code>-padding.</p>

<p>The <em>dimension</em> of a vector space is the number of elements in a basis (which is independent of the particular choice of basis).
For vector space types <code>a</code> and <code>b</code>,</p>

<pre><code>dimension (a,b) == dimension a + dimension b
</code></pre>

<p>which corresponds to the fact that our linear map representation (as built by <code>linear</code>) contains samples for each basis element of <code>a</code>, <em>plus</em> samples for each basis element of <code>b</code> (all <code>zeroV</code>-padded).</p>

<h3>Working with linear maps</h3>

<p>Besides the instances above for creating and applying linear maps, what else can we do?
For starters, let&#8217;s define the identity linear map.
Since the identity function is already linear, simply convert it to a linear map:</p>

<pre><code>idL :: LMapDom a s =&gt; a :-* a
idL = linear id
</code></pre>

<p>Another very useful tool is transforming a linear map by transforming a linear function.</p>

<pre><code>inL :: (LMapDom c s, VectorSpace b s', LMapDom a s') =&gt;
        ((a -&gt; b) -&gt; (c -&gt; d)) -&gt; ((a :-* b) -&gt; (c :-* d))
inL h = linear . h . lapply
</code></pre>

<p>where the higher-order function <code>h</code> is assumed to map linear functions to linear functions.</p>

<p>We don&#8217;t have to stop at <em>unary</em> transformations of linear functions.</p>

<pre><code>-- | Transform a linear maps by transforming linear functions.
inL2 :: ( LMapDom c s, VectorSpace b s', LMapDom a s'
        , LMapDom e s, VectorSpace d s ) =&gt;
        ((a  -&gt; b) -&gt; (c  -&gt; d) -&gt; (e  -&gt; f))
     -&gt; ((a :-* b) -&gt; (c :-* d) -&gt; (e :-* f))
inL2 h = inL . h . lapply
</code></pre>

<p>The type constraints are starting to get hairy.
Fortunately, they&#8217;re entirely inferred by the compiler.</p>

<p>Let&#8217;s do some inlining and simplification to see what goes on inside <code>inL2</code>:</p>

<pre><code>inL2 h m n
  == (inL . h . lapply) m n                -- inline inL2
  == inL (h (lapply m)) n                  -- inline (.)
  == (linear . (h (lapply m)) . lapply) n  -- inline inL
  == linear (h (lapply m) (lapply n))      -- inline (.)
</code></pre>

<p>Ternary transformations are defined similarly.
I&#8217;ll spare you the type constraints this time.</p>

<pre><code>inL3 :: ( ... ) =&gt;
        ((a  -&gt; b) -&gt; (c  -&gt; d) -&gt; (e  -&gt; f) -&gt; (p  -&gt; q))
     -&gt; ((a :-* b) -&gt; (c :-* d) -&gt; (e :-* f) -&gt; (p :-* q))
inL3 h = inL2 . h . lapply
</code></pre>

<p>Look what happens when these operations are composed.
As an example,</p>

<pre><code>inL h . inL g
  == (linear . h . lapply) . (linear . g . lapply)
  == linear . h . lapply . linear . g . lapply    -- associativity of (.)
  == linear . h . g . lapply                      -- rule "lapply.linear"
  == inL (h . g)
</code></pre>

<p>This transformation is not actually happening in the compiler yet.
The &#8220;lapply.linear&#8221; rule is not firing, and I don&#8217;t know why.
I&#8217;d appreciate suggestions.</p>

<p>There are a few more operations defined in <a href="http://code.haskell.org/vector-space/doc/html/src/Data-LinearMap.html" title="Source module: Data.LinearMap">Data.LinearMap</a>.
I&#8217;ll end with this simple, general definition of composition of linear maps:</p>

<pre><code>-- | Compose linear maps
(.*) :: (VectorSpace c s, LMapDom b s, LMapDom a s) =&gt;
        (b :-* c) -&gt; (a :-* b) -&gt; (a :-* c)
(.*) = inL2 (.)
</code></pre>

<h3>Derivative towers again</h3>

<p>A similar, but recursive, definition is used in the new definition of the the general chain rule for infinite derivative towers, updated since the post <em><a href="http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/" title="Blog post: &quot;Higher-dimensional, higher-order derivatives, functionally&quot;">Higher-dimensional, higher-order derivatives, functionally</a></em>.</p>

<pre><code>(@.) :: (LMapDom b s, LMapDom a s, VectorSpace c s) =&gt;
        (b :~&gt; c) -&gt; (a :~&gt; b) -&gt; (a :~&gt; c)
(h @. g) a0 = D c0 (inL2 (@.) c' b')
  where
    D b0 b' = g a0
    D c0 c' = h b0
</code></pre>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=51&amp;md5=9b9afbe99da6c23039f2be00a2cb2c53"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/functional-linear-maps/feed</wfw:commentRss>
		<slash:comments>2</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Ffunctional-linear-maps&amp;language=en_GB&amp;category=text&amp;title=Functional+linear+maps&amp;description=Two+earlier+posts+described+a+simple+and+general+notion+of+derivative+that+unifies+the+many+concrete+notions+taught+in+traditional+calculus+courses.+All+of+those+variations+turn+out+to+be...&amp;tags=derivative%2Clinear+map%2Cmath%2Ctype+family%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Higher-dimensional, higher-order derivatives, functionally</title>
		<link>http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally</link>
		<comments>http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally#comments</comments>
		<pubDate>Wed, 21 May 2008 05:29:32 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[beautiful code]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[linear map]]></category>
		<category><![CDATA[math]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=49</guid>
		<description><![CDATA[The post Beautiful differentiation showed some lovely code that makes it easy to compute not just the values of user-written functions, but also all of its derivatives (infinitely many). This elegant technique is limited, however, to functions over a scalar (one-dimensional) domain. Next, we explored what it means to transcend that limitation, asking and answering [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Higher-dimensional, higher-order derivatives, functionally

Tags: derivatives, linear maps, math, beautiful code

Alternative titles:

Derivative towers across the 8th dimension
Higher-dimensional derivative towers
Higher-dimensional, higher-order derivatives, functionally

URL: http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/

-->

<!-- references -->

<!-- teaser -->

<p>The post <em><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">Beautiful differentiation</a></em> showed some lovely code that makes it easy to compute not just the values of user-written functions, but also <em>all</em> of its derivatives (infinitely many).
This elegant technique is limited, however, to functions over a <em>scalar</em> (one-dimensional) domain.
Next, we explored what it means to transcend that limitation, asking and answering the question <em><a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="Blog post: &quot;What is a derivative, really?&quot;">What is a derivative, really?</a></em>
The answer to that question is that derivative values are <em>linear maps</em> saying how small input changes result in output changes.
This answer allows us to unify several different notions of derivatives and their corresponding chain rules into a single simple and powerful form.</p>

<p>This third post combines the ideas from the two previous posts, to easily compute infinitely many derivatives of functions over arbitrary-dimensional domains.</p>

<p>The code shown here is part of a <a href="http://haskell.org/haskellwiki/vector-space" title="Library wiki page: &quot;vector-space&quot;">new Haskell library</a>, which you can download and play with or peruse on the web.</p>

<!--
**Edits**:

* 2008-02-09: just fiddling around
-->

<!-- without a comment or something here, the last item above becomes a paragraph -->

<p><span id="more-49"></span></p>

<h3>The general setting: vector spaces</h3>

<p>Linear maps (transformations) lie at the heart of the generalized idea of derivative <a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="Blog post: &quot;What is a derivative, really?&quot;">described earlier</a>.
Talking about linearity requires a few simple operations, which are encapsulated in the the abstract interface known from math as a <em>vector space</em>.</p>

<p>A vector space <code>v</code> has an associated type <code>s</code> of scalar values (a field) and a set of operations.
In Haskell,</p>

<pre><code>class VectorSpace v s | v -&gt; s where
  zeroV   :: v              -- the zero vector
  (*^)    :: s -&gt; v -&gt; v    -- scale a vector
  (^+^)   :: v -&gt; v -&gt; v    -- add vectors
  negateV :: v -&gt; v         -- additive inverse
</code></pre>

<p>In many cases, we&#8217;ll want to add inner (dot) products as well, to form an <em>inner product space</em>:</p>

<pre><code>class VectorSpace v s =&gt; InnerSpace v s | v -&gt; s where
  (&lt;.&gt;) :: v -&gt; v -&gt; s
</code></pre>

<p>Several other useful operations can be defined in terms of these five methods.
For instance, vector subtraction and linear interpolation for vector spaces, and magnitude and normalization (rescaling to unit length) for inner product spaces.
The <a href="http://haskell.org/haskellwiki/vector-space" title="Library wiki page: &quot;vector-space&quot;">vector-space</a> library defines instances for <code>Float</code>, <code>Double</code>, and <code>Complex</code>, as well as pairs, triples, and quadruples of vectors, and functions with vector ranges.
(By &#8220;vector&#8221; here, I mean any instance of <code>VectorSpace</code>, recursively).</p>

<p>It&#8217;s pretty easy to define new instances of your own.
For instance, here is the library&#8217;s definition of functions as vector spaces, using the same techniques <a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">as before</a>:</p>

<pre><code>instance VectorSpace v s =&gt; VectorSpace (a-&gt;v) s where
  zeroV   = pure   zeroV
  (*^) s  = fmap   (s *^)
  (^+^)   = liftA2 (^+^)
  negateV = fmap   negateV
</code></pre>

<p>Linear transformations could perhaps be defined as an abstract data type, with primitives and a composition operator.
I don&#8217;t know how to provide enough primitives for all possibly types of interest.
I also played with linear maps as a <a href="http://www.haskell.org/haskellwiki/GHC/Type_families">type family</a>, indexed on the domain or range type, but it didn&#8217;t quite work out for me.
For now, I&#8217;ll simply represent a linear map as a function, define a type synonym as reminder of intention:</p>

<pre><code>type a :-* b = a -&gt; b       -- linear map
</code></pre>

<p>This definition makes some things quite convenient.
Function composition, <code>(.)</code>, implements linear map composition.
The function <code>VectorSpace</code> instance (above) gives the customary meaning for linear maps as vector spaces.
Like, <code>(-&gt;)</code>, this new <code>(:-*)</code> operator is <em>right</em>-associative, so <code>a :-* b :-* c</code> means <code>a :-* (b :-* c)</code>.</p>

<h4>Derivative towers</h4>

<p>A derivative tower contains a value and <em>all</em> derivatives of a function at a point.
<a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="Blog post: &quot;What is a derivative, really?&quot;">Previously</a>, I&#8217;d suggested the following type for derivative towers.</p>

<pre><code>data a :&gt; b = D b (a :&gt; (a :-* b))   -- old definition
</code></pre>

<p>The values in one of these towers have types <code>b</code>, <code>a :-&gt; b</code>, <code>a :-&gt; a :-&gt; b</code>, &#8230;.
So, for instance, a second derivative value is a linear map from <code>a</code> to linear maps from <code>a</code> to b.
(Uncurrying a second derivative yields a <em>bilinear</em> map.)</p>

<p>Since making this suggestion, I&#8217;ve gotten simpler code using the following variation, which I&#8217;ll use instead:</p>

<pre><code>data a :&gt; b = D b (a :-* (a :&gt; b))
</code></pre>

<p>Now a tower value is a regular value, plus a linear map that yields a tower for the derivative.</p>

<p>We can also write this second version more simply, without the linearity reminder:</p>

<pre><code>data a :&gt; b = D b (a :~&gt; b)
</code></pre>

<p>where <code>a :~&gt; b</code> is the type of infinitely differentiable functions, represented as a function that produces a derivative tower:</p>

<pre><code>type a :~&gt; b = a -&gt; (a :&gt; b)
</code></pre>

<h4>Basics</h4>

<p>As in <em><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">Beautiful differentiation</a></em>, constant functions have all derivatives equal to zero:</p>

<pre><code>dConst :: VectorSpace b s =&gt; b -&gt; a:&gt;b
dConst b = b `D` const dZero

dZero :: VectorSpace b s =&gt; a:&gt;b
dZero = dConst zeroV
</code></pre>

<p>Note the use of the standard Haskell function <code>const</code>, which makes constant functions (always returning the same value).
Also, the use of the zero vector required me to use a <code>VectorSpace</code> constraint in the type signature.
(I could have used <code>0</code> and <code>Num</code> instead, but <code>Num</code> requires more methods and so is less general than <code>VectorSpace</code>.)</p>

<p>The differentiable identity function plays a very important role.
Its towers are sometimes called &#8220;the derivation variable&#8221; or similar, but it&#8217;s a not really a variable.
The definition is quite terse:</p>

<pre><code>dId :: VectorSpace u s =&gt; u :~&gt; u
dId u = D u ( du -&gt; dConst du)
</code></pre>

<p>What&#8217;s going on here?
The differentiable identity function, <code>dId</code>, takes an argument <code>u</code> and yields a tower.
The regular value (the <em>0<sup>th</sup></em> derivative) is simply the argument <code>u</code>, as one would expect from an identity function.
The derivative (a linear map) turns a tiny input offset, <code>du</code>, to a resulting output offset, which is also <code>du</code> (also as expected from an identity function).
The higher derivatives are all zero, so our first derivative tower is <code>dConst du</code>.</p>

<h4>Linear functions</h4>

<p>Returning, for a few moments, to thinking of derivatives as numbers, let&#8217;s consider about the function <code>f =  x -&gt; m * x + b</code> for some values <code>m</code> and <code>b</code>.
We&#8217;d usually say that the derivative of <code>f</code> is equal to <code>m</code> everywhere, and indeed <code>f</code> can be interpreted as a line with (constant) slope <code>m</code> and y-intercept <code>b</code>.
In the language of linear algebra, the function <code>f</code> is <em>affine</em> in general, and is (more specifically) <em>linear</em> only when <code>b == 0</code>.</p>

<p>In the generalized view of derivatives as linear maps, we say instead that the derivative is <code>x -&gt; m * x</code>.
The derivative everywhere is almost the same as <code>f</code> itself.
If we take <code>b == 0</code> (so that <code>f</code> is linear and not just affine), then the derivative of <code>f</code> is exactly <code>f</code>, everywhere!
Consequently, its higher derivatives are all zero.</p>

<p>In the generalized view of derivatives as linear maps, this relationship always holds.
The derivative of a linear function <code>f</code> is <code>f</code> everywhere.
We can encapsulate this general property as a utility function:</p>

<pre><code>linearD :: VectorSpace v s =&gt; (u :-* v) -&gt; (u :~&gt; v)
linearD f u = D (f u) ( du -&gt; dConst (f du))
</code></pre>

<p>The <code>dConst</code> here sets up all of the higher derivatives to be zero.
This definition can also be written more succinctly:</p>

<pre><code>linearD f u = D (f u) (dConst . f)
</code></pre>

<p>You may have noticed a similarity between this discussion of linear functions and the identity function above.
This similarity is more than coincidental, because the identity function is linear.
With this insight, we can write a more compact definition for <code>dId</code>, replacing the one above:</p>

<pre><code>dId = linearD id
</code></pre>

<p>As other examples of linear functions, here are differentiable versions of the functions <code>fst</code> and <code>snd</code>, which extract element from a pair.</p>

<pre><code>fstD :: VectorSpace a s =&gt; (a,b) :~&gt; a
fstD = linearD fst

sndD :: VectorSpace b s =&gt; (a,b) :~&gt; b
sndD = linearD snd
</code></pre>

<h4>Numeric operations</h4>

<p>Numeric operations can be specified much as they were <a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">previously</a>.
First, those definition again (with variable names changed),</p>

<pre><code>instance Num b =&gt; Num (Dif b) where
  fromInteger               = dConst . fromInteger
  D u0 u' + D v0 v'         = D (u0 + v0) (u' + v')
  D u0 u' - D v0 v'         = D (u0 - v0) (u' - v')
  u@(D u0 u') * v@(D v0 v') = D (u0 * v0) (u' * v + u * v')
</code></pre>

<p>Now the new definition:</p>

<pre><code>instance (Num b, VectorSpace b b) =&gt; Num (a:&gt;b) where
  fromInteger               = dConst . fromInteger
  D u0 u' + D v0 v'         = D (u0 + v0) (u' + v')
  D u0 u' - D v0 v'         = D (u0 - v0) (u' - v')
  u@(D u0 u') * v@(D v0 v') =
    D (u0 * v0) ( da -&gt; (u * v' da) + (u' da * v))
</code></pre>

<p>The main change shows up in multiplication.
It is no longer meaningful to write something like <code>u' * v</code>, because <code>u' :: b :-* (a :&gt; b)</code>, while <code>v :: a :&gt; b</code>.
Instead, <code>v'</code> gets <em>applied to</em> the small change in input before multiplying by <code>u</code>.
Likewise, <code>u'</code> gets <em>applied to</em> the small change in input before multiplying by <code>v</code>.</p>

<p>The same sort of change has happened silently in the sum and difference cases, but are hidden by the numeric overloadings provided for functions.
Written more explicitly:</p>

<pre><code>  D u0 u' + D v0 v' = D (u0 + v0) ( da -&gt; u' da + v' da)
</code></pre>

<p>By the way, a bit of magic can also hide the &#8220;<code>da -&gt; ...</code>&#8221; in the definition of multiplication:</p>

<pre><code>  u@(D u0 u') * v@(D v0 v') = D (u0 * v0) ((u *) . v' + (* v) . u')
</code></pre>

<p>The derivative part can be deciphered as follows: transform (the input change) by <code>v'</code> and then pre-multiply by <code>u</code>; transform (the input change) by <code>u'</code> and then post-multiply by <code>v</code>; and add the result.
If this sort of wizardry isn&#8217;t your game, forget about it and use the more explicit form.</p>

<h4>Composition &#8212; the chain rule</h4>

<p>Here&#8217;s the chain rule we used <a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">earlier</a>.</p>

<pre><code>(&gt;-&lt;) :: (Num a) =&gt; (a -&gt; a) -&gt; (Dif a -&gt; Dif a) -&gt; (Dif a -&gt; Dif a)
f &gt;-&lt; f' =  u@(D u0 u') -&gt; D (f u0) (f' u * u')
</code></pre>

<p>The new one differs just slightly:</p>

<pre><code>(&gt;-&lt;) :: VectorSpace u s =&gt;
         (u -&gt; u) -&gt; ((a :&gt; u) -&gt; (a :&gt; s)) -&gt; (a :&gt; u) -&gt; (a :&gt; u)
f &gt;-&lt; f' =  u@(D u0 u') -&gt; D (f u0) ( da -&gt; f' u *^ u' da)
</code></pre>

<p>Or we can hide the <code>da</code>, as with multiplication:</p>

<pre><code>f &gt;-&lt; f' =  u@(D u0 u') -&gt; D (f u0) ((f' u *^) . u')
</code></pre>

<p>With this change, all of the method definitions in <em><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">Beautiful differentiation</a></em> work as before, with only the 
For instance,</p>

<pre><code>instance (Fractional b, VectorSpace b b) =&gt; Fractional (a:&gt;b) where
  fromRational = dConst . fromRational
  recip        = recip &gt;-&lt; recip sqr
</code></pre>

<p>See <a href="http://haskell.org/haskellwiki/vector-space" title="Library wiki page: &quot;vector-space&quot;">the library</a> for details.</p>

<h4>The chain rule pure and simple</h4>

<p>The <code>(&gt;-&lt;)</code> operator above is specialized form of the chain rule that is convenient for automatic differentiation.
In its simplest and most general form, the chain rule says</p>

<pre><code>deriv (f . g) x = deriv f (g x) . deriv g x
</code></pre>

<p>The composition on the right hand side is on linear maps (derivatives).
You may be used to seeing the chain rule in one or more of its specialized forms, using some form of product (scalar/scalar, scalar/vector, vector/vector dot, matrix/vector) instead of composition.
Those forms all mean the same as this general case, but are defined on various <em>representations</em> of linear maps, instead of linear maps themselves.</p>

<p>The chain rule above constructs only the first derivatives.
Instead, we&#8217;ll construct all of the derivatives by using all of the derivatives of <code>f</code> and <code>g</code>.</p>

<pre><code>(@.) :: (b :~&gt; c) -&gt; (a :~&gt; b) -&gt; (a :~&gt; c)
(f @. g) a0 = D c0 (c' @. b')
  wfere
    D b0 b' = g a0
    D c0 c' = f b0
</code></pre>

<h4>Coming attractions</h4>

<p>In this post, we&#8217;ve combined <a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">derivative towers</a> with <a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="Blog post: &quot;What is a derivative, really?&quot;">generalized derivatives (based on linear maps)</a>, for constructing infinitely many derivatives of functions over multi-dimensional (or scalar) domains.
The inner workings are subtler than the previous code, but almost as simple to express and just as easy to use.</p>

<p>If you&#8217;re interested in learning more about generalized derivatives, I recommend the book <a href="http://books.google.com/books?hl=en&amp;id=g_EXJtkz7PYC" title="Book: &quot;Calculus on Manifolds&quot;, by Michael Spivak">Calculus on Manifolds</a>.</p>

<p>Future posts will include:</p>

<ul>
<li>A look at an efficiency issue and consider some solutions.</li>
<li>Elegant executable specifications of smooth surfaces, using derivatives for the surface normals used in shading.</li>
</ul>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=49&amp;md5=076ecf56639f40c0b1075f23a066ba1f"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fhigher-dimensional-higher-order-derivatives-functionally&amp;language=en_GB&amp;category=text&amp;title=Higher-dimensional%2C+higher-order+derivatives%2C+functionally&amp;description=The+post+Beautiful+differentiation+showed+some+lovely+code+that+makes+it+easy+to+compute+not+just+the+values+of+user-written+functions%2C+but+also+all+of+its+derivatives+%28infinitely+many%29.+This...&amp;tags=beautiful+code%2Cderivative%2Clinear+map%2Cmath%2Cblog" type="text/html" />
	</item>
		<item>
		<title>What is a derivative, really?</title>
		<link>http://conal.net/blog/posts/what-is-a-derivative-really</link>
		<comments>http://conal.net/blog/posts/what-is-a-derivative-really#comments</comments>
		<pubDate>Mon, 19 May 2008 05:01:08 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[linear map]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=48</guid>
		<description><![CDATA[The post Beautiful differentiation showed how easily and beautifully one can construct an infinite tower of derivative values in Haskell programs, while computing plain old values. The trick (from Jerzy Karczmarczuk) was to overload numeric operators to operate on the following (co)recursive type: data Dif b = D b (Dif b) This representation, however, works [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: What is a derivative, really?

Tags: derivatives, linear maps, math

URL: http://conal.net/blog/posts/what-is-a-derivative-really/

-->

<!-- references -->

<!-- teaser -->

<p>The post <em><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">Beautiful differentiation</a></em> showed how easily and beautifully one can construct an infinite tower of derivative values in Haskell programs, while computing plain old values.
The trick (from Jerzy Karczmarczuk) was to overload numeric operators to operate on the following (co)recursive type:</p>

<pre><code>data Dif b = D b (Dif b)
</code></pre>

<p>This representation, however, works only when differentiating functions from a <em>scalar</em> (one-dimensional) domain, i.e., functions of type <code>a -&gt; b</code> for a scalar type <code>a</code>.
The reason for this limitation is that only in those cases can the type of derivative values be identified with the type of regular values.</p>

<p>Consider a function <code>f :: (R,R) -&gt; R</code>, where <code>R</code> is, say, <code>Double</code>.
The <em>value</em> of <code>f</code> at a domain value <code>(x,y)</code> has type <code>R</code>, but the derivative of <code>f</code> consists of <em>two</em> partial derivatives.
Moreover, the second derivative consists of <em>four</em> partial second-order derivatives (or three, depending how you count).
A function <code>f :: (R,R) -&gt; (R,R,R)</code> also has two partial derivatives at each point <code>(x,y)</code>, each of which is a triple.
That pair of triples is commonly written as a two-by-three matrix.</p>

<p>Each of these situations has its own derivative shape <em>and</em> its own chain rule (for the derivative of function compositions), using plain-old multiplication, scalar-times-vector, vector-dot-vector, matrix-times-vector, or matrix-times-matrix.
Second derivatives are more complex and varied.</p>

<p>How many forms of derivatives and chain rules are enough?
Are we doomed to work with a plethora of increasingly complex types of derivatives, as well as the diverse chain rules needed to accommodate all compatible pairs of derivatives?
Fortunately, not.
<em>There is a single, simple, unifying generalization</em>.
By reconsidering what we mean by a derivative value, we can see that these various forms are all representations of a single notion, <em>and</em> all the chain rules mean the same thing on the meanings of the representations.</p>

<p>This blog post is about that unifying view of derivatives.</p>

<p><strong>Edits</strong>:</p>

<ul>
<li>2008-05-20: There are several comments about this post <a href="http://reddit.com/info/6jw8w/comments/">on reddit</a>.</li>
<li>2008-05-20: Renamed derivative operator from <code>D</code> to <code>deriv</code> to avoid confusion with the data constructor for derivative towers.</li>
<li>2008-05-20: Renamed linear map type from <code>(:-&gt;)</code> to <code>(:-*)</code> to make it visually closer to a standard notation.</li>
</ul>

<!-- without a comment or something here, the last item above becomes a paragraph -->

<p><span id="more-48"></span></p>

<h3>What&#8217;s a derivative?</h3>

<p>To get an intuitive sense of what&#8217;s going on with derivatives in general, let&#8217;s look at some examples.
If you already know about <a href="http://books.google.com/books?hl=en&amp;id=g_EXJtkz7PYC" title="Book: &quot;Calculus on Manifolds&quot;, by Michael Spivak">calculus on manifolds</a>, you might want to <a href="#NewDif">skip ahead</a></p>

<h4>One dimension</h4>

<p>Start with a simple function on real numbers:</p>

<pre><code>f1 :: R -&gt; R
f1 x = x^2 + 3*x + 1
</code></pre>

<p>Writing the derivative of a function <code>f</code> as <code>deriv f</code>, let&#8217;s now consider the question: what is <code>deriv f1</code>?
We might say that</p>

<pre><code>deriv f1 x = 2*x+3
</code></pre>

<p>so e.g., <code>deriv f1 5 = 13</code>.
In other words, <code>f1</code> is changing 13 times as fast as its argument, when its argument is passing 5.</p>

<p>Rephrased yet again, if <code>dx</code> is a very tiny number, then <code>f1(5+dx) - f1 5</code> is very nearly <code>13 * dx</code>.
If <code>f1</code> maps seconds to meters, then <code>deriv f1 5</code> is 13 meters per second.
So already, we can see that the range of <code>f</code> (meters) and the range of <code>deriv f</code> (meters/second) disagree.</p>

<h4>Two dimensions in and one dimension out</h4>

<p>As a second example, consider a two-dimensional domain:</p>

<pre><code>f2 :: (R,R) -&gt; R
f2 (x,y) = 2*x*y + 3*x + 5*y + 7
</code></pre>

<p>Again, let&#8217;s consider some units, to get a guess of what kind of thing <code>deriv f2 (x,y)</code> really is.
Suppose that <code>f2</code> measures altitude of terrain above a plane, as a function of the position in the plane.
(So <code>f2</code> is a &#8220;height field&#8221;.)
You can guess that <code>deriv f (x,y)</code> is going to have something to do with how fast the altitude is changing, i.e. the slope, at <code>(x,y)</code>.
But there isn&#8217;t a single slope.
Instead, there&#8217;s a slope for <em>every</em> possible compass direction (a hiker&#8217;s degrees of freedom).</p>

<p>Now consider the conventional math answer to what is <code>deriv f2 (x,y)</code>.
Since <code>f2</code> has a two-dimensional domain, it has two partial derivatives, and its derivative is commonly written as a pair of the two partials:</p>

<pre><code>deriv f2 (x,y) = (2*y+3, 2*x+5)
</code></pre>

<p>In our example, these two pieces of information correspond to two of the possible slopes.
The first is the slope if heading directly east, and the second if directly north (increasing <code>x</code> and increasing <code>y</code>, respectively).</p>

<p>What good does it do our hiker to be told just two of the infinitude of possible slopes at a point?
The answer is perhaps magical: for well-behaved terrains, these two pieces of information are enough to calculate <em>all</em> (infinitely many) slopes, with just a bit of math.
Every direction can be described as partly east and partly north (perhaps negatively for westish and southish directions).
Given a direction angle <code>ang</code> (where east is zero and north is 90 degrees), the east and north components are <code>cos ang</code> and <code>sin ang</code>, respectively.
When heading in the direction <code>ang</code>, the slope will be a weighted sum of the north-going slope and the east-going slope, where the weights are the north and south components (<code>cos ang</code> and <code>sin ang</code>).</p>

<p>Instead of angles, our hiker may prefer thinking directly about the north and east components of a tiny step from the position <code>(x,y)</code>.
If the step is small enough and lands <code>dx</code> feet to the east and <code>dy</code> feet to the north, then the change in altitude, <code>f2(x+dx,y+dy) - f2(x,y)</code> is very nearly equal to <code>(2*y+3)*dx + (2*x+5)*dy</code>.
If we use <code>(&lt;.&gt;)</code> to mean dot (inner) product, then this change in altitude is <code>deriv f2 (x,y) &lt;.&gt; (dx,dy)</code>.</p>

<p>From this second example, we can see that the derivative value is not a range value, but also not a rate-of-change of range values.
It&#8217;s a pair of such rates with the know-how to use those rates to determine output changes.</p>

<h4>Two dimensions in and three dimensions out</h4>

<p>Next, imagine moving around on a surface in space, say a torus, and suppose that the surface has grid marks to define a two-dimensional parameter space.
As our hiker travels around in the 2D parameter space, his position in 3D space changes accordingly, more flexibly than just an altitude.
This situation corresponds to a function from 2D to 3D:</p>

<pre><code>f3 :: (R,R) -&gt; (R,R,R)
</code></pre>

<p>At any position <code>(s,t)</code> in the parameter space, and for every choice of direction through parameter space, each of the the coordinates of the position in 3D space has a rate of change.
Again, if the function is mathematically well-behaved (differentiable), then all of these rates of change can be summarized in two partial derivatives.
This time, however, each partial derivative has components in <em>X</em>, <em>Y</em>, and <em>Z</em>, so it takes six numbers to describe the 3D velocities for all possible directions in parameter space.
These numbers are usually written as a 3-by-2 matrix <code>m</code> (the <em>Jacobian</em> of <code>f3</code>).
Given a small parameter step <code>(dx,dy)</code>, the resulting change in 3D position is equal to the product of the derivative matrix and the difference vector, i.e., <code>m `timesVec` (dx,dy)</code>.</p>

<h3>A common perspective</h3>

<p>The examples above use different representations for derivatives: scalar numbers, a vector (pair of numbers), and a matrix.
Common to <em>all</em> of these representations is the ability to turn a small step in the function&#8217;s domain into a resulting step in the range.</p>

<ul>
<li>In <code>f1</code>, the (scalar) derivative <code>c</code> really means <code>(c *)</code>, meaning multiply by <code>c</code>.</li>
<li>In <code>f2</code>, the (vector) derivative <code>v</code> means <code>(v &lt;.&gt;)</code>.</li>
<li>In <code>f3</code>, the (matrix) derivative <code>m</code> means <code>(m `timesVec`)</code>.</li>
</ul>

<p>So, the common meaning of these derivative representations is a function, and not just any function, but a <em>linear</em> function&#8211;often called a &#8220;linear map&#8221; or &#8220;linear transformation&#8221;.
For a function <code>lf</code> to be <em>linear</em> in this context means that</p>

<ul>
<li><code>lf (u+v) == lf u + lf v</code>, and</li>
<li><code>lf (c*v) == c * lf v</code>, for scalar values <code>c</code>.</li>
</ul>

<p>Now what about the different chain rules, saying to combine derivative values via various kinds of products (scalar/scalar, scalar/vector, vector/vector dot, matrix/vector)?
Each of these products implements the same abstract notion, which is <em>composition</em> of linear maps.</p>

<h3 id="NewDif">What about <code>Dif</code>?</h3>

<p>Now let&#8217;s return to the derivative towers we used before:</p>

<pre><code>data Dif b = D b (Dif b)
</code></pre>

<p>As I mentioned above, this representation only works when derivative values can be represented just like range values.
That punning of derivative values with range values works when the domain type is one dimensional.
For functions over higher-dimensional domains, we&#8217;ll have to use a different representation.</p>

<p>Assume a type of linear functions from <code>a</code> to <code>b</code>:</p>

<pre><code>type a :-* b = . . .
</code></pre>

<p>(In Haskell, type constructors beginning with a colon are used infix.)
Since the derivative type depends on domain as well as range, our derivative tower will have two type parameters instead of one.
To make definitions prettier, I&#8217;ll change derivative towers to an infix operator as well.</p>

<pre><code>data a :&gt; b = D b (a :&gt; (a :-* b))
</code></pre>

<p>An infinitely differentiable function is then one that produces a derivative tower:</p>

<pre><code>type a :~&gt; b = a -&gt; (a:&gt;b)
</code></pre>

<h3>What&#8217;s next?</h3>

<p>Perhaps now you&#8217;re wondering:</p>

<ul>
<li>Are these lovely ideas workable in practice?</li>
<li>What happens to the code from <em><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">Beautiful differentiation</a></em>?</li>
<li>What use are derivatives, anyway?</li>
</ul>

<p>These questions and more will be answered in upcoming installments.</p>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=48&amp;md5=1ab0baee13e05d2dc55d999eedde56b9"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/what-is-a-derivative-really/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fwhat-is-a-derivative-really&amp;language=en_GB&amp;category=text&amp;title=What+is+a+derivative%2C+really%3F&amp;description=The+post+Beautiful+differentiation+showed+how+easily+and+beautifully+one+can+construct+an+infinite+tower+of+derivative+values+in+Haskell+programs%2C+while+computing+plain+old+values.+The+trick+%28from+Jerzy...&amp;tags=derivative%2Clinear+map%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Beautiful differentiation</title>
		<link>http://conal.net/blog/posts/beautiful-differentiation</link>
		<comments>http://conal.net/blog/posts/beautiful-differentiation#comments</comments>
		<pubDate>Wed, 07 May 2008 22:26:08 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[applicative functor]]></category>
		<category><![CDATA[beautiful code]]></category>
		<category><![CDATA[derivative]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=43</guid>
		<description><![CDATA[Lately I&#8217;ve been playing again with parametric surfaces in Haskell. Surface rendering requires normals, which can be constructed from partial derivatives, which brings up automatic differentiation (AD). Playing with some refactoring, I&#8217;ve stumbled across a terser, lovelier formulation for the derivative rules than I&#8217;ve seen before. Edits: 2008-05-08: Added source files: NumInstances.hs and Dif.hs. 2008-05-20: [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Beautiful differentiation

Tags: automatic differentiation, applicative functors, beautiful code

URL: http://conal.net/blog/posts/beautiful-differentiation/

-->

<!-- references -->

<!-- teaser -->

<p>Lately I&#8217;ve been playing again with parametric surfaces in Haskell.
Surface rendering requires normals, which can be constructed from partial derivatives, which brings up <em>automatic differentiation</em> (AD).
Playing with some refactoring, I&#8217;ve stumbled across a terser, lovelier formulation for the derivative rules than I&#8217;ve seen before.</p>

<p><strong>Edits</strong>:</p>

<ul>
<li>2008-05-08: Added source files: <a href='http://conal.net/blog/wp-content/uploads/2008/05/NumInstances.hs'>NumInstances.hs</a> and <a href='http://conal.net/blog/wp-content/uploads/2008/05/Dif.hs'>Dif.hs</a>.</li>
<li>2008-05-20: Changed some variable names for clarity and consistency.  For instance, <code>x@(D x0 x')</code> instead of <code>p@(D x x')</code>.</li>
<li>2008-05-20: Removed extraneous <code>Fractional</code> constraint in the <code>Floating</code> instance of <code>Dif</code>.</li>
</ul>

<!-- without a comment or something here, the last item above becomes a paragraph -->

<p><span id="more-43"></span></p>

<h3>Automatic differentiation</h3>

<p>The idea of AD is to simultaneously manipulate values and derivatives.
Overloading of the standard numerical operations (and literals) makes this combined manipulation as simple and pretty as manipulating values without derivatives.</p>

<p>In <em><a href="http://citeseer.ist.psu.edu/karczmarczuk98functional.html" title="ICFP '98 paper &quot;Functional Differentiation of Computer Programs&quot; by Jerzy Karczmarczuk">Functional Differentiation of Computer Programs</a></em>, Jerzy Karczmarczuk extended the usual trick to a &#8220;lazy tower of derivatives&#8221;.
He exploited Haskell&#8217;s laziness to carry <em>infinitely many</em> derivatives, rather than just one.
<a href="http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-2.html" title="Blog post: &quot;Overloading Haskell numbers, part 2, Forward Automatic Differentiation&quot;">Lennart Augustsson&#8217;s AD post</a> contains a summary of Jerzy&#8217;s idea and an application.
I&#8217;ll use some of the details from Lennart&#8217;s version, for simplicity.</p>

<p>For some perspectives on the mathematical structuure under AD, see <a href="http://sigfpe.blogspot.com/2005/07/automatic-differentiation.html">sigfpe&#8217;s AD post</a>, and <em><a href="http://vandreev.wordpress.com/2006/12/04/non-standard-analysis-and-automatic-differentiation/" title="Blog post by Vlad Andreev">Non-standard analysis, automatic differentiation, Haskell, and other stories</a></em>.</p>

<h3>Representation and overloadings</h3>

<p>The tower of derivatives can be represented as an infinite list.
Since we&#8217;ll use operator overloadings that are not meaningful for lists in general, let&#8217;s instead define a new data type:</p>

<pre><code>data Dif a = D a (Dif a)
</code></pre>

<p>Given a function <code>f :: a -&gt; Dif b</code>, <code>f a</code> has the form <code>D x (D x' (D x'' ...))</code>, where <code>x</code> is the value at <code>a</code>, and <code>x'</code>, <code>x''</code> &#8230;, are the derivatives (first, second, &#8230;) at <code>a</code>.</p>

<p>Constant functions have all derivatives equal to zero.</p>

<pre><code>dConst :: Num a =&gt; a -&gt; Dif a
dConst x0 = D x0 dZero

dZero :: Num a =&gt; Dif a
dZero = D 0 dZero
</code></pre>

<p>Numeric overloadings then are simple.
For instance,</p>

<pre><code>instance Num a =&gt; Num (Dif a) where
  fromInteger               = dConst . fromInteger
  D x0 x' + D y0 y'         = D (x0 + y0) (x' + y')
  D x0 x' - D y0 y'         = D (x0 - y0) (x' - y')
  x@(D x0 x') * y@(D y0 y') = D (x0 * y0) (x' * y + x * y')
</code></pre>

<p>In each of the right-hand sides of these last three definitions, the first argument to <code>D</code> is constructed using <code>Num a</code>, while the second argument is <em>recursively</em> constructed using <code>Num (Dif a)</code>.</p>

<p><a href="http://citeseer.ist.psu.edu/karczmarczuk98functional.html" title="ICFP '98 paper &quot;Functional Differentiation of Computer Programs&quot; by Jerzy Karczmarczuk">Jerzy&#8217;s paper</a> uses a function to provide all of the derivatives of a given function (called <code>dlift</code> from Section 3.3):</p>

<pre><code>lift :: Num a =&gt; [a -&gt; a] -&gt; Dif a -&gt; Dif a
lift (f : f') p@(D x x') = D (f x) (x' * lift f' p)
</code></pre>

<p>The given list of functions are all of the derivatives of a given function.
Then, derivative towers can be constructed by definitions like the following:</p>

<pre><code>instance Floating a =&gt; Floating (Dif a) where
  pi               = dConst pi
  exp (D x x')     = r where r = D (exp x) (x' * r)
  log p@(D x x')   = D (log x) (x' / p)
  sqrt (D x x')    = r where r = D (sqrt x) (x' / (2 * r))
  sin              = lift (cycle [sin, cos, negate . sin, negate . cos])
  cos              = lift (cycle [cos, negate . sin, negate . cos, sin])
  asin p@(D x x')  = D (asin x) ( x' / sqrt(1 - sqr p))
  acos p@(D x x')  = D (acos x) (-x' / sqrt(1 - sqr p))
  atan p@(D x x')  = D (atan x) ( x' / (sqr p - 1))

sqr :: Num a =&gt; a -&gt; a
sqr x = x*x
</code></pre>

<h3>Reintroducing the chain rule</h3>

<p>The code above, which corresponds to section 3 of <a href="http://citeseer.ist.psu.edu/karczmarczuk98functional.html" title="ICFP '98 paper &quot;Functional Differentiation of Computer Programs&quot; by Jerzy Karczmarczuk">Jerzy&#8217;s paper</a>, is fairly compact.
It can be made prettier, however, which is the point of this blog post.</p>

<p>First, let&#8217;s simplify the <code>lift</code> so that it expresses the chain rule directly.
In fact, this definition is just like <code>dlift</code> from Section 2 (not Section 3) of <a href="http://citeseer.ist.psu.edu/karczmarczuk98functional.html" title="ICFP '98 paper &quot;Functional Differentiation of Computer Programs&quot; by Jerzy Karczmarczuk">Jerzy&#8217;s paper</a>.
It&#8217;s the same code, but at a different type, here being used to manipulate infinite derivative towers instead of just value and derivative.</p>

<pre><code>dlift :: Num a =&gt; (a -&gt; a) -&gt; (Dif a -&gt; Dif a) -&gt; Dif a -&gt; Dif a
dlift f f' =  u@(D u0 u') -&gt; D (f u0) (f' u * u')
</code></pre>

<p>This operator lets us write simpler definitions.</p>

<pre><code>instance Floating a =&gt; Floating (Dif a) where
  pi    = dConst pi
  exp   = dlift exp exp
  log   = dlift log recip
  sqrt  = dlift sqrt (recip . (2*) . sqrt)
  sin   = dlift sin cos
  cos   = dlift cos (negate . sin)
  asin  = dlift asin ( x -&gt; recip (sqrt (1 - sqr x)))
  acos  = dlift acos ( x -&gt; - recip (sqrt (1 - sqr x)))
  atan  = dlift atan ( x -&gt; recip (sqr x + 1))
  sinh  = dlift sinh cosh
  cosh  = dlift cosh sinh
  asinh = dlift asinh ( x -&gt; recip (sqrt (sqr x + 1)))
  acosh = dlift acosh ( x -&gt; - recip (sqrt (sqr x - 1)))
  atanh = dlift atanh ( x -&gt; recip (1 - sqr x))
</code></pre>

<p>The necessary recursion has moved out of the lifting function into the class instance (second argument to <code>dlift</code>).</p>

<p>Notice that <code>dlift</code> and the <code>Floating</code> instance are the <em>same code</em> (with minor variations) as in Jerzy&#8217;s section two.
In that section, however, the code computes only first derivatives, while here, we&#8217;re computing all of them.</p>

<h3>Prettier still, with function-level overloading</h3>

<p>The last steps are cosmetic.
The goal is to make the derivative functions used with <code>lift</code> easier to read and write.</p>

<p>Just as we&#8217;ve overloaded numeric operations for derivative towers (<code>Dif</code>), let&#8217;s also overload them for <em>functions</em>.
This trick is often used informally in math.
For instance, given functions <code>f</code> and <code>g</code>, one might write <code>f + g</code> to mean <code>x -&gt; f x + g x</code>.
Using <a href="http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Applicative.html" title="Documentation for Control.Applicative: applicative functors">applicative functor</a> notation makes these instances a breeze to define:</p>

<pre><code>instance Num b =&gt; Num (a-&gt;b) where
  fromInteger = pure . fromInteger
  (+)         = liftA2 (+)
  (*)         = liftA2 (*)
  negate      = fmap negate
  abs         = fmap abs
  signum      = fmap signum
</code></pre>

<p>The other numeric class instances are analogous.
(<em>Any</em> <a href="http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Applicative.html" title="Documentation for Control.Applicative: applicative functors">applicative functor</a> can be given these same instance definitions.)</p>

<p>As a final touch, define an infix operator to replace the name &#8220;<code>dlift</code>&#8220;:</p>

<pre><code>infix 0 &gt;-&lt;
(&gt;-&lt;) = dlift
</code></pre>

<p>Now the complete code:</p>

<pre><code>instance Num a =&gt; Num (Dif a) where
  fromInteger               = dConst . fromInteger
  D x0 x' + D y0 y'         = D (x0 + y0) (x' + y')
  D x0 x' - D y0 y'         = D (x0 - y0) (x' - y')
  x@(D x0 x') * y@(D y0 y') = D (x0 * y0) (x' * y + x * y')

  negate = negate &gt;-&lt; -1
  abs    = abs    &gt;-&lt; signum
  signum = signum &gt;-&lt; 0

instance Fractional a =&gt; Fractional (Dif a) where
  fromRational = dConst . fromRational
  recip        = recip &gt;-&lt; - sqr recip

instance Floating a =&gt; Floating (Dif a) where
  pi    = dConst pi
  exp   = exp   &gt;-&lt; exp
  log   = log   &gt;-&lt; recip
  sqrt  = sqrt  &gt;-&lt; recip (2 * sqrt)
  sin   = sin   &gt;-&lt; cos
  cos   = cos   &gt;-&lt; - sin
  sinh  = sinh  &gt;-&lt; cosh
  cosh  = cosh  &gt;-&lt; sinh
  asin  = asin  &gt;-&lt; recip (sqrt (1-sqr))
  acos  = acos  &gt;-&lt; recip (- sqrt (1-sqr))
  atan  = atan  &gt;-&lt; recip (1+sqr)
  asinh = asinh &gt;-&lt; recip (sqrt (1+sqr))
  acosh = acosh &gt;-&lt; recip (- sqrt (sqr-1))
  atanh = atanh &gt;-&lt; recip (1-sqr)
</code></pre>

<p>The operators and literals on the right of the <code>(&gt;-&lt;)</code> are overloaded for the type <code>Dif a -&gt; Dif a</code>.
For instance, in the definition of <code>sqrt</code>,</p>

<pre><code>2     :: Dif a -&gt; Dif a
recip :: (Dif a -&gt; Dif a) -&gt; (Dif a -&gt; Dif a)
(*)   :: (Dif a -&gt; Dif a) -&gt; (Dif a -&gt; Dif a)
      -&gt; (Dif a -&gt; Dif a)
</code></pre>

<h3>Try it</h3>

<p>You can try out this code yourself.
Just grab the source files: <a href='http://conal.net/blog/wp-content/uploads/2008/05/NumInstances.hs'>NumInstances.hs</a> and <a href='http://conal.net/blog/wp-content/uploads/2008/05/Dif.hs'>Dif.hs</a>.
Enjoy!</p>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=43&amp;md5=75941b13bc061408cf92f7ed5b89141f"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/beautiful-differentiation/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fbeautiful-differentiation&amp;language=en_GB&amp;category=text&amp;title=Beautiful+differentiation&amp;description=Lately+I%26%238217%3Bve+been+playing+again+with+parametric+surfaces+in+Haskell.+Surface+rendering+requires+normals%2C+which+can+be+constructed+from+partial+derivatives%2C+which+brings+up+automatic+differentiation+%28AD%29.+Playing+with+some...&amp;tags=applicative+functor%2Cbeautiful+code%2Cderivative%2Cblog" type="text/html" />
	</item>
	</channel>
</rss>
