<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Conal Elliott &#187; beautiful code</title>
	<atom:link href="http://conal.net/blog/tag/beautiful-code/feed" rel="self" type="application/rss+xml" />
	<link>http://conal.net/blog</link>
	<description>Inspirations &#38; experiments, mainly about denotative/functional programming in Haskell</description>
	<lastBuildDate>Thu, 25 Jul 2019 18:15:11 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.1.17</generator>
	<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2F&amp;language=en_US&amp;category=text&amp;title=Conal+Elliott&amp;description=Inspirations+%26amp%3B+experiments%2C+mainly+about+denotative%2Ffunctional+programming+in+Haskell&amp;tags=blog" type="text/html" />
	<item>
		<title>Paper: Beautiful differentiation</title>
		<link>http://conal.net/blog/posts/paper-beautiful-differentiation</link>
		<comments>http://conal.net/blog/posts/paper-beautiful-differentiation#comments</comments>
		<pubDate>Tue, 24 Feb 2009 08:05:10 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[applicative functor]]></category>
		<category><![CDATA[beautiful code]]></category>
		<category><![CDATA[calculus on manifolds]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[functor]]></category>
		<category><![CDATA[linear map]]></category>
		<category><![CDATA[math]]></category>
		<category><![CDATA[paper]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=85</guid>
		<description><![CDATA[I have another paper draft for submission to ICFP 2009. This one is called Beautiful differentiation, The paper is a culmination of the several posts I&#8217;ve written on derivatives and automatic differentiation (AD). I&#8217;m happy with how the derivation keeps getting simpler. Now I&#8217;ve boiled extremely general higher-order AD down to a Functor and Applicative [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Paper: Beautiful differentiation

Tags: derivative, functor, applicative functor, beautiful code, calculus on manifolds, linear map, math, paper

URL: http://conal.net/blog/posts/paper-beautiful-differentiation/

-->

<!-- references -->

<!-- teaser -->

<p>I have another paper draft for submission to <a href="http://www.cs.nott.ac.uk/~gmh/icfp09.html" title="conference page">ICFP 2009</a>.
This one is called <em><a href="http://conal.net/papers/beautiful-differentiation" title="paper">Beautiful differentiation</a></em>, 
The paper is a culmination of the <a href="http://conal.net/blog/tag/derivative/">several posts</a> I&#8217;ve written on derivatives and automatic differentiation (AD).
I&#8217;m happy with how the derivation keeps getting simpler.
Now I&#8217;ve boiled extremely general higher-order AD down to a <code>Functor</code> and <code>Applicative</code> morphism.</p>

<p>I&#8217;d love to get some readings and feedback.
I&#8217;m a bit over the page the limit, so I&#8217;ll have to do some trimming before submitting.</p>

<p>The abstract:</p>

<blockquote>
  <p>Automatic differentiation (AD) is a precise, efficient, and convenient
  method for computing derivatives of functions. Its implementation can be
  quite simple even when extended to compute all of the higher-order
  derivatives as well. The higher-dimensional case has also been tackled,
  though with extra complexity. This paper develops an implementation of
  higher-dimensional, higher-order differentiation in the extremely
  general and elegant setting of <em>calculus on manifolds</em> and derives that
  implementation from a simple and precise specification.</p>
  
  <p>In order to motivate and discover the implementation, the paper poses
  the question &#8220;What does AD mean, independently of implementation?&#8221; An
  answer arises in the form of <em>naturality</em> of sampling a function and its
  derivative. Automatic differentiation flows out of this naturality
  condition, together with the chain rule. Graduating from first-order to
  higher-order AD corresponds to sampling all derivatives instead of just
  one. Next, the notion of a derivative is generalized via the notions of
  vector space and linear maps. The specification of AD adapts to this
  elegant and very general setting, which even <em>simplifies</em> the
  development.</p>
</blockquote>

<p>You can <a href="http://conal.net/papers/beautiful-differentiation" title="paper">get the paper and see current errata here</a>.</p>

<p>The submission deadline is March 2, so comments before then are most helpful to me.</p>

<p>Enjoy, and thanks!</p>

<!--
**Edits**:

* 2009-02-09: just fiddling around
-->
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=85&amp;md5=2f6565c8f001a5925c3e6dbd29158c37"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/paper-beautiful-differentiation/feed</wfw:commentRss>
		<slash:comments>22</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fpaper-beautiful-differentiation&amp;language=en_GB&amp;category=text&amp;title=Paper%3A+Beautiful+differentiation&amp;description=I+have+another+paper+draft+for+submission+to+ICFP+2009.+This+one+is+called+Beautiful+differentiation%2C+The+paper+is+a+culmination+of+the+several+posts+I%26%238217%3Bve+written+on+derivatives+and...&amp;tags=applicative+functor%2Cbeautiful+code%2Ccalculus+on+manifolds%2Cderivative%2Cfunctor%2Clinear+map%2Cmath%2Cpaper%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Higher-dimensional, higher-order derivatives, functionally</title>
		<link>http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally</link>
		<comments>http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally#comments</comments>
		<pubDate>Wed, 21 May 2008 05:29:32 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[beautiful code]]></category>
		<category><![CDATA[derivative]]></category>
		<category><![CDATA[linear map]]></category>
		<category><![CDATA[math]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=49</guid>
		<description><![CDATA[The post Beautiful differentiation showed some lovely code that makes it easy to compute not just the values of user-written functions, but also all of its derivatives (infinitely many). This elegant technique is limited, however, to functions over a scalar (one-dimensional) domain. Next, we explored what it means to transcend that limitation, asking and answering [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Higher-dimensional, higher-order derivatives, functionally

Tags: derivatives, linear maps, math, beautiful code

Alternative titles:

Derivative towers across the 8th dimension
Higher-dimensional derivative towers
Higher-dimensional, higher-order derivatives, functionally

URL: http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/

-->

<!-- references -->

<!-- teaser -->

<p>The post <em><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">Beautiful differentiation</a></em> showed some lovely code that makes it easy to compute not just the values of user-written functions, but also <em>all</em> of its derivatives (infinitely many).
This elegant technique is limited, however, to functions over a <em>scalar</em> (one-dimensional) domain.
Next, we explored what it means to transcend that limitation, asking and answering the question <em><a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="Blog post: &quot;What is a derivative, really?&quot;">What is a derivative, really?</a></em>
The answer to that question is that derivative values are <em>linear maps</em> saying how small input changes result in output changes.
This answer allows us to unify several different notions of derivatives and their corresponding chain rules into a single simple and powerful form.</p>

<p>This third post combines the ideas from the two previous posts, to easily compute infinitely many derivatives of functions over arbitrary-dimensional domains.</p>

<p>The code shown here is part of a <a href="http://haskell.org/haskellwiki/vector-space" title="Library wiki page: &quot;vector-space&quot;">new Haskell library</a>, which you can download and play with or peruse on the web.</p>

<!--
**Edits**:

* 2008-02-09: just fiddling around
-->

<!-- without a comment or something here, the last item above becomes a paragraph -->

<p><span id="more-49"></span></p>

<h3>The general setting: vector spaces</h3>

<p>Linear maps (transformations) lie at the heart of the generalized idea of derivative <a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="Blog post: &quot;What is a derivative, really?&quot;">described earlier</a>.
Talking about linearity requires a few simple operations, which are encapsulated in the the abstract interface known from math as a <em>vector space</em>.</p>

<p>A vector space <code>v</code> has an associated type <code>s</code> of scalar values (a field) and a set of operations.
In Haskell,</p>

<pre><code>class VectorSpace v s | v -&gt; s where
  zeroV   :: v              -- the zero vector
  (*^)    :: s -&gt; v -&gt; v    -- scale a vector
  (^+^)   :: v -&gt; v -&gt; v    -- add vectors
  negateV :: v -&gt; v         -- additive inverse
</code></pre>

<p>In many cases, we&#8217;ll want to add inner (dot) products as well, to form an <em>inner product space</em>:</p>

<pre><code>class VectorSpace v s =&gt; InnerSpace v s | v -&gt; s where
  (&lt;.&gt;) :: v -&gt; v -&gt; s
</code></pre>

<p>Several other useful operations can be defined in terms of these five methods.
For instance, vector subtraction and linear interpolation for vector spaces, and magnitude and normalization (rescaling to unit length) for inner product spaces.
The <a href="http://haskell.org/haskellwiki/vector-space" title="Library wiki page: &quot;vector-space&quot;">vector-space</a> library defines instances for <code>Float</code>, <code>Double</code>, and <code>Complex</code>, as well as pairs, triples, and quadruples of vectors, and functions with vector ranges.
(By &#8220;vector&#8221; here, I mean any instance of <code>VectorSpace</code>, recursively).</p>

<p>It&#8217;s pretty easy to define new instances of your own.
For instance, here is the library&#8217;s definition of functions as vector spaces, using the same techniques <a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">as before</a>:</p>

<pre><code>instance VectorSpace v s =&gt; VectorSpace (a-&gt;v) s where
  zeroV   = pure   zeroV
  (*^) s  = fmap   (s *^)
  (^+^)   = liftA2 (^+^)
  negateV = fmap   negateV
</code></pre>

<p>Linear transformations could perhaps be defined as an abstract data type, with primitives and a composition operator.
I don&#8217;t know how to provide enough primitives for all possibly types of interest.
I also played with linear maps as a <a href="http://www.haskell.org/haskellwiki/GHC/Type_families">type family</a>, indexed on the domain or range type, but it didn&#8217;t quite work out for me.
For now, I&#8217;ll simply represent a linear map as a function, define a type synonym as reminder of intention:</p>

<pre><code>type a :-* b = a -&gt; b       -- linear map
</code></pre>

<p>This definition makes some things quite convenient.
Function composition, <code>(.)</code>, implements linear map composition.
The function <code>VectorSpace</code> instance (above) gives the customary meaning for linear maps as vector spaces.
Like, <code>(-&gt;)</code>, this new <code>(:-*)</code> operator is <em>right</em>-associative, so <code>a :-* b :-* c</code> means <code>a :-* (b :-* c)</code>.</p>

<h4>Derivative towers</h4>

<p>A derivative tower contains a value and <em>all</em> derivatives of a function at a point.
<a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="Blog post: &quot;What is a derivative, really?&quot;">Previously</a>, I&#8217;d suggested the following type for derivative towers.</p>

<pre><code>data a :&gt; b = D b (a :&gt; (a :-* b))   -- old definition
</code></pre>

<p>The values in one of these towers have types <code>b</code>, <code>a :-&gt; b</code>, <code>a :-&gt; a :-&gt; b</code>, &#8230;.
So, for instance, a second derivative value is a linear map from <code>a</code> to linear maps from <code>a</code> to b.
(Uncurrying a second derivative yields a <em>bilinear</em> map.)</p>

<p>Since making this suggestion, I&#8217;ve gotten simpler code using the following variation, which I&#8217;ll use instead:</p>

<pre><code>data a :&gt; b = D b (a :-* (a :&gt; b))
</code></pre>

<p>Now a tower value is a regular value, plus a linear map that yields a tower for the derivative.</p>

<p>We can also write this second version more simply, without the linearity reminder:</p>

<pre><code>data a :&gt; b = D b (a :~&gt; b)
</code></pre>

<p>where <code>a :~&gt; b</code> is the type of infinitely differentiable functions, represented as a function that produces a derivative tower:</p>

<pre><code>type a :~&gt; b = a -&gt; (a :&gt; b)
</code></pre>

<h4>Basics</h4>

<p>As in <em><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">Beautiful differentiation</a></em>, constant functions have all derivatives equal to zero:</p>

<pre><code>dConst :: VectorSpace b s =&gt; b -&gt; a:&gt;b
dConst b = b `D` const dZero

dZero :: VectorSpace b s =&gt; a:&gt;b
dZero = dConst zeroV
</code></pre>

<p>Note the use of the standard Haskell function <code>const</code>, which makes constant functions (always returning the same value).
Also, the use of the zero vector required me to use a <code>VectorSpace</code> constraint in the type signature.
(I could have used <code>0</code> and <code>Num</code> instead, but <code>Num</code> requires more methods and so is less general than <code>VectorSpace</code>.)</p>

<p>The differentiable identity function plays a very important role.
Its towers are sometimes called &#8220;the derivation variable&#8221; or similar, but it&#8217;s a not really a variable.
The definition is quite terse:</p>

<pre><code>dId :: VectorSpace u s =&gt; u :~&gt; u
dId u = D u ( du -&gt; dConst du)
</code></pre>

<p>What&#8217;s going on here?
The differentiable identity function, <code>dId</code>, takes an argument <code>u</code> and yields a tower.
The regular value (the <em>0<sup>th</sup></em> derivative) is simply the argument <code>u</code>, as one would expect from an identity function.
The derivative (a linear map) turns a tiny input offset, <code>du</code>, to a resulting output offset, which is also <code>du</code> (also as expected from an identity function).
The higher derivatives are all zero, so our first derivative tower is <code>dConst du</code>.</p>

<h4>Linear functions</h4>

<p>Returning, for a few moments, to thinking of derivatives as numbers, let&#8217;s consider about the function <code>f =  x -&gt; m * x + b</code> for some values <code>m</code> and <code>b</code>.
We&#8217;d usually say that the derivative of <code>f</code> is equal to <code>m</code> everywhere, and indeed <code>f</code> can be interpreted as a line with (constant) slope <code>m</code> and y-intercept <code>b</code>.
In the language of linear algebra, the function <code>f</code> is <em>affine</em> in general, and is (more specifically) <em>linear</em> only when <code>b == 0</code>.</p>

<p>In the generalized view of derivatives as linear maps, we say instead that the derivative is <code>x -&gt; m * x</code>.
The derivative everywhere is almost the same as <code>f</code> itself.
If we take <code>b == 0</code> (so that <code>f</code> is linear and not just affine), then the derivative of <code>f</code> is exactly <code>f</code>, everywhere!
Consequently, its higher derivatives are all zero.</p>

<p>In the generalized view of derivatives as linear maps, this relationship always holds.
The derivative of a linear function <code>f</code> is <code>f</code> everywhere.
We can encapsulate this general property as a utility function:</p>

<pre><code>linearD :: VectorSpace v s =&gt; (u :-* v) -&gt; (u :~&gt; v)
linearD f u = D (f u) ( du -&gt; dConst (f du))
</code></pre>

<p>The <code>dConst</code> here sets up all of the higher derivatives to be zero.
This definition can also be written more succinctly:</p>

<pre><code>linearD f u = D (f u) (dConst . f)
</code></pre>

<p>You may have noticed a similarity between this discussion of linear functions and the identity function above.
This similarity is more than coincidental, because the identity function is linear.
With this insight, we can write a more compact definition for <code>dId</code>, replacing the one above:</p>

<pre><code>dId = linearD id
</code></pre>

<p>As other examples of linear functions, here are differentiable versions of the functions <code>fst</code> and <code>snd</code>, which extract element from a pair.</p>

<pre><code>fstD :: VectorSpace a s =&gt; (a,b) :~&gt; a
fstD = linearD fst

sndD :: VectorSpace b s =&gt; (a,b) :~&gt; b
sndD = linearD snd
</code></pre>

<h4>Numeric operations</h4>

<p>Numeric operations can be specified much as they were <a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">previously</a>.
First, those definition again (with variable names changed),</p>

<pre><code>instance Num b =&gt; Num (Dif b) where
  fromInteger               = dConst . fromInteger
  D u0 u' + D v0 v'         = D (u0 + v0) (u' + v')
  D u0 u' - D v0 v'         = D (u0 - v0) (u' - v')
  u@(D u0 u') * v@(D v0 v') = D (u0 * v0) (u' * v + u * v')
</code></pre>

<p>Now the new definition:</p>

<pre><code>instance (Num b, VectorSpace b b) =&gt; Num (a:&gt;b) where
  fromInteger               = dConst . fromInteger
  D u0 u' + D v0 v'         = D (u0 + v0) (u' + v')
  D u0 u' - D v0 v'         = D (u0 - v0) (u' - v')
  u@(D u0 u') * v@(D v0 v') =
    D (u0 * v0) ( da -&gt; (u * v' da) + (u' da * v))
</code></pre>

<p>The main change shows up in multiplication.
It is no longer meaningful to write something like <code>u' * v</code>, because <code>u' :: b :-* (a :&gt; b)</code>, while <code>v :: a :&gt; b</code>.
Instead, <code>v'</code> gets <em>applied to</em> the small change in input before multiplying by <code>u</code>.
Likewise, <code>u'</code> gets <em>applied to</em> the small change in input before multiplying by <code>v</code>.</p>

<p>The same sort of change has happened silently in the sum and difference cases, but are hidden by the numeric overloadings provided for functions.
Written more explicitly:</p>

<pre><code>  D u0 u' + D v0 v' = D (u0 + v0) ( da -&gt; u' da + v' da)
</code></pre>

<p>By the way, a bit of magic can also hide the &#8220;<code>da -&gt; ...</code>&#8221; in the definition of multiplication:</p>

<pre><code>  u@(D u0 u') * v@(D v0 v') = D (u0 * v0) ((u *) . v' + (* v) . u')
</code></pre>

<p>The derivative part can be deciphered as follows: transform (the input change) by <code>v'</code> and then pre-multiply by <code>u</code>; transform (the input change) by <code>u'</code> and then post-multiply by <code>v</code>; and add the result.
If this sort of wizardry isn&#8217;t your game, forget about it and use the more explicit form.</p>

<h4>Composition &#8212; the chain rule</h4>

<p>Here&#8217;s the chain rule we used <a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">earlier</a>.</p>

<pre><code>(&gt;-&lt;) :: (Num a) =&gt; (a -&gt; a) -&gt; (Dif a -&gt; Dif a) -&gt; (Dif a -&gt; Dif a)
f &gt;-&lt; f' =  u@(D u0 u') -&gt; D (f u0) (f' u * u')
</code></pre>

<p>The new one differs just slightly:</p>

<pre><code>(&gt;-&lt;) :: VectorSpace u s =&gt;
         (u -&gt; u) -&gt; ((a :&gt; u) -&gt; (a :&gt; s)) -&gt; (a :&gt; u) -&gt; (a :&gt; u)
f &gt;-&lt; f' =  u@(D u0 u') -&gt; D (f u0) ( da -&gt; f' u *^ u' da)
</code></pre>

<p>Or we can hide the <code>da</code>, as with multiplication:</p>

<pre><code>f &gt;-&lt; f' =  u@(D u0 u') -&gt; D (f u0) ((f' u *^) . u')
</code></pre>

<p>With this change, all of the method definitions in <em><a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">Beautiful differentiation</a></em> work as before, with only the 
For instance,</p>

<pre><code>instance (Fractional b, VectorSpace b b) =&gt; Fractional (a:&gt;b) where
  fromRational = dConst . fromRational
  recip        = recip &gt;-&lt; recip sqr
</code></pre>

<p>See <a href="http://haskell.org/haskellwiki/vector-space" title="Library wiki page: &quot;vector-space&quot;">the library</a> for details.</p>

<h4>The chain rule pure and simple</h4>

<p>The <code>(&gt;-&lt;)</code> operator above is specialized form of the chain rule that is convenient for automatic differentiation.
In its simplest and most general form, the chain rule says</p>

<pre><code>deriv (f . g) x = deriv f (g x) . deriv g x
</code></pre>

<p>The composition on the right hand side is on linear maps (derivatives).
You may be used to seeing the chain rule in one or more of its specialized forms, using some form of product (scalar/scalar, scalar/vector, vector/vector dot, matrix/vector) instead of composition.
Those forms all mean the same as this general case, but are defined on various <em>representations</em> of linear maps, instead of linear maps themselves.</p>

<p>The chain rule above constructs only the first derivatives.
Instead, we&#8217;ll construct all of the derivatives by using all of the derivatives of <code>f</code> and <code>g</code>.</p>

<pre><code>(@.) :: (b :~&gt; c) -&gt; (a :~&gt; b) -&gt; (a :~&gt; c)
(f @. g) a0 = D c0 (c' @. b')
  wfere
    D b0 b' = g a0
    D c0 c' = f b0
</code></pre>

<h4>Coming attractions</h4>

<p>In this post, we&#8217;ve combined <a href="http://conal.net/blog/posts/beautiful-differentiation/" title="Blog post: &quot;Beautiful differentiation&quot;">derivative towers</a> with <a href="http://conal.net/blog/posts/what-is-a-derivative-really/" title="Blog post: &quot;What is a derivative, really?&quot;">generalized derivatives (based on linear maps)</a>, for constructing infinitely many derivatives of functions over multi-dimensional (or scalar) domains.
The inner workings are subtler than the previous code, but almost as simple to express and just as easy to use.</p>

<p>If you&#8217;re interested in learning more about generalized derivatives, I recommend the book <a href="http://books.google.com/books?hl=en&amp;id=g_EXJtkz7PYC" title="Book: &quot;Calculus on Manifolds&quot;, by Michael Spivak">Calculus on Manifolds</a>.</p>

<p>Future posts will include:</p>

<ul>
<li>A look at an efficiency issue and consider some solutions.</li>
<li>Elegant executable specifications of smooth surfaces, using derivatives for the surface normals used in shading.</li>
</ul>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=49&amp;md5=076ecf56639f40c0b1075f23a066ba1f"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/higher-dimensional-higher-order-derivatives-functionally/feed</wfw:commentRss>
		<slash:comments>11</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fhigher-dimensional-higher-order-derivatives-functionally&amp;language=en_GB&amp;category=text&amp;title=Higher-dimensional%2C+higher-order+derivatives%2C+functionally&amp;description=The+post+Beautiful+differentiation+showed+some+lovely+code+that+makes+it+easy+to+compute+not+just+the+values+of+user-written+functions%2C+but+also+all+of+its+derivatives+%28infinitely+many%29.+This...&amp;tags=beautiful+code%2Cderivative%2Clinear+map%2Cmath%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Beautiful differentiation</title>
		<link>http://conal.net/blog/posts/beautiful-differentiation</link>
		<comments>http://conal.net/blog/posts/beautiful-differentiation#comments</comments>
		<pubDate>Wed, 07 May 2008 22:26:08 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[applicative functor]]></category>
		<category><![CDATA[beautiful code]]></category>
		<category><![CDATA[derivative]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=43</guid>
		<description><![CDATA[Lately I&#8217;ve been playing again with parametric surfaces in Haskell. Surface rendering requires normals, which can be constructed from partial derivatives, which brings up automatic differentiation (AD). Playing with some refactoring, I&#8217;ve stumbled across a terser, lovelier formulation for the derivative rules than I&#8217;ve seen before. Edits: 2008-05-08: Added source files: NumInstances.hs and Dif.hs. 2008-05-20: [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- 

Title: Beautiful differentiation

Tags: automatic differentiation, applicative functors, beautiful code

URL: http://conal.net/blog/posts/beautiful-differentiation/

-->

<!-- references -->

<!-- teaser -->

<p>Lately I&#8217;ve been playing again with parametric surfaces in Haskell.
Surface rendering requires normals, which can be constructed from partial derivatives, which brings up <em>automatic differentiation</em> (AD).
Playing with some refactoring, I&#8217;ve stumbled across a terser, lovelier formulation for the derivative rules than I&#8217;ve seen before.</p>

<p><strong>Edits</strong>:</p>

<ul>
<li>2008-05-08: Added source files: <a href='http://conal.net/blog/wp-content/uploads/2008/05/NumInstances.hs'>NumInstances.hs</a> and <a href='http://conal.net/blog/wp-content/uploads/2008/05/Dif.hs'>Dif.hs</a>.</li>
<li>2008-05-20: Changed some variable names for clarity and consistency.  For instance, <code>x@(D x0 x')</code> instead of <code>p@(D x x')</code>.</li>
<li>2008-05-20: Removed extraneous <code>Fractional</code> constraint in the <code>Floating</code> instance of <code>Dif</code>.</li>
</ul>

<!-- without a comment or something here, the last item above becomes a paragraph -->

<p><span id="more-43"></span></p>

<h3>Automatic differentiation</h3>

<p>The idea of AD is to simultaneously manipulate values and derivatives.
Overloading of the standard numerical operations (and literals) makes this combined manipulation as simple and pretty as manipulating values without derivatives.</p>

<p>In <em><a href="http://citeseer.ist.psu.edu/karczmarczuk98functional.html" title="ICFP '98 paper &quot;Functional Differentiation of Computer Programs&quot; by Jerzy Karczmarczuk">Functional Differentiation of Computer Programs</a></em>, Jerzy Karczmarczuk extended the usual trick to a &#8220;lazy tower of derivatives&#8221;.
He exploited Haskell&#8217;s laziness to carry <em>infinitely many</em> derivatives, rather than just one.
<a href="http://augustss.blogspot.com/2007/04/overloading-haskell-numbers-part-2.html" title="Blog post: &quot;Overloading Haskell numbers, part 2, Forward Automatic Differentiation&quot;">Lennart Augustsson&#8217;s AD post</a> contains a summary of Jerzy&#8217;s idea and an application.
I&#8217;ll use some of the details from Lennart&#8217;s version, for simplicity.</p>

<p>For some perspectives on the mathematical structuure under AD, see <a href="http://sigfpe.blogspot.com/2005/07/automatic-differentiation.html">sigfpe&#8217;s AD post</a>, and <em><a href="http://vandreev.wordpress.com/2006/12/04/non-standard-analysis-and-automatic-differentiation/" title="Blog post by Vlad Andreev">Non-standard analysis, automatic differentiation, Haskell, and other stories</a></em>.</p>

<h3>Representation and overloadings</h3>

<p>The tower of derivatives can be represented as an infinite list.
Since we&#8217;ll use operator overloadings that are not meaningful for lists in general, let&#8217;s instead define a new data type:</p>

<pre><code>data Dif a = D a (Dif a)
</code></pre>

<p>Given a function <code>f :: a -&gt; Dif b</code>, <code>f a</code> has the form <code>D x (D x' (D x'' ...))</code>, where <code>x</code> is the value at <code>a</code>, and <code>x'</code>, <code>x''</code> &#8230;, are the derivatives (first, second, &#8230;) at <code>a</code>.</p>

<p>Constant functions have all derivatives equal to zero.</p>

<pre><code>dConst :: Num a =&gt; a -&gt; Dif a
dConst x0 = D x0 dZero

dZero :: Num a =&gt; Dif a
dZero = D 0 dZero
</code></pre>

<p>Numeric overloadings then are simple.
For instance,</p>

<pre><code>instance Num a =&gt; Num (Dif a) where
  fromInteger               = dConst . fromInteger
  D x0 x' + D y0 y'         = D (x0 + y0) (x' + y')
  D x0 x' - D y0 y'         = D (x0 - y0) (x' - y')
  x@(D x0 x') * y@(D y0 y') = D (x0 * y0) (x' * y + x * y')
</code></pre>

<p>In each of the right-hand sides of these last three definitions, the first argument to <code>D</code> is constructed using <code>Num a</code>, while the second argument is <em>recursively</em> constructed using <code>Num (Dif a)</code>.</p>

<p><a href="http://citeseer.ist.psu.edu/karczmarczuk98functional.html" title="ICFP '98 paper &quot;Functional Differentiation of Computer Programs&quot; by Jerzy Karczmarczuk">Jerzy&#8217;s paper</a> uses a function to provide all of the derivatives of a given function (called <code>dlift</code> from Section 3.3):</p>

<pre><code>lift :: Num a =&gt; [a -&gt; a] -&gt; Dif a -&gt; Dif a
lift (f : f') p@(D x x') = D (f x) (x' * lift f' p)
</code></pre>

<p>The given list of functions are all of the derivatives of a given function.
Then, derivative towers can be constructed by definitions like the following:</p>

<pre><code>instance Floating a =&gt; Floating (Dif a) where
  pi               = dConst pi
  exp (D x x')     = r where r = D (exp x) (x' * r)
  log p@(D x x')   = D (log x) (x' / p)
  sqrt (D x x')    = r where r = D (sqrt x) (x' / (2 * r))
  sin              = lift (cycle [sin, cos, negate . sin, negate . cos])
  cos              = lift (cycle [cos, negate . sin, negate . cos, sin])
  asin p@(D x x')  = D (asin x) ( x' / sqrt(1 - sqr p))
  acos p@(D x x')  = D (acos x) (-x' / sqrt(1 - sqr p))
  atan p@(D x x')  = D (atan x) ( x' / (sqr p - 1))

sqr :: Num a =&gt; a -&gt; a
sqr x = x*x
</code></pre>

<h3>Reintroducing the chain rule</h3>

<p>The code above, which corresponds to section 3 of <a href="http://citeseer.ist.psu.edu/karczmarczuk98functional.html" title="ICFP '98 paper &quot;Functional Differentiation of Computer Programs&quot; by Jerzy Karczmarczuk">Jerzy&#8217;s paper</a>, is fairly compact.
It can be made prettier, however, which is the point of this blog post.</p>

<p>First, let&#8217;s simplify the <code>lift</code> so that it expresses the chain rule directly.
In fact, this definition is just like <code>dlift</code> from Section 2 (not Section 3) of <a href="http://citeseer.ist.psu.edu/karczmarczuk98functional.html" title="ICFP '98 paper &quot;Functional Differentiation of Computer Programs&quot; by Jerzy Karczmarczuk">Jerzy&#8217;s paper</a>.
It&#8217;s the same code, but at a different type, here being used to manipulate infinite derivative towers instead of just value and derivative.</p>

<pre><code>dlift :: Num a =&gt; (a -&gt; a) -&gt; (Dif a -&gt; Dif a) -&gt; Dif a -&gt; Dif a
dlift f f' =  u@(D u0 u') -&gt; D (f u0) (f' u * u')
</code></pre>

<p>This operator lets us write simpler definitions.</p>

<pre><code>instance Floating a =&gt; Floating (Dif a) where
  pi    = dConst pi
  exp   = dlift exp exp
  log   = dlift log recip
  sqrt  = dlift sqrt (recip . (2*) . sqrt)
  sin   = dlift sin cos
  cos   = dlift cos (negate . sin)
  asin  = dlift asin ( x -&gt; recip (sqrt (1 - sqr x)))
  acos  = dlift acos ( x -&gt; - recip (sqrt (1 - sqr x)))
  atan  = dlift atan ( x -&gt; recip (sqr x + 1))
  sinh  = dlift sinh cosh
  cosh  = dlift cosh sinh
  asinh = dlift asinh ( x -&gt; recip (sqrt (sqr x + 1)))
  acosh = dlift acosh ( x -&gt; - recip (sqrt (sqr x - 1)))
  atanh = dlift atanh ( x -&gt; recip (1 - sqr x))
</code></pre>

<p>The necessary recursion has moved out of the lifting function into the class instance (second argument to <code>dlift</code>).</p>

<p>Notice that <code>dlift</code> and the <code>Floating</code> instance are the <em>same code</em> (with minor variations) as in Jerzy&#8217;s section two.
In that section, however, the code computes only first derivatives, while here, we&#8217;re computing all of them.</p>

<h3>Prettier still, with function-level overloading</h3>

<p>The last steps are cosmetic.
The goal is to make the derivative functions used with <code>lift</code> easier to read and write.</p>

<p>Just as we&#8217;ve overloaded numeric operations for derivative towers (<code>Dif</code>), let&#8217;s also overload them for <em>functions</em>.
This trick is often used informally in math.
For instance, given functions <code>f</code> and <code>g</code>, one might write <code>f + g</code> to mean <code>x -&gt; f x + g x</code>.
Using <a href="http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Applicative.html" title="Documentation for Control.Applicative: applicative functors">applicative functor</a> notation makes these instances a breeze to define:</p>

<pre><code>instance Num b =&gt; Num (a-&gt;b) where
  fromInteger = pure . fromInteger
  (+)         = liftA2 (+)
  (*)         = liftA2 (*)
  negate      = fmap negate
  abs         = fmap abs
  signum      = fmap signum
</code></pre>

<p>The other numeric class instances are analogous.
(<em>Any</em> <a href="http://www.haskell.org/ghc/docs/latest/html/libraries/base/Control-Applicative.html" title="Documentation for Control.Applicative: applicative functors">applicative functor</a> can be given these same instance definitions.)</p>

<p>As a final touch, define an infix operator to replace the name &#8220;<code>dlift</code>&#8220;:</p>

<pre><code>infix 0 &gt;-&lt;
(&gt;-&lt;) = dlift
</code></pre>

<p>Now the complete code:</p>

<pre><code>instance Num a =&gt; Num (Dif a) where
  fromInteger               = dConst . fromInteger
  D x0 x' + D y0 y'         = D (x0 + y0) (x' + y')
  D x0 x' - D y0 y'         = D (x0 - y0) (x' - y')
  x@(D x0 x') * y@(D y0 y') = D (x0 * y0) (x' * y + x * y')

  negate = negate &gt;-&lt; -1
  abs    = abs    &gt;-&lt; signum
  signum = signum &gt;-&lt; 0

instance Fractional a =&gt; Fractional (Dif a) where
  fromRational = dConst . fromRational
  recip        = recip &gt;-&lt; - sqr recip

instance Floating a =&gt; Floating (Dif a) where
  pi    = dConst pi
  exp   = exp   &gt;-&lt; exp
  log   = log   &gt;-&lt; recip
  sqrt  = sqrt  &gt;-&lt; recip (2 * sqrt)
  sin   = sin   &gt;-&lt; cos
  cos   = cos   &gt;-&lt; - sin
  sinh  = sinh  &gt;-&lt; cosh
  cosh  = cosh  &gt;-&lt; sinh
  asin  = asin  &gt;-&lt; recip (sqrt (1-sqr))
  acos  = acos  &gt;-&lt; recip (- sqrt (1-sqr))
  atan  = atan  &gt;-&lt; recip (1+sqr)
  asinh = asinh &gt;-&lt; recip (sqrt (1+sqr))
  acosh = acosh &gt;-&lt; recip (- sqrt (sqr-1))
  atanh = atanh &gt;-&lt; recip (1-sqr)
</code></pre>

<p>The operators and literals on the right of the <code>(&gt;-&lt;)</code> are overloaded for the type <code>Dif a -&gt; Dif a</code>.
For instance, in the definition of <code>sqrt</code>,</p>

<pre><code>2     :: Dif a -&gt; Dif a
recip :: (Dif a -&gt; Dif a) -&gt; (Dif a -&gt; Dif a)
(*)   :: (Dif a -&gt; Dif a) -&gt; (Dif a -&gt; Dif a)
      -&gt; (Dif a -&gt; Dif a)
</code></pre>

<h3>Try it</h3>

<p>You can try out this code yourself.
Just grab the source files: <a href='http://conal.net/blog/wp-content/uploads/2008/05/NumInstances.hs'>NumInstances.hs</a> and <a href='http://conal.net/blog/wp-content/uploads/2008/05/Dif.hs'>Dif.hs</a>.
Enjoy!</p>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=43&amp;md5=75941b13bc061408cf92f7ed5b89141f"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/beautiful-differentiation/feed</wfw:commentRss>
		<slash:comments>12</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fbeautiful-differentiation&amp;language=en_GB&amp;category=text&amp;title=Beautiful+differentiation&amp;description=Lately+I%26%238217%3Bve+been+playing+again+with+parametric+surfaces+in+Haskell.+Surface+rendering+requires+normals%2C+which+can+be+constructed+from+partial+derivatives%2C+which+brings+up+automatic+differentiation+%28AD%29.+Playing+with+some...&amp;tags=applicative+functor%2Cbeautiful+code%2Cderivative%2Cblog" type="text/html" />
	</item>
	</channel>
</rss>
