Beautiful differentiation

March 2009

Appeared in ICFP 2009

Abstract

Automatic differentiation (AD) is a precise, efficient, and convenient method for computing derivatives of functions. Its forward-mode implementation can be quite simple even when extended to compute all of the higher-order derivatives as well. The higher-dimensional case has also been tackled, though with extra complexity. This paper develops an implementation of higher-dimensional, higher-order, forward-mode AD in the extremely general and elegant setting of calculus on manifolds and derives that implementation from a simple and precise specification.

In order to motivate and discover the implementation, the paper poses the question “What does AD mean, independently of implementation?” An answer arises in the form of naturality of sampling a function and its derivative. Automatic differentiation flows out of this naturality condition, together with the chain rule. Graduating from first-order to higher-order AD corresponds to sampling all derivatives instead of just one. Next, the setting is expanded to arbitrary vector spaces, in which derivative values are linear maps. The specification of AD adapts to this elegant and very general setting, which even simplifies the development.

Versions:

For ICFP 2009 (260K PDF)
Extended version: LambdaPix technical report 2009-02 (271K PDF) (with minor revisions 2013/07/19)

BibTex

@InProceedings{Elliott2009-beautiful-differentiation,
  author    = {Conal Elliott},
  title     = {Beautiful differentiation},
  booktitle = "International Conference on Functional Programming (ICFP)",
  year      = 2009,
  url       = {http://conal.net/papers/beautiful-differentiation},
}

Errata

Errors and corrections are listed here as they’re reported and fixed.

Version 2009/02/23

Clarified in places that the paper is about forward-mode AD, not general AD.
Clarified somewhat the distinction between calculus on vector spaces and calculus on manifolds (founded on vector spaces).
Page 2, figure 2: add definitions for constD and idD
Figure 5: “dConst” becomes “constD”
Section 6.4, first paragraph: “This only differences” becomes “The only differences”
Section 8: “two-by-three matrix” becomes “three-by-two matrix”
Section 11: “I do know whether” becomes “I don’t know whether”
Section 11: Added reference to Henrik Nilsson’s paper Functional Automatic Differentiation with Dirac Impulses.

Thanks to Anonymous, Barak Pearlmutter, Mark Rafter, and Paul Liu.

Version 2009/02/27

Figure 2, page 2, Fractional instance: “fromInteger” becomes “fromRational”?
Section 9, second paragraph: “an group” becomes “a group”.

Thanks to Freddie Manners and Jared Updike.

Version 2009/03/01

Section 2, paragraph 2, “informality that is is typical”. Drop one “is”.
Section 11, “… and and carefully distinguished …”. Drop one “and”.
Section 11. “this paper based linear maps” –> “this paper (based on linear maps)”.

Thanks to Vincent Kraeutler.

Version 2010/12/22

Figure 2: recip derivative is missing a negation. Should be recip (D x x’) = D (recip x) (- x’ / sqr x)

Thanks to Yrogirg

Version 2010/03/16

Section 10.6, last paragraph, and bibliography. Removed accidental reference of tech report to itself.

Version 2010/04/21

Added a section relating my LinearMap type to traditional linear algebra representations (scalars, column & row vectors, matrices, and beyond). To my astonishment, I realized only the day before my ICFP presentation that the various traditional representations I know fall out of my general representation, thanks to the nature of bases and memo tries.

Version 2010/03/16

At the end of section 5.2.3, the “sufficient definition” should be D x x' * D y y' = D (x * y) (x' * y + y' * x). (Was x + y instead of x * y.)

Thanks to Kirstin Rhys.