Automatic differentiation (AD) is a precise, efficient, and convenient method for computing derivatives of functions. Its forward-mode implementation can be quite simple even when extended to compute all of the higher-order derivatives as well. The higher-dimensional case has also been tackled, though with extra complexity. This paper develops an implementation of higher-dimensional, higher-order, forward-mode AD in the extremely general and elegant setting of

calculus on manifoldsand derives that implementation from a simple and precise specification.In order to motivate and discover the implementation, the paper poses the question “What does AD mean, independently of implementation?” An answer arises in the form of

naturalityof sampling a function and its derivative. Automatic differentiation flows out of this naturality condition, together with the chain rule. Graduating from first-order to higher-order AD corresponds to sampling all derivatives instead of just one. Next, the setting is expanded to arbitrary vector spaces, in which derivative values are linear maps. The specification of AD adapts to this elegant and very general setting, which evensimplifiesthe development.

Versions:

- For ICFP 2009 (260K PDF)
- Extended version: LambdaPix technical report 2009-02 (271K PDF) (with minor revisions 2013/07/19)

See also:

- Blog post
- Slides for ICFP talk
- Video of ICFP talk
- Updated slides (last revised 2013-07-19)

```
@InProceedings{Elliott2009-beautiful-differentiation,
author = {Conal Elliott},
title = {Beautiful differentiation},
booktitle = "International Conference on Functional Programming (ICFP)",
year = 2009,
url = {http://conal.net/papers/beautiful-differentiation},
}
```

Errors and corrections are listed here as they’re reported and fixed.

- Clarified in places that the paper is about
*forward-mode*AD, not general AD. - Clarified somewhat the distinction between calculus on vector spaces and calculus on manifolds (founded on vector spaces).
- Page 2, figure 2: add definitions for
*constD*and*idD* - Figure 5: “
*dConst*” becomes “*constD*” - Section 6.4, first paragraph: “This only differences” becomes “The only differences”
- Section 8: “two-by-three matrix” becomes “three-by-two matrix”
- Section 11: “I do know whether” becomes “I don’t know whether”
- Section 11: Added reference to Henrik Nilsson’s paper
*Functional Automatic Differentiation with Dirac Impulses*.

Thanks to Anonymous, Barak Pearlmutter, Mark Rafter, and Paul Liu.

- Figure 2, page 2,
`Fractional`

instance: “*fromInteger*” becomes “*fromRational*”? - Section 9, second paragraph: “an group” becomes “a group”.

Thanks to Freddie Manners and Jared Updike.

- Section 2, paragraph 2, “informality that is is typical”. Drop one “is”.
- Section 11, “… and and carefully distinguished …”. Drop one “and”.
- Section 11. “this paper based linear maps” –> “this paper (based on linear maps)”.

Thanks to Vincent Kraeutler.

- Figure 2:
*recip*derivative is missing a negation. Should be*recip (D x x’) = D (recip x) (- x’ / sqr x)*

Thanks to Yrogirg

- Section 10.6, last paragraph, and bibliography. Removed accidental reference of tech report to itself.

- Added a section relating my
`LinearMap`

type to traditional linear algebra representations (scalars, column & row vectors, matrices, and beyond). To my astonishment, I realized only the day before my ICFP presentation that the various traditional representations I know fall out of my general representation, thanks to the nature of bases and memo tries.

- At the end of section 5.2.3, the “sufficient definition” should be
`D x x' * D y y' = D (x * y) (x' * y + y' * x)`

. (Was`x + y`

instead of`x * y`

.)

Thanks to Kirstin Rhys.