Automatic differentiation (AD) is a precise, efficient, and convenient method for computing derivatives of functions. Its forward-mode implementation can be quite simple even when extended to compute all of the higher-order derivatives as well. The higher-dimensional case has also been tackled, though with extra complexity. This paper develops an implementation of higher-dimensional, higher-order, forward-mode AD in the extremely general and elegant setting of calculus on manifolds and derives that implementation from a simple and precise specification.
In order to motivate and discover the implementation, the paper poses the question “What does AD mean, independently of implementation?” An answer arises in the form of naturality of sampling a function and its derivative. Automatic differentiation flows out of this naturality condition, together with the chain rule. Graduating from first-order to higher-order AD corresponds to sampling all derivatives instead of just one. Next, the setting is expanded to arbitrary vector spaces, in which derivative values are linear maps. The specification of AD adapts to this elegant and very general setting, which even simplifies the development.
Versions:
See also:
@InProceedings{Elliott2009-beautiful-differentiation,
author = {Conal Elliott},
title = {Beautiful differentiation},
booktitle = "International Conference on Functional Programming (ICFP)",
year = 2009,
url = {http://conal.net/papers/beautiful-differentiation},
}
Errors and corrections are listed here as they’re reported and fixed.
Thanks to Anonymous, Barak Pearlmutter, Mark Rafter, and Paul Liu.
Fractional
instance: “fromInteger” becomes “fromRational”?Thanks to Freddie Manners and Jared Updike.
Thanks to Vincent Kraeutler.
Thanks to Yrogirg
LinearMap
type to traditional linear algebra representations (scalars, column & row vectors, matrices, and beyond). To my astonishment, I realized only the day before my ICFP presentation that the various traditional representations I know fall out of my general representation, thanks to the nature of bases and memo tries.D x x' * D y y' = D (x * y) (x' * y + y' * x)
. (Was x + y
instead of x * y
.)Thanks to Kirstin Rhys.