Comments on: What is automatic differentiation, and why does it work?

By: In Heaven Now Are Three « Changing the world

In Heaven Now Are Three « Changing the world — Mon, 09 Feb 2009 19:30:56 +0000

[...] Short summary: I rant a little about how I found out about Automatic Differentiation from Conal Elliott and then from wikipedia. After this I speak a little about how this tool was useful once in my [...]

By: Conal Elliott » Blog Archive » From the chain rule to automatic differentiation

Sun, 08 Feb 2009 22:14:16 +0000

[...] What is automatic differentiation, and why does it work?, I gave a semantic model that explains what automatic differentiation (AD) accomplishes. Correct [...]

By: conal

conal — Mon, 02 Feb 2009 19:29:16 +0000

Simon, Thanks for the tip. Yes, I know Barak (CMU classmates). The issue he raises is "nested AD" -- derivatives of functions built out of derivatives, and the danger of confusing one infinitessimal (perturbation) with another. I've read his Nesting forward-mode AD in a functional framework a couple of time and haven't yet gotten my head around whether the problem is inherent in AD or in a way of attacking it. A related post is Differentiating regions by Chung-chieh Shan.

By: Simon PJ

Simon PJ — Mon, 02 Feb 2009 11:39:14 +0000

Conal, do you know Barak Pearlmutter? http://www-bcl.cs.nuim.ie/~barak/. He’s an expert on AD, and knows Haskell too. Well worth talking to.

My hind-brain memory is that he’s identified something to do with AD that you can nearly-but-not-quite Do Right in Haskell. I never quite got my brain around what that Thing was, but I’m sure he could tell you. It may be that some implementation hack would let you do it Right, but I’m not quite sure what the hack is. It’s cool stuff anyway.

Simon

By: Raoul Duke

Raoul Duke — Thu, 29 Jan 2009 00:11:13 +0000

i read the wikipedia entry

http://en.wikipedia.org/wiki/Automatic_differentiation

and found it not bad. in particular i thought the following really explained AD well vs. the other 2 options:

“AD exploits the fact that any computer program that implements a vector function y = F(x) (generally) can be decomposed into a sequence of elementary assignments, any one of which may be trivially differentiated by a simple table lookup. These elemental partial derivatives, evaluated at a particular argument, are combined in accordance with the chain rule from derivative calculus to form some derivative information for F (such as gradients, tangents, the Jacobian matrix, etc.). This process yields exact (to numerical accuracy) derivatives. Because the symbolic transformation occurs only at the most basic level, AD avoids the computational problems inherent in complex symbolic computation.”