I just reread Jason Foutz’s post Higher order multivariate automatic differentiation in Haskell, as I’m thinking about this topic again. I like his trick of using an
IntMap to hold the partial derivatives and (recursively) the partials of those partials, etc.
I bet one can eliminate the constant (
C) case in Jason’s representation, and hence 3/4 of the cases to handle, without much loss in performance. He already has a fairly efficient representation of constants, which is a
Dwith an empty
I imagine there’s also a nice generalization of the code for combining two finite maps used in his third multiply case. The code’s meaning and correctness follows from a model for those maps as total functions with missing elements denoting a default value (zero in this case).
Jason’s data type reminds me of a sparse matrix representation, but cooler in how it’s infinitely nested. Perhaps depth n (starting with zero) is a sparse n-dimensional matrix.
Finally, I suspect there’s a close connection between Jason’s
IntMap-based implementation and my
LinearMap-based implementation described in Higher-dimensional, higher-order derivatives, functionally and in Simpler, more efficient, functional linear maps. For the case of Rn, my formulation uses a trie with entries for n basis elements, while Jason’s uses an
IntMap(which is also a trie) with n entries (counting any implicit zeros).
I suspect Jason’s formulation is more efficient (since it optimizes the constant case), while mine is more statically typed and more flexible (since it handles more than Rn).
For optimizing constants, I think I’d prefer having a single constructor with a
Maybe for the derivatives, to eliminate code duplication.
I am still trying to understand the paper Lazy Multivariate Higher-Order Forward-Mode AD, with its management of various epsilons.
A final remark: I prefer the term “higher-dimensional” over the traditional “multivariate”. I hear classic syntax/semantics confusion in the latter.