Bertrand Russell remarked that
Everything is vague to a degree you do not realize till you have tried to make it precise.
I’m mulling over automatic differentiation (AD) again, neatening up previous posts on derivatives and on linear maps, working them into a coherent whole for an ICFP submission. I understand the mechanics and some of the reasons for its correctness. After all, it’s "just the chain rule".
As usual, in the process of writing, I bumped up against Russell’s principle. I felt a growing uneasiness and realized that I didn’t understand AD in the way I like to understand software, namely,
- What does it mean, independently of implementation?
- How do the implementation and its correctness flow gracefully from that meaning?
- Where else might we go, guided by answers to the first two questions?
Ever since writing Simply efficient functional reactivity, the idea of type class morphisms keeps popping up for me as a framework in which to ask and answer these questions. To my delight, this framework gives me new and more satisfying insight into automatic differentiation.