Conal Elliott

What is automatic differentiation, and why does it work?

28th January 2009, 12:09 pm

Bertrand Russell remarked that

Everything is vague to a degree you do not realize till you have tried to make it precise.

I’m mulling over automatic differentiation (AD) again, neatening up previous posts on derivatives and on linear maps, working them into a coherent whole for an ICFP submission. I understand the mechanics and some of the reasons for its correctness. After all, it’s "just the chain rule".

As usual, in the process of writing, I bumped up against Russell’s principle. I felt a growing uneasiness and realized that I didn’t understand AD in the way I like to understand software, namely,

What does it mean, independently of implementation?
How do the implementation and its correctness flow gracefully from that meaning?
Where else might we go, guided by answers to the first two questions?

Ever since writing Simply efficient functional reactivity, the idea of type class morphisms keeps popping up for me as a framework in which to ask and answer these questions. To my delight, this framework gives me new and more satisfying insight into automatic differentiation.

Continue reading ‘What is automatic differentiation, and why does it work?’ »

Tags: derivative, semantics, type class morphism | 5 Comments

Comparing formulations of higher-dimensional, higher-order derivatives

24th January 2009, 05:40 pm

I just reread Jason Foutz’s post Higher order multivariate automatic differentiation in Haskell, as I’m thinking about this topic again. I like his trick of using an IntMap to hold the partial derivatives and (recursively) the partials of those partials, etc.

Some thoughts:

I bet one can eliminate the constant (C) case in Jason’s representation, and hence 3/4 of the cases to handle, without much loss in performance. He already has a fairly efficient representation of constants, which is a D with an empty IntMap.
I imagine there’s also a nice generalization of the code for combining two finite maps used in his third multiply case. The code’s meaning and correctness follows from a model for those maps as total functions with missing elements denoting a default value (zero in this case).
Jason’s data type reminds me of a sparse matrix representation, but cooler in how it’s infinitely nested. Perhaps depth n (starting with zero) is a sparse n-dimensional matrix.
Finally, I suspect there’s a close connection between Jason’s IntMap-based implementation and my LinearMap-based implementation described in Higher-dimensional, higher-order derivatives, functionally and in Simpler, more efficient, functional linear maps. For the case of Rⁿ, my formulation uses a trie with entries for n basis elements, while Jason’s uses an IntMap (which is also a trie) with n entries (counting any implicit zeros).

I suspect Jason’s formulation is more efficient (since it optimizes the constant case), while mine is more statically typed and more flexible (since it handles more than Rⁿ).

For optimizing constants, I think I’d prefer having a single constructor with a Maybe for the derivatives, to eliminate code duplication.

I am still trying to understand the paper Lazy Multivariate Higher-Order Forward-Mode AD, with its management of various epsilons.

A final remark: I prefer the term “higher-dimensional” over the traditional “multivariate”. I hear classic syntax/semantics confusion in the latter.

Tags: comparison, derivative, linear map, math | Comment

Fostering creativity by relinquishing the obvious

24th January 2009, 05:08 pm

“It’s obvious”

A recent thread on haskell-cafe included claims and counter-claims of what is “obvious”.

First,

O[b]viously, bottom [⊥] is not ()

Then a second writer,

Why is this obvious – I would argue that it’s “obvious” that bottom is () – the data type definition says there’s only one value in the type. […]

And a third,

It’s obvious because () is a defined value, while bottom is not – per definitionem.

Personally, I have long suffered unaware under the affliction of “obviousness” that ⊥ is not (), and I suspect many others have as well. My thanks to Bob for jostling me out of my preconceptions. After weighing some alternatives, I might make one choice or another. Still, I like being reminded that other possibilities are available.

I don’t mean to pick on these writers. They just happened to provide at-hand examples of a communication (anti-)pattern I often hear. Also, if you want to address the issue of whether necessarily () /= ⊥, I refer you to the haskell-cafe thread “Re: Laws and partial values”.

A fallacy

I understand discussions of what is “obvious” as being founded on a fallacy, namely believing that obviousness is a property of a thing itself, rather than of an individual’s or community’s mental habits (ruts). (See 2-Place and 1-Place Words.)

Creativity

Still, I wondered why I get so annoyed about the uses of “obvious” in this discussion and in others. (It’s never merely because “Someone is wrong on the Internet”.) Whenever I get bugged and I take the time to look under the surface of my emotional reaction, I find there’s something important & beautiful to me.

My reaction to “obvious” comes from my seeing it as injurious to creativity, which is something I treasure in myself and others. I understand “it’s obvious” to mean “I’m not curious”. Worse yet, in a debate, I hear it as a tactic for discouraging curiosity in others.

For me, creativity begins with and is sustained by curiosity. Creativity is what’s important & beautiful to me here. It’s a value instilled thoroughly & joyfully in me from a very young age by my mother, as curiosity was by my father.

I wonder if “It’s obvious that” is one of those verbal devices that are most appealing when untrue. For instance, “I’m sure that”, as in “I’m sure that your [sick] cat will be okay”. Perhaps “It’s obvious that” most often means “I don’t want to imagine alternatives” or, when used in argument, “I don’t want you to imagine alternatives”. Perhaps “I’m sure that” means “I’d like to be sure that”, or “I’d like you to be sure that”.

In praise of curiosity

Here is a quote from Albert Einstein:

The important thing is not to stop questioning. Curiosity has its own reason for existing. One cannot help but be in awe when he contemplates the mysteries of eternity, of life, of the marvelous structure of reality. It is enough if one tries merely to comprehend a little of this mystery every day. Never lose a holy curiosity.

And another:

I have no special talents. I am only passionately curious.

Today I stumbled across a document that speaks to curiosity and more in a way that I resonate with: Twelve Virtues of Rationality. I warmly recommend it.

Tags: creativity, curiosity | 7 Comments

Lazier function definitions by merging partial values

20th January 2009, 05:06 pm

This post continues from an idea of Ryan Ingram’s in an email thread How to make code least strict?.

A pretty story

Pattern matching in function definitions is very handy and has a declarative feel. For instance,

sum []     = 0
sum (x:xs) = x + sum xs

Simply replace “=” by “==” to read such a set of pattern clauses (partial definitions) as a collection of properties specifying a sum function:

The sum of an empty list equals zero
The sum of a (non-empty) list x:xs equals x plus the sum of the xs.

Moreover, these properties define the sum function, in that sum is the least-defined function that satisfies these two properties.

Guards have a similar style and meaning:

abs x | x < 0 = -x
abs x | x >= 0 =  x

Replacing “=” by “==” and guards by logical implication, we again have two properties that define abs:

x < 0 ==> abs x == -x
x >= 0 ==> abs x ==  x

O, the lies!

This pretty story is a lie, as becomes apparent when we look at overlapping clauses. For instance, we’re more likely to write abs without the second guard:

abs x | x < 0 = -x
abs x          =  x

A declarative of the second clause (∀ x. abs x == x) is false.

I’d more likely write

abs x | x < 0     = -x
      | otherwise =  x

which is all the more deceptive, since “otherwise” doesn’t really mean otherwise. It’s just a synonym for “True“.

Another subtle but common problem arises with definitions like the following, as pointed out by ChrisK in How to make code least strict?:

zip :: [a] -> [b] -> [(a,b)]
zip []      _       = []
zip _       []      = []
zip (x:xs') (y:ys') = (x,y) : zip xs' ys'

These three clauses read like independently true properties for zip. The first two clauses overlap, but their values agree, so what could possibly go wrong with a declarative reading?

The problem is that there are really three flavors of lists, not two. This definition explicitly addresses the nil and cons cases, leaving ⊥.

By the definition above, the value of ‘zip [] ⊥‘ is indeed [], which is consistent with each clause. However, the value of ‘zip ⊥ []‘ is ⊥, because Haskell semantics says that each clause is tried in order, and the first clause forces evaluation of ⊥ when comparing it with []. This ⊥ value is inconsistent with reading the second clause as a property. Swapping the first two clauses fixes the second example but breaks the first one.

Is it possible to fix zip so that its meaning is consistent with these three properties? We seem to be stuck with an arbitrary bias, with strictness in the first or second argument.

Or are we?

Continue reading ‘Lazier function definitions by merging partial values’ »

Tags: lub, partial value, unamb | 6 Comments

3D rendering as functional reactive programming

11th January 2009, 09:38 pm

I’ve been playing with a simple/general semantics for 3D. In the process, I was surprised to see that a key part of the semantics looks exactly like a key part of the semantics of functional reactivity as embodied in the library Reactive. A closer look revealed a closer connection still, as described in this post.

Continue reading ‘3D rendering as functional reactive programming’ »

Tags: 3D, FRP, functional reactive programming, monoid, semantics | 11 Comments

Another angle on functional future values

4th January 2009, 08:01 pm

An earlier post introduced functional future values, which are values that cannot be known until the future, but can be manipulated in the present. That post presented a simple denotational semantics of future values as time/value pairs. With a little care in the definition of Time (using the Max monoid), the instances of Functor, Applicative, Monad are all derived automatically.

A follow-up post gave an implementation of Future values via multi threading. Unfortunately, that implementation did not necessarily satisfy the semantics, because it allowed the nondeterminism of thread scheduling to leak through. Although the implementation is usually correct, I wasn’t satisfied.

After a while, I hit upon an idea that really tickled me. My original simple semantics could indeed serve as a correct and workable implementation if I used a subtler form of time that could reveal partial information. Implementing this subtler form of time turned out to be quite tricky, and was my original motivation for the unamb operator described in the paper Push-pull functional reactive programming and the post Functional concurrency with unambiguous choice.

It took me several days of doodling, pacing outside, and talking to myself before the idea for unamb broke through. Like many of my favorite ideas, it’s simple and obvious in retrospect: to remove the ambiguity of nondeterministic choice (as in the amb operator), restrict its use to values that are equal when non-bottom. Whenever we have two different methods of answering the same question (or possibly failing), we can use unamb to try them both. Failures (errors or non-termination) are no problem in this context. A more powerful variation on unamb is the least upper bound operator lub, as described in Merging partial values.

I’ve been having trouble with the unamb implementation. When two (compatible) computations race, the loser gets killed so as to free up cycles that are no longer needed. My first few implementations, however, did not recursively terminate other threads spawned in service of abandoned computations (from nested use of unamb). I raised this problem in Smarter termination for thread racing, which suggested some better definitions. In the course of several helpful reader comments, some problems with my definitions were addressed, particularly in regard to blocking and unblocking exceptions. None of these definitions so far has done the trick reliably, and now it looks like there is a bug in the GHC run-time system. I hope the bug (if there is one) will be fixed soon, because I’m seeing more & more how unamb and lub can make functional programming even more modular (just as laziness does, as explained by John Hughes in Why Functional Programming Matters).

I started playing with future values and unambiguous choice as a way to implement Reactive, a library for functional reactive programming (FRP). (See Reactive values from the future and Push-pull functional reactive programming.) Over the last few days, I’ve given some thought to ways to implement future values without unambiguous choice. This post describes one such alternative.

Edits:

2010-08-25: Replaced references to Simply efficient functional reactivity with Push-pull functional reactive programming. The latter paper supercedes the former.
2010-08-25: Fixed the unFuture field of FutureG to be TryFuture.

Continue reading ‘Another angle on functional future values’ »

Tags: caching, FRP, functional reactive programming, future value, referential transparency, type class morphism, type composition | 11 Comments