Memoizing polymorphic functions – part one

Memoization takes a function and gives back a semantically equivalent function that reuses rather than recomputes when applied to the same argument more than once. Variations include not-quite-equivalence due to added strictness, and replacing value equality with pointer equality.

Memoization is often packaged up polymorphically:

memo :: (???) => (k -> v) -> (k -> v)

For pointer-based (“lazy”) memoization, the type constraint (“???”) is empty. For equality-based memoization, we’d need at least Eq k, and probably Ord k or HasTrie k for efficient lookup (in a finite map or a possibly infinite memo trie).

Although memo is polymorphic, its argument is a monomorphic function. Implementations that use maps or tries exploit that monomorphism in that they use a type like Map k v or Trie k v. Each map or trie is built around a particular (monomorphic) type of keys. That is, a single map or trie does not mix keys of different types.

Now I find myself wanting to memoize polymorphic functions, and I don’t know how to do it.

Continue reading ‘Memoizing polymorphic functions – part one’ »

The C language is purely functional

There has been a scurry of reaction on twitter and reddit to Robert Fischer’s post saying that Scala is Not a Functional Programming Language. James Iry responded by saying that analogous reasoning leads to the conclusion that Erlang is Not Functional

My first inclination was to suggest that Haskell, as commonly practiced (with monadic IO), is not a functional language either. Instead, I’m going to explain how it is that the C language is purely functional.

Continue reading ‘The C language is purely functional’ »

Notions of purity in Haskell

Lately I’ve been learning that some programming principles I treasure are not widely shared among my Haskell comrades. Or at least not widely among those I’ve been hearing from. I was feeling bummed, so I decided to write this post, in order to help me process the news and to see who resonates with what I’m looking for.

One of the principles I’m talking about is that the value of a closed expression (one not containing free variables) depends solely on the expression itself — not influenced by the dynamic conditions under which it is executed. I relate to this principle as the soul of functional programming and of referential transparency in particular.

Edits:

  • 2009-10-26: Minor typo fix

Continue reading ‘Notions of purity in Haskell’ »

Paper: Beautiful differentiation

I have another paper draft for submission to ICFP 2009. This one is called Beautiful differentiation, The paper is a culmination of the several posts I’ve written on derivatives and automatic differentiation (AD). I’m happy with how the derivation keeps getting simpler. Now I’ve boiled extremely general higher-order AD down to a Functor and Applicative morphism.

I’d love to get some readings and feedback. I’m a bit over the page the limit, so I’ll have to do some trimming before submitting.

The abstract:

Automatic differentiation (AD) is a precise, efficient, and convenient method for computing derivatives of functions. Its implementation can be quite simple even when extended to compute all of the higher-order derivatives as well. The higher-dimensional case has also been tackled, though with extra complexity. This paper develops an implementation of higher-dimensional, higher-order differentiation in the extremely general and elegant setting of calculus on manifolds and derives that implementation from a simple and precise specification.

In order to motivate and discover the implementation, the paper poses the question “What does AD mean, independently of implementation?” An answer arises in the form of naturality of sampling a function and its derivative. Automatic differentiation flows out of this naturality condition, together with the chain rule. Graduating from first-order to higher-order AD corresponds to sampling all derivatives instead of just one. Next, the notion of a derivative is generalized via the notions of vector space and linear maps. The specification of AD adapts to this elegant and very general setting, which even simplifies the development.

You can get the paper and see current errata here.

The submission deadline is March 2, so comments before then are most helpful to me.

Enjoy, and thanks!

Denotational design with type class morphisms

I’ve just finished a draft of a paper called Denotational design with type class morphisms, for submission to ICFP 2009. The paper is on a theme I’ve explored in several posts, which is semantics-based design, guided by type class morphisms.

I’d love to get some readings and feedback. Pointers to related work would be particularly appreciated, as well as what’s unclear and what could be cut. It’s an entire page over the limit, so I’ll have to do some trimming before submitting.

The abstract:

Type classes provide a mechanism for varied implementations of standard interfaces. Many of these interfaces are founded in mathematical tradition and so have regularity not only of types but also of properties (laws) that must hold. Types and properties give strong guidance to the library implementor, while leaving freedom as well. Some of the remaining freedom is in how the implementation works, and some is in what it accomplishes.

To give additional guidance to the what, without impinging on the how, this paper proposes a principle of type class morphisms (TCMs), which further refines the compositional style of denotational semantics. The TCM idea is simply that the instance’s meaning is the meaning’s instance. This principle determines the meaning of each type class instance, and hence defines correctness of implementation. In some cases, it also provides a systematic guide to implementation, and in some cases, valuable design feedback.

The paper is illustrated with several examples of type, meanings, and morphisms.

You can get the paper and see current errata here.

The submission deadline is March 2, so comments before then are most helpful to me.

Enjoy, and thanks!

From the chain rule to automatic differentiation

In What is automatic differentiation, and why does it work?, I gave a semantic model that explains what automatic differentiation (AD) accomplishes. Correct implementations then flowed from that model, by applying the principle of type class morphisms. (An instance’s interpretation is the interpretation’s instance).

I’ve had a nagging discomfort about the role of the chain rule in AD, with an intuition that the chain rule can carry a more central role the the specification and implementation. This post gives a variation on the previous AD post that carries the chain rule further into the reasining and implementation, leading to simpler correctness proofs and a nearly unaltered implementation.

Finally, as a bonus, I’ll show how GHC rewrite rules enable an even simpler and more modular implementation.

I’ve included some optional content, including exercises. You can see my answers to the exercises by examining the HTML.

Continue reading ‘From the chain rule to automatic differentiation’ »

Seeking advice on software licensing

Thanks to my time at Microsoft Research (1994-2002), I was able to live for a while (modestly) on asset growth. Last year I started working in earnest on two libraries, Reactive (functional reactive programming) and FieldTrip (functional 3D), plus some supporting libraries. I placed these libraries all under the most liberal license I know, which is BSD3. I’m more enthusiastic than ever about my functional programming work and am enjoying active involvement with the Haskell community.

With the recent economic meltdown, my old means of sustainable income ended, and now I’m looking for a replacement. I’m not yet in crisis, so I have time to make some thoughtful decisions and even take some risk. Rather than abandoning Reactive, FieldTrip, and related projects (some new), I’m looking for ways to continue these projects while building potential for future income related to them. At the same time, it’s very important to me to keep these projects open so as to advance purely functional techniques & tools, as well as to have enjoyable creative connections, and to get feedback & help.

For these reasons, I’m now considering licensing future releases of some my libraries for non-commercial use, with commercial uses to be arranged as separate licenses. I know almost nothing about licensing issues, because I haven’t been interested, and I’ve always wanted maximum freedom for all users.

So, I’m looking for help in choosing a software license that enables & encourages a creative community, while preserving opportunity to ask for some portion of return in future for-profit uses. If people have alternative perspectives how to achieve my goals without changing license terms, I’m interested in hearing them as well. I am not trying to “make a killing”, just a living, so that I can keep doing what I love and share it with others.

What is automatic differentiation, and why does it work?

Bertrand Russell remarked that

Everything is vague to a degree you do not realize till you have tried to make it precise.

I’m mulling over automatic differentiation (AD) again, neatening up previous posts on derivatives and on linear maps, working them into a coherent whole for an ICFP submission. I understand the mechanics and some of the reasons for its correctness. After all, it’s "just the chain rule".

As usual, in the process of writing, I bumped up against Russell’s principle. I felt a growing uneasiness and realized that I didn’t understand AD in the way I like to understand software, namely,

  • What does it mean, independently of implementation?
  • How do the implementation and its correctness flow gracefully from that meaning?
  • Where else might we go, guided by answers to the first two questions?

Ever since writing Simply efficient functional reactivity, the idea of type class morphisms keeps popping up for me as a framework in which to ask and answer these questions. To my delight, this framework gives me new and more satisfying insight into automatic differentiation.

Continue reading ‘What is automatic differentiation, and why does it work?’ »

Comparing formulations of higher-dimensional, higher-order derivatives

I just reread Jason Foutz’s post Higher order multivariate automatic differentiation in Haskell, as I’m thinking about this topic again. I like his trick of using an IntMap to hold the partial derivatives and (recursively) the partials of those partials, etc.

Some thoughts:

  • I bet one can eliminate the constant (C) case in Jason’s representation, and hence 3/4 of the cases to handle, without much loss in performance. He already has a fairly efficient representation of constants, which is a D with an empty IntMap.

  • I imagine there’s also a nice generalization of the code for combining two finite maps used in his third multiply case. The code’s meaning and correctness follows from a model for those maps as total functions with missing elements denoting a default value (zero in this case).

  • Jason’s data type reminds me of a sparse matrix representation, but cooler in how it’s infinitely nested. Perhaps depth n (starting with zero) is a sparse n-dimensional matrix.

  • Finally, I suspect there’s a close connection between Jason’s IntMap-based implementation and my LinearMap-based implementation described in Higher-dimensional, higher-order derivatives, functionally and in Simpler, more efficient, functional linear maps. For the case of Rn, my formulation uses a trie with entries for n basis elements, while Jason’s uses an IntMap (which is also a trie) with n entries (counting any implicit zeros).

I suspect Jason’s formulation is more efficient (since it optimizes the constant case), while mine is more statically typed and more flexible (since it handles more than Rn).

For optimizing constants, I think I’d prefer having a single constructor with a Maybe for the derivatives, to eliminate code duplication.

I am still trying to understand the paper Lazy Multivariate Higher-Order Forward-Mode AD, with its management of various epsilons.

A final remark: I prefer the term “higher-dimensional” over the traditional “multivariate”. I hear classic syntax/semantics confusion in the latter.

Fostering creativity by relinquishing the obvious

“It’s obvious”

A recent thread on haskell-cafe included claims and counter-claims of what is “obvious”.

First,

O[b]viously, bottom [⊥] is not ()

Then a second writer,

Why is this obvious – I would argue that it’s “obvious” that bottom is () – the data type definition says there’s only one value in the type. […]

And a third,

It’s obvious because () is a defined value, while bottom is not – per definitionem.

Personally, I have long suffered unaware under the affliction of “obviousness” that ⊥ is not (), and I suspect many others have as well. My thanks to Bob for jostling me out of my preconceptions. After weighing some alternatives, I might make one choice or another. Still, I like being reminded that other possibilities are available.

I don’t mean to pick on these writers. They just happened to provide at-hand examples of a communication (anti-)pattern I often hear. Also, if you want to address the issue of whether necessarily () /= ⊥, I refer you to the haskell-cafe thread “Re: Laws and partial values”.

A fallacy

I understand discussions of what is “obvious” as being founded on a fallacy, namely believing that obviousness is a property of a thing itself, rather than of an individual’s or community’s mental habits (ruts). (See 2-Place and 1-Place Words.)

Creativity

Still, I wondered why I get so annoyed about the uses of “obvious” in this discussion and in others. (It’s never merely because “Someone is wrong on the Internet”.) Whenever I get bugged and I take the time to look under the surface of my emotional reaction, I find there’s something important & beautiful to me.

My reaction to “obvious” comes from my seeing it as injurious to creativity, which is something I treasure in myself and others. I understand “it’s obvious” to mean “I’m not curious”. Worse yet, in a debate, I hear it as a tactic for discouraging curiosity in others.

For me, creativity begins with and is sustained by curiosity. Creativity is what’s important & beautiful to me here. It’s a value instilled thoroughly & joyfully in me from a very young age by my mother, as curiosity was by my father.

I wonder if “It’s obvious that” is one of those verbal devices that are most appealing when untrue. For instance, “I’m sure that”, as in “I’m sure that your [sick] cat will be okay”. Perhaps “It’s obvious that” most often means “I don’t want to imagine alternatives” or, when used in argument, “I don’t want you to imagine alternatives”. Perhaps “I’m sure that” means “I’d like to be sure that”, or “I’d like you to be sure that”.

In praise of curiosity

Here is a quote from Albert Einstein:

The important thing is not to stop questioning. Curiosity has its own reason for existing. One cannot help but be in awe when he contemplates the mysteries of eternity, of life, of the marvelous structure of reality. It is enough if one tries merely to comprehend a little of this mystery every day. Never lose a holy curiosity.

And another:

I have no special talents. I am only passionately curious.

Today I stumbled across a document that speaks to curiosity and more in a way that I resonate with: Twelve Virtues of Rationality. I warmly recommend it.