Conal Elliott » memoization

Memoizing polymorphic functions via unmemoization

Conal — Mon, 27 Sep 2010 22:38:46 +0000

Last year I wrote two posts on memoizing polymorphic functions. The first post gave a solution of sorts that uses a combination of patricia trees (for integer-keyed maps), stable names, and type equality proofs. The second post showed how to transform some functions that do not quite fit the required form so that they do fit.

Dan Piponi wrote a follow-up post Memoizing Polymorphic Functions with High School Algebra and Quantifiers showing a different approach that was more in the spirit of type-directed functional memoization, as it follows purely from mathematical properties, free of the deeply operational magic of stable names. Recently, I finally studied and worked with Dan’s post enough to understand what he did. It’s very clever and beautiful!

This post re-presents Dan’s elegant insight as I understand it, via some examples that helped it come together for me.

Playing with quantifiers

Consider the type of fmap:

fmap :: Functor f ⇒ (a → b) → f a → f b

and make the type quantification explicit

fmap :: ∀ f a b. Functor f ⇒ (a → b) → f a → f b

Since the Functor constraint depends only on f, restrict the scope of the other quantifiers:

fmap :: ∀ f. Functor f ⇒ ∀ a b. (a → b) → f a → f b

We cannot further restrict the scope of a or b, because the first argument of the function involves both a and b. The second argument, however, ignores b, so let’s next flip the arguments.

flip fmap :: ∀ f. Functor f ⇒ ∀ a b. f a → (a → b) → f b

and restrict the scope of b:

flip fmap :: ∀ f. Functor f ⇒ ∀ a. f a → ∀ b. (a → b) → f b

Next introduce names:

type Yo f a = ∀ b. (a → b) → f b

toYo :: Functor f ⇒ ∀ a. f a → Yo f a
toYo = flip fmap

So flip fmap is a way to map f a to Yo f a. The reverse is easier:

fromYo :: ∀ a. Yo f a → f a
fromYo f = f id

GHC’s type-checker won’t really let this toYo definition fly, because of persnickety details about universal quantification. Instead, inline flip and simplify:

toYo x p = fmap p x

The Yoneda lemma

Are the functions fromYo and toYo inverses of each other? The Yoneda lemma from category theory says yes. The following proof is taken from Dan’s post:

  fromYo (toYo x)
≡ fromYo (λ p → fmap p x)
≡ (λ p → fmap p x) id
≡ fmap id x
≡ x                    -- a functor law

  toYo (fromYo f)
≡ toYo (f id)
≡ λ p → fmap p (f id)
≡ λ p → f (fmap p id)  -- f is polymorphic/natural
≡ λ p → f (p ∘ id)     -- fmap on functions is (∘)
≡ λ p → f p
≡ f

Note the critical role of naturality, which holds of parametrically polymorphic functions in Haskell. To say that a function f is natural means that it commutes with fmap p, for all functions p, i.e.,

fmap p ∘ f  ≡  f ∘ fmap p

Playing with the Yoneda lemma

With a little type massaging, the Yoneda lemma can be applied more broadly than its literal form. For instance, consider the type (∀ b. f b) for a functor f. Does it have a simpler isomorphic form?

The Yoneda lemma applies to types of the form ∀ b. (a → b) → f b for some type a. We don’t have a function to f b, but we could fake one by introducing a unit argument:

  ∀ b. f b
≅ ∀ b. 1 → f b

Warning: this step is only correct for strict functions, so I’m fudging here.

I’m using “1” for the Haskell type (). I’ll also use “×” and “+” for product and sum types in these examples, in place of the Haskell names “(,)” and “Either“. For parsing the examples, × binds more tightly than than +, which binds more tightly than →.

We haven’t gotten to the required form yet, because we have 1 where we need a → b for some a. Hm. For what type a is there only one function from a to b, where b is arbitrary type? By choosing b to have at least two elements, we can make different functions of type a → b by mapping some element of a to different elements of b. Unless a has no elements at all!

Let 0 be the type with no elements:

data 0

This type is usually named “Void” in Haskell.

Since there are no values of type 0, there is exactly one function from 0 to any other type b, i.e., 1 ≅ 0 → b. (This type isomorphism corresponds to a familiar equality on numbers.) Thus continuing our definition from above

⋯ ≅ ∀ b. (0 → b) → f b

Now we have the form covered directly by the Yoneda lemma

⋯ ≅ f 0

Oh dear, I keep lying to you. The step from 1 to 0 → b is also valid only for strict functions, or for a 0 without ⊥.

Replaying this derivation,

  ∀ b. f b
≅ ∀ b. 1 → f b
≅ ∀ b. (0 → b) → f b
≅ f 0

Next try a simpler example: ∀ a. a. This time, temporarily introduce the identity functor Id:

  ∀ a. a
≅ ∀ a. Id a
≅ Id 0
≅ 0

which is correct, as there is no non-⊥ value of type ∀ a. a.

Next try ∀ a. a → a:

  ∀ a. a → a
≅ ∀ a. (1 → a) → Id a
≅ Id 1
≅ 1

And indeed, there is only one (non-⊥) function of type ∀ a. a → a.

Here’s a simple specialization of the Yoneda lemma (also mentioned in Dan’s post):

  ∀ b. (a → b) → b
≅ ∀ b. (a → b) → Id b
≅ Id a
≅ a

These examples prove the existence of bijection, but we can also synthesize the bijections, as Dan Piponi demonstrates. The bijections then let us memoize.

Memoizing polymorphic functions via unmemoization

Let’s look at the Yoneda lemma again. Recall

type Yo f a = ∀ b. (a → b) → f b

For all natural transformations f (in Haskell polymorphic functions “with no funny stuff”), the Yoneda lemma says that

Yo f a ≅ f a

where the conversions between Yo f a and f a are very simple: ($ id) and flip fmap.

Now suppose we want to memoize a polymorphic function ∀ a. T a → f a. The Yoneda lemma suggests that we find some Q such that T a ≅ Q → a, in which case

  ∀ a. T a → f a
≅ ∀ a. (Q → a) → f a
≅ f Q

How do Q and T relate? T is a memo table for Q. In other words, Q is the type of indices into T. In still other words, Q is an unmemoization of T.

This trick is from Dan Piponi’s post Memoizing Polymorphic Functions with High School Algebra and Quantifiers. We started with a polymorphic function, and we ended up with a data structure, i.e., we have memoized. But the critical step one the route to memoization was unmemoization (of T). Delightful!

We saw some very simple examples in the previous section. Let’s next look at some more examples.

Pairs

How can we memoize polymorphic functions on pairs, i.e., functions of type ∀ a. a × a → f a?

We can work out an answer by appealing to the same laws of exponents used in memoization, but now applied in reverse (to create function types instead of eliminate them).

  a × a
≅ (1 → a) × (1 → a)
≅ (1 + 1) → a
≅ Bool → a

Again, I’m handling only strict functions here.

Therefore,

(∀ a. a × a → f a)  ≅  f Bool

Spelling out this conclusion (once more for practice),

  ∀ a. a × a      → f a
≅ ∀ a. (Bool → a) → f a
≅ f Bool

A variation on this derivation is to make the pairing functor explicit as

data P a = P a a

or, using functor combinators,

type P = Id :*: Id

Now imagine we have a type family UnTrie that acts as an inverse to Trie

type UnTrie P = UnTrie (Id :*: Id)
              = UnTrie Id + UnTrie Id
              = 1 + 1
              ≅ Bool

Streams

A type of streams (infinite-only lazy lists) is like the type of lists but without a nil case:

data Stream a = Cons a (Stream a)

How can we memoize ∀ a. Stream a → f a? We simply ask: for what type is Stream a memo table? Or: what type naturally indexes a stream? An answer is natural numbers in Peano form:

data Nat = Zero | Succ Nat

Memoizing Nat → a as in Elegant memoization with higher-order types leads to Stream a.

Therefore,

∀ a. Stream a → f a  ≅  f Nat

In case we didn’t think of Nat right off, we can derive it systematically. We’ll appeal to the same three laws of exponents used in memoization, but now applied in reverse (to create function types instead of eliminate them).

Look for a type N such that N → a ≅ Stream a.

  N → a
≅ Stream a
≅ a × Stream a
≅ (1 → a) × (N → a)   --  coinductively
≅ (1 + N) → a

A sufficient condition then is that N ≅ 1 + N, which condition then translates to a recursive data type definition:

data N = Z | S N

data Nat = Zero | Succ Nat

Infinite binary trees

Next let’s look at one form of binary trees, having values at each node and only infinite paths.

data BTree a = BTree a (BTree a) (BTree a)

and let B be an unmemoized form of BTree, or type of indices of BTree, i.e., B → a ≅ BTree a.

  B → a
≅ BTree a
≅ a × BTree a × BTree a
≅ (1 → a) × (B → a) × (B → a)
≅ (1 + B + B) → a

A sufficient definition for B is then B = 1 + B + B. Or in legal Haskell form (no recursive type synonyms):

data B = Z | E B | O B

We can think of this data type as representing natural numbers in little endian binary. Expanding the type and constructor names:

data BinNat = Zero | Even BinNat | Odd BinNat

A variation on infinite binary trees

We might have written our tree type differently:

data BTree a = BTree a (P (BTree a))

where P is the pairing functor. Again, let B be the sought index type for (unmemoization of) BTree

  B → a
≅ BTree a
≅ a × P (BTree a)
≅ (1 → a) × (Bool → BTree a)
≅ (1 → a) × (Bool → B → a)     - coinductively
≅ (1 → a) × (Bool × B → a)
≅ 1 + Bool × B → a

which then leads to a slightly different representation for our index type:

data BinNat = Zero | NonZero Bool BinNat

We’ve made the bit type explicit. In this form, BinNat is a little-endian list of bits, so we can redefine it as such:

type BinNat = [Bool]

and the desired isomorphism

(∀ a. BTree a → f a)  ≅  f [Bool]

Generalized infinite trees

The functor P and the type Bool have an important relationship to each other in the previous derivation, namely Bool is the index type for P. We can play this same trick for all index types and their corresponding trie (memoization) functors. Generalize from binary trees:

data Tree d a = Tree a (d :→: Tree d a)

where “k :→: v” is the type of memo tries over domain type k, with range v. (The trie structure is driven entirely by k.) See other posts on memoization.

These generalized trees are indexed by little-endian natural numbers over a “digit” type d. Generalizing the proof for binary trees we get

Tree d a ≅ [d] → a

Details:

  Tree d a
≅ a × (d :→: Tree d a)
≅ a × (d → Tree d a)
≅ a × (d → [d] → a)   - coinductively
≅ a × (d × [d] → a)
≅ (1 → a) × (d × [d] → a)
≅ (1 + d × [d] → a)
≅ [d] → a

So the Yoneda lemma tells us that

(∀ a. Tree d a → f a) ≅ f [d]

Specializing, we get binary trees indexed by binary numbers, and streams (“unary trees”) indexed by unary numbers:

type BTree = Tree Bool
type Stream = Tree 1

For rose trees we would need an index type d whose memo tries are lists, i.e., d :→: v ≅ [v] for all types v. I don’t think there is such a type d, since [v] is isomorphic to a sum type, and memoization–like exponentiation-doesn’t produce sums. I’ll return to this question below.

`Maybe`

Let’s try something a bit simpler: memoizing ∀ a. Maybe a → f a. Of what domain type is Maybe a a memo table?

Oops. None of the memoizing transformations generates a sum type such as Maybe. So we’ll have to try something else. First split the domain sum, yielding a product, and then proceed with the summands.

  ∀ a. Maybe a → f a
≅ ∀ a. (1 + a) → f a
≅ ∀ a. (1 → f a) × (a → f a)
≅ ∀ a. f a × (a → f a)
≅ (∀ a. f a) × (∀ a. a → f a)
≅ f 0 × f 1

Lists

(Even more than the previous sections, this one is a regurgitation of material from Dan’s post.)

We’ve covered Maybe and Stream above, so lists might work out with a combination of techniques.

  ∀ a. [a] → f a
≅ ∀ a. (1 + a × [a]) → f a
≅ ∀ a. (1 → f a) × (a × [a] → f a)
≅ ∀ a. f a × (a × [a] → f a)
≅ (∀ a. f a) × (∀ a. a × [a] → f a)
≅ f 0 × (∀ a. a × [a] → f a)

How to continue from here? The remaining polymorphic function doesn’t look quite like what we started with. Dan’s trick was to generalize (as Pólya recommended), replacing [a] with a^n × [a], starting the process with n ≡ 0. Dan generalized even more, where n is an arbitrary type, to (n → a) × [a], starting with n ≡ 0 (zero).

type T f n = ∀ a. (n → a) × [a] → f a

Then

  T f n
≡ ∀ a. (n → a) × [a] → f a
≅ ∀ a. (n → a) × (1 + a × [a]) → f a
≅ ∀ a. (n → a) × 1 + (n → a) × a × [a] → f a
≅ ∀ a. (n → a) + (n → a) × a × [a] → f a
≅ ∀ a. (n → a) + (n → a) × (1 → a) × [a] → f a
≅ ∀ a. (n → a) + ((n + 1 → a) × [a] → f a
≅ ∀ a. ((n → a) → f a) × ((n + 1 → a) × [a] → f a)
≅ (∀ a. (n → a) → f a) × (∀ a. (n + 1 → a) × [a] → f a)
≅ f n × (∀ a. (n + 1 → a) × [a] → f a)
≡ f n × T f (n + 1)

Inductively, this isomorphism gives rise to an function-free type:

type Q f n = f n × Q f (n + 1)

for which

Q f n ≅ T f n

Since Haskell doesn’t handle recursive type synonyms, instead define a new data type, as in Dan’s post:

data Q f n = Q (f n) (Q f (n + 1))

Since n + 1 ≅ Maybe n, we could instead define

data Q f n = Q (f n) (Q f (Maybe n))

As Dan described, the trick in this derivation is to rework the type [a] in polynomial form [a] ≅ 1 + a + a^2 + a^3 + ⋯. Then memoization works out fairly easily:

(∀ a. [a] → f a) ≅ f 0 × f 1 × f 2 × ...

Where the types 0, 1, 2, … have 0, 1, 2, … elements, respectively. We can choose 2 = Maybe 1, 3 = Maybe 2, … or 2 = 1 + 1, 3 = 2 + 1, ….

Where to go from here?

Algebraic types

I’m not entirely satisfied with this solution to memoizing polymorphic functions over lists (and other algebraic types). I want to find another angle that doesn’t require this conversion to polynomial form. Partly because I don’t think it handles infinite lists (as Dan mentioned). I suppose one could just add the infinite case:

(∀ a. [a] → f a) ≅ (f 0 × f 1 × f 2 × ...) × f Nat ≅ Q f n × f Nat

How to justify/derive this isomorphism? And how to get non-strict memoization, including sharing of work for computations that use bounded input?

I also want a representation that’s friendly to non-strict memoization, sharing work on common prefixes.

Constructing isomorphisms

As always, type isomorphism is at the core of functional memoization. However, it’s not enough to show that isomorphisms exist. We have to construct the isomorphisms in order to implement a harder memoization problem via an easier one.

Dan’s post builds up some isomorphisms. The process is rather tedious, and I’d like to make it much easier. I’ve tinkered with making this process much more elegant by applying Semantic editor combinators in their generalized form (as in the paper Tangible Functional Programming), using bijections as the deep arrow. So far, however, I’m struggling unsuccessfully against awkwardness in the handling of explicit polymorphism (higher-rank types).

Automation

In my other posts on memoization, functions are memoized in a fully automatic, type-driven way. Can the same be done somehow with polymorphic functions?

Non-strictness

I fudged treatment of non-strictness above. Can non-strict polymorphic functions be handled correctly and elegantly, perhaps with a simple combination of previous techniques or perhaps with new ones?

Other memoization tools?

Dan’s use of the Yoneda lemma to memoize polymorphic functions (via unmemoization) came as a surprise to me. Are there other deep properties of typed, pure functional programming that give rise to additional tools for memoization? One place to start looking is additional free theorems.

Relationship to numeric representations

In Purely Functional Data Structures, Chris Okasaki points out a correspondence between number representation schemes and collection data structures. How does this correspondence relate to memoization?

Fixing broken isomorphisms — details for non-strict memoization, part 2

Conal — Wed, 22 Sep 2010 23:02:26 +0000

The post Details for non-strict memoization, part 1 works out a systematic way of doing non-strict memoization, i.e., correct memoization of non-strict (and more broadly, non-hyper-strict) functions. As I mentioned at the end, there was an awkward aspect, which is that the purported “isomorphisms” used for regular types are not quite isomorphisms.

For instance, functions from triples are memoized by converting to and from nested pairs:

untriple ∷ (a,b,c) -> ((a,b),c)
untriple (a,b,c) = ((a,b),c)

triple ∷ ((a,b),c) -> (a,b,c)
triple ((a,b),c) = (a,b,c)

Then untriple and triple form an embedding/projection pair, i.e.,

triple ∘ untriple ≡ id
untriple ∘ triple ⊑ id

The reason for the inequality is that the nested-pair form permits (⊥,c), which does not correspond to any triple.

untriple (triple (⊥,c)) ≡ untriple ⊥ ≡ ⊥

Can we patch this problem by simply using an irrefutable (lazy) pattern in the definition of triple, i.e., triple (~(a,b),c) = (a,b,c)? Let’s try:

untriple (triple (⊥,c)) ≡ untriple (⊥,⊥,c) ≡ ((⊥,⊥),c)

So isomorphism fails and so does even the embedding/projection property.

Similarly, to deal with regular algebraic data types, I used a class that describes regular data types as repeated applications of a single, associated pattern functor (following A Lightweight Approach to Datatype-Generic Rewriting):

class Functor (PF t) ⇒ Regular t where
  type PF t ∷ * → *
  unwrap ∷ t → PF t t
  wrap   ∷ PF t t → t

Here unwrap converts a value into its pattern functor form, and wrap converts back. For example, here is the Regular instance I had used for lists:

instance Regular [a] where
  type PF [a] = Const () :+: Const a :*: Id

  unwrap []     = InL (Const ())
  unwrap (a:as) = InR (Const a :*: Id as)

  wrap (InL (Const ()))          = []
  wrap (InR (Const a :*: Id as)) = a:as

Again, we have an embedding/projection pair, rather than a genuine isomorphism:

wrap ∘ unwrap ≡ id
unwrap ∘ wrap ⊑ id

The inequality comes from ⊥ values occurring in PF [a] [a] at type Const () [a], (), (Const a :*: Id) [a], Const a [a], or Id [a].

Why care?

What harm results from the lack of genuine isomorphism? For hyper-strict functions, as usually handled (correctly) in memoization, I don’t think there is any harm. For correct memoization of non-hyper-strict functions, however, the superfluous points of undefinedness lead to larger memo tries and wasted effort. For instance, a function from triples goes through some massaging on the way to being memoized:

λ (a,b,c) → ⋯
⇓
λ ((a,b),c) → ⋯
⇓
λ (a,b) → λ c → ⋯

For hyper-strict memoization, the next step transforms to λ a → λ b → λ c → ⋯. For non-strict memoization, however, we first stash away the value of the function applied to ⊥ ∷ (a,b), which will always be ⊥ in this context.

Strict products and sums

To eliminate the definedness discrepancy and regain isomorphism, we might make all non-strictness explicit via unlifted product & sums, and explicit lifting.

-- | Add a bottom to a type
data Lift a = Lift { unLift ∷ a } deriving Functor

infixl 6 :+:!
infixl 7 :*:!

-- | Strict pair
data a :*! b = !a :*! !b

-- | Strict sum
data a :+! b = Left' !a | Right' !b

Note that the Id and Const a functors used in canonical representations are already strict, as they’re defined via newtype.

With these new tools, we can decompose isomorphically. For instance,

(a,b,c) ≅ Lift a :*! Lift b :*! Lift c

with the isomorphism given by

untriple' ∷ (a,b,c) -> Lift a :*! Lift b :*! Lift c
untriple' (a,b,c) = Lift a :*! Lift b :*! Lift c

triple' ∷ Lift a :*! Lift b :*! Lift c -> (a,b,c)
triple' (Lift a :*! Lift b :*! Lift c) = (a,b,c)

For regular types, we’ll also want variations as functor combinators:

-- | Strict product functor
data (f :*:! g) a = !(f a) :*:! !(g a) deriving Functor

-- | Strict sum functor
data (f :+:! g) a = InL' !(f a) | InR' !(g a) deriving Functor

Then change the Regular instance on lists to the following:

instance Regular [a] where
  type PF [a] = Const () :+:! Const (Lift a) :*:! Lift

  unwrap []     = InL' (Const ())
  unwrap (a:as) = InR' (Const (Lift a) :*:! Lift as)

  wrap (InL' (Const ()))                    = []
  wrap (InR' (Const (Lift a) :*:! Lift as)) = a:as

I suppose it would be fairly straightforward to derive such instances for algebraic data types automatically via Template Haskell.

Tries for non-strict memoization

As in part 1, represent a non-strict memo trie for a function f ∷ k -> v as a value for f ⊥ and a strict (but not hyper-strict) memo trie for f:

type k :→: v = Trie v (k :→ v)

For non-strict sum domains, the strict memo trie was a pair of non-strict tries:

instance (HasTrie a, HasTrie b) ⇒ HasTrie (Either a b) where
  type STrie (Either a b) = Trie a :*: Trie b
  sTrie   f           = trie (f ∘ Left) :*: trie (f ∘ Right)
  sUntrie (ta :*: tb) = untrie ta `either` untrie tb

For non-strict product, the strict trie was a composition of non-strict tries:

instance (HasTrie a, HasTrie b) => HasTrie (a , b) where
  type STrie (a , b) = Trie a :. Trie b
  sTrie   f = O (fmap trie (trie (curry f)))
  sUntrie (O tt) = uncurry (untrie (fmap untrie tt))

What about strict sum and product domains? Since strict sums & products cannot contain ⊥ as their immediate components, we can omit the values corresponding to ⊥ for those components. That is, we can use pairs and compositions of strict tries instead.

instance (HasTrie a, HasTrie b) => HasTrie (a :+! b) where
  type STrie (a :+! b) = STrie a :*: STrie b
  sTrie   f           = sTrie (f . Left') :*: sTrie (f . Right')
  sUntrie (ta :*: tb) = sUntrie ta `either'` sUntrie tb

instance (HasTrie a, HasTrie b) => HasTrie (a :*! b) where
  type STrie (a :*! b) = STrie a :. STrie b
  sTrie   f      = O (fmap sTrie (sTrie (curry' f)))
  sUntrie (O tt) = uncurry' (sUntrie (fmap sUntrie tt))

I’ve also substituted versions of curry and uncurry for strict products and either for strict sums:

curry' ∷ (a :*! b -> c) -> (a -> b -> c)
curry' f a b = f (a :*! b)

uncurry' ∷ (a -> b -> c) -> ((a :*! b) -> c)
uncurry' f (a :*! b) = f a b

either' ∷ (a -> c) -> (b -> c) -> (a :+! b -> c)
either' f _ (Left'  a) = f a
either' _ g (Right' b) = g b

We’ll also need to handle the lifting functor. The type Lift a has an additional bottom. A strict function or trie over Lift a is only strict in the lower (outer) one. So a strict trie over Lift a is simply a non-strict trie over a.

instance HasTrie a => HasTrie (Lift a) where
  type STrie (Lift a) = Trie a
  sTrie   f = trie (f . Lift)
  sUntrie t = untrie t . unLift

Notice that this instance puts back exactly what was lost from memo tries when going from non-strict products and sums to strict products and sums. The reason for this relationship is explained in the following simple isomorphisms:

(a,b)      ≅ Lift a :*! Lift b
Either a b ≅ Lift a :+! Lift b

Then isomorphisms can then be used to implement memoize over non-strict products and sums via memoization over strict products and sums.

Higher-order memoization

The post Memoizing higher-order functions suggested a simple way to memoize functions over function-valued domains by using (as always) type isomorphisms. The isomorphism used is between functions and memo tries.

I gave one example in that post

ft1 ∷ (Bool → a) → [a]
ft1 f = [f False, f True]

In retrospect, this example was a lousy choice, as it hides an important problem. The Bool type is finite, and so the corresponding trie type has only finitely large elements. For that reason, higher-order memoization can get away with the usual hyper-strict memoization.

If instead, we try memoizing a function of type (a → b) → c, where the type a has infinitely many elements (e.g., Integer or [Bool]), then we’ll have to memoize over the domain a :→: b (memo tries from a to b), which includes infinite elements. In that case, hyper-strict memoization blows up, so we’ll want to use non-strict memoization instead.

As mentioned above, the type of non-strict tries contains a value and a strict trie:

type k :→: v = Trie v (k :→ v)

I thought I’d memoize by mapping to & from the isomorphic pair type (v, k :→ v). However, now I’m not satisfied with this mapping. A non-strict trie from k to v is not just any such pair of v and k :→ v. Monotonicity requires that the single v value (for ⊥) be a lower bound (information-wise) of every v in the trie. Ignoring this constraint would lead to a trie in which most of the entries do not correspond to any non-strict memo trie.

Puzzle: Can this constraint be captured as a static type in modern Haskell’s (GHC’s) type system (i.e., without resorting to general dependent typing)? I don’t know the answer.

Memoizing abstract types

This problem is more wide-spread still. Whenever there are constraints on a representation beyond what is expressed directly and statically in the representation type, we will have this same sort of isomorphism puzzle. Can we capture the constraint as a Haskell type? When we cannot, what do we do?

If we didn’t care about efficiency, I think we could ignore the issue, and everything else in this blog post, and accept making memo tries that are much larger than necessary. Although laziness will keep from filling in range values for unaccessed domain values, I worry that there will be quite a lot time and space wasted navigating past large portions of unusable trie structure.

Details for non-strict memoization, part 1

Conal — Tue, 27 Jul 2010 00:14:20 +0000

In Non-strict memoization, I sketched out a means of memoizing non-strict functions. I gave the essential insight but did not show the details of how a nonstrict memoization library comes together. In this new post, I give details, which are a bit delicate, in terms of the implementation described in Elegant memoization with higher-order types.

Near the end, I run into some trouble with regular data types, which I don’t know how to resolve cleanly and efficiently.

Edits:

2010-09-10: Fixed minor typos.

Hyper-strict memo tries

Strict memoization (really hyper-strict) is centered on a family of trie functors, defined as a functor Trie k, associated with a type k.

type k :→: v = Trie k v

class HasTrie k where
    type Trie k :: * → *
    trie   :: (k  →  v) → (k :→: v)
    untrie :: (k :→: v) → (k  →  v)

The simplest instance is for the unit type:

instance HasTrie () where
  type Trie ()  = Id
  trie   f      = Id (f ())
  untrie (Id v) = λ () → v

For consistency with other types, I just made a small change from the previous version, which used const v instead of the stricter λ () → v.

Sums and products are a little more intricate:

instance (HasTrie a, HasTrie b) ⇒ HasTrie (Either a b) where
  type Trie (Either a b) = Trie a :*: Trie b
  trie   f           = trie (f ∘ Left) :*: trie (f ∘ Right)
  untrie (ta :*: tb) = untrie ta `either` untrie tb

instance (HasTrie a, HasTrie b) ⇒ HasTrie (a , b) where
  type Trie (a , b) = Trie a :. Trie b
  trie   f = O (trie (trie ∘ curry f))
  untrie (O tt) = uncurry (untrie ∘ untrie tt)

These trie types are not just strict, they’re hyper-strict. During trie search, arguments get thorougly evaluated. (See Section 9 in the paper Denotational design with type class morphisms.) In other words, all of the points of possible undefinedness are lost.

Strict and non-strict memo tries

The formulation of strict tries will look very like the hyper-strict tries we’ve already seen, with new names for the associated trie type and the conversion methods:

type k :→ v = STrie k v

class HasTrie k where
    type STrie k :: * → *
    sTrie   ::             (k  → v) → (k :→ v)
    sUntrie :: HasLub v ⇒ (k :→ v) → (k  → v)

Besides renaming, I’ve also added a HasLub constraint for sUntrie, which we’ll need later.

For instance, the (almost) simplest strict trie is the one for the unit type, defined exactly as before (with new names):

instance HasTrie () where
  type STrie ()  = Id
  sTrie   f      = Id (f ())
  sUntrie (Id v) = λ () → v

For non-strict memoization, we’ll want to recover all of the points of possible undefinedness lost in hyper-strict memoization. At every level of a structured value, there is the possibility of ⊥ or of a non-⊥ value. Correspondingly, a non-strict trie consists of the value corresponding to the argument ⊥, together with a strict (but not hyper-strict) trie for the non-⊥ values:

data Trie k v = Trie v (k :→ v)

type k :→: v = Trie k v

The conversions between functions and non-strict tries are no longer methods, as they can be defined uniformly for all domain types. To form a non-strict trie, capture the function’s value at ⊥, and build a strict (but not hyper-strict) trie:

trie   :: (HasTrie k          ) ⇒ (k  →  v) → (k :→: v)
trie f = Trie (f ⊥) (sTrie f)

To convert back from a non-strict trie to a (now memoized) function, combine the information from two sources: the original function’s value at ⊥, and the function resulting from the strict (but not hyper-strict) trie:

untrie :: (HasTrie k, HasLub v) ⇒ (k :→: v) → (k  →  v)
untrie (Trie b t) = const b ⊔ sUntrie t

The least-upper-bound (⊔) here is well-defined because its arguments are information-compatible (consistent, non-contradictory). More strongly, const b ⊑ sUntrie t, i.e., the first argument is an information approximation to (contains no information absent from) the second argument. Now we see the need for HasLub v in the type of sUntrie above: functions are ⊔-able exactly when their result types are.

Sums

Just as non-strict tries contain strict tries, so also strict tries contain non-strict tries. For instance, consider a sum type, Either a b. An element is either ⊥ or Left x or Right y, for x :: a and y :: b. The types a and b also contain a bottom element, so we’ll need non-strict memo tries for them:

instance (HasTrie a, HasTrie b) ⇒ HasTrie (Either a b) where
  type STrie (Either a b) = Trie a :*: Trie b
  sTrie   f           = trie (f ∘ Left) :*: trie (f ∘ Right)
  sUntrie (ta :*: tb) = untrie ta `either` untrie tb

Just as in the unit instance (above), the only visible change from hyper-strict to strict is that the left-hand sides use the strict trie type and operations. The right-hand sides are written exactly as before, though now they refer to non-strict tries and their operations.

Products

With product, we run into some trouble. As a first attempt, change only the names on the left-hand side:

instance (HasTrie a, HasTrie b) ⇒ HasTrie (a , b) where
  type STrie (a , b) = Trie a :. Trie b
  sTrie   f      = O (trie (trie ∘ curry f))
  sUntrie (O tt) = uncurry (untrie ∘ untrie tt)

This sUntrie definition, however, leads to an error in type-checking:

Could not deduce (HasLub (Trie b v)) from the context (HasLub v)
  arising from a use of `untrie'

The troublesome untrie use is the one applied directly to tt. (Thank you for column numbers, GHC.)

So what’s going on here? Since sUntrie in this definition takes a (a,b) :→ v, or equivalently, STrie (a,b) v,

O tt :: (a,b) :→ v
     :: STrie (a,b) v
     :: (Trie a :. Trie b) v

The definition of type composition (from an earlier post) is

newtype (g :. f) x = O (g (f x))

tt :: Trie a (Trie b v)
   :: a :→: b :→: v

and

untrie tt :: HasLub (b :→: v) ⇒ a → (b :→: v)

The HasLub constraint comes from the type of untrie (above).

Continuing,

untrie ∘ untrie tt ::
  (HasLub v, HasLub (b :→: v)) ⇒ a → (b → v)

uncurry (untrie ∘ untrie tt) ::
  (HasLub v, HasLub (b :→: v)) ⇒ (a , b) → v

which is almost the required type but contains the extra requirement that HasLub (b :→: v).

Hm.

Looking at the definition of Trie and the definitions of STrie for various domain types b, I think it’s the case that HasLub (b :→: v), whenever HasLub v, exactly as needed. In principle, I could make this requirement of b explicit as a superclass for HasTrie:

class (forall v. HasLub v ⇒ HasLub (b :→: v)) ⇒ HasTrie k where ...

However, Haskell’s type system isn’t quite expressive enough, even with GHC extensions (as far as I know).

A possible solution

We could instead define a functor-level variant of HasLub:

class HasLubF f where
  lubF :: HasLub v ⇒ f v → f v → f v

and then use lubF instead of (⊔) in sUntrie. The revised HasTrie class definition:

class HasLubF (Trie k) ⇒ HasTrie k where
    type STrie k :: * → *
    sTrie   ::             (k  → v) → (k :→ v)
    sUntrie :: HasLub v ⇒ (k :→ v) → (k  → v)

I would rather not replicate and modify the HasLub class and all of its instances, so I’m going to set this idea aside and look for another.

Another route

Let’s return to the problematic definition of sUntrie for pairs:

sUntrie (O tt) = uncurry (untrie ∘ untrie tt)

and recall that tt :: a :→: b :→: v. The strategy here was to first convert the outer trie (with domain a) and then the inner trie (with domain b).

Alternatively, we might reverse the order.

If we’re going to convert inside-out instead of outside-in, then we’ll need a way to transform each of the range elements of a trie. Which is exactly what fmap is for. If only we had a functor instance for Trie a, then we could re-define sUntrie on pair tries as follows:

sUntrie (O tt) = uncurry (untrie (fmap untrie tt))

As a sanity check, try compiling this definition. Sure enough, it’s okay except for a missing Functor instance:

Could not deduce (Functor (Trie a))
  from the context (HasTrie (a, b), HasTrie a, HasTrie b)
  arising from a use of `fmap'

Fixed easily enough:

instance Functor (STrie k) ⇒ Functor (Trie k) where
  fmap f (Trie b t) = Trie (f b) (fmap f t)

Or even, using the GHC language extensions DeriveFunctor and StandaloneDeriving, just

deriving instance Functor (STrie k) ⇒ Functor (Trie k)

Now we get a slightly different error message. We’re now missing a Functor instance for STrie a instead of Trie a:

Could not deduce (Functor (STrie a))
  from the context (HasTrie (a, b), HasTrie a, HasTrie b)
  arising from a use of `fmap'

By the way, we can also construct tries inside-out, if we want:

sTrie f = O (fmap trie (trie (curry f)))

So we’ll be in good shape if we can satisfy the Functor requirement on strict tries. Fortunately, all of the strict trie (higher-order) types appearing are indeed functors, since we built them up using functor combinators.

Still, we’ll have to help the type-checker prove that all of the trie types it involved must indeed be functors. Again, a superclass constraint can capture this requirement:

class Functor (STrie k) ⇒ HasTrie k where ...

Unlike HasLub, this time the required constraint is already at the functor level, so we don’t have to define a new class. We don’t even have to define any new instances, as our functor combinators come with Functor instances, all of which can be derived automatically by GHC.

With this one change, all of the HasTrie instances go through!

Isomorphisms

As pointed out in Memoizing higher-order functions, type isomorphism is the central, repeated theme of functional memoization. In addition to the isomorphism between functions and tries, the tries for many types are given via isomorphism with other types that have tries. In this way, we only have to define tries for our tiny set of functor combinators.

Isomorphism support is as in Elegant memoization with higher-order types, just using the new names:

#define HasTrieIsomorph(Context,Type,IsoType,toIso,fromIso) 
instance Context ⇒ HasTrie (Type) where { 
  type STrie (Type) = STrie (IsoType); 
  sTrie f = sTrie (f ∘ (fromIso)); 
  sUntrie t = sUntrie t ∘ (toIso); 
}

Note the use of strict tries even on the right-hand sides.

Aside: as mentioned in Composing memo tries, trie/untrie forms not just an isomorphism but a pair of type class morphisms (TCMs). (For motivation and examples of TCMs in software design, see Denotational design with type class morphisms.)

Regular data types

Regular data types are isomorphic to fixed-points of functors. Elegant memoization with higher-order types gives a brief introduction to these notions and pointers to more information. That post also shows how to use the Regular type class and its instances (defined for other purposes as well) to provide hyper-strict memo tries for all regular data types.

Switching from hyper-strict to non-strict raises an awkward issue. The functor isomorphisms we used are only correct for fully defined data-types. When we allow full or partial undefinedness, as in a lazy language like Haskell, our isomorphisms break down.

Following A Lightweight Approach to Datatype-Generic Rewriting, here is the class I used, where “PF” stands for “pattern functor”:

class Functor (PF t) ⇒ Regular t where
  type PF t :: * → *
  unwrap :: t → PF t t
  wrap   :: PF t t → t

The unwrap method peels off a single layer from a regular type. For example, the top level of a list is either a unit (nil) or a pair (cons) of an element and a hole in which a list can be placed.

instance Regular [a] where
  type PF [a] = Unit :+: Const a :*: Id   -- note Unit == Const ()

  unwrap []     = InL (Const ())
  unwrap (a:as) = InR (Const a :*: Id as)

  wrap (InL (Const ()))          = []
  wrap (InR (Const a :*: Id as)) = a:as

The catch here is that the unwrap and wrap methods do not really form an isomorphism. Instead, they satisfy a weaker connection: they form embedding/projection pair. That is,

wrap ∘ unwrap ≡ id
unwrap ∘ wrap ⊑ id

To see the mismatch between [a] and PF [a] [a], note that the latter has opportunities for partial undefinedness that have no corresponding opportunities in [a]. Specifically, ⊥ could occur at type Const () [a], (), (Const a :*: Id) [a], Const a [a], or Id [a]. Any of these ⊥ values will result in wrap returning ⊥ altogether. For instance, if

oops :: PF [Integer]
oops = InR (⊥ :*: Id [3,5])

then

unwrap (wrap oops) ≡ unwrap ⊥ ≡ ⊥ ⊑ oops

By examining various cases, we can prove that unwrap (wrap p) ⊑ p for all p, which is to say unwrap ∘ wrap ⊑ id, since information ordering on functions is defined point-wise. (See Merging partial values.)

Examining the definition of unwrap above shows that it does not give rise to the troublesome ⊥ points, and so a trivial equational proof shows that wrap ∘ unwrap ≡ id.

In the context of memoization, the additional undefined values are problematic. Consider the case of lists. The specification macro

HasTrieRegular1([], ListSTrie)

expands into a newtype and its HasTrie instance. Changing only the associated type and method names in the version for hyper-strict memoization:

newtype ListSTrie a v = ListSTrie (PF [a] [a] :→: v)

instance HasTrie a ⇒ HasTrie [a] where
  type STrie [a] = ListSTrie a
  sTrie f = ListSTrie (sTrie (f . wrap))
  sUntrie (ListSTrie t) = sUntrie t . unwrap

Note that the trie in ListSTrie trie contains entries for many ⊥ sub-elements that do not correspond to any list values. The memoized function is f ∘ wrap, which will have many fewer ⊥ possibilities than the trie structure supports. At each of the superfluous ⊥ points, the function sampled is strict, so the Trie (rather than STrie) will contain a predictable ⊥. Considering the definition of untrie:

untrie (Trie b t) = const b ⊔ sUntrie t

we know b ≡ ⊥, and so const b ⊔ sUntrie t ≡ sUntrie t. Thus, at these points, the ⊥ value is never helpful, and we could use a strict (though not hyper-strict) trie instead of a non-strict trie.

Perhaps we could safely ignore this whole issue and lose only some efficiency, rather than correctness. Still, I’d rather build and traverse just the right trie for our regular types.

As this post is already longer than I intended, and my attention is wandering, I’ll publish it here and pick up later. Comments & suggestions please!

Memoizing higher-order functions

Conal — Wed, 21 Jul 2010 15:41:17 +0000

Memoization incrementally converts functions into data structures. It pays off when a function is repeatedly applied to the same arguments and applying the function is more expensive than accessing the corresponding data structure.

In lazy functional memoization, the conversion from function to data structure happens all at once from a denotational perspective, and incrementally from an operational perspective. See Elegant memoization with functional memo tries and Elegant memoization with higher-order types.

As Ralf Hinze presented in Generalizing Generalized Tries, trie-based memoization follows from three simple isomorphisms involving functions types:

1 → a ≅ a

(a + b) → c ≅ (a → c) × (b → c)

(a × b) → c ≅ a → (b → c)

which correspond to the familiar laws of exponents

a ^ 1 = a

c^a + b = c^a × c^b

c^a × b = (c^b)^a

When applied as a transformation from left to right, each law simplifies the domain part of a function type. Repeated application of the rules then eliminate all function types or reduce them to functions of atomic types. These atomic domains are eliminated as well by additional mappings, such as between a natural number and a list of bits (as in patricia trees). Algebraic data types corresponding to sums of products and so are eliminated by the sum and product rules. Recursive algebraic data types (lists, trees, etc) give rise to correspondingly recursive trie types.

So, with a few simple and familiar rules, we can memoize functions over an infinite variety of common types. Have we missed any?

Yes. What about functions over functions?

Edits:

2010-07-22: Made the memoization example polymorphic and switched from pairs to lists. The old example accidentally coincided with a specialized version of trie itself.
2011-02-27: updated some notation

Tries

In Elegant memoization with higher-order types, I showed a formulation of functional memoization using functor combinators.

type k ↛ v = Trie k v

class HasTrie k where
  type Trie k ∷ * → *
  trie   ∷ (k → v) → (k ↛ v)
  untrie ∷ (k ↛ v) → (k → v)

I will describe higher-order memoization in terms of this formulation. I imagine it would also work out, though less elegantly, in the associated data types formulation described in Elegant memoization with functional memo tries.

Domain isomorphisms

Elegant memoization with higher-order types showed how to define a HasTrie instance in terms of the instance of an isomorphic type, e.g., reducing tuples to nested pairs or booleans to a sum of unit types. A C macro, HasTrieIsomorph encapsulates the domain isomorphism technique. For instance, to reduce triples to pairs:

HasTrieIsomorph( (HasTrie a, HasTrie b, HasTrie c), (a,b,c), ((a,b),c)
               , λ (a,b,c) → ((a,b),c), λ ((a,b),c) → (a,b,c))

This isomorphism technique applies as well to the standard functor combinators used for constructing tries (any many other purposes). Those combinators again:

data Const x a = Const x

data Id a = Id a

data (f × g) a = f a × g a

data (f + g) a = InL (f a) | InR (g a)

newtype (g ∘ f) a = O (g (f a))

and their trie definitions:

HasTrieIsomorph( HasTrie a, Const a x, a, getConst, Const )

HasTrieIsomorph( HasTrie a, Id a, a, unId, Id )

HasTrieIsomorph( (HasTrie (f a), HasTrie (g a))
               , (f × g) a, (f a,g a)
               , λ (fa × ga) → (fa,ga), λ (fa,ga) → (fa × ga) )

HasTrieIsomorph( (HasTrie (f a), HasTrie (g a))
               , (f + g) a, Either (f a) (g a)
               , eitherF Left Right, either InL InR )

HasTrieIsomorph( HasTrie (g (f a))
               , (g ∘ f) a, g (f a) , unO, O )

The eitherF function is a variation on either:

eitherF ∷ (f a → b) → (g a → b) → (f + g) a → b
eitherF p _ (InL fa) = p fa
eitherF _ q (InR ga) = q ga

Higher-order memoization

Now higher-order memoization is easy. Apply yet another isomorphism, this time between functions and tries: The trie and untrie methods are exactly the mappings we need.

HasTrieIsomorph( (HasTrie a, HasTrie (a ↛ b))
               , a → b, a ↛ b, trie, untrie)

So, to memoize a higher-order function f ∷ (a → b) → v, we only a trie type for a and one for a ↛ b. The latter (tries for trie-valued domains) are provided by the isomorphisms above, and additional ones.

Demo

Our sample higher-order function will take a function of booleans and yield its value at False and at True:

ft1 ∷ (Bool → a) → [a]
ft1 f = [f False, f True]

A sample input converts False to 0 and True to 1:

f1 ∷ Bool → Int
f1 False = 0
f1 True  = 1

A sample run without memoization:

*FunctorCombo.MemoTrie> ft1 f1
[0,1]

and one with memoization:

*FunctorCombo.MemoTrie> memo ft1 f1
[0,1]

To illustrate what's going on behind the scenes, the following definitions (all of which type-check) progressively reveal the representation of the underlying memo trie. Most steps result from inlining a single Trie definition (as well as switching between Trie k v and the synonymous form k ↛ v).

trie1a ∷ HasTrie a ⇒ (Bool → a) ↛ (a, a)
trie1a = trie ft1

trie1b ∷ HasTrie a ⇒ (Bool ↛ a) ↛ (a, a)
trie1b = trie1a

trie1c ∷ HasTrie a ⇒ (Either () () ↛ a) ↛ (a, a)
trie1c = trie1a

trie1d ∷ HasTrie a ⇒ ((Trie () × Trie ()) a) ↛ (a, a)
trie1d = trie1a

trie1e ∷ HasTrie a ⇒ (Trie () a, Trie () a) ↛ (a, a)
trie1e = trie1a

trie1f ∷ HasTrie a ⇒ (() ↛ a, () ↛ a) ↛ (a, a)
trie1f = trie1a

trie1g ∷ HasTrie a ⇒ (a, a) ↛ (a, a)
trie1g = trie1a

trie1h ∷ HasTrie a ⇒ (Trie a ∘ Trie a) (a, a)
trie1h = trie1a

trie1i ∷ HasTrie a ⇒ a ↛ a ↛ (a, a)
trie1i = unO trie1a

Pragmatics

I'm happy with the correctness and elegance of the method in this post. It gives me the feeling of inevitable simplicity that I strive for -- obvious in hindsight. What about performance? After all, memoization is motivated by a desire to efficiency -- specifically, to reduce the cost of repeatedly applying the same function to the same argument, while keeping almost all of the modularity & simplicity of a naïve algorithm.

Memoization pays off when (a) a function is repeatedly applied to some arguments, and (b) when the cost of recomputing an application exceeds the cost of finding the previously computed result. (I'm over-simplifying here. Space efficiency matters also and can affect time efficiency.) The isomorphism technique used in this post and a previous one requires transforming an argument to the isomorphic type for each look-up and from the isomorphic type for each application. (I'm using "isomorphic type" to mean the type for which a HasTrie instance is already defined.) When these transformations are between function and trie form, I wonder how high the break-even threshold becomes.

How might we avoid these transformations, thus reducing the overhead of memoizing?

For conversion to isomorphic type during trie lookup, perhaps the cost could be reduced substantially through deforestation--inlining chains of untrie methods and applying optimizations to eliminate the many intermediate representation layers. GHC has gotten awfully good at this sort of thing. Maybe someone with more Haskell performance analysis & optimization experience than I have would be interested in collaborating.

For trie construction, I suspect the conversion back from the isomorphic type could be avoided by somehow holding onto the original form of the argument, before it was converted to the isomorphic type. I haven't attempted this idea yet.

Another angle on reducing the cost of the isomorphism technique is to use memoization! After all, if memoizing is worthwhile at all, there will be repeated applications of the memoized function to the same arguments. Exactly in such a case, the conversion of arguments to isomorphic form will also be done repeatedly for these same arguments. When a conversion is both expensive and repeated, we'd like to memoize. I don't know how to get off the ground with this idea, however. If I'm trying to memoize a function of type a → b, then the required conversion has type a → a' for some type a' with a HasTrie instance. Memoizing that conversion is just as hard as memoizing the function we started with.

Conclusion

Existing accounts of functional memoization I know of cover functions of the unit type, sums, and products, and they do so quite elegantly.

Type isomorphisms form the consistent, central theme in this work. Functions from unit, sums and products have isomorphic forms with simpler domain types (and so on, recursively). Additional isomorphisms extend these fundamental building blocks to many other types, including integer types and algebraic data types. However, functions over function-valued domains are conspicuously missing (though I hadn't noticed until recently). This post fills that gap neatly, using yet another isomorphism, and moreover an isomorphism that has been staring us in the face all along: the one between functions and tries.

I wonder:

Given how this trick shouts to be noticed, has it been discovered and written up?
How useful will higher-order memoization turn out to be?
How efficient is the straightforward implementation given above?
Can the conversions between isomorphic domain types be done inexpensively, perhaps eliminating many altogether?
How does [non-strict memoization][] fit in with higher-order memoization?

Elegant memoization with higher-order types

Conal — Wed, 21 Jul 2010 04:48:22 +0000

A while back, I got interested in functional memoization, especially after seeing some code from Spencer Janssen using the essential idea of Ralf Hinze’s paper Generalizing Generalized Tries. The blog post Elegant memoization with functional memo tries describes a library, MemoTrie, based on both of these sources, and using associated data types. I would have rather used associated type synonyms and standard types, but I couldn’t see how to get the details to work out. Recently, while playing with functor combinators, I realized that they might work for memoization, which they do quite nicely.

This blog post shows how functor combinators lead to an even more elegant formulation of functional memoization. The code is available as part of the functor-combo package.

The techniques in this post are not so much new as they are ones that have recently been sinking in for me. See Generalizing Generalized Tries, as well as Generic programming with fixed points for mutually recursive datatypes.

Edits:

2011-01-28: Fixed small typo: “b^^a^^” ⟼ “b^a“
2010-09-10: Corrected Const definition to use newtype instead of data.
2010-09-10: Added missing Unit type definition (as Const ()).

Tries as associated data type

The MemoTrie library is centered on a class HasTrie with an associated data type of tries (efficient indexing structures for memoized functions):

class HasTrie k where
    data (:→:) k :: * → *
    trie   :: (k  →  v) → (k :→: v)
    untrie :: (k :→: v) → (k  →  v)

The type a :→: b represents a trie that maps values of type a to values of type b. The trie representation depends only on a.

Memoization is a simple combination of these two methods:

memo :: HasTrie a ⇒ (a → b) → (a → b)
memo = untrie . trie

The HasTrie instance definitions correspond to isomorphisms invoving function types. The isomorphisms correspond to the familiar rules of exponents, if we translate a → b into b^a. (See Elegant memoization with functional memo tries for more explanation.)

instance HasTrie () where
    data () :→: x = UnitTrie x
    trie f = UnitTrie (f ())
    untrie (UnitTrie x) = const x

instance (HasTrie a, HasTrie b) ⇒ HasTrie (Either a b) where
    data (Either a b) :→: x = EitherTrie (a :→: x) (b :→: x)
    trie f = EitherTrie (trie (f . Left)) (trie (f . Right))
    untrie (EitherTrie s t) = either (untrie s) (untrie t)

instance (HasTrie a, HasTrie b) ⇒ HasTrie (a,b) where
    data (a,b) :→: x = PairTrie (a :→: (b :→: x))
    trie f = PairTrie (trie (trie . curry f))
    untrie (PairTrie t) = uncurry (untrie .  untrie t)

Functors and functor combinators

For notational convenience, let “(:→:)” be a synonym for “Trie“:

type k :→: v = Trie k v

And replace the associated data with an associated type.

class HasTrie k where
    type Trie k :: * → *
    trie   :: (k  →  v) → (k :→: v)
    untrie :: (k :→: v) → (k  →  v)

Then, imitating the three HasTrie instances above,

type Trie () v = v

type Trie (Either a b) v = (Trie a v, Trie b v)

type Trie (a,b) v = Trie a (Trie b v)

Imagine that we have type lambdas for writing higher-kinded types.

type Trie () = λ v → v

type Trie (Either a b) = λ v → (Trie a v, Trie b v)

type Trie (a,b) = λ v → Trie a (Trie b v)

Type lambdas are often written as “Λ” (capital “λ”) instead. In the land of values, these three right-hand sides correspond to common building blocks for functions, namely identity, product, and composition:

id      = λ v → v
f *** g = λ v → (f v, g v)
g  .  f = λ v → g (f v)

These building blocks arise in the land of types.

newtype Id a = Id a

data (f :*: g) a = f a :*: g a

newtype (g :. f) a = O (g (f a))

where Id, f and g are functors. Sum and a constant functor are also common building blocks:

data (f :+: g) a = InL (f a) | InR (g a)

newtype Const x a = Const x

type Unit = Const () -- one non-⊥ inhabitant

Tries as associated type synonym

Given these standard definitions, we can eliminate the special-purpose data types used, replacing them with our standard functor combinators:

instance HasTrie () where
  type Trie ()  = Id
  trie   f      = Id (f ())
  untrie (Id v) = const v

instance (HasTrie a, HasTrie b) => HasTrie (Either a b) where
  type Trie (Either a b) = Trie a :*: Trie b
  trie   f           = trie (f . Left) :*: trie (f . Right)
  untrie (ta :*: tb) = untrie ta `either` untrie tb

instance (HasTrie a, HasTrie b) ⇒ HasTrie (a , b) where
  type Trie (a , b) = Trie a :. Trie b
  trie   f      = O (trie (trie . curry f))
  untrie (O tt) = uncurry (untrie . untrie tt)

At first blush, it might appear that we’ve simply moved the data type definitions outside of the instances. However, the extracted functor combinators have other uses, as explored in polytypic programming. I’ll point out some of these uses in the next few blog posts.

Isomorphisms

Many types are isomorphic variations, and so their corresponding tries can share a common representation. For instance, triples are isomorphic to nested pairs:

detrip :: (a,b,c) → ((a,b),c)
detrip (a,b,c) = ((a,b),c)

trip :: ((a,b),c) → (a,b,c)
trip ((a,b),c) = (a,b,c)

A trie for triples can be a a trie for pairs (already defined). The trie and untrie methods then just perform conversions around the corresponding methods on pairs:

instance (HasTrie a, HasTrie b, HasTrie c) ⇒ HasTrie (a,b,c) where
    type Trie (a,b,c) = Trie ((a,b),c)
    trie f = trie (f . trip)
    untrie t = untrie t . detrip

All type isomorphisms can use this same pattern. I don’t think Haskell is sufficiently expressive to capture this pattern within the language, so I’ll resort to a C macro. There are five parameters:

Context: the instance context;
Type: the type whose instance is being defined;
IsoType: the isomorphic type;
toIso: conversion function to IsoType; and
fromIso: conversion function from IsoType.

The macro:

#define HasTrieIsomorph(Context,Type,IsoType,toIso,fromIso)  
instance Context ⇒ HasTrie (Type) where {  
  type Trie (Type) = Trie (IsoType);  
  trie f = trie (f . (fromIso));  
  untrie t = untrie t . (toIso);  
}

Now we can easily define HasTrie instances:

HasTrieIsomorph( (), Bool, Either () ()
               ,  c -> if c then Left () else Right ()
               , either ( () -> True) ( () -> False))

HasTrieIsomorph( (HasTrie a, HasTrie b, HasTrie c), (a,b,c), ((a,b),c)
               , λ (a,b,c) → ((a,b),c), λ ((a,b),c) → (a,b,c))

HasTrieIsomorph( (HasTrie a, HasTrie b, HasTrie c, HasTrie d)
               , (a,b,c,d), ((a,b,c),d)
               , λ (a,b,c,d) → ((a,b,c),d), λ ((a,b,c),d) → (a,b,c,d))

In most (but not all) cases, the first argument (Context) could simply be that the isomorphic type HasTrie, e.g.,

HasTrieIsomorph( HasTrie ((a,b),c), (a,b,c), ((a,b),c)
               , λ (a,b,c) → ((a,b),c), λ ((a,b),c) → (a,b,c))

We could define another macro that captures this pattern and requires one fewer argument. On the other hand, there is merit to keeping the contextual requirements explicit.

Regular data types

A regular data type is one in which the recursive uses are at the same type. Functions over such types are often defined via monomorphic recursion. Data types that do not satisfy this constraint are called “nested“.

As in several recent generic programming systems, regular data types can be encoded generically through a type class that unwraps one level of functor from a type. The regular data type is the fixpoint of that functor. See, e.g., Polytypic programming in Haskell. Adopting the style of A Lightweight Approach to Datatype-Generic Rewriting,

class Functor (PF t) ⇒ Regular t where
  type PF t :: * → *
  wrap   :: PF t t → t
  unwrap :: t → PF t t

Here “PF” stands for “pattern functor”.

The pattern functors can be constructed out of the functor combinators above. For instance, a list at the top level is either empty or a value and a list. Translating this description:

instance Regular [a] where
  type PF [a] = Unit :+: Const a :*: Id

  unwrap []     = InL (Const ())
  unwrap (a:as) = InR (Const a :*: Id as)

  wrap (InL (Const ()))          = []
  wrap (InR (Const a :*: Id as)) = a:as

As another example, consider rose trees ([Data.Tree][]):

data Tree  a = Node a [Tree a]

instance Regular (Tree a) where

  type PF (Tree a) = Const a :*: []

  unwrap (Node a ts) = Const a :*: ts

  wrap (Const a :*: ts) = Node a ts

Regular types allow for even more succinct HasTrie instance implementations. Specialize HasTrieIsomorph further:

#define HasTrieRegular(Context,Type)  
HasTrieIsomorph(Context, Type, PF (Type) (Type) , unwrap, wrap)

For instance, for lists and rose trees:

HasTrieRegular(HasTrie a, [a])
HasTrieRegular(HasTrie a, Tree a)

The HasTrieRegular macro could be specialized even further for single-parameter polymorphic data types:

#define HasTrieRegular1(TypeCon) HasTrieRegular(HasTrie a, TypeCon a)

HasTrieRegular1([])
HasTrieRegular1(Tree)

You might wonder if I’m cheating here, by claiming very simple trie specifications when I’m really just shuffling code around. After all, the complexity removed from HasTrie instances shows up in Regular instances. The win in making this shuffle is that Regular is handy for other purposes, as illustrated in Generic programming with fixed points for mutually recursive datatypes (including fold, unfold, and fmap). (More examples in A Lightweight Approach to Datatype-Generic Rewriting.)

Trouble

Sadly, these elegant trie definitions have a problem. Trying to compile them leads to a error message from GHC. For instance,

Nested type family application
  in the type family application: Trie (PF [a] [a])
(Use -XUndecidableInstances to permit this)

Adding UndecidableInstances silences this error message, but leads to nontermination in the compiler.

Expanding definitions, I can see the likely cause of nontermination. The definition in terms of a type family allows an infinite type to sneak through, and I guess GHC’s type checker is unfolding infinitely.

As a simpler example:

{-# LANGUAGE TypeFamilies, UndecidableInstances #-}

type family List a :: *

type instance List a = Either () (a, List a)

-- Hangs ghc 6.12.1:
nil :: List a
nil = Left ()

A solution

Since GHC’s type-checker cannot handle directly recursive types, perhaps we can use a standard avoidance strategy, namely introducing a newtype or data definition to break the cycle. For instance, as a trie for [a], we got into trouble by using the trie of the unwrapped form of [a], i.e., Trie (PF [a] [a]). So instead,

newtype ListTrie a v = ListTrie (Trie (PF [a] [a]) v)

which is to say

newtype ListTrie a v = ListTrie (PF [a] [a] :→: v)

Now wrap and unwrap as before, and add & remove ListTrie as needed:

instance HasTrie a ⇒ HasTrie [a] where
  type Trie [a] = ListTrie a
  trie f = ListTrie (trie (f . wrap))
  untrie (ListTrie t) = untrie t . unwrap

Again, abstract the boilerplate code into a C macro:

#define HasTrieRegular(Context,Type,TrieType,TrieCon) 
newtype TrieType v = TrieCon (PF (Type) (Type) :→: v); 
instance Context ⇒ HasTrie (Type) where { 
  type Trie (Type) = TrieType; 
  trie f = TrieCon (trie (f . wrap)); 
  untrie (TrieCon t) = untrie t . unwrap; 
}

For instance,

HasTrieRegular(HasTrie a, [a] , ListTrie a, ListTrie)
HasTrieRegular(HasTrie a, Tree, TreeTrie a, TreeTrie)

Again, simplify a bit with a specialization to unary regular types:

#define HasTrieRegular1(TypeCon,TrieCon) 
HasTrieRegular(HasTrie a, TypeCon a, TrieCon a, TrieCon)

And then use the following declarations instead:

HasTrieRegular1([]  , ListTrie)
HasTrieRegular1(Tree, TreeTrie)

Similarly for binary etc as needed.

The second macro parameter (TrieCon) is just a name, which I don’t to be used other than in the macro-generated code. It could be eliminated, if there were a way to gensym the name. Perhaps with Template Haskell?

Conclusion

I like the elegance of constructing memo tries in terms of common functor combinators. Standard pattern functors allow for extremely succinct trie specifications for regular data types. However, these specifications lead to nontermination of the type checker, which can then be avoided by the standard trick of introducing a newtype to break type recursion. As often, this trick brings introduces some clumsiness. Perhaps the problem can also be avoided by using a formulation using bifunctors, as in Design Patterns as Higher-Order Datatype-Generic Programs and Polytypic programming in Haskell, which allows the fixed-point nature of regular data types to be exposed.

Non-strict memoization

Conal — Wed, 14 Jul 2010 02:46:23 +0000

I’ve written a few posts about functional memoization. In one of them, Luke Palmer commented that the memoization methods are correct only for strict functions, which I had not noticed before. In this note, I correct this flaw, extending correct memoization to non-strict functions as well. The semantic notion of least upper bound (which can be built of unambiguous choice) plays a crucial role.

Edits:

2010-07-13: Fixed the non-strict memoization example to use an argument of undefined (⊥) as intended.
2010-07-23: Changed spelling from “nonstrict” to the much more popular “non-strict”.
2011-02-16: Fixed minor typo. (“constraint on result” → “constraint on the result type”)

What is memoization?

In purely functional programming, applying a function to equal arguments gives equal results. However, the second application is as costly as the first one. The idea of memoization, invented by Donald Michie in the 1960s, is to cache the results of applications and reuse those results in subsequent applications. Memoization is a handy technique to know, as it can dramatically reduces expense while making little impact on an algorithm’s simplicity.

Early implementations of memoization were imperative. Some sort of table (e.g., a hash table) is initialized as empty. Whenever the memoized function is applied, the argument is looked up in the table. If present, the corresponding result is returned. Otherwise, the original function is applied to the argument, and the result is stored in the table, keyed by the argument.

Functional memoization

Can memoization be implemented functionally (without assignment)? One might argue that it cannot, considering that we want the table structure to get filled in destructively, as the memoized function is sampled.

However, this argument is flawed (like many informal arguments of impossibility). Although we want a mutation to happen, we needn’t ask for one explicitly. Instead, we can exploit the mutation that happens inside the implementation of laziness.

For instance, consider memoizing a function of booleans:

memoBool :: (Bool -> b) -> (Bool -> b)

In this case, the “table” can simply be a pair, with one slot for the argument False and one for True:

type BoolTable a = (a,a)

memoBool f = lookupBool (f False, f True)

lookupBool :: BoolTable b -> Bool -> b
lookupBool (f,_) False = f
lookupBool (_,t) True  = t

For instance, consider this simple function and a memoized version:

f1 b = if b then 3 else 4

s1 = memoBool f1

The memo table will be (f False, f True), i.e., (4,3). Checking that s1 is equivalent to f1:

s1 False ≡ lookupBool (4,3) False ≡ 4 ≡ f1 False
s1 True  ≡ lookupBool (4,3) True  ≡ 3 ≡ f1 True

Other argument types have other table representations, and these table types can be defined systematically and elegantly.

Now, wait a minute! Building an entire table up-front doesn’t sound like the incremental algorithm Richie invented, especially considering that the domain type can be quite large and even infinite. However, in a lazy language, incremental construction of data structures is automatic and pervasive, and infinite data structures are bread & butter. So the computing and updating doesn’t have to be expressed imperatively.

While lazy construction can be helpful for pairs, it’s essential for infinite tables, as needed for domain types that are enornmously large (e.g., Int), and even infinitely large (e.g., Integer, or [Bool]). However, laziness brings to memoization not only a gift, but also a difficulty, namely the challenge of correctly memoizing non-strict functions, as we’ll see next.

A problem with memoizing non-strict functions

The confirmation above that s1 ≡ f1 has a mistake: it fails to consider a third possible choice of argument, namely ⊥. Let’s check this case now:

s1 ⊥ ≡ lookupBool (4,3) ⊥ ≡ ⊥ ≡ f1 ⊥

The ⊥ case does not show up explicitly in the definition of lookupBool, but is implied by the use of pattern-matching against True and False. For the same reason (in the definition of if-then-else), f1 ⊥ ≡ ⊥, so indeed s1 ≡ f1. The key saving grace here is that f1 is already strict, so the strictness introduced by lookupBool is harmless.

To see how memoization add strictness, consider a memoizing a non-strict function of booleans:

f2 b = 5

s2 = memoBool f2

The memo table will be (f False, f True), i.e., (5,5). Checking that s2 is equivalent to f2:

s2 False ≡ lookupBool (5,5) False ≡ 5 ≡ f2 False
s2 True  ≡ lookupBool (5,5) True  ≡ 5 ≡ f2 True

However,

s2 ⊥ ≡ lookupBool (5,5) ⊥ ≡ ⊥

The latter equality is due again to pattern matching against False and True in lookupBool.

In contrast, f2 ⊥ ≡ 5, so s2 ≢ f2, so memoBool does not correctly memoize.

Non-strict memoization

The bug in memoBool comes from ignoring one of the possible boolean values. In a lazy language, Bool has three possible values, not two. A simple solution then might be for the memo table to be a triple instead of a pair:

type BoolTable a = (a,a,a)

memoBool h = lookupBool (h ⊥, h False, h True)

Table lookup needs one additional case:

lookupBool :: BoolTable a -> Bool -> a
lookupBool (b,_,_) ⊥     = b
lookupBool (_,f,_) False = f
lookupBool (_,_,t) True  = t

I hope you read my posts with a good deal of open-mindedness, but also with some skepticism. This revised definition of lookupBool is not legitimate Haskell code, and for a good reason. If we could write and run this kind of code, we could solve the halting problem:

halts :: a -> Bool
halts ⊥ = False
halts _ = True

The problem here is not just that ⊥ is not a legitimate Haskell pattern, but more fundamentally that equality with ⊥ is non-computable.

The revised lookupBool function and the halts function violate a fundamental semantic property, namely monotonicity (of information content). Monotonicity of a function h means that

∀ a b. a ⊑ b ⟹ h a ⊑ h b

where “⊑” means has less (or equal) information content, as explained in Merging partial values. In other words, if you tell f more about an argument, it will tell you more about the result, where “more” (really more-or-equal) includes compatibility (no contradiction of previous knowledge).

The halts function is nonmonotonic, since, for instance, ⊥ ⊑ 3, and h ⊥ ≡ False and h 3 ≡ True, but False ⋢ True. (False and True are incompatible, i.e., they contradict each other.)

Similarly, the function lookupBool (5,3,4) is nonmonotonic, which you can verify by applying it to ⊥ and to False. Although ⊥ ⊑ False, h ⊥ ≡ 5 and h False ≡ 3, but 5 ⋢ 3. Similarly, ⊥ ⊑ True, h ⊥ ≡ 5 and h True ≡ 5, but 5 ⋢ 4.

So this particular memo table gets us into trouble (nonmonotonicity). Are there other memo tables (b,f,t) that lead to monotonic lookup? Re-examining the breakdown shows us a necessary and sufficient condition, which is that b ⊑ f and b ⊑ t.

Look again at the particular use of lookupBool in the definition of memoBool above, and you’ll see that

b ≡ h ⊥
f ≡ h False
t ≡ h True

so the monotonicity condition becomes h ⊥ ⊑ h False and h ⊥ ⊑ h True. This condition holds, thanks to the monotonicity of all computable functions h.

So the triple-based lookupBool can be semantically problematic outside of its motivating context, but never as used in memoBool. That is, the triple-based definition of memoBool correctly specifies the (computable) meaning we want, but isn’t an implementation. How might we correctly implement memoBool?

In Lazier function definitions by merging partial values, I examined the standard Haskell style (inherited from predecessors) of definition by clauses, pointing out how that style is teasingly close to a declarative reading in which each clause is a true equation (possibly conditional). I transformed the standard style into a form with modular, declarative semantics.

Let’s try transforming lookupBool into this modular form:

lookupBool :: BoolTable a -> Bool -> a
lookupBool (b,f,t) = (λ ⊥ → b) ⊔ (λ False → f) ⊔ (λ True → t)

We still have the problem with λ ⊥ → b (nonmonotonicity), but it’s now isolated. What if we broaden the domain from just ⊥ (for which we cannot dependably test) to all arguments, i.e., λ _ → b (i.e., const b)? This latter function is the least one (in the information ordering) that is monotonic and contains all the information present in λ ⊥ → b. (Exercise: prove.) Dissecting this function:

const b ≡ (λ _ → b) ≡ (λ ⊥ → b) ⊔ (λ False → b) ⊔ (λ True → b)

  const b ⊔ (λ False → f) ⊔ (λ True → t)
≡ (λ ⊥ → b) ⊔ (λ False → b) ⊔ (λ True → b) ⊔ (λ False → f) ⊔ (λ True → t)
≡ (λ ⊥ → b) ⊔ (λ False → b) ⊔ (λ False → f) ⊔ (λ True → b) ⊔ (λ True → t)
≡ (λ ⊥ → b) ⊔ (λ False → (b ⊔ f)) ⊔ (λ True → (b ⊔ t))
≡ (λ ⊥ → b) ⊔ (λ False →      f ) ⊔ (λ True →      t )

under the condition that b ⊑ f and b ⊑ t, which does hold in the context of our use (again by monotonicity of the h in memoBool). Therefore, in this context, we can replace the nonmonotonic λ ⊥ → b with the monotonic const b, while preserving the meaning of memoBool.

Behind the dancing symbols in the proof above lies the insight that we can use the ⊥ case even for non-⊥ arguments, because the result will be subsumed by non-⊥ cases, thanks to the lubs (⊔).

The original two non-⊥ cases can be combined back into their more standard (less modular) Haskell form, and we can revert to our original strict table and lookup function. Our use of ⊔ requires the result type to be ⊔-able.

memoBool :: HasLub b => (Bool -> b) -> (Bool -> b)

type BoolTable a = (a,a)

memoBool h = const (h ⊥) ⊔ lookupBool (h False, h True)

lookupBool :: BoolTable b -> Bool -> b
lookupBool (f,_) False = f
lookupBool (_,t) True  = t

So the differences between our original, too-strict memoBool and this correct one are quite small: the HasLub constraint and the “const (f ⊥) ⊔“.

The HasLub constraint on the result type warns us of a possible loss of generality. Are there types for which we do not know how to ⊔? Primitive types are flat, where ⊔ is equivalent to unamb; and there are HasLub instances for functions, sums, and products. (See Merging partial values.) HasLub could be derived automatically for algebraic data types (labeled sums of products) and trivially for newtype. Perhaps abstract types need some extra thought.

Demo

First, import the lub package:

{-# LANGUAGE Rank2Types #-}
{-# OPTIONS -Wall #-}
import Data.Lub

And define a type of strict memoization. Borrowing from Luke Palmer‘s MemoCombinators package, define a type of strict memoizers:

type MemoStrict a = forall r. (a -> r) -> (a -> r)

Now a strict memoizer for Bool, as above:

memoBoolStrict :: MemoStrict Bool
memoBoolStrict h = lookupBool (h False, h True)
 where
   lookupBool (f,_) False = f
   lookupBool (_,t) True  = t

Test out the strict memoizer. First on a strict function:

h1, s1 :: Bool -> Integer
h1 =  b -> if b then 3 else 4
s1 = memoBoolStrict h1

A test run:

*Main> h1 True
3
*Main> s1 True
3

Next on a non-strict function:

h2, s2 :: Bool -> Integer
h2 = const 5
s2 = memoBoolStrict h2

And test:

*Main> h2 undefined
5
*Main> s2 undefined
*** Exception: Prelude.undefined

Now define a type of non-strict memoizers:

type Memo a = forall r. HasLub r => (a -> r) -> (a -> r)

And a non-strict Bool memoizer:

memoBool :: Memo Bool
memoBool h = const (h undefined) `lub` memoBoolStrict h

Testing:

*Main> h2 undefined
5
*Main> n2 undefined
5

Success!

Beyond `Bool`

To determine how to generalize memoBool to types other than Bool, consider what properties of Bool mattered in our development.

We know how to strictly memoize over Bool (i.e., what shape to use for the memo table and how to fill it).
Bool is flat.

The first condition also holds (elegantly) for integral types, sums, products, and algebraic types.

The second condition is terribly restrictive and fails to hold for sums, products and most algebraic types (e.g., Maybe and []).

Consider a Haskell function h :: (a,b) -> c. An element of type (a,b) is either ⊥ or (x,y), where x :: a and y :: b. We can cover the ⊥ case as we did with Bool, by ⊔-ing in const (h ⊥). For the (x,y) case, we can proceed just as in strict memoization, by uncurrying, memoizing the outer and inner functions (of a and of b respectively), and recurrying. For details, see Elegant memoization with functional memo tries.

Similarly for sum types. (A value of type Either a b is either ⊥, or Left x or Right y, where x :: a and y :: b.) And by following the treatment of products and sums, we can correctly memoize functions over any algebraic type.

Related work

Lazy Memo-functions

In 1985, John Hughes published a paper Lazy Memo-functions, in which he points out the laziness-harming property of standard memoization.

[…] In a language with lazy evaluation this problem is aggravated: since verifying that two data-structures are equal requires that each be completely evaluated, all memoised functions are completely strict. This means they cannot be applied to circular or infinite arguments, or to arguments which (for one reason or another) cannot yet be completely evaluated. Therefore memo-functions cannot be combined with the most powerful features of lazy languages.

John gives a laziness-friendlier alternative, which is to use the addresses rather than contents in the case of structured arguments. Since it does force evaluation on atomic arguments, I don’t think it preserves non-strictness. Moreover, it leads to redundant computation when structured arguments are equal but not pointer-equal.

Conclusion

Formulations of function memoization can be quite elegant and practical in a non-strict/lazy functional language. In such a setting, however, I cannot help but want to correctly handle all functions, including non-strict ones. This post gives a technique for doing so, making crucial use of the least upper bound (⊔) operator described in various other posts.

Despite the many words above, the modification to strict memoization is simple: for a function h, given an argument x, in addition to indexing a memo trie with x, also evaluate h ⊥, and merge the information obtained from these two attempts (conceptually run in parallel). Indexing a memo trie forces evaluation of x, which is a problem when h is non-strict and x evaluates to ⊥. In exactly that case, however, h ⊥ is not ⊥, and so provides exactly the information we need. Moreover, information-monotonicity of h (a property of all computable functions) guarantees that h ⊥ ⊑ h x, so the information being merged is compatible.

Note that this condition is even stronger than compatibility, so perhaps we could use a more restricted and more efficient alternative to the fully general least upper bound. The technique in Exact numeric integration also used this restricted form.

How does this method for correct, non-strict memoization work in practice? I guess the answer mainly depends on the efficiency and robustness of ⊔ (or of the restricted form mentioned just above). The current implementation could probably be improved considerably if brought into the runtime system (RTS) and implemented by an RTS expert (which I’m not).

Information ordering and ⊔ play a central role in the denotational semantics of programming languages. Since first stumbling onto a use for ⊔ (initially in its flat form, unamb), I’ve become very curious about how this operator might impact programming practice as well as theory. My impression so far is that it is a powerful modularization tool, just as laziness is (as illustrated by John Hughes in Why Functional Programming Matters). I’m looking for more examples, to further explore this impression.

Memoizing polymorphic functions – part two

Conal — Fri, 12 Jun 2009 22:04:01 +0000

Part one of this series introduced the problem of memoizing functions involving polymorphic recursion. The caching data structures used in memoization typically handle only one type of argument at a time. For instance, one can have finite maps of differing types, but each concrete finite map holds just one type of key and one type of value.

I extended memoization to handle polymorphic recursion by using an existential type together with a reified type of types. This extension works (afaik), but it is restricted to a particular form for the type of the polymorphic function being memoized, namely

-- Polymorphic function
type k :--> v = forall a. HasType a => k a -> v a

My motivating example is a GADT-based representation of typed lambda calculus, and some of the functions I want to memoize do not fit the pattern. After writing part one, I fooled around and found that I could transform these awkwardly typed polymorphic functions into isomorphic form that does indeed fit the restricted pattern of polymorphic types I can handle.

Awkward types

The first awkwardly typed memoizee is the function application constructor:

type AT = forall a b . (HasType a, HasType b) => E (a -> b) -> E a -> E b

(:^) :: AT

Right away AT misses the required form. It has two HasType constraints, and the first argument is parameterized over two type variables instead of one. However, the second argument looks more promising, so let’s flip the arguments to get an isomorphic type:

forall a b . (HasType a, HasType b) => E a -> E (a -> b) -> E b

And then move the quantifier and constraint on b inside the outer (first) ->:

forall a . HasType a => E a -> (forall b. HasType b => E (a -> b) -> E b)

We’re getting closer. Next, define a newtype wrapper.

newtype A2 a = A2 (forall b. HasType b => E (a -> b) -> E b)

So that AT is isomorphic to

forall a . HasType a => E a -> A2 a

i.e.,

E :--> A2

The function inside of A2 doesn’t have the required form, but another newtype wrapper finishes the job.

newtype EF a b = EF {unEF :: E (a -> b) }

type H' a = EF a :--> E

newtype H a = H { unH :: H' a }

The AT type is isomorphic AP where

type AP = E :--> H

Curried memoization

A “curried memo function” is one that takes one argument and produces another memo function. For a simple memo function, not involving polymorphic recursion, there’s a simple recipe for curried memoization:

memo2 :: (a -> b -> c) -> (a -> b -> c)
memo2 f = memo (memo . f)

Our more polymorphic memo makes currying a little more awkward. First, here’s a helper function for working inside of the representation of an H:

inH :: (H' a -> H' a) -> (H a -> H a)
inH h z = H (h (unH z))

The following more elegant definition doesn’t type-check, due to the rank 2 polymorphism:

inH f = H . f . unH  -- type error

Now our AP memoizer is much like memo2:

memoAP :: AP -> AP
memoAP app' = memo (inH memo . app')

(A more general, consistent type for memoAP is (f :--> H) -> (f :--> H).)

Isomorphisms

Now, to define the isomorphisms. Define

toAP   :: AT -> AP
fromAP :: AP -> AT

The definitions:

toAP app ea = H $  (EF eab) -> app eab ea
fromAP app' eab ea = unH (app' ea) (EF eab)

If you erase the newtype wrappers & unwrappers, you’ll see that toAP and fromAP are both just flip.

I constructed fromAP from the following specification:

toAP (fromAP app') == app'

Transforming step-by-step into equivalent specifications:

 ea -> H $  (EF eab) -> (fromAP app') eab ea == app'

H $  (EF eab) -> (fromAP app') eab ea == app' ea

 (EF eab) -> (fromAP app') eab ea == unH (app' ea)

(fromAP app') eab ea == unH (app' ea) (EF eab)

fromAP app' eab ea == unH (app' ea) (EF eab)

Memoizing vis isomorphisms

Finally, I can memoize

memoAT :: AT -> AT
memoAT app = fromAP (memoAP (toAP app))

Again, a more elegant definition via (.) fails to type-check, due to rank 2 polymorphism.

The Lam (lambda abstraction) constructor can be handled similarly:

Lam  :: (HasType a, HasType b) => V a -> E b -> E (a -> b)

This time, no flip is needed.

I wonder

How far does this isomorphism trick go?

Is there an easier way to memoize polymorphic functions?

Memoizing polymorphic functions – part one

Conal — Thu, 11 Jun 2009 00:36:34 +0000

Memoization takes a function and gives back a semantically equivalent function that reuses rather than recomputes when applied to the same argument more than once. Variations include not-quite-equivalence due to added strictness, and replacing value equality with pointer equality.

Memoization is often packaged up polymorphically:

memo :: (???) => (k -> v) -> (k -> v)

For pointer-based (“lazy”) memoization, the type constraint (“???”) is empty. For equality-based memoization, we’d need at least Eq k, and probably Ord k or HasTrie k for efficient lookup (in a finite map or a possibly infinite memo trie).

Although memo is polymorphic, its argument is a monomorphic function. Implementations that use maps or tries exploit that monomorphism in that they use a type like Map k v or Trie k v. Each map or trie is built around a particular (monomorphic) type of keys. That is, a single map or trie does not mix keys of different types.

Now I find myself wanting to memoize polymorphic functions, and I don’t know how to do it.

Flavors of polymorphism

If a recursively defined function f is polymorphic, then the recursion may still be monomorphic. That is, recursive calls may be restricted to the same type instance as the parent call. Most recursive polymorphic functions fit this form, because most polymorphic recursive data types are “regular”, meaning that a polymorphic data type is included in itself only at the same type instance. For instance, the usual polymorphic lists and trees are regular.

Example: GADTs

Among other places, non-regular, or “nested data types” arise in statically typed encodings of typed languages. For instance, here’s a GADT (generalized algebraic data type) for typed lambda calculus expressions:

-- Variables
data V a = V String (Type a)

-- Expressions
data E :: * -> * where
  Lit  :: a -> E a                      -- literal
  Var  :: V  a -> E a                   -- variable
  (:^) :: E (a -> b) -> E a -> E b      -- application
  Lam  :: V a -> E b -> E (a -> b)      -- abstraction

The Type type is sort of like TypeRep, except that it is statically typed.

data Type :: * -> * where
  Bool   :: Type Bool
  Float  :: Type Float
  ...
  (:*:)  :: Type a -> Type b -> Type (a,b)
  (:->:) :: Type a -> Type b -> Type (a->b)

These GADTs (E and Type) are both non-regular, and so recursive functions over them will involve more than one argument type.

So how can we memoize?

A first try

Let’s consider a specific case of polymorphic functions:

type k :--> v = ∀ a. HasType a => k a -> v a

HasType is to Typeable as Type is to TypeRep. The memo implementation I’m playing with relies on HasType.

The memoizer can have type

memo :: (k :--> v) -> (k :--> v)

which uses rank 2 polymorphism (because of the argument type’s ∀, which cannot be moved to the outside).

My implementation of memo is similar to the discussion in Stretching the storage manager: weak pointers and stable names in Haskell, but modernized a bit and adapted for polymorphism. It uses an IntMap of lists of pairs of stable names (akin to pointers) for arguments and values for results. The idea is to first use hashStableName to get an Int to use as an IntMap key, and then linearly traverse the resulting list of binding pairs, comparing stable keys for equality. (StableName has Eq but not Ord.) Although hashStableName can map different stable names to the same hash value, collisions are rare, so the StableBind lists rarely have more than one element and the linear search is cheap.

type SM k v = I.IntMap [StableBind k v]  -- Stable map

data StableBind k v = ∀ a. HasType a => SB (StableName (k a)) (v a)

The reason for hiding the type parameter a in StableName is so that different bindings can be for different types.

The key tricky bit is managing static typing while searching for a particular StableName a in a [StableBind]. Here’s my implementation:

blookup :: ∀ k v a. HasType a =>
           StableName (k a) -> [StableBind k v] -> Maybe (v a)
blookup stk = look
 where
   look :: [StableBind k v] -> Maybe (v a)
   look [] = Nothing
   look (SB stk' v : binds') 
     | Just Refl <- tya `tyEq` typeOf2 stk', stk == stk' = Just v
     | otherwise                                         = look binds'
   tya :: Type a
   tya = typeT

The crucial magic bit is

tyEq :: Type a -> Type b -> Maybe (a :=: b)

where a :=: b represents a proof that the types a and b are the same type. The proof type has a simple GADT representation:

data (:=:) :: * -> * -> * where Refl :: a :=: a

This simple type definition ensures that only valid type-equality proofs can exist. Well, except for ⊥. The guard’s pattern match with Just Refl will force evaluation, so that ⊥ can’t sneak by us. That match also informs the type-checker that stk and stk' are stable names of the same type in that clause, which then makes stk == stk' be well-typed, and makes Just v have the required type, i.e., Maybe (v a).

Finally, the typeOf2 function is a simple helper that peels off two type constructors and extracts a Type:

typeOf2 :: HasType a => g (f a) -> Type a

Wishing for more

The type of memo above is too restrictive for my uses. It only handles polymorphic functions of type k a -> v a, and only with the single constraint HasType a.

The reason I want lazy memoization now is that I’m compressing expressions to maximize representation sharing, as John Hughes described in Lazy Memo-functions. Once sharing is maximized, pointer-based memoization works better, because equal values are pointer-equal. To compress an expression, simply use a memoized copy function, as John suggested.

compress :: HasType a => E a -> E a
compress e = mcopy e
 where
   mcopy, copy :: HasType b => E b -> E b
   -- Memo version
   mcopy = memo copy
   -- Copier, with memo-copied components
   copy (u :^ v)  = appM (mcopy u) (mcopy v)
   copy (Lam v b) = lamM v (mcopy b)
   copy e         = e
   -- memoized constructors
   appM :: (HasType a, HasType b) => E (a -> b) -> E a -> E b
   appM = memo2 (:^)
   lamM :: (HasType a, HasType b) => V a -> E b -> E (a -> b)
   lamM = memo2 Lam

The memo2 function is defined in terms of memo, using some some newtype trickery. Its type:

memo2 :: HasType a =>
         (k a -> l a -> v a) -> (k a -> l a -> v a)

But, sigh, the (:^) and Lam constructors I’m trying to memoize do not have the required types. My higher-order-polymorphic memo functions do not have flexible enough types.

And this is where I’m stuck.

I’d appreciate your ideas and suggestions.

Simpler, more efficient, functional linear maps

Conal — Mon, 20 Oct 2008 01:26:05 +0000

A previous post described a data type of functional linear maps. As Andy Gill pointed out, we had a heck of a time trying to get good performance. This note describes a new representation that is very simple and much more efficient. It’s terribly obvious in retrospect but took me a good while to stumble onto.

The Haskell module described here is part of the vector-space library (version 0.5 or later) and requires ghc version 6.10 or better (for associated types).

Edits:

2008-11-09: Changed remarks about versions. The vector-space version 0.5 depends on ghc 6.10.
2008-10-21: Fixed the vector-space library link in the teaser.

Linear maps

Semantically, a linear map is a function f ∷ a → b such that, for all scalar values s and "vectors" u, v ∷ a, the following properties hold:

f (s ⋅ u) = s ⋅ f u

f (u + v) = f u + f v

By repeated application of these properties,

f (s₁ ⋅ u₁ + ⋯ + s_n ⋅ u_n) = s₁ ⋅ f u₁ + ⋯ + s_n ⋅ f u_n

Taking the u_i as basis vectors, this form implies that a linear function is determined by its behavior on any basis of its domain type.

Therefore, a linear function can be represented simply as a function from a basis, using the representation described in Vector space bases via type families.

type u :-* v = Basis u → v

The semantic function converts from (u :-* v) to (u → v). It decomposes a source vector into its coordinates, applies the basis function to basis representations, and linearly combines the results.

lapply ∷ ( VectorSpace u, VectorSpace v
         , Scalar u ~ Scalar v, HasBasis u ) ⇒
         (u :-* v) → (u → v)
lapply lm = λ u → sumV [s *^ lm b | (b,s) ← decompose u]

lapply lm = linearCombo ∘ fmap (first lm) ∘ decompose

The reverse function is easier. Convert a function f, presumed linear, to a linear map representation:

linear ∷ (VectorSpace u, VectorSpace v, HasBasis u) ⇒
         (u → v) → (u :-* v)

It suffices to apply f to basis values:

linear f = f ∘ basisValue

Memoization

The idea of the linear map representation is to reconstruct an entire (linear) function out of just a few samples. In other words, we can make a very small sampling of function's domain, and re-use those values in order to compute the function's value at all domain values. As implemented above, however, this trick makes function application more expensive, not less. If lm = linear f, then each use of lapply lm can apply f to the value of every basis element, and then linearly combine results.

A simple trick fixes this efficiency problem: memoize the linear map. We could do the memoization privately, e.g.,

linear f = memo (f ∘ basisValue)

If lm = linear f, then no matter how many times lapply lm is applied, the function f can only get applied as many times as the dimension of the domain of f.

However, there are several other ways to make linear maps, and it would be easy to forget to memoize each combining form. So, instead of the function representation above, I ensure that the function be memoized by representing it as a memo trie.

type u :-* v = Basis u ↛ v

The conversion functions linear and lapply need just a little tweaking. Split memo into its definition untrie ∘ trie, and then move the second phase (untrie) into lapply. We'll also have to add HasTrie constraints:

linear ∷ ( VectorSpace u, VectorSpace v
         , HasBasis u, HasTrie (Basis u) ) ⇒
         (u → v) → (u :-* v)
linear f = trie (f ∘ basisValue)

lapply ∷ ( VectorSpace u, VectorSpace v, Scalar u ~ Scalar v
         , HasBasis u, HasTrie (Basis u) ) ⇒
         (u :-* v) → (u → v)
lapply lm = linearCombo ∘ fmap (first (untrie lm)) ∘ decompose

Now we can build up linear maps conveniently and efficiently by using the operations on memo tries shown in Composing memo tries. For instance, suppose that h is a linear function of two arguments (linear in both, not it each) and m and n are two linear maps. Then liftA2 h m n is the linear function that applies h to the results of m and n.

lapply (liftA2 h m n) a = h (lapply m a) (lapply n a)

Exploiting the applicative functor instance for functions, we get another formulation:

lapply (liftA2 h m n) = liftA2 h (lapply m) (lapply n)

In other words, the meaning of a liftA2 is the liftA2 of the meanings, as discussed in Simplifying semantics with type class morphisms.

Composing memo tries

Conal — Thu, 16 Oct 2008 02:18:12 +0000

The post Elegant memoization with functional memo tries showed a simple type of search tries and their use for functional memoization of functions. This post provides some composition tools for memo tries, whose definitions are inevitable, in that they are determined by the principle presented in Simplifying semantics with type class morphisms.

Compositional semantics and type class morphisms

The discipline of denotational semantics defines meaning functions compositionally, i.e., the meaning of a construct must be some function of the meanings of its components.

Type class morphisms make the denotational discipline even more specific:

The meaning of each method corresponds to the same method for the meaning.

For instance,

meaning (a `mappend` b) == meaning a `mappend` meaning b

Memo tries, semantics, and morphisms

The semantic function for a memo trie is untrie, which converts a trie (back) to a function:

untrie :: (a :->: b) -> (a  ->  b)

Let’s look at the consequences of requiring that untrie be a morphism over Monoid, Functor, Applicative, Monad, Category, and Arrow, i.e.,

untrie mempty          == mempty
untrie (s `mappend` t) == untrie s `mappend` untrie t

untrie (fmap f t)      == fmap f (untrie t)

untrie (pure a)        == pure a
untrie (tf <*> tx)     == untrie tf <*> untrie tx

untrie (return a)      == return a
untrie (u >>= k)       == untrie u >>= untrie . k

untrie id              == id
untrie (s . t)         == untrie s . untrie t

untrie (arr f)         == arr f
untrie (first t)       == first (untrie t)

These morphism properties imply that all of the expected laws hold, assuming that we interpret equality semantically (or observationally). For instance,

untrie (mempty `mappend` a)
  == untrie mempty `mappend` untrie a
  == mempty `mappend` untrie a
  == untrie a

untrie (fmap f (fmap g a))
  == fmap f (untrie (fmap g a))
  == fmap f (fmap g (untrie a))
  == fmap (f.g) (untrie a)
  == untrie (fmap (f.g) a)

Deriving instances

The implementation instances follow from applying trie to both sides of each of these morphism laws, using the property trie . untrie == id.

instance (HasTrie a, Monoid b) => Monoid (a :->: b) where
  mempty        = trie mempty
  s `mappend` t = trie (untrie s `mappend` untrie t)

instance HasTrie a => Functor ((:->:) a) where
  fmap f t      = trie (fmap f (untrie t))

instance HasTrie a => Applicative ((:->:) a) where
  pure b        = trie (pure b)
  tf <*> tx     = trie (untrie tf <*> untrie tx)

instance HasTrie a => Monad ((:->:) a) where
  return a      = trie (return a)
  u >>= k       = trie (untrie u >>= untrie . k)

instance Category (:->:) where
  id            = trie id
  s . t         = trie (untrie s . untrie t)

instance Arrow (:->:) where
  arr f         = trie (arr f)
  first t       = trie (first (untrie t))

Correctness of these instances follows by applying untrie to each side of each definition and using the property untrie . trie == id.

The Category and Arrow instances don’t quite work, however, because of necessary but disallowed HasTrie constraints on the domain type. John Hughes pointed out a similar problem near the end of Generalising Monads to Arrows, saying “I consider this to be a defect of the Haskell type system, which hopefully can be corrected in a future version of the language.”

Conal Elliott » memoization

Memoizing polymorphic functions via unmemoization

Playing with quantifiers

The Yoneda lemma

Playing with the Yoneda lemma

Memoizing polymorphic functions via unmemoization

Pairs

Streams

Infinite binary trees

A variation on infinite binary trees

Generalized infinite trees

Maybe

Lists

Where to go from here?

Algebraic types

Constructing isomorphisms

Automation

Non-strictness

Other memoization tools?

Relationship to numeric representations

Fixing broken isomorphisms — details for non-strict memoization, part 2

Why care?

Strict products and sums

Tries for non-strict memoization

Higher-order memoization

Memoizing abstract types

Details for non-strict memoization, part 1

Hyper-strict memo tries

Strict and non-strict memo tries

Sums

Products

A possible solution

Another route

Isomorphisms

Regular data types

Memoizing higher-order functions

Tries

Domain isomorphisms

Higher-order memoization

Demo

Pragmatics

Conclusion

Elegant memoization with higher-order types

Tries as associated data type

Functors and functor combinators

Tries as associated type synonym

Isomorphisms

Regular data types

Trouble

A solution

Conclusion

Non-strict memoization

What is memoization?

Functional memoization

A problem with memoizing non-strict functions

Non-strict memoization

Demo

Beyond Bool

Related work

Lazy Memo-functions

Conclusion

Memoizing polymorphic functions – part two

Awkward types

Curried memoization

Isomorphisms

Memoizing vis isomorphisms

I wonder

Memoizing polymorphic functions – part one

Flavors of polymorphism

Example: GADTs

A first try

Wishing for more

Simpler, more efficient, functional linear maps

Linear maps

Memoization

Composing memo tries

Compositional semantics and type class morphisms

Memo tries, semantics, and morphisms

Deriving instances

`Maybe`

Beyond `Bool`