Non-strict memoization

I’ve written a few posts about functional memoization. In one of them, Luke Palmer commented that the memoization methods are correct only for strict functions, which I had not noticed before. In this note, I correct this flaw, extending correct memoization to non-strict functions as well. The semantic notion of least upper bound (which can be built of unambiguous choice) plays a crucial role.

Edits:

  • 2010-07-13: Fixed the non-strict memoization example to use an argument of undefined (⊥) as intended.
  • 2010-07-23: Changed spelling from “nonstrict” to the much more popular “non-strict”.
  • 2011-02-16: Fixed minor typo. (“constraint on result” → “constraint on the result type”)

What is memoization?

In purely functional programming, applying a function to equal arguments gives equal results. However, the second application is as costly as the first one. The idea of memoization, invented by Donald Michie in the 1960s, is to cache the results of applications and reuse those results in subsequent applications. Memoization is a handy technique to know, as it can dramatically reduces expense while making little impact on an algorithm’s simplicity.

Early implementations of memoization were imperative. Some sort of table (e.g., a hash table) is initialized as empty. Whenever the memoized function is applied, the argument is looked up in the table. If present, the corresponding result is returned. Otherwise, the original function is applied to the argument, and the result is stored in the table, keyed by the argument.

Functional memoization

Can memoization be implemented functionally (without assignment)? One might argue that it cannot, considering that we want the table structure to get filled in destructively, as the memoized function is sampled.

However, this argument is flawed (like many informal arguments of impossibility). Although we want a mutation to happen, we needn’t ask for one explicitly. Instead, we can exploit the mutation that happens inside the implementation of laziness.

For instance, consider memoizing a function of booleans:

memoBool :: (Bool -> b) -> (Bool -> b)

In this case, the “table” can simply be a pair, with one slot for the argument False and one for True:

type BoolTable a = (a,a)
 
memoBool f = lookupBool (f False, f True)
 
lookupBool :: BoolTable b -> Bool -> b
lookupBool (f,_) False = f
lookupBool (_,t) True  = t

For instance, consider this simple function and a memoized version:

f1 b = if b then 3 else 4
 
s1 = memoBool f1

The memo table will be (f False, f True), i.e., (4,3). Checking that s1 is equivalent to f1:

s1 False ≡ lookupBool (4,3) False ≡ 4 ≡ f1 False
s1 True  ≡ lookupBool (4,3) True  ≡ 3 ≡ f1 True

Other argument types have other table representations, and these table types can be defined systematically and elegantly.

Now, wait a minute! Building an entire table up-front doesn’t sound like the incremental algorithm Richie invented, especially considering that the domain type can be quite large and even infinite. However, in a lazy language, incremental construction of data structures is automatic and pervasive, and infinite data structures are bread & butter. So the computing and updating doesn’t have to be expressed imperatively.

While lazy construction can be helpful for pairs, it’s essential for infinite tables, as needed for domain types that are enornmously large (e.g., Int), and even infinitely large (e.g., Integer, or [Bool]). However, laziness brings to memoization not only a gift, but also a difficulty, namely the challenge of correctly memoizing non-strict functions, as we’ll see next.

A problem with memoizing non-strict functions

The confirmation above that s1 ≡ f1 has a mistake: it fails to consider a third possible choice of argument, namely ⊥. Let’s check this case now:

s1 ⊥ ≡ lookupBool (4,3) ⊥ ≡ ⊥ ≡ f1 ⊥

The ⊥ case does not show up explicitly in the definition of lookupBool, but is implied by the use of pattern-matching against True and False. For the same reason (in the definition of if-then-else), f1 ⊥ ≡ ⊥, so indeed s1 ≡ f1. The key saving grace here is that f1 is already strict, so the strictness introduced by lookupBool is harmless.

To see how memoization add strictness, consider a memoizing a non-strict function of booleans:

f2 b = 5
 
s2 = memoBool f2

The memo table will be (f False, f True), i.e., (5,5). Checking that s2 is equivalent to f2:

s2 False ≡ lookupBool (5,5) False ≡ 5 ≡ f2 False
s2 True  ≡ lookupBool (5,5) True  ≡ 5 ≡ f2 True

However,

s2 ⊥ ≡ lookupBool (5,5) ⊥ ≡ ⊥

The latter equality is due again to pattern matching against False and True in lookupBool.

In contrast, f2 ⊥ ≡ 5, so s2 ≢ f2, so memoBool does not correctly memoize.

Non-strict memoization

The bug in memoBool comes from ignoring one of the possible boolean values. In a lazy language, Bool has three possible values, not two. A simple solution then might be for the memo table to be a triple instead of a pair:

type BoolTable a = (a,a,a)
 
memoBool h = lookupBool (h ⊥, h False, h True)

Table lookup needs one additional case:

lookupBool :: BoolTable a -> Bool -> a
lookupBool (b,_,_)= b
lookupBool (_,f,_) False = f
lookupBool (_,_,t) True  = t

I hope you read my posts with a good deal of open-mindedness, but also with some skepticism. This revised definition of lookupBool is not legitimate Haskell code, and for a good reason. If we could write and run this kind of code, we could solve the halting problem:

halts :: a -> Bool
halts ⊥ = False
halts _ = True

The problem here is not just that ⊥ is not a legitimate Haskell pattern, but more fundamentally that equality with ⊥ is non-computable.

The revised lookupBool function and the halts function violate a fundamental semantic property, namely monotonicity (of information content). Monotonicity of a function h means that

∀ a b. a ⊑ b ⟹ h a ⊑ h b

where “⊑” means has less (or equal) information content, as explained in Merging partial values. In other words, if you tell f more about an argument, it will tell you more about the result, where “more” (really more-or-equal) includes compatibility (no contradiction of previous knowledge).

The halts function is nonmonotonic, since, for instance, ⊥ ⊑ 3, and h ⊥ ≡ False and h 3 ≡ True, but False ⋢ True. (False and True are incompatible, i.e., they contradict each other.)

Similarly, the function lookupBool (5,3,4) is nonmonotonic, which you can verify by applying it to ⊥ and to False. Although ⊥ ⊑ False, h ⊥ ≡ 5 and h False ≡ 3, but 5 ⋢ 3. Similarly, ⊥ ⊑ True, h ⊥ ≡ 5 and h True ≡ 5, but 5 ⋢ 4.

So this particular memo table gets us into trouble (nonmonotonicity). Are there other memo tables (b,f,t) that lead to monotonic lookup? Re-examining the breakdown shows us a necessary and sufficient condition, which is that b ⊑ f and b ⊑ t.

Look again at the particular use of lookupBool in the definition of memoBool above, and you’ll see that

b ≡ h ⊥
f ≡ h False
t ≡ h True

so the monotonicity condition becomes h ⊥ ⊑ h False and h ⊥ ⊑ h True. This condition holds, thanks to the monotonicity of all computable functions h.

So the triple-based lookupBool can be semantically problematic outside of its motivating context, but never as used in memoBool. That is, the triple-based definition of memoBool correctly specifies the (computable) meaning we want, but isn’t an implementation. How might we correctly implement memoBool?

In Lazier function definitions by merging partial values, I examined the standard Haskell style (inherited from predecessors) of definition by clauses, pointing out how that style is teasingly close to a declarative reading in which each clause is a true equation (possibly conditional). I transformed the standard style into a form with modular, declarative semantics.

Let’s try transforming lookupBool into this modular form:

lookupBool :: BoolTable a -> Bool -> a
lookupBool (b,f,t) = (λ ⊥ → b)(λ False → f)(λ True → t)

We still have the problem with λ ⊥ → b (nonmonotonicity), but it’s now isolated. What if we broaden the domain from just ⊥ (for which we cannot dependably test) to all arguments, i.e., λ _ → b (i.e., const b)? This latter function is the least one (in the information ordering) that is monotonic and contains all the information present in λ ⊥ → b. (Exercise: prove.) Dissecting this function:

const b ≡ (λ _ → b)(λ ⊥ → b)(λ False → b)(λ True → b)

So

  const b ⊔ (λ False → f)(λ True → t)(λ ⊥ → b)(λ False → b)(λ True → b)(λ False → f)(λ True → t)(λ ⊥ → b)(λ False → b)(λ False → f)(λ True → b)(λ True → t)(λ ⊥ → b)(λ False → (b ⊔ f))(λ True → (b ⊔ t))(λ ⊥ → b)(λ False →      f )(λ True →      t )

under the condition that b ⊑ f and b ⊑ t, which does hold in the context of our use (again by monotonicity of the h in memoBool). Therefore, in this context, we can replace the nonmonotonic λ ⊥ → b with the monotonic const b, while preserving the meaning of memoBool.

Behind the dancing symbols in the proof above lies the insight that we can use the ⊥ case even for non-⊥ arguments, because the result will be subsumed by non-⊥ cases, thanks to the lubs (⊔).

The original two non-⊥ cases can be combined back into their more standard (less modular) Haskell form, and we can revert to our original strict table and lookup function. Our use of ⊔ requires the result type to be ⊔-able.

memoBool :: HasLub b => (Bool -> b) -> (Bool -> b)
 
type BoolTable a = (a,a)
 
memoBool h = const (h ⊥) ⊔ lookupBool (h False, h True)
 
lookupBool :: BoolTable b -> Bool -> b
lookupBool (f,_) False = f
lookupBool (_,t) True  = t

So the differences between our original, too-strict memoBool and this correct one are quite small: the HasLub constraint and the “const (f ⊥) ⊔“.

The HasLub constraint on the result type warns us of a possible loss of generality. Are there types for which we do not know how to ⊔? Primitive types are flat, where ⊔ is equivalent to unamb; and there are HasLub instances for functions, sums, and products. (See Merging partial values.) HasLub could be derived automatically for algebraic data types (labeled sums of products) and trivially for newtype. Perhaps abstract types need some extra thought.

Demo

First, import the lub package:

{-# LANGUAGE Rank2Types #-}
{-# OPTIONS -Wall #-}
import Data.Lub

And define a type of strict memoization. Borrowing from Luke Palmer‘s MemoCombinators package, define a type of strict memoizers:

type MemoStrict a = forall r. (a -> r) -> (a -> r)

Now a strict memoizer for Bool, as above:

memoBoolStrict :: MemoStrict Bool
memoBoolStrict h = lookupBool (h False, h True)
 where
   lookupBool (f,_) False = f
   lookupBool (_,t) True  = t

Test out the strict memoizer. First on a strict function:

h1, s1 :: Bool -> Integer
h1 = \ b -> if b then 3 else 4
s1 = memoBoolStrict h1

A test run:

*Main> h1 True
3
*Main> s1 True
3

Next on a non-strict function:

h2, s2 :: Bool -> Integer
h2 = const 5
s2 = memoBoolStrict h2

And test:

*Main> h2 undefined
5
*Main> s2 undefined
*** Exception: Prelude.undefined

Now define a type of non-strict memoizers:

type Memo a = forall r. HasLub r => (a -> r) -> (a -> r)

And a non-strict Bool memoizer:

memoBool :: Memo Bool
memoBool h = const (h undefined) `lub` memoBoolStrict h

Testing:

*Main> h2 undefined
5
*Main> n2 undefined
5

Success!

Beyond Bool

To determine how to generalize memoBool to types other than Bool, consider what properties of Bool mattered in our development.

  • We know how to strictly memoize over Bool (i.e., what shape to use for the memo table and how to fill it).
  • Bool is flat.

The first condition also holds (elegantly) for integral types, sums, products, and algebraic types.

The second condition is terribly restrictive and fails to hold for sums, products and most algebraic types (e.g., Maybe and []).

Consider a Haskell function h :: (a,b) -> c. An element of type (a,b) is either or (x,y), where x :: a and y :: b. We can cover the ⊥ case as we did with Bool, by ⊔-ing in const (h ⊥). For the (x,y) case, we can proceed just as in strict memoization, by uncurrying, memoizing the outer and inner functions (of a and of b respectively), and recurrying. For details, see Elegant memoization with functional memo tries.

Similarly for sum types. (A value of type Either a b is either ⊥, or Left x or Right y, where x :: a and y :: b.) And by following the treatment of products and sums, we can correctly memoize functions over any algebraic type.

Related work

Lazy Memo-functions

In 1985, John Hughes published a paper Lazy Memo-functions, in which he points out the laziness-harming property of standard memoization.

[...] In a language with lazy evaluation this problem is aggravated: since verifying that two data-structures are equal requires that each be completely evaluated, all memoised functions are completely strict. This means they cannot be applied to circular or infinite arguments, or to arguments which (for one reason or another) cannot yet be completely evaluated. Therefore memo-functions cannot be combined with the most powerful features of lazy languages.

John gives a laziness-friendlier alternative, which is to use the addresses rather than contents in the case of structured arguments. Since it does force evaluation on atomic arguments, I don’t think it preserves non-strictness. Moreover, it leads to redundant computation when structured arguments are equal but not pointer-equal.

Conclusion

Formulations of function memoization can be quite elegant and practical in a non-strict/lazy functional language. In such a setting, however, I cannot help but want to correctly handle all functions, including non-strict ones. This post gives a technique for doing so, making crucial use of the least upper bound (⊔) operator described in various other posts.

Despite the many words above, the modification to strict memoization is simple: for a function h, given an argument x, in addition to indexing a memo trie with x, also evaluate h ⊥, and merge the information obtained from these two attempts (conceptually run in parallel). Indexing a memo trie forces evaluation of x, which is a problem when h is non-strict and x evaluates to ⊥. In exactly that case, however, h ⊥ is not ⊥, and so provides exactly the information we need. Moreover, information-monotonicity of h (a property of all computable functions) guarantees that h ⊥ ⊑ h x, so the information being merged is compatible.

Note that this condition is even stronger than compatibility, so perhaps we could use a more restricted and more efficient alternative to the fully general least upper bound. The technique in Exact numeric integration also used this restricted form.

How does this method for correct, non-strict memoization work in practice? I guess the answer mainly depends on the efficiency and robustness of ⊔ (or of the restricted form mentioned just above). The current implementation could probably be improved considerably if brought into the runtime system (RTS) and implemented by an RTS expert (which I’m not).

Information ordering and ⊔ play a central role in the denotational semantics of programming languages. Since first stumbling onto a use for (initially in its flat form, unamb), I’ve become very curious about how this operator might impact programming practice as well as theory. My impression so far is that it is a powerful modularization tool, just as laziness is (as illustrated by John Hughes in Why Functional Programming Matters). I’m looking for more examples, to further explore this impression.

8 Comments

  1. Nikolay Orlyuk:

    There may be a problem with memoization of nonstrict result.

    h3, s3 :: Bool -> Integer
    h3 = const (undefined, 3)
    s3 = memoBoolStrict h3
    

    snd (s3 (undefined))

  2. conal:

    Hi Nikolay,

    Let’s give it a try:

    h4, s4,n4 :: Bool -> (Integer,Integer)
    h4 = const (undefined, 4)
    s4 = memoBoolStrict h4
    n4 = memoBool h4

    Testing:

    *Main> snd (h4 undefined)    -- original
    4
    *Main> snd (s4 undefined)    -- strict
    *** Exception: Prelude.undefined
    *Main> snd (n4 undefined)    -- nonstrict
    4

    Looks good.

  3. Sebastian Fischer:

    Very interesting!

    Your remarks under “Beyond Bool” are a little sketchy. I was missing a demo of memoizing a function like and . take 10 on repeat True and False:undefined.

    I guess it works as expected but it would be nice to be convinced by an example!

  4. Job Vranish:

    Neat! Any idea what a memo table type for Int would look like? It seems like such a thing would have some pretty severe restrictions

  5. conal:

    Hi Job,

    For integral types, even including Integer (unbounded), one can use a structure like a Patricia tree. See Chris Okasaki and Andy Gill’s paper, Fast Mergeable Integer Maps or the Word instance of HasTrie in Elegant memoization with functional memo tries. For signed types (like Int and Integer), one has to handle negative and non-negative keys. You can find the details in Luke Palmer’s data-inttrie package.

  6. conal:

    Sebastian:

    I was running out of energy, and the post was running long, and hence the sketchy section. I think I’ll release a new library soon, and at the least, I’ll mention it with your example as a comment here.

  7. Conal Elliott » Blog Archive » Memoizing higher-order functions:

    [...] does nonstrict memoization fit in with higher-order memoization? var flattr_wp_ver = '0.9.5'; var flattr_uid = '17223'; var [...]

  8. Conal Elliott » Blog Archive » Details for non-strict memoization, part 1:

    [...] Non-strict memoization, I sketched out a means of memoizing non-strict functions. I gave the essential insight but did not [...]

Leave a comment