## Non-strict memoization

I’ve written a few posts about functional memoization.
In one of them, Luke Palmer commented that the memoization methods are correct only for strict functions, which I had not noticed before.
In this note, I correct this flaw, extending correct memoization to non-strict functions as well.
The semantic notion of *least upper bound* (which can be built of *unambiguous choice*) plays a crucial role.

**Edits**:

- 2010-07-13: Fixed the non-strict memoization example to use an argument of
`undefined`

(⊥) as intended. - 2010-07-23: Changed spelling from “nonstrict” to the much more popular “non-strict”.
- 2011-02-16: Fixed minor typo. (“constraint on result” → “constraint on the result type”)

### What is memoization?

In purely functional programming, applying a function to equal arguments gives equal results. However, the second application is as costly as the first one. The idea of memoization, invented by Donald Michie in the 1960s, is to cache the results of applications and reuse those results in subsequent applications. Memoization is a handy technique to know, as it can dramatically reduces expense while making little impact on an algorithm’s simplicity.

Early implementations of memoization were imperative. Some sort of table (e.g., a hash table) is initialized as empty. Whenever the memoized function is applied, the argument is looked up in the table. If present, the corresponding result is returned. Otherwise, the original function is applied to the argument, and the result is stored in the table, keyed by the argument.

### Functional memoization

Can memoization be implemented functionally (without assignment)? One might argue that it cannot, considering that we want the table structure to get filled in destructively, as the memoized function is sampled.

However, this argument is flawed (like many informal arguments of impossibility).
Although we want a mutation to happen, we needn’t ask for one explicitly.
Instead, we can exploit the mutation that happens *inside the implementation* of laziness.

For instance, consider memoizing a function of booleans:

In this case, the “table” can simply be a pair, with one slot for the argument `False`

and one for `True:`

type BoolTable a = (a,a) memoBool f = lookupBool (f False, f True) lookupBool :: BoolTable b -> Bool -> b lookupBool (f,_) False = f lookupBool (_,t) True = t

For instance, consider this simple function and a memoized version:

f1 b = if b then 3 else 4 s1 = memoBool f1

The memo table will be `(f False, f True)`

, i.e., `(4,3)`

.
Checking that `s1`

is equivalent to `f1`

:

s1 False ≡ lookupBool (4,3) False ≡ 4 ≡ f1 False s1 True ≡ lookupBool (4,3) True ≡ 3 ≡ f1 True

Other argument types have other table representations, and these table types can be defined systematically and elegantly.

Now, wait a minute!
Building an entire table up-front doesn’t sound like the incremental algorithm Richie invented, especially considering that the domain type can be quite large and even infinite.
However, in a *lazy* language, incremental construction of data structures is automatic and pervasive, and infinite data structures are bread & butter.
So the computing and updating doesn’t have to be *expressed* imperatively.

While lazy construction can be helpful for pairs, it’s *essential* for infinite tables, as needed for domain types that are enornmously large (e.g., `Int`

), and even infinitely large (e.g., `Integer`

, or `[Bool]`

).
However, laziness brings to memoization not only a gift, but also a difficulty, namely the challenge of correctly memoizing *non-strict* functions, as we’ll see next.

### A problem with memoizing non-strict functions

The confirmation above that `s1 ≡ f1`

has a mistake: it fails to consider a third possible choice of argument, namely ⊥.
Let’s check this case now:

s1 ⊥ ≡ lookupBool (4,3) ⊥ ≡ ⊥ ≡ f1 ⊥

The ⊥ case does not show up explicitly in the definition of `lookupBool`

, but is implied by the use of pattern-matching against `True`

and `False`

.
For the same reason (in the definition of `if-then-else`

), `f1 ⊥ ≡ ⊥`

, so indeed `s1 ≡ f1`

.
The key saving grace here is that `f1`

is already strict, so the strictness introduced by `lookupBool`

is harmless.

To see how memoization add strictness, consider a memoizing a *non-strict* function of booleans:

f2 b = 5 s2 = memoBool f2

The memo table will be `(f False, f True)`

, i.e., `(5,5)`

.
Checking that `s2`

is equivalent to `f2`

:

s2 False ≡ lookupBool (5,5) False ≡ 5 ≡ f2 False s2 True ≡ lookupBool (5,5) True ≡ 5 ≡ f2 True

However,

s2 ⊥ ≡ lookupBool (5,5) ⊥ ≡ ⊥

The latter equality is due again to pattern matching against `False`

and `True`

in `lookupBool`

.

In contrast, `f2 ⊥ ≡ 5`

, so `s2 ≢ f2`

, so `memoBool`

does not correctly memoize.

### Non-strict memoization

The bug in `memoBool`

comes from ignoring one of the possible boolean values.
In a lazy language, `Bool`

has three possible values, not two.
A simple solution then might be for the memo table to be a triple instead of a pair:

type BoolTable a = (a,a,a) memoBool h = lookupBool (h ⊥, h False, h True)

Table lookup needs one additional case:

lookupBool :: BoolTable a -> Bool -> a lookupBool (b,_,_) ⊥ = b lookupBool (_,f,_) False = f lookupBool (_,_,t) True = t

I hope you read my posts with a good deal of open-mindedness, but also with some skepticism.
This revised definition of `lookupBool`

is not legitimate Haskell code, and for a good reason.
If we could write and run this kind of code, we could solve the halting problem:

halts :: a -> Bool halts ⊥ = False halts _ = True

The problem here is not just that ⊥ is not a legitimate Haskell *pattern*, but more fundamentally that equality with ⊥ is non-computable.

The revised `lookupBool`

function and the `halts`

function violate a fundamental semantic property, namely *monotonicity* (of information content).
Monotonicity of a function `h`

means that

`∀ a b. a ⊑ b ⟹ h a ⊑ h b`

where “⊑” means has less (or equal) information content, as explained in *Merging partial values*.
In other words, if you tell `f`

more about an argument, it will tell you more about the result, where “more” (really more-or-equal) includes compatibility (no contradiction of previous knowledge).

The `halts`

function is nonmonotonic, since, for instance, `⊥ ⊑ 3`

, and `h ⊥ ≡ False`

and `h 3 ≡ True`

, but `False ⋢ True`

.
(`False`

and `True`

are incompatible, i.e., they contradict each other.)

Similarly, the function `lookupBool (5,3,4)`

is nonmonotonic, which you can verify by applying it to ⊥ and to `False`

.
Although `⊥ ⊑ False`

, `h ⊥ ≡ 5`

and `h False ≡ 3`

, but `5 ⋢ 3`

.
Similarly, `⊥ ⊑ True`

, `h ⊥ ≡ 5`

and `h True ≡ 5`

, but `5 ⋢ 4`

.

So this particular memo table gets us into trouble (nonmonotonicity).
Are there other memo tables `(b,f,t)`

that lead to monotonic lookup?
Re-examining the breakdown shows us a necessary and sufficient condition, which is that `b ⊑ f`

and `b ⊑ t`

.

Look again at the particular use of `lookupBool`

in the definition of `memoBool`

above, and you’ll see that

b ≡ h ⊥ f ≡ h False t ≡ h True

so the monotonicity condition becomes `h ⊥ ⊑ h False`

and `h ⊥ ⊑ h True`

.
This condition holds, thanks to the monotonicity of all computable functions `h`

.

So the triple-based `lookupBool`

can be semantically problematic outside of its motivating context, but never as used in `memoBool`

.
That is, the triple-based definition of `memoBool`

correctly specifies the (computable) meaning we want, but isn’t an implementation.
How might we correctly implement `memoBool`

?

In *Lazier function definitions by merging partial values*, I examined the standard Haskell style (inherited from predecessors) of definition by clauses, pointing out how that style is teasingly close to a declarative reading in which each clause is a true equation (possibly conditional).
I transformed the standard style into a form with modular, declarative semantics.

Let’s try transforming `lookupBool`

into this modular form:

lookupBool :: BoolTable a -> Bool -> a lookupBool (b,f,t) = (λ ⊥ → b) ⊔ (λ False → f) ⊔ (λ True → t)

We still have the problem with `λ ⊥ → b`

(nonmonotonicity), but it’s now isolated.
What if we broaden the domain from just ⊥ (for which we cannot dependably test) to *all* arguments, i.e., `λ _ → b`

(i.e., `const b`

)?
This latter function is the least one (in the information ordering) that is monotonic and contains all the information present in `λ ⊥ → b`

.
(Exercise: prove.)
Dissecting this function:

const b ≡ (λ _ → b) ≡ (λ ⊥ → b) ⊔ (λ False → b) ⊔ (λ True → b)

So

const b ⊔ (λ False → f) ⊔ (λ True → t) ≡ (λ ⊥ → b) ⊔ (λ False → b) ⊔ (λ True → b) ⊔ (λ False → f) ⊔ (λ True → t) ≡ (λ ⊥ → b) ⊔ (λ False → b) ⊔ (λ False → f) ⊔ (λ True → b) ⊔ (λ True → t) ≡ (λ ⊥ → b) ⊔ (λ False → (b ⊔ f)) ⊔ (λ True → (b ⊔ t)) ≡ (λ ⊥ → b) ⊔ (λ False → f ) ⊔ (λ True → t )

under the condition that `b ⊑ f`

and `b ⊑ t`

, which does hold in the context of our use (again by monotonicity of the `h`

in `memoBool`

).
Therefore, in this context, we can replace the nonmonotonic `λ ⊥ → b`

with the monotonic `const b`

, while preserving the meaning of `memoBool`

.

Behind the dancing symbols in the proof above lies the insight that we can use the ⊥ case even for non-⊥ arguments, because the result will be subsumed by non-⊥ cases, thanks to the lubs (⊔).

The original two non-⊥ cases can be combined back into their more standard (less modular) Haskell form, and we can revert to our original strict table and lookup function. Our use of ⊔ requires the result type to be ⊔-able.

memoBool :: HasLub b => (Bool -> b) -> (Bool -> b) type BoolTable a = (a,a) memoBool h = const (h ⊥) ⊔ lookupBool (h False, h True) lookupBool :: BoolTable b -> Bool -> b lookupBool (f,_) False = f lookupBool (_,t) True = t

So the differences between our original, too-strict `memoBool`

and this correct one are quite small: the `HasLub`

constraint and the “`const (f ⊥) ⊔`

“.

The `HasLub`

constraint on the result type warns us of a possible loss of generality.
Are there types for which we do not know how to ⊔?
Primitive types are flat, where ⊔ is equivalent to `unamb`

; and there are `HasLub`

instances for functions, sums, and products.
(See *Merging partial values*.)
`HasLub`

could be derived automatically for algebraic data types (labeled sums of products) and trivially for `newtype`

.
Perhaps abstract types need some extra thought.

### Demo

First, import the lub package:

{-# LANGUAGE Rank2Types #-} {-# OPTIONS -Wall #-} import Data.Lub

And define a type of *strict* memoization.
Borrowing from Luke Palmer‘s MemoCombinators package, define a type of strict memoizers:

type MemoStrict a = forall r. (a -> r) -> (a -> r)

Now a strict memoizer for `Bool`

, as above:

memoBoolStrict :: MemoStrict Bool memoBoolStrict h = lookupBool (h False, h True) where lookupBool (f,_) False = f lookupBool (_,t) True = t

Test out the strict memoizer. First on a strict function:

A test run:

*Main> h1 True 3 *Main> s1 True 3

Next on a non-strict function:

And test:

Now define a type of non-strict memoizers:

type Memo a = forall r. HasLub r => (a -> r) -> (a -> r)

And a non-strict `Bool`

memoizer:

Testing:

Success!

### Beyond `Bool`

To determine how to generalize `memoBool`

to types other than `Bool`

, consider what properties of `Bool`

mattered in our development.

- We know how to strictly memoize over
`Bool`

(i.e., what shape to use for the memo table and how to fill it). `Bool`

is flat.

The first condition also holds (elegantly) for integral types, sums, products, and algebraic types.

The second condition is terribly restrictive and fails to hold for sums, products and most algebraic types (e.g., `Maybe`

and `[]`

).

Consider a Haskell function `h :: (a,b) -> c`

.
An element of type `(a,b)`

is either `⊥`

or `(x,y)`

, where `x :: a`

and `y :: b`

.
We can cover the ⊥ case as we did with `Bool`

, by ⊔-ing in `const (h ⊥)`

.
For the `(x,y)`

case, we can proceed just as in strict memoization, by uncurrying, memoizing the outer and inner functions (of `a`

and of `b`

respectively), and recurrying.
For details, see *Elegant memoization with functional memo tries*.

Similarly for sum types.
(A value of type `Either a b`

is either ⊥, or `Left x`

or `Right y`

, where `x :: a`

and `y :: b`

.)
And by following the treatment of products and sums, we can correctly memoize functions over any algebraic type.

### Related work

#### Lazy Memo-functions

In 1985, John Hughes published a paper *Lazy Memo-functions*, in which he points out the laziness-harming property of standard memoization.

[...] In a language with lazy evaluation this problem is aggravated: since verifying that two data-structures are equal requires that each be completely evaluated, all memoised functions are completely strict. This means they cannot be applied to circular or infinite arguments, or to arguments which (for one reason or another) cannot yet be completely evaluated. Therefore memo-functions cannot be combined with the most powerful features of lazy languages.

John gives a laziness-friendlier alternative, which is to use the *addresses* rather than contents in the case of structured arguments.
Since it does force evaluation on atomic arguments, I don’t think it preserves non-strictness.
Moreover, it leads to redundant computation when structured arguments are equal but not pointer-equal.

### Conclusion

Formulations of function memoization can be quite elegant and practical in a non-strict/lazy functional language.
In such a setting, however, I cannot help but want to correctly handle *all* functions, including non-strict ones.
This post gives a technique for doing so, making crucial use of the least upper bound (⊔) operator described in various other posts.

Despite the many words above, the modification to strict memoization is simple: for a function `h`

, given an argument `x`

, in addition to indexing a memo trie with `x`

, also evaluate `h ⊥`

, and merge the information obtained from these two attempts (conceptually run in parallel).
Indexing a memo trie forces evaluation of `x`

, which is a problem when `h`

is non-strict and `x`

evaluates to ⊥.
In exactly that case, however, `h ⊥`

is not ⊥, and so provides exactly the information we need.
Moreover, information-monotonicity of `h`

(a property of all computable functions) guarantees that `h ⊥ ⊑ h x`

, so the information being merged is compatible.

Note that this condition is even stronger than compatibility, so perhaps we could use a more restricted and more efficient alternative to the fully general least upper bound.
The technique in *Exact numeric integration* also used this restricted form.

How does this method for correct, non-strict memoization work in practice? I guess the answer mainly depends on the efficiency and robustness of ⊔ (or of the restricted form mentioned just above). The current implementation could probably be improved considerably if brought into the runtime system (RTS) and implemented by an RTS expert (which I’m not).

Information ordering and ⊔ play a central role in the denotational semantics of programming languages.
Since first stumbling onto a use for `⊔`

(initially in its flat form, `unamb`

), I’ve become very curious about how this operator might impact programming *practice* as well as theory.
My impression so far is that it is a powerful modularization tool, just as laziness is (as illustrated by John Hughes in *Why Functional Programming Matters*).
I’m looking for more examples, to further explore this impression.

## Nikolay Orlyuk:

There may be a problem with memoization of nonstrict result.

14 July 2010, 3:57 am`snd (s3 (undefined))`

## conal:

Hi Nikolay,

Let’s give it a try:

Testing:

Looks good.

14 July 2010, 8:36 am## Sebastian Fischer:

Very interesting!

Your remarks under “Beyond Bool” are a little sketchy. I was missing a demo of memoizing a function like

`and . take 10`

on`repeat True`

and`False:undefined`

.I guess it works as expected but it would be nice to be convinced by an example!

14 July 2010, 11:49 am## Job Vranish:

Neat! Any idea what a memo table type for Int would look like? It seems like such a thing would have some pretty severe restrictions

14 July 2010, 12:25 pm## conal:

Hi Job,

For integral types, even including

14 July 2010, 2:51 pm`Integer`

(unbounded), one can use a structure like a Patricia tree. See Chris Okasaki and Andy Gill’s paper,Fast Mergeable Integer Mapsor the`Word`

instance of`HasTrie`

inElegant memoization with functional memo tries. For signed types (like`Int`

and`Integer`

), one has to handle negative and non-negative keys. You can find the details in Luke Palmer’s data-inttrie package.## conal:

Sebastian:

I was running out of energy, and the post was running long, and hence the sketchy section. I think I’ll release a new library soon, and at the least, I’ll mention it with your example as a comment here.

14 July 2010, 2:54 pm## Conal Elliott » Blog Archive » Memoizing higher-order functions:

[...] does nonstrict memoization fit in with higher-order memoization? var flattr_wp_ver = '0.9.5'; var flattr_uid = '17223'; var [...]

22 July 2010, 11:43 am## Conal Elliott » Blog Archive » Details for non-strict memoization, part 1:

[...] Non-strict memoization, I sketched out a means of memoizing non-strict functions. I gave the essential insight but did not [...]

10 September 2010, 3:37 pm