## Lazier function definitions by merging partial values

This post continues from an idea of Ryan Ingram’s in an email thread How to make code least strict?.

### A pretty story

Pattern matching in function definitions is very handy and has a declarative feel. For instance,

``````sum []     = 0
sum (x:xs) = x + sum xs
``````

Simply replace “`=`” by “`==`” to read such a set of pattern clauses (partial definitions) as a collection of properties specifying a `sum` function:

• The sum of an empty list equals zero
• The sum of a (non-empty) list `x:xs` equals `x` plus the sum of the `xs`.

Moreover, these properties define the `sum` function, in that `sum` is the least-defined function that satisfies these two properties.

Guards have a similar style and meaning:

``````abs x | x <  0 = -x
abs x | x >= 0 =  x
``````

Replacing “`=`” by “`==`” and guards by logical implication, we again have two properties that define `abs`:

``````x <  0 ==> abs x == -x
x >= 0 ==> abs x ==  x
``````

### O, the lies!

This pretty story is a lie, as becomes apparent when we look at overlapping clauses. For instance, we’re more likely to write `abs` without the second guard:

``````abs x | x <  0 = -x
abs x          =  x
``````

A declarative of the second clause (∀ x. abs x == x) is false.

I’d more likely write

``````abs x | x < 0     = -x
| otherwise =  x
``````

which is all the more deceptive, since “`otherwise`” doesn’t really mean otherwise. It’s just a synonym for “`True`“.

Another subtle but common problem arises with definitions like the following, as pointed out by ChrisK in How to make code least strict?:

``````zip :: [a] -> [b] -> [(a,b)]
zip []      _       = []
zip _       []      = []
zip (x:xs') (y:ys') = (x,y) : zip xs' ys'
``````

These three clauses read like independently true properties for `zip`. The first two clauses overlap, but their values agree, so what could possibly go wrong with a declarative reading?

The problem is that there are really three flavors of lists, not two. This definition explicitly addresses the nil and cons cases, leaving ⊥.

By the definition above, the value of ‘`zip [] ⊥`‘ is indeed `[]`, which is consistent with each clause. However, the value of ‘`zip ⊥ []`‘ is ⊥, because Haskell semantics says that each clause is tried in order, and the first clause forces evaluation of `⊥` when comparing it with `[]`. This ⊥ value is inconsistent with reading the second clause as a property. Swapping the first two clauses fixes the second example but breaks the first one.

Is it possible to fix `zip` so that its meaning is consistent with these three properties? We seem to be stuck with an arbitrary bias, with strictness in the first or second argument.

Or are we?

### Unambiguous choice

Ryan Ingram suggested using `unamb`, and I agree that it is just the ticket for dilemmas like `zip` above, where we want unbiased laziness. (See posts on unambiguous choice and especially Functional concurrency with unambiguous choice.) The `unamb` operator returns the more defined of its two arguments, which are required to be equal if neither is ⊥. This precondition allows `unamb` to have a concurrent implementation with nondeterministic scheduling, while retaining simple functional (referentially transparent) semantics (over its domain of definition).

As Ryan said:

Actually, I see a nice pattern here for unamb + pattern matching:

``````zip xs ys = foldr unamb undefined [p1 xs ys, p2 xs ys, p3 xs ys] where
p1 [] _ = []
p2 _ [] = []
p3 (x:xs) (y:ys) = (x,y) : zip xs ys
``````

Basically, split each pattern out into a separate function (which by definition is ⊥ if there is no match), then use `unamb` to combine them.

The invariant you need to maintain is that potentially overlapping pattern matches (`p1` and `p2`, here) must return the same result.

With a little typeclass hackery you could turn this into

``````zip = unambPatterns [p1,p2,p3] where {- p1, p2, p3 as above -}
``````

I liked Ryan’s `unambPatterns` idea very much, and then it occurred me that the necessary “typeclass hackery” is already part of the lub library, as described in the post Merging partial values. The `lub` operator (“least upper bound”, also written “⊔”), combines information from its arguments and is defined in different ways for different types (via a type class). For functions,

``````instance HasLub b => HasLub (a -> b) where
f ⊔ g =  a -> f a ⊔ g a
``````

More succinctly, on functions, `(⊔) = liftA2 (⊔)`.

Using this instance twice gives a `(⊔)` suitable for curried functions of two arguments, which turns out to give us almost exactly what we want for Ryan’s `zip` definitions.

``````zip' = lubs [p1,p2,p3]
where
p1 []     _      = []
p2 _      []     = []
p3 (x:xs) (y:ys) = (x,y) : zip' xs ys
``````

where

``````lubs :: [a] -> a
lubs = foldr (⊔) undefined
``````

The difference between `zip` and `zip'` shows up in inferred `HasLub` type constraints on `a` and `b`:

``````zip' :: (HasLub a, HasLub b) => [a] -> [b] -> [(a,b)]
``````

Let’s see if this definition works:

``````*Data.Lub> zip' [] undefined :: [(Int,Int)]
[]
*Data.Lub> zip' undefined [] :: [(Int,Int)]
[]
``````

Next, an example that requires some recursive calls:

``````*Data.Lub> zip' [10,20] (1 : 2 : undefined)
[(10,1),(20,2)]
*Data.Lub> zip' (1 : 2 : undefined) [10,20]
[(1,10),(2,20)]
``````

Lazy and bias-free!

If you want to try out out this example, be sure to get at least version 0.1.9 of `unamb`. I tweaked the implementation to handle pattern-match failures gracefully.

### Checking totality

One unfortunate aspect of this definition style is that the Haskell compiler can no longer reliably warn us about non-exhaustive pattern-based definitions. With warnings turned on, I get the following messages:

``````Warning: Pattern match(es) are non-exhaustive
In the definition of `p1': Patterns not matched: (_ : _) _

Warning: Pattern match(es) are non-exhaustive
In the definition of `p2': Patterns not matched: _ (_ : _)

Warning: Pattern match(es) are non-exhaustive
In the definition of `p3':
Patterns not matched:
[] _
(_ : _) []
``````

Perhaps a tool like Neil Mitchell’s Catch could be adapted to understand the semantics of `unamb` and `lub` (⊔) sufficiently to prove totality.

### More possibilities

The `zip'` definition above places `HasLub` constraints on the type parameters `a` and `b`. The origin of these constraints is the `HasLub` instance for pairs, which is roughly the following:

``````instance (HasLub a, HasLub b) => HasLub (a,b) where
(a,b) ⊔ (a',b') = (a ⊔ a', b ⊔ b')
``````

This instance is subtly incorrect. See Merging partial values.

Thanks to this instance, each argument to `(⊔)` can contribute partial information to each half of the pair. (As a special case, each argument could contribute a half.)

Although I haven’t run into a use for this flexibility yet, I’m confident that there are very cool uses waiting to be discovered. Please let me know if you have any ideas.

1. #### Luke Palmer:

Well, I don’t know about uses, but with the existence of lub, I smell a theorem to the order of “Every set of equations has a least strict implementation”, for a suitably restricted definition of equation. If true, I would consider this a very important theorem. If I have some idle brainpower (unlikely) I’ll play with it a bit.

2. #### Eyal Lotem:

Perhaps this calls for a language extension of non-biased pattern-matches?

As a language extension, it will probably be easier to prove totality while retaining nearly identical syntax to existing pattern-matching functions.

3. #### Robert F:

Is this related to the indefinability of parallel or in PCF (and corresponding lack of full abstraction for the domain theoretic semantics thereof)?

4. #### Conal Elliott » Blog Archive » Exact numeric integration:

[…] Now we have some information. How can we mix it in with the sum of recursive calls to integral? We can use (⊔) (least upper bound or “lub”), which is perfect for the job because its meaning is exactly to combine two pieces of information. See Merging partial values and Lazier function definitions by merging partial values. […]

5. #### Conal Elliott » Blog Archive » Nonstrict memoization:

[…] Lazier function definitions by merging partial values, I examined the standard Haskell style (inherited from predecessors) of definition by clauses, […]

6. #### Conal Elliott » Blog Archive » Lazier functional programming, part 1:

[…] reading. (The reading is less straightforward when patterns overlap, as mentioned in Lazier function definitions by merging partial values.) In a non-strict language like Haskell, there are three distinct boolean values, not two. Besides […]