Conal Elliott

Merging partial values

21st November 2008, 09:41 pm

Last year I stumbled across a simple representation for partial information about values, and wrote about it in two posts, A type for partial values and Implementing a type for partial values. Of particular interest is the ability to combine two partial values into one, combining the information present in each one.

More recently, I played with unambiguous choice, described in the previous post.

This post combines these two ideas. It describes how to work with partial values in Haskell natively, i.e., without using any special representation and without the use restrictions of unambiguous choice. I got inspired to try removing those restrictions during stimulating discussions with Thomas Davie, Russell O’Connor others in the #haskell gang.

You can download and play with the library shown described here. There are links and a bit more info on the lub wiki page.

Edits:

2008-11-22: Fixed link: Implementing a type for partial values
2008-11-22: Tidied introduction

Information, more or less

The meanings of programming languages are often defined via a technique called “denotational semantics”. In this style, one specifies a mathematical model, or semantic domain, for the meanings of utterances in the and then writes what looks like a recursive functional program that maps from syntax to semantic domain. That “program” defines the semantic function. Really, there’s one domain and one semantic function each syntactic category. In typed languages like Haskell, every type has an associated semantic domain.

One of the clever ideas of the theory of semantic domains (“domain theory”) is to place a partial ordering on values (domain members), based on information content. Values can not only be equal and unequal, they can also have more or less information content than each other. The value with the least information is at the bottom of this ordering, and so is called “bottom”, often written as “⊥”. In Haskell, ⊥ is the meaning of “undefined“. For succinctness below, I’ll write “⊥” instead of “undefined” in Haskell code.

Many types have corresponding flat domains, meaning that the only values are either completely undefined completely defined. For instance, the Haskell type Integer is (semantically) flat. Its values are all either ⊥ or integers.

Structured types are not flat. For instance, the meaning of (i.e., the domain corresponding to) the Haskell type (Bool,Integer) contains five different kinds of values, as shown in the figure. Each arrow leads from a less-defined (less informative) value to a more-defined value.

To handle the diversity of Haskell types, define a class of types for which we know how to compute lubs.

class HasLub a where (⊔) :: a -> a -> a

The actual lub library, uses “lub” instead of “(⊔)“.

The arguments must be consistent, i.e., must have a common upper bound. This precondition is not checked statically or dynamically, so the programmer must take care.

Flat types

For flat types, (⊔) is equivalent to unamb (see Functional concurrency with unambiguous choice), so

-- Flat types:
instance HasLub ()      where (⊔) = unamb
instance HasLub Bool    where (⊔) = unamb
instance HasLub Char    where (⊔) = unamb
instance HasLub Integer where (⊔) = unamb
instance HasLub Float   where (⊔) = unamb
...

Pairs

Too strict

We can handle pairs easily enough:

instance (HasLub a, HasLub b) => HasLub (a,b) where
  (a,b) ⊔ (a',b') = (a ⊔ a', b ⊔ b')

but we’d be wrong! This definition is too strict, e.g.,

(1,2) ⊔ ⊥ == ⊥

when we’d want (1,2).

Too lazy

No problem. Just make the patterns lazy:

instance (HasLub a, HasLub b) => HasLub (a,b) where
  ~(a,b) ⊔ ~(a',b') = (a ⊔ a', b ⊔ b')

Oops — wrong again. This definition is too lazy:

⊥ ⊔ ⊥ == (⊥,⊥)

when we’d want ⊥.

Almost

We can fix the too-lazy version by checking that one of the arguments is non-bottom, which is what seq does. Which one to we check? The one that isn’t ⊥, or either one if they’re both defined. Our friend unamb can manage this task:

instance (HasLub a, HasLub b) => HasLub (a,b) where
  ~p@(a,b) ⊔ ~p'@(a',b') =
    (p `unamb` p') `seq` (a ⊔ a', b ⊔ b')

But there’s a catch (which I realized only now): p and p' may not satisfy the precondition on unamb. (If they did, we could use (⊔) = unamb.)

Just right

To fix this last problem, check whether each pair is defined. We can’t know which to check first, so test concurrently, using unamb.

instance (HasLub a, HasLub b) => HasLub (a,b) where
  ~p@(a,b) ⊔ ~p'@(a',b') =
    (definedP p `unamb` definedP p')
    `seq` (a ⊔ a', b ⊔ b')

where

definedP :: (a,b) -> Bool
definedP (_,_) = True

The implicit second case in that definition is definedP ⊥ = ⊥.

Some examples:

*Data.Lub> (⊥,False) ⊔ (True,⊥)
(True,False)
*Data.Lub> (⊥,(⊥,False)) ⊔ ((),(⊥,⊥)) ⊔ (⊥,(True,⊥))
((),(True,False))

Sums

For sums, we’ll discriminate between lefts and rights, with the following assistants:

isL :: Either a b -> Bool
isL = either (const True) (const False)

outL :: Either a b -> a
outL = either id (error "outL on Right")

outR :: Either a b -> b
outR = either (error "outR on Left") id

The (⊔) method for sums unwraps its arguments as a Left or as a Right, after checking which kind of arguments it gets. We can’t know which one to check first, so check concurrently:

instance (HasLub a, HasLub b) => HasLub (Either a b) where
  s ⊔ s' = if isL s `unamb` isL s' then
             Left  (outL s ⊔ outL s')
           else
             Right (outR s ⊔ outR s')

Exercise: Why is the use of unamb here legal?

Functions

A function f is said to be less (or equally) defined than a function g when f is less (or equally) defined g for every argument value. Consequently,

instance HasLub b => HasLub (a -> b) where
  f ⊔ g =  a -> f a ⊔ g a

More succinctly:

instance HasLub b => HasLub (a -> b) where (⊔) = liftA2 (⊔)

Other types

We’ve already handled the unit type (), other flat types, pairs, sums, and functions. Algebraic data types can be modeled via this standard set, with a technique from generic programming. Define methods that map to and from a type in the standard set.

class HasRepr t r | t -> r where
  -- Required: unrepr . repr == id
  repr   :: t -> r  --  to repr
  unrepr :: r -> t  -- from repr

We can implement (⊔) on a type by performing a (⊔) on that type’s standard representation, as follows:

repLub :: (HasRepr a v, HasLub v) => a -> a -> a
a `repLub` a' = unrepr (repr a ⊔ repr a')

instance (HasRepr t v, HasLub v) => HasLub t where
  (⊔) = repLub

However, this rule would overlap with all other HasLub instances, because Haskell instance selection is based only on the head of an instance definition, i.e., the part after the “=>“. Instead, we’ll define a HasLub instance per HasRepr instance.

For instance, here are encodings for Maybe and for lists:

instance HasRepr (Maybe a) (Either () a) where
  repr   Nothing   = (Left ())
  repr   (Just a)  = (Right a)

  unrepr (Left ()) = Nothing
  unrepr (Right a) = (Just a)

instance HasRepr [a] (Either () (a,[a])) where
  repr   []             = (Left  ())
  repr   (a:as)         = (Right (a,as))

  unrepr (Left  ())     = []
  unrepr (Right (a,as)) = (a:as)

And corresponding HasLub instances:

instance HasLub a => HasLub (Maybe a) where (⊔) = repLub
instance HasLub a => HasLub [a]       where (⊔) = repLub

Testing:

*Data.Lub> [1,⊥,2] ⊔ [⊥,3,2]
[1,3,2]

| Comment (RSS) | Trackback

6 Comments

Luke Palmer:

I played with almost exactly the same thing quite a while ago. I always felt that this should work in Haskell:
```
if _|_ then [1..] else [1..] = [1..]
```
Semantically, I wanted if with these properties:
```
if True  then a else b = a
if False then a else b = b
if _|_   then a else b = a `lub` b
```
So I made a Lub class just like this and implemented it:
```
if' :: Bool -> a -> a -> a
if' c x y = (if c then x else y) `lub` x `lub` y
```
Which was pretty cool. Unfortunately on my motivating example, it did work, but it crawled like a snail; using 100% cpu and outputting about 2 elements per second.

I wonder if there is some runtime support that would make this speedier.

And I cannot get your alleged markdown support to work…
22 November 2008, 1:42 am
Conal Elliott » Blog Archive » Another angle on functional future values:

[…] It took me several days of doodling, pacing outside, and talking to myself before the idea for unamb broke through. Like many of my favorite ideas, it’s simple and obvious in retrospect: to remove the ambiguity of nondeterministic choice (as in the amb operator), restrict its use to values that are equal when non-bottom. Whenever we have two different methods of answering the same question (or possibly failing), we can use unamb to try them both. Failures (errors or non-termination) are no problem in this context. A more powerful variation on unamb is the least upper bound operator lub, as described in Merging partial values. […]
4 January 2009, 9:49 pm
Conal Elliott » Blog Archive » Exact numeric integration:

[…] is perfect for the job because its meaning is exactly to combine two pieces of information. See Merging partial values and Lazier function definitions by merging partial […]
28 December 2009, 10:25 am
conal:

Hi Luke. Thanks for another insightful comment.

Your use of unamb in defining if' does not meet the required precondition (of information-compatible arguments). From a conversation on #haskell, I know you’ve come up with a really beautiful correct definition. I’d love to see you post the correct version for all to admire.
11 July 2010, 8:14 pm
Conal Elliott » Blog Archive » Nonstrict memoization:

[…] “⊑” means has less (or equal) information content, as explained in Merging partial values. In other words, if you tell f more about an argument, it will tell you more about the result, […]
13 July 2010, 6:47 pm
Conal Elliott » Blog Archive » Lazier functional programming, part 1:

[…] In a strict language, where there are only two boolean values, these two clauses have a straightforward reading. (The reading is less straightforward when patterns overlap, as mentioned in Lazier function definitions by merging partial values.) In a non-strict language like Haskell, there are three distinct boolean values, not two. Besides True and False, Bool also has a value ⊥, pronounced “bottom” for being at the bottom of the information ordering. For an illustration and explanation of information ordering, see Merging partial values. […]
13 September 2010, 2:23 pm

Merging partial values

Information, more or less

Flat types

Pairs

Too strict

Too lazy

Almost

Just right

Sums

Functions

Other types

6 Comments

Luke Palmer:

Conal Elliott » Blog Archive » Another angle on functional future values:

Conal Elliott » Blog Archive » Exact numeric integration:

conal:

Conal Elliott » Blog Archive » Nonstrict memoization:

Conal Elliott » Blog Archive » Lazier functional programming, part 1:

Leave a comment

Recent Posts

Recent Comments

Archives

Tags

Tags

Meta