Lazier functional programming, part 2

In Lazier functional programming, part 1, I suggested that some of the standard tools of lazy functional programming are not as lazy as they might be, including even if-then-else. Generalizing from boolean conditionals, I posed the puzzle of how to define a lazier either function, which encapsulates case analysis for sum types. The standard definition:

data Either a b = Left a | Right b

either :: (a → c) → (b → c) → Either a b → c
either f g (Left  x) = f x
either f g (Right y) = g y

The comments to part 1 fairly quickly converged to something close to the laxer definition I had in mind:

eitherL f g = const (f ⊥ ⊓ g ⊥) ⊔ either f g

which is a simple generalization of Luke Palmer‘s laxer if-then-else (reconstructed from memory):

bool c a b = (a ⊓ b) ⊔ (if c then a else b)

My thanks to Dave for “laxer” as a denotational alternative to the operational term “lazier”.

Note in the either definition that the argument ordering allows computing (f ⊥ ⊓ g ⊥) just once and reusing the result for different values of the third argument. Similarly,

cond :: a → a → Bool → a
cond a b = const (a ⊓ b) ⊔ (λ c → if c then a else b)

Let’s look at some examples:

cond 6 8 ⊥ ≡ (6 ⊓ 8) ⊔ ⊥ ≡ 6 ⊓ 8 ≡ ⊥

cond 7 7 ⊥ ≡ (7 ⊓ 7) ⊔ ⊥ ≡ 7 ⊓ 7 ≡ 7

cond (3,4) (4,5) ⊥ ≡ ((3,4) ⊓ (4,5)) ⊔ ⊥ ≡ (⊥,⊥)

cond (3,4) (3,5) ⊥ ≡ ((3,4) ⊓ (3,5)) ⊔ ⊥ ≡ (3,⊥)

cond [2,3,5] [1,3] ⊥ ≡ ([2,3,5] ⊓ [1,3]) ⊔ ⊥ ≡ ⊥ : 3 : ⊥

These results are more useful than they might at first appear. Monotonicity implies that the following information-inequalities also hold for an arbitrary boolean c:

cond 6 8 c ⊒ ⊥
cond 7 7 c ⊒ 7
cond (3,4) (4,5) c ⊒ (⊥,⊥)
cond (3,4) (3,5) c ⊒ (3,⊥)
cond [2,3,5] [1,3] c ⊒ (⊥ : 3 : ⊥)

In other words, given only a and b, we can already start extracting some information about the value of cond a b c. If c takes a long time to compute (forever in the extreme case), we may be able to get some useful information out of cond before cond gets any information out of c. Moreover, the presence of “⊔” hints at parallelization. Evaluation of a ⊓ b can proceed in one thread, while another computes the standard conditional.

At the function level,

cond 6 8 ⊒ const ⊥
cond 7 7 ⊒ const 7
cond (3,4) (4,5) ⊒ const (⊥,⊥)
cond (3,4) (3,5) ⊒ const (3,⊥)
cond [2,3,5] [1,3] ⊒ const (⊥ : 3 : ⊥)

This observation is handy when cond is partially applied to two arguments and the resulting function is reused. We can even compute a ⊓ b once and share the results among many calls.

These same considerations apply more generally to the laxer either definition above.

Does the definition of either above really satisfy the conditions of the puzzle, and how did I come up with it? A key insight is that monotonicity implies the following two inequalities for all a and b:

eitherL f g ⊥ ⊑ eitherL f g (Left  a)
eitherL f g ⊥ ⊑ eitherL f g (Right b)

Consistency with the defining equations then imply that

eitherL f g (Left  a) ≡ f a
eitherL f g (Right b) ≡ g b

so

eitherL f g ⊥ ⊑ f a
eitherL f g ⊥ ⊑ g b

and hence

eitherL f g ⊥ ⊑ f a ⊓ g b

Specializing to a ≡ b ≡ ⊥ gives us an single least of these upper bounds:

eitherL f g ⊥ ⊑ f ⊥ ⊓ g ⊥

similarly to Albert’s reasoning for the similar case of if-then-else.

I asked for a information-maximal version of eitherL so as to get as much information out with as little information in, in the spirit of lax (non-strict) functional programming. For the maximal version, choose equality:

eitherL f g ⊥ ≡ f ⊥ ⊓ g ⊥

There are three fairly simple proofs remaining, namely that (a) this equality holds for the definition of eitherL above, (b) the use of (⊔) is legitimate in that definition, and (c) the definition as a whole is consistent with the defining equations.

Finally, note that lub (⊔) is used in a restricted way here. In this use u ⊔ v, not only are the arguments u and v information-compatible (the semantic pre-condition of lub), but also, u ⊑ v. This property shows up in several applications of lub, and I suspect it can be useful in efficient implementation.

10 Comments

  1. Paul Brauner:

    So, if I understand, the ideal version cannot be reached effectively without solving the halting problem. The best you can do is to approximate it as much as you want by choosing a timeout as long as you want on the computation of (f ⊥ ⊓ g ⊥). Am I right?

  2. conal:

    Paul: No. I meant the definitions in the post as actual, computable, ideal implementations, except that I left out some class constraints, which I’ll add in an edit.

  3. David Sankel:

    Dimtar’s implementation,

    eitherL' f g x = ( f ⊥ ⊓ g ⊥ ) ⊔ either f g x
    

    , seems to generally have the same properties as your implementation.

    On the other hand, I’m not certain if it “allows computing ( f ⊥ ⊓ g ⊥ ) just once and reusing the result for different values of the third argument”. Couldn’t an evaluator easily detect ( f ⊥ ⊓ g ⊥ ) as a common subexpression in two different applications of eitherL' f g?

  4. conal:

    Couldn’t an evaluator easily detect ( f ⊥ ⊓ g ⊥ ) as a common subexpression in two different applications of eitherL’ f g?

    What do you mean by an “evaluator” in this question? Something that structurally analyzes and transforms thunks at run time?

    If you’re asking about compilers, then my understanding is a compiler could perform the same code transformation that I applied, and that GHC intentionally does not, so as to avoid space leaks.

  5. David Sankel:

    I guess the thing that’s tripping me up is that I don’t know what is being allowed to compute (who is computing?) in the sentence “Note in the either definition that the argument ordering allows computing (f ⊥ ⊓ g ⊥) just once and reusing the result for different values of the third argument”. Maybe the subject here is a Haskell interpreter with certain assumed optimizations (or lack of)?

  6. conal:

    David: Ah, thanks. Now I get the ambiguity. I’ll try to clarify.

    Suppose you define h = eitherL f g (for some functions f and g), and then apply h to several Either-valued arguments. Then under GHC or GHCi, the work of computing f ⊥ ⊓ g ⊥ will be done just once, not repeatedly.

    Now suppose instead that eitherL were defined as

    eitherL f g x = (f ⊥ ⊓ g ⊥)either f g x

    and again, define h = eitherL f g (for some functions f and g). In this case f ⊥ ⊓ g ⊥ will get redundantly computed every time h is applied, because it is expressed within the scope of the “λ x → ...” (after desugaring).

    The same sort of operational difference holds between the two following denotationally-equivalent expressions:

    const (nthPrime 100000)
     
      _ -> nthPrime 100000

  7. David Sankel:

    Thanks. That clears it up. For those interested, I’ve asked haskell-cafe for more information on that particular optimization.

  8. conal:

    The new version of the lub package has modules for glb and for lazier ‘if-then-else’ and ‘either’.

  9. conal:

    Make that laxer if-then-else and either. Just fixed module name.

    Thanks to Luke Palmer for the suggestion, via Twitter.

    luqui: “@conal, lazier? Not laxer?”.

    conal: @luqui For the module name? “Laxer” would be more fitting than “Lazier”, wouldn’t it?

    luqui: @conal, well according to the motivating blog posts, yes. I’m wondering why you went to the trouble to find a good term then didn’t use it.

    conal: @luqui simple reason: i forgot. just fixed. thanks!

  10. Douglas McClean:

    @Paul, I thought so too at first, but now I think I understand why that isn’t the case. If you alternate between taking steps to evaluate f \bot and taking steps to evaluate g \bot, one of three things will happen.

    Either the partial answers you start to get will differ at some point (in which case you are done), or you will get to the end of both without encountering a difference (in which case you are done), or you will keep evaluating forever (in which case the answer was –––\bot anyway).

    (Is this the gist of it? Or am I missing a different reason?)

Leave a comment