## Exact numeric integration

This post describes a simple way to integrate a function over an interval and get an exact answer. The question came out of another one, which is how to optimally render a continuous-space image onto a discrete array of pixels.

For anti-aliasing, I’ll make two simplying assumptions (to be revisited):

- Each pixel is a square area. (With apologies to Alvy Ray Smith.)
- Since I can choose only one color per pixel, I want exactly the
*average*of the continuous image over pixel’s subregion of 2D space.

The average of a function over a region (here a continuous image over a 2D interval) is equal to the integral of the function across the region divided by the size (area for 2D) of the region. Since our regions are simple squares, the average and the integral can each be defined easily in terms of the other (dividing or multiplying by the size).

To simplify the problem further, I’ll consider one-dimensional integration, i.e., integrating a function of **R** over a 1D interval.
My solution below involves the least upper bound operator I’ve written about (and its specialization `unamb`

).

### A first simple algorithm

Integration takes a real-valued function and an interval (low & high) and gives a real.

integral :: (R->R) -> R -> R -> R

Integration has a property of interval additivity, i.e., the integral of a function from *a* to *c* is the sum of the integral from *a* to *b* and the integral from *b* to *c*.

∀ a b c. integral f a c == integral f a b + integral f b c

which immediately leads to a simple recursive algorithm:

integral f low high = integral f low mid + integral f mid high where mid = (low + high) / 2

Extending to 2D is simple: we could divide a rectangular region into four quarter subregions, or into two subregions by splitting the longer dimension.
The quartering variation is very like mipmapping, as used to anti-alias textures.
Mipmapping takes a pixel array and builds a pyramid of successively lower resolution versions.
Each level except the first is constructed out of the previous (next higher-resolution) level by averaging blocks of four pixels into one.
The simple `integral`

algorithm above (extended to 2D) is like mipmapping when we start with an *infinite resolution* (i.e., continuous) texture.

### Umm …

Maybe you’re thinking what I’m thinking: Hey! We don’t have a base case, so we won’t even get off the ground.

Given that our domain is continuous, I don’t know what to use for a base case. So let’s consider what the purpose of a base case is, and see whether that purpose can be accomplished some other way.

### What’s so important about base cases?

Does every recursion need a base case in order to avoid being completely undefined?

Here’s a counter-example: mapping a function over an infinite stream, taken from the source code of the Stream package.

data Stream a = Cons a (Stream a) instance Functor Stream where fmap f ~(Cons x xs) = Cons (f x) (fmap f xs)

The key thing here is that `Cons`

is *non-strict* in its second argument, which holds the recursive call.
(Definition: a function `f`

is “strict” if `f ⊥ == ⊥`

.)

Non-strictness of `if-then-else`

is exactly what allows more mundane recursions to produce defined (non-⊥) results as well, e.g.,

fac n = if n <= 0 then 1 else fac (n-1)

In this `fac`

example, `if-then-else`

must be (and *is*) non-strict in its third argument (the recursive call).

So “base case” is not the heart of the matter; non-strictness is.

### Finding non-strictness

The trouble with our elegant recursive derivation of `integral`

above is that addition is strict in its arguments (where the recursive `integral`

calls appear).
This strictness means that we cannot get *any information at all* out of the right-hand side until we get some information out of the recursive calls, which won’t happen until we get information out of *their* recursive calls, ad infinitum.

To escape this information black hole, can we find some scrap of information about the value of the integral that doesn’t (always) rely on the recursive calls?

Interval analysis (IA) provides an answer.
The idea of IA is to extend functions to apply to *intervals* of values and produce intervals of results.
The interval versions are sloppy in that a result interval may hold values not corresponding to anything in the source interval.
(In math-speak, a result interval may be a *proper* superset of the image of the source interval under the function.)

Applying an interval version of the function to the source interval results in lower and upper bounds for `f`

.
The average must be between these bounds, so the integral is bounded by these bounds multiplied by the interval size.
(Or use more sophisticated variations on IA, such as affine analysis, etc.)

Now we have *some* information.
How can we mix it in with the sum of recursive calls to `integral`

?
We can use `(⊔)`

(least upper bound or “lub”), which is perfect for the job because its meaning is exactly to combine two pieces of information.
See *Merging partial values* and *Lazier function definitions by merging partial values*.

Instead of treating IA as operating on *intervals*, think of it as operating on “partial numbers”, i.e., inexact values.
Suppose we define a type of numbers that consistently generalizes exact numbers but is additionally populated with inexact values.

type Partial R -- abstract between :: Ord a => a -> a -> Partial a exact :: a -> Partial a

Perhaps `exact a = between a a`

.

Now we can escape the black hole:

integral f low high = ((high - low) * f (between low high)) ⊔ (integral f low mid + integral f mid high) where mid = (low + high) / 2

### Representation

One representation of a partial value is an interval.
In this representation, `(⊔)`

is interval intersection.

If the lower and bounds are plain old exact numbers, then this choice of representation and `(⊔)`

has a fatal flaw.
The `max`

and `min`

functions are strict, so `(⊔)`

can easily produce *less* information than it is given, while its job is to combine all information present in both arguments.
(For instance, let `l == 3`

and `l' == ⊥`

. Then `l `max` l' == ⊥`

, so we’ve lost the information from `l`

.)

One possible solution is to change the kind of numbers used in the bounds in such a way that `max`

and `min`

are not strict.
Let’s use two new types, for lower and upper bounds:

type Lower a -- abstract type Upper a -- abstract

These two types also capture partial information about numbers, though they cannot express exact numbers (other than `maxBound`

and `minBound`

, where available).

Now we can represent partial numbers.

type Partial a = (Lower a, Upper a) (l,h) ⊔ (l',h') = (l ⊔ l', h ⊔ h')

Notice that I’ve replaced both `max`

and `min`

each by `(⊔)`

.
Now I realize that I only used `max`

and `min`

above as a way of combining the information that the lower and upper bounds (respectively) gave us.
The `(⊔)`

operator states this intention directly.

Pleasantly, we don’t even have to state this definition, as `(⊔)`

is already defined this way for pairs.
(Well, not quite; see *Merging partial values*.)

### Improving values and intervals

This idea for representing partial values is very like what Warren Burton and Ken Jackson called “improving intervals”, which is a two-sided version of Warren’s “improving values” (corresponding to `Lower`

above).
(See *Encapsulating nondeterminacy in an abstract data type with deterministic semantics* (JFP, 1991) and *Improving Intervals* (JFP, 1993). As these papers are hard to find, you might start with Ken Jackson’s dissertation.)

Warren and Ken represented improving values and intervals as lazy, possibly-infinite lists of improving approximations.
While an improving value is represented as a sequence (finite or infinite) of monotonically increasing values, an improving interval is represented as a sequence of monotonically shrinking intervals (each containing the next).
The *denotation* of one of these improving representations is the limit of the sequence.
Any query for information not specifically present in a partial value would yield ⊥, just as applying a partial function (or data structure) outside its domain yields ⊥.
For instance, one could ask a partial number for successive bits.
If a requested bit cannot be determined from the available bounds, then that bit and all later bits are ⊥.

### Dropping `lub`

There’s a subtlety in the second definition of `integral`

above.
My goal is that `integral`

yield a completely defined (exact) number.
We can meet this goal for fairly well-behaved functions, since IA gives decreasing errors as input intervals shrink, and the result intervals are multiplied by shrinking interval sizes.
Even discontinuities in the integrated function will be smoothed out, if there are only finitely many, thanks to the multiplication.

Completely defined values are at the tops of the information ordering.
(I’m assuming we don’t have the “over-defined” (self-contradictory) value ⊤.)
The sum of recursive calls is also fully defined, and starter partial value, `f (between low high)`

, has strictly less information, i.e.,

(high - low) * f (between low high) ⊑ integral f low high = integral f low mid + integral f mid high

So we can get by with a less general form of `(⊔)`

.
If we represent our numbers as lazy lists of improving intervals, then we can simply use `(:)`

in place of `(⊔)`

.

Hm. My reasoning just above is muddled. I think the crucial property is that

(high - low) * f (between low high) ⊑ (mid - low) * f (between low mid ) + (high - mid) * f (between mid high)

To see why this property holds, first subdivide:

(high - low) * f (between low high) == ((mid - low) + (high - mid)) * f (between low high) == (mid - low) * f (between low high) + (high - mid) * f (between low high)

Next apply information monotonicity, i.e., more information (smaller interval) in yields more information out:

f (between low high) ⊑ f (between low med) f (between low high) ⊑ f (between med high)

from which the crucial property follows.

*Note*: the notation is potentially confusing, since smaller intervals means more information, and so `(⊑)`

means `(⊇)`

, and `(⊔)`

means `(∩)`

.

### Efficiency and benign side-effects

There is a serious efficiency problem with this (lazy) list-based representation of improving values or intervals, as pointed out by Ken and Warren:

At any point in time, it is really only the tightest bounds currently in the list that are important. [...] Hence, the list representation consumes more space than is necessary, and time is wasted examining the out-of-date values in the list. A better representation, in a language that allows update-in-place, would be a pair consisting of the tightest lower bound and the tightest upper bound found so far. This pair would be updated-in-place when better bounds become known. (

Improving Intervals, section 5.2)

The update-in-place suggested here is semantically benign, i.e., does not compromise pure functional semantics, because it doesn’t change the information content. Instead, it makes that content more efficient to access a second time. The same sort of semantically benign optimization underlies lazy evaluation, with the run-time system destructively replacing a thunk upon its first evaluation.

This benign-update idea is also explored in *Another angle on functional future values*.

### Averaging vs integration

In practice, I’d probably rather use a recursive interval average function instead of a recursive integral.
Recall that with the recursive `integral`

, we had to multiply by the interval size at each step.
The intervals get very small, and I worry about having to combine numbers of greatly differing magnitudes.
With a recursive average, the numbers get averaged at each step, which I expect means we’ll be combining numbers of similar magnitudes.

average f low high = f (between low high) ⊔ average f low mid `avg` average f mid high where mid = low `avg` high a `avg` b = (a + b) / 2 integral f low high = (high - low) * average f low high -- non-recursive

Here’s a modified form that can apply to higher dimensions as well:

average f iv = f iv ⊔ mean [average f iv' | iv' <- subdivide iv] mean l = sum l / length l integral f iv = size iv * average f iv -- non-recursive

### Exact computation

The algorithms described above can easily run afoul of inexact numeric representations, such as `Float`

or `Double`

.
With such representation types, as recursive subdivision progresses, at some point, an interval will be bounded by consecutive representable numbers.
At that point, sub-dividing an interval will result in an empty interval plus the pre-divided interval, and we will stop making progress.

One solution to this problem is use *exact* number representations.
Another is to use progressively precise inexact representations.

## Conal:

Heinrich Apfelmus gave a quick & dirty implementation in the Haskell reddit page for this post. Thanks, Heinrich!

2 January 2010, 4:48 pm## Martijn van Steenbergen:

Is the ~ in Stream’s Functor instance necessary? Wouldn’t it already be sufficiently lazy without the ~?

13 January 2010, 5:51 am## Yrogirg:

I think the described method is similar to one in this article:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.54.395&rep=rep1&type=pdf

“Lazy functional algorithms for exact real functionals”

by Alex K Simpson 1998

14 March 2010, 7:25 am## Conal Elliott » Blog Archive » Nonstrict memoization:

[...] restricted and more efficient alternative to the fully general least upper bound. The technique in Exact numeric integration also used this restricted [...]

18 July 2010, 8:41 am