Exact numeric integration

This post describes a simple way to integrate a function over an interval and get an exact answer. The question came out of another one, which is how to optimally render a continuous-space image onto a discrete array of pixels.

For anti-aliasing, I’ll make two simplying assumptions (to be revisited):

• Each pixel is a square area. (With apologies to Alvy Ray Smith.)
• Since I can choose only one color per pixel, I want exactly the average of the continuous image over pixel’s subregion of 2D space.

The average of a function over a region (here a continuous image over a 2D interval) is equal to the integral of the function across the region divided by the size (area for 2D) of the region. Since our regions are simple squares, the average and the integral can each be defined easily in terms of the other (dividing or multiplying by the size).

To simplify the problem further, I’ll consider one-dimensional integration, i.e., integrating a function of R over a 1D interval. My solution below involves the least upper bound operator I’ve written about (and its specialization unamb).

A first simple algorithm

Integration takes a real-valued function and an interval (low & high) and gives a real.

integral :: (R->R) -> R -> R -> R

Integration has a property of interval additivity, i.e., the integral of a function from a to c is the sum of the integral from a to b and the integral from b to c.

∀ a b c. integral f a c == integral f a b + integral f b c

which immediately leads to a simple recursive algorithm:

integral f low high = integral f low mid + integral f mid high
where mid = (low + high) / 2

Extending to 2D is simple: we could divide a rectangular region into four quarter subregions, or into two subregions by splitting the longer dimension. The quartering variation is very like mipmapping, as used to anti-alias textures. Mipmapping takes a pixel array and builds a pyramid of successively lower resolution versions. Each level except the first is constructed out of the previous (next higher-resolution) level by averaging blocks of four pixels into one. The simple integral algorithm above (extended to 2D) is like mipmapping when we start with an infinite resolution (i.e., continuous) texture.

Umm …

Maybe you’re thinking what I’m thinking: Hey! We don’t have a base case, so we won’t even get off the ground.

Given that our domain is continuous, I don’t know what to use for a base case. So let’s consider what the purpose of a base case is, and see whether that purpose can be accomplished some other way.

What’s so important about base cases?

Does every recursion need a base case in order to avoid being completely undefined?

Here’s a counter-example: mapping a function over an infinite stream, taken from the source code of the Stream package.

data Stream a = Cons a (Stream a)

instance Functor Stream where
fmap f ~(Cons x xs) = Cons (f x) (fmap f xs)

The key thing here is that Cons is non-strict in its second argument, which holds the recursive call. (Definition: a function f is “strict” if f ⊥ == ⊥.)

Non-strictness of if-then-else is exactly what allows more mundane recursions to produce defined (non-⊥) results as well, e.g.,

fac n = if n <= 0 then 1 else fac (n-1)

In this fac example, if-then-else must be (and is) non-strict in its third argument (the recursive call).

So “base case” is not the heart of the matter; non-strictness is.

Finding non-strictness

The trouble with our elegant recursive derivation of integral above is that addition is strict in its arguments (where the recursive integral calls appear). This strictness means that we cannot get any information at all out of the right-hand side until we get some information out of the recursive calls, which won’t happen until we get information out of their recursive calls, ad infinitum.

To escape this information black hole, can we find some scrap of information about the value of the integral that doesn’t (always) rely on the recursive calls?

Interval analysis (IA) provides an answer. The idea of IA is to extend functions to apply to intervals of values and produce intervals of results. The interval versions are sloppy in that a result interval may hold values not corresponding to anything in the source interval. (In math-speak, a result interval may be a proper superset of the image of the source interval under the function.)

Applying an interval version of the function to the source interval results in lower and upper bounds for f. The average must be between these bounds, so the integral is bounded by these bounds multiplied by the interval size. (Or use more sophisticated variations on IA, such as affine analysis, etc.)

Now we have some information. How can we mix it in with the sum of recursive calls to integral? We can use (⊔) (least upper bound or “lub”), which is perfect for the job because its meaning is exactly to combine two pieces of information. See Merging partial values and Lazier function definitions by merging partial values.

Instead of treating IA as operating on intervals, think of it as operating on “partial numbers”, i.e., inexact values. Suppose we define a type of numbers that consistently generalizes exact numbers but is additionally populated with inexact values.

type Partial R  -- abstract

between :: Ord a => a -> a -> Partial a
exact   :: a -> Partial a

Perhaps exact a = between a a.

Now we can escape the black hole:

integral f low high = ((high - low) * f (between low high)) ⊔
(integral f low mid + integral f mid high)
where mid = (low + high) / 2

Representation

One representation of a partial value is an interval. In this representation, (⊔) is interval intersection.

type Partial a = (a,a)  -- first try

(l,h) ⊔ (l',h') = (l `max` l', h `min` h')

If the lower and upper bounds are plain old exact numbers, then this choice of representation and (⊔) has a fatal flaw. The max and min functions are strict, so (⊔) can easily produce less information than it is given, while its job is to combine all information present in both arguments. (For instance, let l == 3 and l' == ⊥. Then l `max` l' == ⊥, so we’ve lost the information from l.)

One possible solution is to change the kind of numbers used in the bounds in such a way that max and min are not strict. Let’s use two new types, for lower and upper bounds:

type Lower a  -- abstract
type Upper a  -- abstract

These two types also capture partial information about numbers, though they cannot express exact numbers (other than maxBound and minBound, where available).

Now we can represent partial numbers.

type Partial a = (Lower a, Upper a)

(l,h) ⊔ (l',h') = (l ⊔ l', h ⊔ h')

Notice that I’ve replaced both max and min each by (⊔). Now I realize that I only used max and min above as a way of combining the information that the lower and upper bounds (respectively) gave us. The (⊔) operator states this intention directly.

Pleasantly, we don’t even have to state this definition, as (⊔) is already defined this way for pairs. (Well, not quite; see Merging partial values.)

Improving values and intervals

This idea for representing partial values is very like what Warren Burton and Ken Jackson called “improving intervals”, which is a two-sided version of Warren’s “improving values” (corresponding to Lower above). (See Encapsulating nondeterminacy in an abstract data type with deterministic semantics (JFP, 1991) and Improving Intervals (JFP, 1993). As these papers are hard to find, you might start with Ken Jackson’s dissertation.)

Warren and Ken represented improving values and intervals as lazy, possibly-infinite lists of improving approximations. While an improving value is represented as a sequence (finite or infinite) of monotonically increasing values, an improving interval is represented as a sequence of monotonically shrinking intervals (each containing the next). The denotation of one of these improving representations is the limit of the sequence. Any query for information not specifically present in a partial value would yield ⊥, just as applying a partial function (or data structure) outside its domain yields ⊥. For instance, one could ask a partial number for successive bits. If a requested bit cannot be determined from the available bounds, then that bit and all later bits are ⊥.

Dropping lub

There’s a subtlety in the second definition of integral above. My goal is that integral yield a completely defined (exact) number. We can meet this goal for fairly well-behaved functions, since IA gives decreasing errors as input intervals shrink, and the result intervals are multiplied by shrinking interval sizes. Even discontinuities in the integrated function will be smoothed out, if there are only finitely many, thanks to the multiplication.

Completely defined values are at the tops of the information ordering. (I’m assuming we don’t have the “over-defined” (self-contradictory) value ⊤.) The sum of recursive calls is also fully defined, and starter partial value, f (between low high), has strictly less information, i.e.,

(high - low) * f (between low high)
⊑  integral f low high
=  integral f low mid + integral f mid high

So we can get by with a less general form of (⊔). If we represent our numbers as lazy lists of improving intervals, then we can simply use (:) in place of (⊔).

Hm. My reasoning just above is muddled. I think the crucial property is that

(high - low) * f (between low high)
⊑    (mid  - low) * f (between low mid )
+ (high - mid) * f (between mid high)

To see why this property holds, first subdivide:

(high - low) * f (between low high)
==   ((mid - low) + (high - mid)) * f (between low high)
==   (mid  - low) * f (between low high)
+ (high - mid) * f (between low high)

Next apply information monotonicity, i.e., more information (smaller interval) in yields more information out:

f (between low high) ⊑ f (between low med)

f (between low high) ⊑ f (between med high)

from which the crucial property follows.

Note: the notation is potentially confusing, since smaller intervals means more information, and so (⊑) means (⊇), and (⊔) means (∩).

Efficiency and benign side-effects

There is a serious efficiency problem with this (lazy) list-based representation of improving values or intervals, as pointed out by Ken and Warren:

At any point in time, it is really only the tightest bounds currently in the list that are important. […] Hence, the list representation consumes more space than is necessary, and time is wasted examining the out-of-date values in the list. A better representation, in a language that allows update-in-place, would be a pair consisting of the tightest lower bound and the tightest upper bound found so far. This pair would be updated-in-place when better bounds become known. (Improving Intervals, section 5.2)

The update-in-place suggested here is semantically benign, i.e., does not compromise pure functional semantics, because it doesn’t change the information content. Instead, it makes that content more efficient to access a second time. The same sort of semantically benign optimization underlies lazy evaluation, with the run-time system destructively replacing a thunk upon its first evaluation.

This benign-update idea is also explored in Another angle on functional future values.

Averaging vs integration

In practice, I’d probably rather use a recursive interval average function instead of a recursive integral. Recall that with the recursive integral, we had to multiply by the interval size at each step. The intervals get very small, and I worry about having to combine numbers of greatly differing magnitudes. With a recursive average, the numbers get averaged at each step, which I expect means we’ll be combining numbers of similar magnitudes.

average f low high = f (between low high) ⊔
average f low mid `avg` average f mid high
where mid = low `avg` high

a `avg` b = (a + b) / 2

integral f low high = (high - low) * average f low high   -- non-recursive

Here’s a modified form that can apply to higher dimensions as well:

average f iv = f iv ⊔ mean [average f iv' | iv' <- subdivide iv]

mean l = sum l / length l

integral f iv = size iv * average f iv   -- non-recursive

Exact computation

The algorithms described above can easily run afoul of inexact numeric representations, such as Float or Double. With such representation types, as recursive subdivision progresses, at some point, an interval will be bounded by consecutive representable numbers. At that point, sub-dividing an interval will result in an empty interval plus the pre-divided interval, and we will stop making progress.

One solution to this problem is use exact number representations. Another is to use progressively precise inexact representations.

1. Conal:

Heinrich Apfelmus gave a quick & dirty implementation in the Haskell reddit page for this post. Thanks, Heinrich!

2. Martijn van Steenbergen:

Is the ~ in Stream’s Functor instance necessary? Wouldn’t it already be sufficiently lazy without the ~?

3. Yrogirg:

I think the described method is similar to one in this article: