Conal Elliott » semantics

Garbage collecting the semantics of FRP

Conal — Mon, 04 Jan 2010 21:55:30 +0000

Ever since ActiveVRML, the model we’ve been using in functional reactive programming (FRP) for interactive behaviors is (T->a) -> (T->b), for dynamic (time-varying) input of type a and dynamic output of type b (where T is time). In “Classic FRP” formulations (including ActiveVRML, Fran & Reactive), there is a “behavior” abstraction whose denotation is a function of time. Interactive behaviors are then modeled as host language (e.g., Haskell) functions between behaviors. Problems with this formulation are described in Why classic FRP does not fit interactive behavior. These same problems motivated “Arrowized FRP”. In Arrowized FRP, behaviors (renamed “signals”) are purely conceptual. They are part of the semantic model but do not have any realization in the programming interface. Instead, the abstraction is a signal transformer, SF a b, whose semantics is (T->a) -> (T->b). See Genuinely Functional User Interfaces and Functional Reactive Programming, Continued.

Whether in its classic or arrowized embodiment, I’ve been growing uncomfortable with this semantic model of functions between time functions. A few weeks ago, I realized that one source of discomfort is that this model is mostly junk.

This post contains some partially formed thoughts about how to eliminate the junk (“garbage collect the semantics”), and what might remain.

There are two generally desirable properties for a denotational semantics: full abstraction and junk-freeness. Roughly, “full abstraction” means we must not distinguish between what is (operationally) indistinguishable, while “junk-freeness” means that every semantic value must be denotable.

FRP’s semantic model, (T->a) -> (T->b), allows not only arbitrary (computable) transformation of input values, but also of time. The output at some time can depend on the input at any time at all, or even on the input at arbitrarily many different times. Consequently, this model allows respoding to future input, violating a principle sometimes called “causality”, which is that outputs may depend on the past or present but not the future.

In a causal system, the present can reach backward to the past but not forward the future. I’m uneasy about this ability as well. Arbitrary access to the past may be much more powerful than necessary. As evidence, consult the system we call (physical) Reality. As far as I can tell, Reality operates without arbitrary access to the past or to the future, and it does a pretty good job at expressiveness.

Moreover, arbitrary past access is also problematic to implement in its semantically simple generality.

There is a thing we call informally “memory”, which at first blush may look like access to the past, it isn’t really. Rather, memory is access to a present input, which has come into being through a process of filtering, gradual accumulation, and discarding (forgetting). I’m talking about “memory” here in the sense of what our brains do, but also what all the rest of physical reality does. For instance, weather marks on a rock are part of the rock’s (present) memory of the past weather.

A very simple memory-less semantic model of interactive behavior is just a -> b. This model is too restrictive, however, as it cannot support any influence of the past on the present.

Which leaves a question: what is a simple and adequate formal model of interactive behavior that reaches neither into the past nor into the future, and yet still allows the past to influence the present? Inspired in part by a design principle I call “what would reality do?” (WWRD), I’m happy to have some kind of infinitesimal access to the past, but nothing further.

My current intuition is that differentiation/integration plays a crucial role. That information is carried forward moment by moment in time as “momentum” in some sense.

I call intuition cosmic fishing. You feel a nibble, then you’ve got to hook the fish. – Buckminster Fuller

Where to go with these intuitions?

Perhaps interactive behaviors are some sort of function with all of its derivatives. See Beautiful differentiation for an specification and derived implementation of numeric operations, and more generally of Functor and Applicative, on which much of FRP is based.

I suspect the whole event model can be replaced by integration. Integration is the main remaining piece.

How weak a semantic model can let us define integration?

Thanks

My thanks to Luke Palmer and to Noam Lewis for some clarifying chats about these half-baked ideas. And to the folks on #haskell IRC for brainstorming titles for this post. My favorite suggestions were

luqui: instance HasJunk FRP where
luqui: Functional reactive programming’s semantic baggage
sinelaw: FRP, please take out the trash!
cale: Garbage collecting the semantics of FRP
BMeph: Take out the FRP-ing Trash

all of which I preferred over my original “FRP is mostly junk”.

Thoughts on semantics for 3D graphics

Conal — Mon, 23 Nov 2009 07:41:30 +0000

The central question for me in designing software is always

What does it mean?

With functional programming, this question is especially crisp. For each data type I define, I want to have a precise and simple mathematical model. (For instance, my model for behavior is function-of-time, and my model of images is function-of-2D-space.) Every operation on the type is also given a meaning in terms of that semantic model.

This specification process, which is denotational semantics applied to data types, provides a basis for

correctness of the implementation,
user documentation free of implementation detail,
generating and proving properties, which can then be used in automated testing, and
evaluating and comparing the elegance and expressive power of design decisions.

For an example (2D images), some motivation of this process, and discussion, see Luke Palmer’s post Semantic Design. See also my posts on the idea and use of type class morphisms, which provide additional structure to denotational design.

In spring of 2008, I started working on a functional 3D library, FieldTrip. I’ve designed functional 3D libraries before as part of TBAG, ActiveVRML, and Fran. This time I wanted a semantics-based design, for all of the reasons given above. As always, I want a model that is

simple,
elegant, and
general.

For 3D, I also want the model to be GPU-friendly, i.e., to execute well on (modern) GPUs and to give access to their abilities.

I hadn’t thought of or heard a model that I was happy with, and so I didn’t have the sort of firm ground I like to stand on in working on FieldTrip. Last February, such a model occurred to me. I’ve had this blog post mostly written since then. Recently, I’ve been focused on functional 3D again for GPU-based rendering, and then Sean McDirmid posed a similar question, which got me thinking again.

Geometry

3D graphics involves a variety of concepts. Let’s start with 3D geometry, using a surface (rather than a solid) model.

Examples of 3D (surface) geometry include

the boundary (surface) of a solid box, sphere, or torus,
a filled triangle, rectangle, or circle,
a collection of geometry , and
a spatial transformation of geometry.

First model: set of geometric primitives

One model of geometry is a set of geometric primitives. In this model, union means set union, and spatial transformation means transforming all of the 3D points in all of the primitives in the set. Primitives contain infinitely (even uncountably) many points, so that’s a lot of transforming. Fortunately, we’re talking about what (semantics), and not how (implementation).

What is a geometric primitive?

We could say it’s a triangle, specified by three coordinates. After all, computer graphics reduces everything to sets of triangles. Oops — we’re confusing semantics and implementation. Tessellation approximates curved surfaces by sets of triangles but loses information in the process. I want a story that includes this approximation process but keeps it clearly distinct from semantically ideal curved surfaces. Then users can work with the ideal, simple semantics and rely on the implementation to perform intelligent, dynamic, view-dependent tessellation that adapts to available hardware resources.

Another model of geometric primitive is a function from 2D space to 3D space, i.e., the “parametric” representation of surfaces. Along with the function, we’ll probably want some means of describing the subset of 2D over which the surface is defined, so as to trim our surfaces. A simple formalization would be

type Surf = R2 -> Maybe R3

where

type R  -- real numbers
type R2 = (R,R)
type R3 = (R,R,R)

For shading, we’ll also need normals, and possibly tangents & bitangents, We can get these features and more by including derivatives, either just first derivatives or all of them. See my posts on derivatives and paper Beautiful differentiation.

In addition to position and derivatives, each point on a primitive also has material properties, which determines how light is reflected by and transmitted through the surface at the point.

type Surf = R2 -> Maybe (R2 :> R3, Material)

where a :> b contains all derivatives (including zeroth) at a point of a function of type a->b. See Higher-dimensional, higher-order derivatives, functionally. We could perhaps also include derivatives of material properties:

type Surf = R2 :~> Maybe (R3, Material)

where a :~> b is the type of infinitely differentiable functions.

Combining geometry values

The union function gives one way to combine two geometry values. Another is morphing (interpolation) of positions and of material properties. What can the semantics of morphing be?

Morphing betwen two surfaces is easier to define. A surface is a function, so we can interpolate point-wise: given surfaces r and s, for each point p in parameter space, interpolate between (a) r at p and (b) s at p, which is what liftA2 (on functions) would suggest.

This definition works if we have a way to interpolate between Maybe values. If we use liftA2 again, now on Maybe values, then the Just/Nothing (and Nothing/Just) cases will yield Nothing. Is this semantics desirable? As an example, consider a flat square surface with hole in the middle. One square has a small hole, and the other has a big hole. If the size of the hole corresponds to size of the portion of parameter space mapped to Nothing, then point-wise interpolation will always yield the larger hole, rather than interpolating between hole sizes. On the other hand, the two surfaces with holes might be Just over exactly the same set of parameters, with the function determining how much the Just space gets stretched.

One way to characterize this awkwardness of morphing is that the two functions (surfaces) might have different domains. This interpretation comes from seeing a -> Maybe b as encoding a function from a subset of a (i.e., a partial function on a).

Even if we had a satisfactory way to combine surfaces (point-wise), how could we extend it to combining full geometry values, which can contain any number of surfaces? One idea is to model geometry as an structured collection of surfaces, e.g., a list. Then we could combine the collections element-wise. Again, we’d have to deal with the possibility that the collections do not match up.

Surface tuples

Let’s briefly return to a simpler model of surfaces:

type Surf = R2 -> R3

We could represent a collection of such surfaces as a structured collection, e.g., a list:

type Geometry = [Surf]

But then the type doesn’t capture the number of surfaces, leading to mismatches when combining geometry values point-wise.

Alternatively, we could make the number of surfaces explicit in the type, via tuples, possibly nested. For instance, two surfaces would have type (Surf,Surf).

Interpolation in this model becomes very simple. A general interpolator works on vector spaces:

lerp :: VectorSpace v => v -> v -> Scalar v -> v
lerp a b t = a ^+^ t*^(b ^-^ a)

or on affine spaces:

alerp :: (AffineSpace p, VectorSpace (Diff p)) =>
         p -> p -> Scalar (Diff p) -> p
alerp p p' s = p .+^ s*^(p' .-. p)

Both definitions are in the vector-space package. That package also includes VectorSpace and AffineSpace instances for both functions and tuples. These instances, together with instances for real values suffice to make (possibly nested) tuples of surfaces be vector spaces and affine spaces.

From products to sums

Function pairing admits some useful isomorphisms. One replaces a product with a product:

(a → b) × (a → c) ≅ a → (b × c)

Using this product/product isomorphism, we could replace tuples of surfaces with a single function from R² to tuples of R³.

There is also a handy isomorphism that relates products to sums, in the context of functions:

(b → a) × (c → a) ≅ (b + c) → a

This second isomorphism lets us replace tuples of surfaces with a single “surface”, if we generalize the notion of surface to include domains more complex than R².

In fact, these two isomorphisms are uncurried forms of the general and useful Haskell functions (&&&) and (|||), defined on arrows:

(&&&) :: Arrow       (~>) => (a ~> b) -> (a ~> c) -> (a ~> (b,c))
(|||) :: ArrowChoice (~>) => (a ~> c) -> (b ~> c) -> (Either a b ~> c)

Restricted to the function arrow, (|||) == either.

The second isomorphism, uncurry (|||), has another benefit. Relaxing the domain type to allow sums opens the way to other domain variations as well. For instance, we can have types for triangular domains, shapes with holes, and other flavors of bounded and unbounded parameter spaces. All of these domains are two-dimensional, although they may result from several patches.

Our Geometry type now becomes parameterized:

type Geometry a = a -> (R3,Material)

The first isomorphism, uncurry (&&&), is also useful in a geometric setting. Think of each component of the range type (here R3 and Material) as a surface “attribute”. Then (&&&) merges two compatible geometries, including attributes from each. Attributes could include position (and derivatives) and shading-related material, as well as non-visual properties like temperature, elasticity, stickiness, etc.

With this flexibility in mind, Geometry gets a second type parameter, which is the range type. Now there’s nothing left of the Geometry type but general functions:

type Geometry = (->)

Recall that we’re looking for a semantics for 3D geometry. The type for Geometry might be abstract, with (->) being its semantic model. In that case, the model suggests that Geometry have all of the same type class instances that (->) (and its full or partial applications) has, including Monoid, Functor, Applicative, Monad, and Arrow. The semantics of these instances would be given by the corresponding instances for (->). (See posts on type class morphisms and the paper Denotational design with type class morphisms.)

Or drop the notion of Geometry altogether and use functions directly.

Domains

I’m happy with the simplicity of geometry as functions. Functions fit the flexibility of programmable GPUs, and they provide simple, powerful & familiar notions of attribute merging ((&&&)) and union ((|||)/either).

The main question I’m left with: what are the domains?

One simple domain is a one-dimensional interval, say [-1,1].

Two useful domain building blocks are sum and product. I mentioned sum above, in connection with geometric union ((|||)/either) Product combines domains into higher-dimensional domains. For instance, the product of two 1D intervals is a 2D interval (axis-aligned filled rectangle), which is handy for some parametric surfaces.

What about other domains, e.g., triangular, or having one more holes? Or multi-way branching surfaces? Or unbounded?

One idea is to stitch together simple domains using sum. We don’t have to build any particular spatial shapes or sizes, since the “geometry” functions themselves yield the shape and size. For instance, a square region can be mapped to a triangular or even circular region. An infinite domain can be stitched together from infinitely many finite domains. Or it can be mapped to from a single finite domain. For instance, the function x -> x / abs (1-x) maps [-1,1] to [-∞,∞].

Alternatively, we could represent domains as typed predicates (characteristic functions). For instance, the closed interval [-1,1] would be x -> abs x <= 1. Replacing abs with magnitude (for inner product spaces), generalizes this formulation to encompass [-1,1] (1D), a unit disk (2D), and a unit ball (3D).

I like the simple generality of the predicate approach, while I like how the pure type approach supports interpolation and other pointwise operations (via liftA2 etc).

Tessellation

I’ve intentionally formulated the graphics semantics over continuous space, which makes it resolution-independent and easy to compose. (This formulation is typical for 3D geometry and 2D vector graphics. The benefits of continuity apply to generally imagery and to animation/behavior.)

Graphics hardware specializes in finite collections of triangles. For rendering, curved surfaces have to be tessellated, i.e., approximated as collections of triangles. Desirable choice of tessellation depends on characteristics of the surface and of the view, as well as scene complexity and available CPU and GPU resources. Formulating geometry in its ideal curved form allows for automated analysis and choice of tessellation. For instance, since triangles are linear, the error of a triangle relative to the surface it approximates depends on how non-linear the surface is over the subset of its domain corresponding to the triangle. Using interval analysis and derivatives, non-linearity can be measured as a size bound on the second derivative or a range of first derivative. Error could also be analyzed in terms of the resulting image rather than the surface.

For a GPU-based implementation, one could tessellate dynamically, in a “geometry shader” or (I presume) in a more general framework like CUDA or OpenCL.

Abstractness

A denotational model is “fully abstract” when it equates observationally equivalent terms. The parametric model of surfaces is not fully abstract in that reparameterizing a surface yields a different function that looks the same as a surface. (Surface reparametrization alters the relationship between domain and range, while covering exactly the same surface, geometrically.) Properties that are independent of particular parametrization are called “geometric”, which I think corresponds to full abstraction (considering those properties as semantic functions).

What might a fully abstract (geometric) model for geometry be?

Notions of purity in Haskell

Conal — Mon, 30 Mar 2009 19:00:48 +0000

Lately I’ve been learning that some programming principles I treasure are not widely shared among my Haskell comrades. Or at least not widely among those I’ve been hearing from. I was feeling bummed, so I decided to write this post, in order to help me process the news and to see who resonates with what I’m looking for.

One of the principles I’m talking about is that the value of a closed expression (one not containing free variables) depends solely on the expression itself — not influenced by the dynamic conditions under which it is executed. I relate to this principle as the soul of functional programming and of referential transparency in particular.

Edits:

2009-10-26: Minor typo fix

Recently I encountered two facts about standard Haskell libraries that I have trouble reconciling with this principle.

The meaning of Int operations in overflow situations is machine-dependent. Typically they use 32 bits when running on 32-bit machine and 64 bits when running on 64-bit machines. Implementations are free to use as few as 29 bits. Thus the value of the expression “2^32 == (0 ::Int)” may be either False or True, depending on the dynamic conditions under which it is evaluated.
The expression “System.Info.os” has type String, although its value as a sequence of characters depends on the circumstances of its execution. (Similarly for the other exports from System.Info. Hm. I just noticed that the module is labeled as “portable”. Typo? Joke?)

Although I’ve been programming primarily in Haskell since around 1995, I didn’t realize that these implementation-dependent meanings were there. As in many romantic relationships, I suppose I’ve been seeing Haskell not as she is, but as I idealized her to be.

There’s another principle that is closely related to the one above and even more fundamental to me: every type has a precise, specific, and preferably simple denotation. If an expression e has type T, then the meaning (value) of e is a member of the collection denoted by T. For instance, I think of the meaning of the type String, i.e., of [Char], as being sequences of characters. Well, not quite that simple, because it also contains some partially defined sequences and has a partial information ordering (non-flat in this case). Given this second principle, if os :: String, then the meaning of os is some sequence of characters. Assuming the sequence is finite and non-partial, it can be written down as a literal string, and that literal can be substituted for every occurrence of “os” in a program, without changing the program’s meaning. However, os evaluates to “linux” on my machine and evaluates to “darwin” on my friend Bob’s machine, so substituting any literal string for “os” would change the meaning, as observable on at least one of these machines.

Now I realize I’m really talking about standard Haskell libraries, not Haskell itself. When I discussed my confusion & dismay in the #haskell chat room, someone suggested explaining these semantic differences in terms of different libraries and hence different programs (if one takes programs to include the libraries they use). One would not expect different programs (due to different libraries) to have the same meaning.

I understand this different-library perspective — in a literal way. And yet I’m not really satisfied. What I get is that standard libraries are “standard” in signature (form), not in meaning (substance). With no promises about semantic commonality, I don’t know how standard libraries can be useful.

Another perspective that came up on #haskell was that the kind of semantic consistency I’m looking for is impossible, because of possibilities of failure. For instance, evaluating an expression might one time fail due to memory exhaustion, while succeeding (perhaps just barely) on another attempt. After mulling over that point, I’d like to weaken my principle a little. Instead of asking that all evaluations of an expression yield same value, I ask that all evaluations of an expression yield consistent answers. By “consistent” I mean in the sense of information content. Answers don’t have to agree, but they must not disagree. Failures like exhausted memory are modeled as ⊥, which is called “bottom” because it is the bottom of the information partial ordering. It contains no information and so is consistent with every value, disagreeing with no value. More precisely, values are consistent when they have a shared upper (information) bound, and inconsistent when they don’t. The value ⊥ means i-don’t-know, and the value (1,⊥,3) means (1, i-don’t-know, 3). The consistent-value principle accepts possible failures due to finite resources and hardware failure, while rejecting “linux” vs “darwin” for System.Info.os or False vs True for “2^32 == (0 ::Int). It also accepts System.Info.os :: IO String, which is the type I would have expected, because the semantics of IO String is big enough to accommodate dependence on dynamic conditions.

If you also cherish the principles I mention above, I’d love to hear from you.

Denotational design with type class morphisms

Conal — Thu, 19 Feb 2009 02:34:08 +0000

I’ve just finished a draft of a paper called Denotational design with type class morphisms, for submission to ICFP 2009. The paper is on a theme I’ve explored in several posts, which is semantics-based design, guided by type class morphisms.

I’d love to get some readings and feedback. Pointers to related work would be particularly appreciated, as well as what’s unclear and what could be cut. It’s an entire page over the limit, so I’ll have to do some trimming before submitting.

The abstract:

Type classes provide a mechanism for varied implementations of standard interfaces. Many of these interfaces are founded in mathematical tradition and so have regularity not only of types but also of properties (laws) that must hold. Types and properties give strong guidance to the library implementor, while leaving freedom as well. Some of the remaining freedom is in how the implementation works, and some is in what it accomplishes.

To give additional guidance to the what, without impinging on the how, this paper proposes a principle of type class morphisms (TCMs), which further refines the compositional style of denotational semantics. The TCM idea is simply that the instance’s meaning is the meaning’s instance. This principle determines the meaning of each type class instance, and hence defines correctness of implementation. In some cases, it also provides a systematic guide to implementation, and in some cases, valuable design feedback.

The paper is illustrated with several examples of type, meanings, and morphisms.

You can get the paper and see current errata here.

The submission deadline is March 2, so comments before then are most helpful to me.

Enjoy, and thanks!

What is automatic differentiation, and why does it work?

Conal — Wed, 28 Jan 2009 20:09:42 +0000

Bertrand Russell remarked that

Everything is vague to a degree you do not realize till you have tried to make it precise.

I’m mulling over automatic differentiation (AD) again, neatening up previous posts on derivatives and on linear maps, working them into a coherent whole for an ICFP submission. I understand the mechanics and some of the reasons for its correctness. After all, it’s "just the chain rule".

As usual, in the process of writing, I bumped up against Russell’s principle. I felt a growing uneasiness and realized that I didn’t understand AD in the way I like to understand software, namely,

What does it mean, independently of implementation?
How do the implementation and its correctness flow gracefully from that meaning?
Where else might we go, guided by answers to the first two questions?

Ever since writing Simply efficient functional reactivity, the idea of type class morphisms keeps popping up for me as a framework in which to ask and answer these questions. To my delight, this framework gives me new and more satisfying insight into automatic differentiation.

What’s a derivative?

My first guess is that AD has something to do with derivatives, which then raises the question of what is a derivative. For now, I’m going to substitute a popular but problematic answer to that question and say that

deriv ∷ ⋯ ⇒ (a → b) → (a → b) --  simplification

As discussed in What is a derivative, really?, the popular answer has limited usefulness, applying just to scalar (one-dimensional) domain. The real deal involves distinguishing the type b from the type a :-* b of linear maps from a to b.

deriv ∷ (VectorSpace u, VectorSpace v) ⇒ (u → v) → (u → (u :-* v))

Why care about derivatives?

Derivatives are useful in a variety of application areas, including root-finding, optimization, curve and surface tessellation, and computation of surface normals for 3D rendering. Considering the usefulness of derivatives, it is worthwhile to find software methods that are

simple (to implement and verify),
convenient,
accurate,
efficient, and
general.

What isn't AD?

Numeric approximation

One differentiation method numeric approximation, using simple finite differences. This method is based on the definition of (scalar) derivative:

deriv f x ≡ lim_h → 0(f (x + h) - f x) / h

The left-hand side reads "the derivative of f at x".

To approximate the derivative, use

deriv f x ≈ (f (x + h) - f x) / h

for a small value of h. While very simple, this method is often inaccurate, due to choosing either too large or too small a value for h. (Small values of h lead to rounding errors.) More sophisticated variations improve accuracy while sacrificing simplicity.

Symbolic differentiation

A second method is symbolic differentiation. Instead of using the definition of deriv directly, the symbolic method uses a collection of rules, such as those below:

deriv (u + v)   ≡ deriv u + deriv v
deriv (u * v)   ≡ deriv v * u + deriv u * v
deriv (- u)     ≡ - deriv u
deriv (exp u)   ≡ deriv u * exp u
deriv (log u)   ≡ deriv u / u
deriv (sqrt u)  ≡ deriv u / (2 * sqrt u)
deriv (sin u)   ≡ deriv u * cos u
deriv (cos u)   ≡ deriv u * (- sin u)
deriv (asin u)  ≡ deriv u/(sqrt (1 - u^2))
deriv (acos u)  ≡ - deriv u/(sqrt (1 - u^2))
deriv (atan u)  ≡ deriv u / (u^2 + 1)
deriv (sinh u)  ≡ deriv u * cosh u
deriv (cosh u)  ≡ deriv u * sinh u
deriv (asinh u) ≡ deriv u / (sqrt (u^2 + 1))
deriv (acosh u) ≡ - deriv u / (sqrt (u^2 - 1))
deriv (atanh u) ≡ deriv u / (1 - u^2)

There are two main drawbacks to the symbolic approach to differentiation.

As a symbolic method, it requires access to and transformation of source code, and placing restrictions on that source code.
Implementations tend to be quite expensive and in particular perform redundant computation. (I wonder if this latter criticism is a straw man argument. Are symbolic methods necessarily expensive or just when implemented naïvely? For instance, can simply memoized symbolic differentiation be nearly as cheap as AD?)

What is AD and how does it work?

A third method is the topic of this post, namely automatic differentiation (also called "algorithmic differentiation"), or "AD". The idea of AD is to simultaneously manipulate values and derivatives. Overloading of the standard numerical operations (and literals) makes this combined manipulation as convenient and elegant as manipulating values without derivatives.

The implementation of AD can be quite simple, as shown below:

data D a = D a a deriving (Eq,Show)

instance Num a ⇒ Num (D a) where
  D x x' + D y y' = D (x+y) (x'+y')
  D x x' * D y y' = D (x*y) (y'*x + x'*y)
  fromInteger x   = D (fromInteger x) 0
  negate (D x x') = D (negate x) (negate x')
  signum (D x _ ) = D (signum x) 0
  abs    (D x x') = D (abs x) (x' * signum x)

instance Fractional x ⇒ Fractional (D x) where
  fromRational x  = D (fromRational x) 0
  recip  (D x x') = D (recip x) (- x' / sqr x)

sqr ∷ Num a ⇒ a → a
sqr x = x * x

instance Floating x ⇒ Floating (D x) where
  π              = D π 0
  exp    (D x x') = D (exp    x) (x' * exp x)
  log    (D x x') = D (log    x) (x' / x)
  sqrt   (D x x') = D (sqrt   x) (x' / (2 * sqrt x))
  sin    (D x x') = D (sin    x) (x' * cos x)
  cos    (D x x') = D (cos    x) (x' * (- sin x))
  asin   (D x x') = D (asin   x) (x' / sqrt (1 - sqr x))
  acos   (D x x') = D (acos   x) (x' / (-  sqrt (1 - sqr x)))
  -- ⋯

As an example, define

f1 ∷ Floating a ⇒ a → a
f1 z = sqrt (3 * sin z)

and try it out in GHCi:

*Main> f1 (D 2 1)
D 1.6516332160855343 (-0.3779412091869595)

To test correctness, here is a symbolically differentiated version:

f2 ∷ Floating a ⇒ a → D a
f2 x = D (f1 x) (3 * cos x / (2 * sqrt (3 * sin x)))

Try it out:

*Main> f2 2
D 1.6516332160855343 (-0.3779412091869595)

The can also be made prettier, as in Beautiful differentiation. Add an operator that captures the chain rule, which is behind the differentiation laws listed above.

infix  0 >-<
(>-<) ∷ Num a ⇒ (a → a) → (a → a) → (D a → D a)
(f >-< f') (D a a') = D (f a) (a' * f' a)

Then, e.g.,

instance Floating a ⇒ Floating (D a) where
  π   = D π 0
  exp  = exp  >-< exp
  log  = log  >-< recip
  sqrt = sqrt >-< recip (2 * sqrt)
  sin  = sin  >-< cos
  cos  = cos  >-< - sin
  asin = asin >-< recip (sqrt (1-sqr))
  acos = acos >-< recip (- sqrt (1-sqr))
  -- ⋯

This AD implementation satisfy most of our criteria very well:

It is simple to implement and verify. Both the implementation and its correctness follow directly from the familiar laws given above.
It is convenient to use, as shown with f1 above.
It is accurate, as shown above, producing exactly the same result as the symbolic differentiated code (f2).
It is efficient, involving no iteration or redundant computation.

The formulation above does less well with generality:

It computes only first derivatives.
It applies (correctly) only to functions over a scalar (one-dimensional) domain, excluding even complex numbers.

Both of these limitations are removed in the post Higher-dimensional, higher-order derivatives, functionally.

What is AD, really?

How do we know whether this AD implementation is correct? We can't begin to address this question until we first answer a more fundamental one: what does its correctness mean?

A model for AD

I'm pretty sure AD has something to do with calculating a function's values and derivative values simultaneously, so I'll start there.

withD ∷ ⋯ ⇒ (a → a) → (a → D a)
withD f x = D (f x) (deriv f x)

Or, in point-free form,

withD f = liftA2 D f (deriv f)

Since, on functions,

liftA2 h f g = λ x → h (f x) (g x)

We don't have an implementation of deriv, so this definition of withD will serve as a specification, not an implementation.

If AD is structured as type class instances, then I'd want there to be a compelling interpretation function that is faithful to each of those classes, as in the principle of type class morphisms, which is to say that the interpretation of each method corresponds to the same method for the interpretation.

For AD, the interpretation function is withD. It's turned around this time (mapping to instead of from our type), as is sometimes the case. The Num, Fractional, and Floating morphisms provide the specifications of the instances:

withD (u + v) ≡ withD u + withD v
withD (u * v) ≡ withD u * withD v
withD (sin u) ≡ sin (withD u)
⋯

Note here that the methods on the left are on a → a, and on the right are on a → D a.

These (morphism) properties exactly define correctness of any implementation of AD, answering my first question:

What does it mean, independently of implementation?

Deriving an AD implementation

Now that we have a simple, formal specification of AD (numeric type class morphisms), we can try to prove that the implementation above satisfies the specification. Better yet, let's do the reverse, and use the morphism properties to discover the implementation, and prove it correct in the process.

Addition

Here is the addition specification:

withD (u + v) ≡ withD u + withD v

Start with the left-hand side:

   withD (u + v)
≡   {- def of withD -}
   liftA2 D (u + v) (deriv (u + v))
≡   {- deriv rule for (+) -}
   liftA2 D (u + v) (deriv u + deriv v)
≡   {- liftA2 on functions -}
   λ x → D ((u + v) x) ((deriv u + deriv v) x)
≡   {- (+) on functions -}
   λ x → D (u x + v x) (deriv u x + deriv v x)

Then start over with the right-hand side:

   withD u + withD v
≡   {- (+) on functions -}
   λ x → withD u x + withD v x
≡   {- def of withD -}
   λ x → D (u x) (deriv u x) + D (v x) (deriv v x)

We need a definition of (+) on D that makes these two final forms equal, i.e.,

   λ x → D (u x + v x) (deriv u x + deriv v x)
≡
   λ x → D (u x) (deriv u x) + D (v x) (deriv v x)

An easy choice is

D a a' + D b b' = D (a + b) (a' + b')

This definition provides the missing link and that completes the proof that

withD (u + v) ≡ withD u + withD v

Multiplication

The specification:

withD (u * v) ≡ withD u * withD v

Reason similarly to the addition case. Begin with the left hand side:

   withD (u * v)
≡   {- def of withD -}
   liftA2 D (u * v) (deriv (u * v))
≡   {- deriv rule for (*) -}
   liftA2 D (u * v) (deriv u * v + deriv v * u)
≡   {- liftA2 on functions -}
   λ x → D ((u * v) x) ((deriv u * v + deriv v * u) x)
≡   {- (*) and (+) on functions -}
   λ x → D (u x * v x) (deriv u x * v x + * deriv v x * u x)

Then start over with the right-hand side:

   withD u * withD v
≡   {- (*) on functions -}
   λ x → withD u x * withD v x
≡   {- def of withD -}
   λ x → D (u x) (deriv u x) * D (v x) (deriv v x)

Sufficient definition:

D a a' * D b b' = D (a + b) (a' * b + b' * a)

Sine

Specification:

withD (sin u) ≡ sin (withD u)

Begin with the left hand side:

   withD (sin u)
≡   {- def of withD -}
   liftA2 D (sin u) (deriv (sin u))
≡   {- deriv rule for sin -}
   liftA2 D (sin u) (deriv u * cos u)
≡   {- liftA2 on functions -}
   λ x → D ((sin u) x) ((deriv u * cos u) x)
≡   {- sin, (*) and cos on functions -}
   λ x → D (sin (u x)) (deriv u x * cos (u x))

Then start over with the right-hand side:

   sin (withD u)
≡   {- sin on functions -}
   λ x → sin (withD u x)
≡   {- def of withD -}
   λ x → sin (D (u x) (deriv u x))

Sufficient definition:

sin (D a a') = D (sin a) (a' * cos a)

Or, using the chain rule operator,

sin = sin >-< cos

The whole implementation can be derived in exactly this style, answering my second question:

How does the implementation and its correctness flow gracefully from that meaning?

Higher-order derivatives

Given answers to the first two questions, let's, turn to the third:

Where else might we go, guided by answers to the first two questions?

Jerzy Karczmarczuk extended the D representation above to an infinite "lazy tower of derivatives", in the paper Functional Differentiation of Computer Programs.

data D a = D a (D a)

The withD function easily adapts to this new D type:

withD ∷ ⋯ ⇒ (a → a) → (a → D a)
withD f x = D (f x) (withD (deriv f) x)

withD f = liftA2 D f (withD (deriv f))

These definitions were not brilliant insights. I looked for the simplest, type-correct possibility (without using ⊥).

Similarly, I'll try tweaking the previous derivations and see what pops out.

Addition

Left-hand side:

   withD (u + v)
≡   {- def of withD -}
   liftA2 D (u + v) (withD (deriv (u + v)))
≡   {- deriv rule for (+) -}
   liftA2 D (u + v) (withD (deriv u + deriv v))
≡   {- (fixed-point) induction withD and (+) -}
   liftA2 D (u + v) (withD (deriv u) + withD (deriv v))
≡   {- def of liftA2 and (+) on functions -}
   λ x → D (u x + v x) (withD (deriv u) x + withD (deriv v) x)

Right-hand side:

   withD u + withD v
≡   {- (+) on functions -}
   λ x → withD u x + withD v x
≡   {- def of withD -}
   λ x → D (u x) (withD (deriv u x)) + D (v x) (withD (deriv v x))

Again, we need a definition of (+) on D that makes the LHS and RHS final forms equal, i.e.,

   λ x → D (u x + v x) (withD (deriv u) x + with (deriv v) x)
≡
   λ x → D (u x) (withD (deriv u) x) + D (v x) (withD (deriv v) x)

Again, an easy choice is

D a a' + D b b' = D (a + b) (a' + b')

Multiplication

Left-hand side:

   withD (u * v)
≡   {- def of withD -}
   liftA2 D (u * v) (withD (deriv (u * v)))
≡   {- deriv rule for (*) -}
   liftA2 D (u * v) (withD (deriv u * v + deriv v * u))
≡   {- induction for withD/(+) -}
   liftA2 D (u * v) (withD (deriv u * v) + withD (deriv v * u))
≡   {- induction for withD/(*) -}
   liftA2 D (u * v) (withD (deriv u) * withD v + withD (deriv v) * withD u)
≡   {- liftA2, (*), (+) on functions -}
   λ x → liftA2 D (u x * v x) (withD (deriv u) x * withD v x + withD (deriv v) x * withD u x)

Right-hand side:

   withD u * withD v
≡   {- def of withD -}
   liftA2 D u (withD (deriv u)) * liftA2 D v (withD (deriv v))
≡   {- liftA2 and (*) on functions -}
   λ x → D (u x) (withD (deriv u) x) * D (v x) (withD (deriv v) x)

A sufficient definition:

a@(D a0 a') * b@(D b0 b') = D (a0 + b0) (a' * b + b' * a)

Because

withD u x ≡ D (u x) (withD (deriv u) x)

withD v x ≡ D (v x) (withD (deriv v) x)

Sine

Left-hand side:

   withD (sin u)
≡   {- def of withD -}
   liftA2 D (sin u) (withD (deriv (sin u)))
≡   {- deriv rule for sin -}
   liftA2 D (sin u) (withD (deriv u * cos u))
≡   {- induction for withD/(*) -}
   liftA2 D (sin u) (withD (deriv u) * withD (cos u))
≡   {- induction for withD/cos -}
   liftA2 D (sin u) (withD (deriv u) * cos (withD u))
≡   {- liftA2, sin, cos and (*) on functions -}
   λ x → D (sin (u x)) (withD (deriv u) x * cos (withD u x))

Right-hand side:

   sin (withD u)
≡   {- def of withD -}
   sin (liftA2 D u (withD (deriv u)))
≡   {- liftA2 and sin on functions -}
   λ x → sin (D (u x) (withD (deriv u) x))

To make the LHS and RHS final forms equal, define

sin a@(D a0 a') ≡ D (sin a0) (a' * cos a)

Higher-dimensional derivatives

I'll save non-scalar ("multi-variate") differentiation for another time. In addition to the considerations above, the key ideas are in Higher-dimensional, higher-order derivatives, functionally and Simpler, more efficient, functional linear maps.

3D rendering as functional reactive programming

Conal — Mon, 12 Jan 2009 05:38:58 +0000

I’ve been playing with a simple/general semantics for 3D. In the process, I was surprised to see that a key part of the semantics looks exactly like a key part of the semantics of functional reactivity as embodied in the library Reactive. A closer look revealed a closer connection still, as described in this post.

What is 3D rendering?

Most programmers think of 3D rendering as being about executing sequence of side-effects on frame buffer or some other mutable array of pixels. This way of thinking (sequences of side-effects) comes to us from the design of early sequential computers. Although computer hardware architecture has evolved a great deal, most programming languages, and hence most programming thinking, is still shaped by the this first sequential model. (See John Backus’s Turing Award lecture Can Programming Be Liberated from the von Neumann Style? A functional style and its algebra of programs.) The invention of monadic Imperative functional programming allows Haskellers to think and program within the imperative paradigm as well.

What’s a functional alternative? Rendering is a function from something to something else. Let’s call these somethings (3D) “Geometry” and (2D) “Image”, where Geometry and Image are types of functional (immutable) values.

type Rendering = Image Color

render :: Geometry -> Rendering

To simplify, I’m assuming a fixed view. What remains is to define what these two types mean and, secondarily, how to represent and implement them.

An upcoming post will suggest an answer for the meaning of Geometry. For now, think of it as a collection of curved and polygonal surfaces, i.e., the outsides (boundaries) of solid shapes. Each point on these surfaces has a location, a normal (perpendicular direction), and material properties (determining how light is reflected by and transmitted through the surface at the point). The geometry will contain light sources.

Next, what is the meaning of Image? A popular answer is that an image is a rectangular array of finite-precision encodings of color (e.g., with eight bits for each of red, blue, green and possibly opacity). This answer is leads to poor compositionality and complex meanings for operations like scaling and rotation, so I prefer another model. As in Pan, an image (the meaning of the type Image Color) is a function from infinite continuous 2D space to colors, where the Color type includes partial opacity. For motivation of this model and examples of its use, see Functional images and the corresponding Pan gallery of functional images. Composition occurs on infinite & continuous images.

After all composition is done, the resulting image can be sampled into a finite, rectangular array of finite precision color encodings. I’m talking about a conceptual/semantic pipeline. The implementation computes the finite sampling without having to compute the values for entire infinite image.

Rendering has several components. I’ll just address one and show how it relates to functional reactive programming (FRP).

Visual occlusion

One aspect of 3D rendering is hidden surface determination. Relative to the viewer’s position and orientation, some 3D objects may fully or partially occluded by nearer objects.

An image is a function of (infinite and continuous) 2D space, so specifying that function is determining its value at every sample point. Each point can correspond to a number of geometric objects, some closer and some further. If we assume for now that our colors are fully opaque, then we’ll need to know the color (after transformation and lighting) of the nearest surface point that is projected onto the sample point. (We’ll remove this opacity assumption later.)

Let’s consider how we’ll combine two Geometry values into one:

union :: Geometry -> Geometry -> Geometry

Because of occlusion, the render function cannot be compositional with respect to union. If it were, then there would exist a functions unionR such that

forall ga gb. render (ga `union` gb) == render ga `unionR` render gb

In other words, to render a union of two geometries, we can render each and combine the results.

The reason we can’t find such a unionR is that render doesn’t let unionR know how close each colored point is. A solution then is simple: add in the missing depth information:

type RenderingD = Image (Depth, Color)  -- first try

renderD :: Geometry -> RenderingD

Now we have enough information for compositional rendering, i.e., we can define unionR such that

forall ga gb. renderD (ga `union` gb) == renderD ga `unionR` renderD gb

where

unionR :: RenderingD -> RenderingD -> RenderingD

unionR im im' p = if d <= d' then (d,c) else (d',c')
 where
   (d ,c ) = im  p
   (d',c') = im' p

When we’re done composing, we can discard the depths:

render g = snd . renderD g

or, with Semantic editor combinators:

render = (result.result) snd renderD

Simpler, prettier

The unionR is not very complicated, but still, I like to tease out common structure and reuse definitions wherever I can. The first thing I notice about unionR is that it works pointwise. That is, the value at a point is a function of the values of two other images at the same point. The pattern is captured by liftA2 on functions, thanks to the Applicative instance for functions.

liftA2 :: (b -> c -> d) -> (a -> b) -> (a -> c) -> (a -> d)

So that

unionR = liftA2 closer

closer (d,c) (d',c') = if d <= d' then (d,c) else (d',c')

closer dc@(d,_) dc'@(d',_) = if d <= d' then dc else dc'

Or even

closer = minBy fst

where

minBy f u v = if f u <= f v then u else v

This definition of unionR is not only simpler, it’s quite a bit more general, as type inference reveals:

unionR :: (Ord a, Applicative f) => f (a,b) -> f (a,b) -> f (a,b)

closer :: Ord a => (a,b) -> (a,b) -> (a,b)

Once again, simplicity and generality go hand-in-hand.

Another type class morphism

Let’s see if we can make union rendering simpler and more inevitable. Rendering is nearly a homomorphism. That is, render nearly distributes over union, but we have to replace union by unionR. I’d rather eliminate this discrepancy, ending up with

forall ga gb. renderD (ga `op` gb) == renderD ga `op` renderD gb

for some op that is equal to union on the left and unionR on the right. Since union and unionR have different types (with neither being a polymorphic instance of the other), op will have to be a method of some type class.

My favorite binary method is mappend, from Monoid, so let’s give it a try. Monoid requires there also to be an identity element mempty and that mappend be associative. For Geometry, we can define

instance Monoid Geometry where
  mempty  = emptyGeometry
  mappend = union

Images with depth are a little trickier. Image already has a Monoid instance, whose semantics is determined by the principle of type class morphisms, namely

The meaning of an instance is the instance of the meaning

The meaning of an image is a function, and that functions have a Monoid instance:

instance Monoid b => Monoid (a -> b) where
  mempty = const mempty
  f `mappend` g =  a -> f a `mappend` g a

which simplifies nicely to a standard form, by using the Applicative instance for functions.

instance Applicative ((->) a) where
  pure      = const
  hf <*> xf =  a -> (hf a) (xf a)

instance Monoid b => Monoid (a -> b) where
  mempty  = pure   mempty
  mappend = liftA2 mappend

We’re in luck. Since we’ve defined unionR as liftA2 closer, so we just need it to turn out that closer == mappend and that closer is associative and has an identity element.

However, closer is defined on pairs, and the standard Monoid instance on pairs doesn’t fit.

instance (Monoid a, Monoid b) => Monoid (a,b) where
  mempty = (mempty,mempty)
  (a,b) `mappend` (a',b') = (a `mappend` a', b `mappend` b')

To avoid this conflict, define a new data type to be used in place of pairs.

data DepthG d a = Depth d a  -- first try

Alternatively,

newtype DepthG d a = Depth { unDepth :: (d,a) }

I’ll go with this latter version, as it turns out to be more convenient.

Then we can define our monoid:

instance Monoid (DepthG d a) where
  mempty  = Depth (maxBound,undefined)
  Depth p `mappend` Depth p' = Depth (p `closer` p')

The second method definition can be simplified nicely

  mappend = inDepth2 closer

where

  inDepth2 = unDepth ~> unDepth ~> Depth

using the ideas from Prettier functions for wrapping and wrapping and the notational improvement from Matt Hellige’s Pointless fun.

FRP — Future values

The Monoid instance for Depth may look familiar to you if you’ve been following along with my future values or have read the paper Simply efficient functional reactivity. A future value has a time and a value. Usually, the value cannot be known until its time arrives.

newtype FutureG t a = Future (Time t, a)

instance (Ord t, Bounded t) => Monoid (FutureG t a) where
  mempty = Future (maxBound, undefined)
  Future (s,a) `mappend` Future (t,b) =
    Future (s `min` t, if s <= t then a else b)

When we’re using a non-lazy (flat) representation of time, this mappend definition can be written more simply:

  mappend = minBy futTime

  futTime (Future (t,_)) = t

Equivalently,

  mappend = inFuture2 (minBy fst)

The Time type is really nothing special about time. It is just a synonym for the Max monoid, as needed for the Applicative and Monad instances.

This connection with future values means we can discard more code.

type RenderingD d = Image (FutureG d Color)
renderD :: (Ord d, Bounded d) => Geometry -> RenderingD d

Now we have our monoid (homo)morphism properties:

renderD mempty == mempty

renderD (ga `mappend` gb) == renderD ga `mappend` renderD gb

And we’ve eliminated the code for renderR by reusing and existing type (future values).

Future values?

What does it mean to think about depth/color pairs as being “future” colors? If we were to probe outward along a ray, say at the speed of light, we would bump into some number of 3D objects. The one we hit earliest is the nearest, so in this sense, mappend on futures (choosing the earlier one) is the right tool for the job.

I once read that a popular belief in the past was that vision (light) reaches outward to strike objects, as I’ve just described. I’ve forgotten where I read about that belief, though I think in a book about perspective, and I’d appreciate a pointer from someone else who might have a reference.

We moderns believe that light travels to us from the objects we see. What we see of nearby objects comes from the very recent past, while of further objects we see the more remote past. From this modern perspective, therefore, the connection I’ve made with future values is exactly backward. Now that I think about it in this way, of course it’s backward, because we see (slightly) into the past rather than the future.

Fixing this conceptual flaw is simple: define a type of “past values”. Give them exactly the same representation as future values, and deriving its class instances entirely.

newtype PastG t a = Past (FutureG t a)
  deriving (Monoid, Functor, Applicative, Monad)

Alternatively, choose a temporally neutral replacement for the name “future values”.

The bug in Z-buffering

The renderD function implements continuous, infinite Z-buffering, with mappend performing the z-compare and conditional overwrite. Z-buffering is the dominant algorithm used in real-time 3D graphics and is supported in hardware on even low-end graphics hardware (though not in its full continuous and infinite glory).

However, Z-buffering also has a serious bug: it is only correct for fully opaque colors. Consider a geometry g and a point p in the domain of the result image. There may be many different points in g that project to p. If g has only fully opaque colors, then at most one place on g contributes to the rendered image at p, and specifically, the nearest such point. If g is the union (mappend) of two other geometries, g == ga `union` gb, then the nearest contribution of g (for p) will be the nearer (mappend) of the nearest contributions of ga and of gb.

When colors may be partially opaque, the color of the rendering at a point p can depend on all of the points in the geometry that get projected to p. Correct rendering in the presence of partial opacity requires a fold that combines all of the colors that project onto a point, in order of distance, where the color-combining function (alpha-blending) is not commutative. Consider again g == ga `union` gb. The contributions of ga to p might be entirely closer than the contributions of gb, or entirely further, or interleaved. If interleaved, then the colors generated from each cannot be combined into a single color for further combination. To handle the general case, replace the single distance/color pair with an ordered collection of them:

type RenderingD d = Image [FutureG d Color]  -- multiple projections, first try

Rendering a union (mappend) requires a merging of two lists of futures (distance/color pairs) into a single one.

More FRP — Events

Sadly, we’ve now lost our monoid morphism, because list mappend is (++), not the required merging. However, we can fix this problem as we did before, by introducing a new type.

Or, we can look for an existing type that matches our required semantics. There is just such a thing in the Reactive formulation of FRP, namely an event. We can simply use the FRP Event type:

type RenderingD d = Image (EventG d Color)

renderD :: (Ord d, Bounded d) => Geometry -> RenderingD d

Spatial transformation

Introducing depths allowed rendering to be defined compositionally with respect to geometric union. Is the depth model, enhanced with lists (events), sufficient for compositionality of rendering with respect to other Geometry operations as well? Let’s look at spatial transformation.

(*%)  :: Transform3 -> Geometry -> Geometry

Compositionally of rendering would mean that we can render xf *% g by rendering g and then using xf in some way to transform that rendering. In other words there would have to exist a function (*%%) such that

forall xf g. renderD (xf *% g) == xf *%% renderD g

I don’t know if the required (*%%) function exists, or what restrictions on Geometry or Transform3 it implies, or whether such a function could be useful in practice. Instead, let’s change the type of renderings again, so that rendering can accumulate transformations and apply them to surfaces.

type RenderingDX = Transform3 -> RenderingD

renderDX :: (Ord d, Bounded d) => Geometry -> RenderingDX d

with or without correct treatment of partial opacity (i.e., using futures or events).

This new function has a simple specification:

renderDX g xf == renderD (xf *% g)

from which it follows that

renderD g == renderDX g identityX

Rendering a transformed geometry then is a simple accumulation, justified as follows:

renderDX (xfi *% g)

  == {- specification of renderDX -}

 xfo -> renderD (xfo *% (xfi *% g))

  == {- property of transformation -}

 xfo -> renderD ((xfo `composeX` xfi) *% g)

  == {- specification of renderDX  -}

 xfo -> renderDX g (xfo `composeX` xfi)

Render an empty geometry:

renderDX mempty

  == {- specification of renderDX -}

 xf -> renderD (xf *% mempty)

  == {- property of (*%) and mempty -}

 xf -> renderD mempty

  == {- renderD is a monoid morphism -}

 xf -> mempty

  == {- definition of pure on functions -}

pure mempty

  == {- definition of mempty on functions -}

mempty

Render a geometric union:

renderDX (ga `mappend` gb)

  == {- specification of renderDX -}

 xf -> renderD (xf *% (ga `mappend` gb))

  == {- property of transformation and union -}

 xf -> renderD ((xf *% ga) `mappend` (xf *% gb))

  == {- renderD is a monoid morphism -}

 xf -> renderD (xf *% ga) `mappend` renderD (xf *% gb)

  == {- specification of renderDX  -}

 xf -> renderDX ga xf `mappend` renderDX gb xf

  == {- definition of liftA2/(<*>) on functions -}

liftA2 mappend (renderDX ga) (renderDX gb)

  == {- definition of mappend on functions -}

renderDX ga `mappend` renderDX gb

Hurray! renderDX is still a monoid morphism.

The two properties of transformation and union used above say together that (xf *%) is a monoid morphism for all transforms xf.

Another lovely example of type class morphisms

Conal — Fri, 14 Nov 2008 06:20:06 +0000

I read Max Rabkin’s recent post Beautiful folding with great excitement. He shows how to make combine multiple folds over the same list into a single pass, which can then drastically reduce memory requirements of a lazy functional program. Max’s trick is giving folds a data representation and a way to combine representations that corresponds to combining the folds.

Peeking out from behind Max’s definitions is a lovely pattern I’ve been noticing more and more over the last couple of years, namely type class morphisms.

Folds as data

Max gives a data representation of folds and adds on an post-fold step, which makes them composable.

data Fold b c = forall a. F (a -> b -> a) a (a -> c)

The components of a Fold are a (strict left) fold’s combiner function and initial value, plus a post-fold step. This interpretation is done by a function cfoldl', which turns these data folds into function folds:

cfoldl' :: Fold b c -> [b] -> c
cfoldl' (F op e k) = k . foldl' op e

where foldl' is the standard strict, left-fold functional:

foldl' :: (a -> b -> a) -> a -> [b] -> a
foldl' op a []     = a
foldl' op a (b:bs) =
  let a' = a `op` b in a' `seq` foldl' f a' bs

Standard classes

As Twan van Laarhoven pointed out in a comment on Max’s post, Fold b is a functor and an applicative functor, so some of Max’s Fold-manipulating functions can be replaced by standard vocabulary.

The Functor instance is pretty simple:

instance Functor (Fold b) where
  fmap h (F op e k) = F op e (h . k)

The Applicative instance is a bit trickier. For strictness, Max used used a type of strict pairs:

data Pair c c' = P !c !c'

The instance:

instance Applicative (Fold b) where
  pure a = F (error "no op") (error "no e") (const a)

  F op e k <*> F op' e' k' = F op'' e'' k''
   where
     P a a' `op''` b = P (a `op` b) (a' `op'` b)
     e''             = P e e'
     k'' (P a a')    = (k a) (k' a')

Given that Fold b is an applicative functor, Max’s bothWith function is then liftA2. Max’s multi-cfoldl' rule then becomes:

forall c f g xs.
   h (cfoldl' f xs) (cfoldl' g xs) == cfoldl' (liftA2 h f g) xs

thus replacing two passes with one pass.

Beautiful properties

Now here’s the fun part. Looking at the Applicative instance for ((->) a), the rule above is equivalent to

forall c f g.
  liftA2 h (cfoldl' f) (cfoldl' g) == cfoldl' (liftA2 h f g)

Flipped around, this rule says that liftA2 distributes over cfoldl'. Or, “the meaning of liftA2 is liftA2“. Neat, huh?

Moreover, this liftA2 property is equivalent to the following:

forall f g.
  cfoldl' f <*> cfoldl' g == cfoldl' (f <*> g)

This form is one of the two Applicative morphism laws (which I usually write in the reverse direction):

For more about these morphisms, see Simplifying semantics with type class morphisms. That post suggests that semantic functions in particular ought to be type class morphisms (and if not, then you’d have an abstraction leak). And cfoldl' is a semantic function, in that it gives meaning to a Fold.

The other type class morphisms in this case are

cfoldl' (pure a  ) == pure a

cfoldl' (fmap h f) == fmap h (cfoldl' f)

Given the Functor and Applicative instances of ((->) a), these two properties are equivalent to

cfoldl' (pure a  ) == const a

cfoldl' (fmap h f) == h . cfoldl' f

cfoldl' (pure a  ) xs == a

cfoldl' (fmap h f) xs == h (cfoldl' f xs)

Rewrite rules

Max pointed out that GHC does not handle his original multi-cfoldl' rule. The reason is that the head of the LHS (left-hand side) is a variable. However, the type class morphism laws have constant (known) functions at the head, so I expect they could usefully act as fusion rewrite rules.

Inevitable instances

Given the implementations (instances) of Functor and Applicative for Fold, I’d like to verify that the morphism laws for cfoldl' (above) hold.

Functor

Start with fmap. The morphism law:

cfoldl' (fmap h f) == fmap h (cfoldl' f)

First, give the Fold argument more structure, so that (without loss of generality) the law becomes

cfoldl' (fmap h (F op e k)) == fmap h (cfoldl' (F op e k))

The game is to work backward from this law to the definition of fmap for Fold. I’ll do so by massaging the RHS (right-hand side) into the form cfoldl' (...), where “...” is the definition fmap h (F op e k).

fmap h (cfoldl' (F op e k))

  ==  {- inline cfoldl' -}

fmap h (k . foldl' op e)

  ==  {- inline fmap on functions -}

h . (k . foldl' op e)

  ==  {- associativity of (.) -}

(h . k) . foldl' op e

  ==  {- uninline cfoldl' -}

cfoldl' (F op e (h . k))

  ==  {- uninline fmap on Fold  -}

cfoldl' (fmap h (F op e k))

This proof show why Max had to add the post-fold function k to his Fold type. If k weren’t there, we couldn’t have buried the h in it.

More usefully, this proof suggests how we could have discovered the fmap definition. For instance, we might have tried with a simpler and more obvious Fold representation:

data FoldS a b = FS (a -> b -> a) a

Getting into the fmap derivation, we’d come to

h . foldsl' op e

and then we’d be stuck. But not really, because the awkward extra bit (h .) beckons us to generalize by adding Max’s post-fold function.

Applicative

Next pure:

cfoldl' (pure a) == pure a

Reason as before, starting with the RHS

pure a

  ==  {- inline pure on functions -}

const a

  ==  {- property of const -}

const a . foldl op e

  ==  {- uninline cfoldl' -}

cfoldl' (F (const a) op e)

The imaginative step was inventing structure to match the definition of cfoldl'. This definition is true for any values of op and e, so we can use bottom in the definition:

instance Applicative (Fold b) where
  pure a = F undefined undefined (const a)

As Twan noticed, the existential (forall) also lets us pick defined values for op and e. He chose (_ _ -> ()) and ().

The derivation of (<*>) is trickier and is the heart of the problem of fusing folds to reduce multiple traversals to a single one. Why the heart? Because (<*>) is all about combining two things into one.

Intermission

I’m taking a break here. While fiddling with a proof of the (<*>) morphism law, I realized a simpler way to structure these folds, which will be the topic of an upcoming post.

Simplifying semantics with type class morphisms

Conal — Wed, 09 Apr 2008 04:22:35 +0000

When I first started playing with functional reactivity in Fran and its predecessors, I didn’t realize that much of the functionality of events and reactive behaviors could be packaged via standard type classes. Then Conor McBride & Ross Paterson introduced us to applicative functors, and I remembered using that pattern to reduce all of the lifting operators in Fran to just two, which correspond to pure and (<*>) in the Applicative class. So, in working on a new library for functional reactive programming (FRP), I thought I’d modernize the interface to use standard type classes as much as possible.

While spelling out a precise (denotational) semantics for the FRP instances of these classes, I noticed a lovely recurring pattern:

The meaning of each method corresponds to the same method for the meaning.

In this post, I’ll give some examples of this principle and muse a bit over its usefulness. For more details, see the paper Simply efficient functional reactivity. Another post will start exploring type class morphisms and type composition, and ask questions I’m wondering about.

Behaviors

The meaning of a (reactive) behavior is a function from time:

type B a = Time -> a

at :: Behavior a -> B a

So the semantic function, at, maps from the Behavior type (for use in FRP programs) to the B type (for understanding FRP programs)

As a simple example, the meaning of the behavior time is the identity function:

at time == id

Functor

Given b :: Behavior a and a function f :: a -> b, we can apply f to the value of b at every moment in (infinite and continuous) time. This operation corresponds to the Functor method fmap, so

instance Functor Behavior where ...

The informal description of fmap on behavior translates to a formal definition of its semantics:

  fmap f b `at` t == f (b `at` t)

Equivalently,

  at (fmap f b) ==  t -> f (b `at` t)
                == f . ( t -> b `at` t)
                == f . at b

Now here’s the fun part. While Behavior is a functor, so is its meaning:

instance Functor ((->) t) where fmap = (.)

So, replacing f . at b with fmap f (at b) above,

  at (fmap f b) == fmap f (at b)

which can also be written

  at . fmap f == fmap f . at

Keep in mind here than the fmap on the left is on behaviors, and on the right is functions (of time).

This last equation can also be written as a simple square commutative diagram and is sometimes expressed by saying that at is a “natural transformation” or “morphism on functors” [Categories for the Working Mathematician]. For consistency with similar properties on other type classes, I suggest “functor morphism” as a synonym for natural transformation.

The Haskell wiki page on natural transformations shows the commutative diagram and gives maybeToList as another example.

Applicative functor

The fmap method applies a static (not time-varying) function to a dynamic (time-varying) argument. A more general operation applies a dynamic function to a dynamic argument. Also useful is promoting a static value to a dynamic one. These two operations correspond to (<*>) and pure for applicative functors:

infixl 4 <*>
class Functor f => Applicative f where
  pure  :: a -> f a
  (<*>) :: f (a->b) -> f a -> f b

where, e.g., f == Behavior.

From these two methods, all of the n-ary lifting functions follow. For instance,

liftA3 :: Applicative f =>
          (  a ->   b ->   c ->   d)
       ->  f a -> f b -> f c -> f d
liftA3 h fa fb fc = pure h <*> fa <*> fb <*> fc

Or use fmap h fa in place of pure h <*> fa. For prettier code, (<$>) (left infix) is synonymous with fmap.

Now, what about semantics? Applying a dynamic function fb to a dynamic argument xb gives a dynamic result, whose value at time t is the value of fb at t, applied to the value of xb at t.

at (fb <*> xb) ==  t -> (fb `at` t) (xb `at` t)

The (<*>) operator is the heart of FRP’s concurrency model, which is determinate, synchronous, and continuous.

Promoting a static value yields a constant behavior:

at (pure a) ==  t -> a
            == const a

As with Functor, let’s look at the Applicative instance of functions (the meaning of behaviors):

instance Applicative ((->) t) where
  pure a    = const a
  hf <*> xf =  t -> (hf t) (xf t)

Wow — these two definitions look a lot like the meanings given above for pure and (<*>) on behaviors. And sure enough, we can use the function instance to simplify these semantic definitions:

at (pure a)    == pure a
at (fb <*> xb) == at fb <*> at xb

Thus the semantic function distributes over the Applicative methods. In other words, the meaning of each method is the method on the meaning. I don’t know of any standard term (like “natural transformation”) for this relationship between at and pure/(<*>). I suggest calling at an “applicative functor morphism”.

Monad

Monad morphisms are a bit trickier, due to the types. There are two equivalent forms of the definition of a monad morphism, depending on whether you use join or (>>=). In the join form (e.g., in Comprehending Monads, section 6), for monads m and n, the function nu :: forall a. m a -> n a is a monad morphism if

nu . join == join . nu . fmap nu

where

join :: Monad m => m (m a) -> m a

For behavior semantics, m == Behavior, n == B == (->) Time, and nu == at.

Then at is also a monad morphism if

at (return a) == return a
at (join bb)  == join (at (fmap at bb))

And, since for functions f,

fmap h f == h . f
join f   ==  t -> f t t

the second condition is

at (join bb) == join (at (fmap at bb))
             ==  t -> at (at . bb) t t
             ==  t -> at (at bb t) t
             ==  t -> (bb `at` t) `at` t

So sampling join bb at t means sampling bb at t to get a behavior b, which is also sampled at t. That’s exactly what I’d guess join to mean on behaviors.

Note: the FRP implementation described in Simply efficient functional reactivity does not include a Monad instance for Behavior, because I don’t see how to implement one with the hybrid data-/demand-driven Behavior implementation. However, the closely related but less expressive type, Reactive, has the same semantic model as Behavior. Reactive does have a Monad instance, and its semantic function (rats) is a monad morphism.

Other examples

The Simply paper contains several more examples of type class morphisms:

Reactive values, time functions, and future values are also morphisms on Functor, Applicative, and Monad.
Improving values are morphisms on Ord.

The paper also includes a significant non-example, namely events. The semantics I gave for Event a is a time-ordered list of time/value pairs. However, the semantic function (occs) is not a Monoid morphism, because

occs (e `mappend` e') == occs e `merge` occs e'

and merge is not (++), which is mappend on lists.

Why care about type class morphisms?

I want my library’s users to think of behaviors and future values as being their semantic models (functions of time and time/value pairs). Why? Because these denotational models are simple and precise and have simple and useful formal properties. Those properties allow library users to program with confidence, and allow library providers to make radical changes in representation and implementation (even from demand-driven to data-driven) without breaking client programs.

When I think of a behavior as a function of time, I’d like it to act like a function of time, hence Functor, Applicative, and Monad. And if it does implement any classes in common with functions, then it had better agree the function instances of those classes. Otherwise, user expectations will be mistaken, and the illusion is broken.

I’d love to hear about other examples of type class morphisms, particularly for Applicative and Monad, as well as thoughts on their usefulness.

Simply efficient functional reactivity

Conal — Fri, 04 Apr 2008 22:27:43 +0000

I submitted a paper Simply efficient functional reactivity to ICFP 2008.

Abstract:

Functional reactive programming (FRP) has simple and powerful semantics, but has resisted efficient implementation. In particular, most past implementations have used demand-driven sampling, which accommodates FRP’s continuous time semantics and fits well with the nature of functional programming. Consequently, values are wastefully recomputed even when inputs don’t change, and reaction latency can be as high as the sampling period.

This paper presents a way to implement FRP that combines data- and demand-driven evaluation, in which values are recomputed only when necessary, and reactions are nearly instantaneous. The implementation is rooted in a new simple formulation of FRP and its semantics and so is easy to understand and reason about.

On the road to efficiency and simplicity, we’ll meet some old friends (monoids, functors, applicative functors, monads, morphisms, and improving values) and make some new friends (functional future values, reactive normal form, and concurrent “unambiguous choice”).

Future values

Conal — Wed, 16 Jan 2008 01:31:00 +0000

A future value (or simply “future”) is a value that might not be knowable until a later time, such as “the value of the next key you press”, or “the value of LambdaPix stock at noon next Monday” (both from the time you first read this sentence), or “how many tries it will take me to blow out all the candles on my next birthday cake”. Unlike an imperative computation, each future has a unique value — although you probably cannot yet know what that value is. I’ve implemented this notion of futures as part of a library Reactive.

Edits:

2008-04-04: tweaked tag; removed first section heading.

You can force a future, which makes you wait (block) until its value is knowable. Meanwhile, what kinds of things can you do a future now?

Apply a function to the not-yet-known value, resulting in another future. For instance, suppose fc :: Future Char is the first character you type after a specific time. Then fmap toUpper fc :: Future Char is the capitalized version of the future character. Thus, Future is a functor. The resulting future is knowable when fc is knowable.
What about combining two or more future values? For instance, how many days between the first time after the start of 2008 that the temperature exceeds 80 degrees Fahrenheit at (a) my home and (b) your home. Each of those dates is a future value, and so is the difference between them. If those futures are m80, y80 :: Day, then the difference is diff80 = liftA2 (-) m80 y80. That difference is becomes knowable when the later of the m80 and y80 becomes knowable. So Future is an applicative functor (AF), and one can apply a future function to a future argument to get a future result (futRes = futFun <*> futArg). The other AF method is pure :: a -> Future a, which makes a future value that is always knowable to be a given value.
Sometimes questions about the future are staged, such as “What will be the price of milk the day after it the temperature next drops below freezing” (plus specifics about where and starting when). Suppose priceOn :: Day -> Future Price gives the price of milk on a given day (at some specified place), and nextFreeze :: Day -> Future Day is the first date of a freeze (also at a specified place) after a given date. Then our query is expressed as nextFreeze today >>= priceOn, which has type Future Price. Future is thus a monad. (The return method of a monad is the same as the pure method of an AF.) From another perspective on monads, we can collapse a future future into a future, using join :: Future (Future a) -> Future a.

These three ways of manipulating futures are all focused on the value of futures. There is one more, very useful, combining operation that focuses on the timing of futures: given two futures, which one comes first. Although we can’t know the answer now, we can ask the question now and get a future. For example, what is the next character that either you or I will type? Call those characters mc, yc :: Future Char. The earlier of the two is mc `mappend` yc, which has type Future Char. Thus, Future ty is a monoid for every type ty. The other monoid method is mempty (the identity for mappend), which is the future that never happens.

Why aren’t futures just lazy values?

If futures were just lazy values, then we wouldn’t have to use pure, fmap, (<*>) (and liftA_n_), and (>>=). However, there isn’t enough semantic content in a plain-old-value to determine which of two values is earlier (mappend on futures).

A semantics for futures

To clarify my thinking about future values, I’d like to have a simple and precise denotational semantics and then an implementation that is faithful to the semantics. The module Data.SFuture provides such a semantics, although the implementation in Data.Future is not completely faithful.

The model

The semantic model is very simple: (the meaning of) a future value is just a time/value pair. The particular choice of “time” type is not important, as long as it is ordered.

newtype Future t a = Future (Time t, a)
  deriving (Functor, Applicative, Monad, Show)

Delightfully, almost all required functionality comes automatically from the derived class instances, thanks to the standard instances for pairs and the definition of Time, given below. Rather than require our time type to be bounded, we can easily add bounds to an arbitrary type. Rather than defining Time t now, let’s discover the definition while considering the required meanings of the class instances. The definition will use just a bit of wrapping around the type t, demonstrating a principle Conor McBride expressed as “types don’t just contain data, types explain data”.

Functor

The Functor instance is provided entirely by the standard instance for pairs:

instance Functor ((,) a) where fmap f (a,b) = (a, f b)

In particular, fmap f (Future (t,b)) == Future t (f b), as desired.

Applicative and Time

Look next at the Applicative instance for pairs:

instance Monoid a => Applicative ((,) a) where
  pure x = (mempty, x)
  (u, f) <*> (v, x) = (u `mappend` v, f x)

So Time t must be a monoid, with mempty being the earliest time and mappend being max. We’ll define Time with the help of the Max monoid:

newtype Max a = Max { getMax :: a }
  deriving (Eq, Ord, Read, Show, Bounded)

instance (Ord a, Bounded a) => Monoid (Max a) where
  mempty = Max minBound
  Max a `mappend` Max b = Max (a `max` b)

We could require that the underlying time parameter type t be Bounded, but I want to have as few restrictions as possible. For instance, Integer, Float, and Double are not Bounded, and neither are the types in the Time library. Fortunately, it’s easy to add bounds to any type, preserving the existing ordering.

data AddBounds a = MinBound | NoBound a | MaxBound
  deriving (Eq, Ord, Read, Show)

instance Bounded (AddBounds a) where
  minBound = MinBound
  maxBound = MaxBound

With these two reusable building blocks, our Time definition falls right out:

type Time t = Max (AddBounds t)

Monad

For our Monad instance, we just need an instance for pairs equivalent to the Monad Writer instance.

instance Monoid o => Monad ((,) o) where
  return = pure
  (o,a) >>= f = (o `mappend` o', a') where (o',a') = f a

Consequently (using join m = m >>= id), join ((o, (o',a))) == (o `mappend` o', a). Again, the standard instance implies exactly the desired meaning for futures. Future (t,a) >>= f is available exactly at the later of t and the availability of f a. We might have guessed instead that the time is simply the time of f a, e.g., assuming it to always be at least t. However, f a could result from pure and so have time minBound.

Monoid

The last piece of Future functionality is the Monoid instance, and I don’t know how to get that instance to define itself. I want mappend to yield the earlier of two futures, choosing the first argument when simultaneous. The never-occuring mempty has a time beyond all t values.

instance Ord t => Monoid (Future t a) where
  mempty  = Future (maxBound, error "it'll never happen, buddy")
  fut@(Future (t,_)) `mappend` fut'@(Future (t',_)) =
    if t <= t' then fut else fut'

Coming next

Tune in for the next post, which describes the current implementation of future values in Reactive. The implementation uses multi-threading and is not quite faithful to the semantics given here. I’m looking for a faithful implementation.

A following post will then describe the use of future values in an elegant new implementation of functional reactive programming.