Thoughts on semantics for 3D graphics
The central question for me in designing software is always
What does it mean?
With functional programming, this question is especially crisp. For each data type I define, I want to have a precise and simple mathematical model. (For instance, my model for behavior is function-of-time, and my model of images is function-of-2D-space.) Every operation on the type is also given a meaning in terms of that semantic model.
This specification process, which is denotational semantics applied to data types, provides a basis for
- correctness of the implementation,
- user documentation free of implementation detail,
- generating and proving properties, which can then be used in automated testing, and
- evaluating and comparing the elegance and expressive power of design decisions.
For an example (2D images), some motivation of this process, and discussion, see Luke Palmer’s post Semantic Design. See also my posts on the idea and use of type class morphisms, which provide additional structure to denotational design.
In spring of 2008, I started working on a functional 3D library, FieldTrip. I’ve designed functional 3D libraries before as part of TBAG, ActiveVRML, and Fran. This time I wanted a semantics-based design, for all of the reasons given above. As always, I want a model that is
- simple,
- elegant, and
- general.
For 3D, I also want the model to be GPU-friendly, i.e., to execute well on (modern) GPUs and to give access to their abilities.
I hadn’t thought of or heard a model that I was happy with, and so I didn’t have the sort of firm ground I like to stand on in working on FieldTrip. Last February, such a model occurred to me. I’ve had this blog post mostly written since then. Recently, I’ve been focused on functional 3D again for GPU-based rendering, and then Sean McDirmid posed a similar question, which got me thinking again.
Geometry
3D graphics involves a variety of concepts. Let’s start with 3D geometry, using a surface (rather than a solid) model.
Examples of 3D (surface) geometry include
- the boundary (surface) of a solid box, sphere, or torus,
- a filled triangle, rectangle, or circle,
- a collection of geometry , and
- a spatial transformation of geometry.
First model: set of geometric primitives
One model of geometry is a set of geometric primitives.
In this model, union
means set union, and spatial transformation means transforming all of the 3D points in all of the primitives in the set.
Primitives contain infinitely (even uncountably) many points, so that’s a lot of transforming.
Fortunately, we’re talking about what (semantics), and not how (implementation).
What is a geometric primitive?
We could say it’s a triangle, specified by three coordinates. After all, computer graphics reduces everything to sets of triangles. Oops — we’re confusing semantics and implementation. Tessellation approximates curved surfaces by sets of triangles but loses information in the process. I want a story that includes this approximation process but keeps it clearly distinct from semantically ideal curved surfaces. Then users can work with the ideal, simple semantics and rely on the implementation to perform intelligent, dynamic, view-dependent tessellation that adapts to available hardware resources.
Another model of geometric primitive is a function from 2D space to 3D space, i.e., the “parametric” representation of surfaces. Along with the function, we’ll probably want some means of describing the subset of 2D over which the surface is defined, so as to trim our surfaces. A simple formalization would be
type Surf = R2 -> Maybe R3
where
type R -- real numbers
type R2 = (R,R)
type R3 = (R,R,R)
For shading, we’ll also need normals, and possibly tangents & bitangents, We can get these features and more by including derivatives, either just first derivatives or all of them. See my posts on derivatives and paper Beautiful differentiation.
In addition to position and derivatives, each point on a primitive also has material properties, which determines how light is reflected by and transmitted through the surface at the point.
type Surf = R2 -> Maybe (R2 :> R3, Material)
where a :> b
contains all derivatives (including zeroth) at a point of a function of type a->b
.
See Higher-dimensional, higher-order derivatives, functionally.
We could perhaps also include derivatives of material properties:
type Surf = R2 :~> Maybe (R3, Material)
where a :~> b
is the type of infinitely differentiable functions.
Combining geometry values
The union
function gives one way to combine two geometry values.
Another is morphing (interpolation) of positions and of material properties.
What can the semantics of morphing be?
Morphing betwen two surfaces is easier to define.
A surface is a function, so we can interpolate point-wise: given surfaces r
and s
, for each point p
in parameter space, interpolate between (a) r
at p
and (b) s
at p
, which is what liftA2
(on functions) would suggest.
This definition works if we have a way to interpolate between Maybe
values.
If we use liftA2
again, now on Maybe
values, then the Just
/Nothing
(and Nothing
/Just
) cases will yield Nothing
.
Is this semantics desirable?
As an example, consider a flat square surface with hole in the middle.
One square has a small hole, and the other has a big hole.
If the size of the hole corresponds to size of the portion of parameter space mapped to Nothing
, then point-wise interpolation will always yield the larger hole, rather than interpolating between hole sizes.
On the other hand, the two surfaces with holes might be Just
over exactly the same set of parameters, with the function determining how much the Just
space gets stretched.
One way to characterize this awkwardness of morphing is that the two functions (surfaces) might have different domains.
This interpretation comes from seeing a -> Maybe b
as encoding a function from a subset of a
(i.e., a partial function on a
).
Even if we had a satisfactory way to combine surfaces (point-wise), how could we extend it to combining full geometry values, which can contain any number of surfaces? One idea is to model geometry as an structured collection of surfaces, e.g., a list. Then we could combine the collections element-wise. Again, we’d have to deal with the possibility that the collections do not match up.
Surface tuples
Let’s briefly return to a simpler model of surfaces:
type Surf = R2 -> R3
We could represent a collection of such surfaces as a structured collection, e.g., a list:
type Geometry = [Surf]
But then the type doesn’t capture the number of surfaces, leading to mismatches when combining geometry values point-wise.
Alternatively, we could make the number of surfaces explicit in the type, via tuples, possibly nested.
For instance, two surfaces would have type (Surf,Surf)
.
Interpolation in this model becomes very simple. A general interpolator works on vector spaces:
lerp :: VectorSpace v => v -> v -> Scalar v -> v
lerp a b t = a ^+^ t*^(b ^-^ a)
or on affine spaces:
alerp :: (AffineSpace p, VectorSpace (Diff p)) =>
p -> p -> Scalar (Diff p) -> p
alerp p p' s = p .+^ s*^(p' .-. p)
Both definitions are in the vector-space package.
That package also includes VectorSpace
and AffineSpace
instances for both functions and tuples.
These instances, together with instances for real values suffice to make (possibly nested) tuples of surfaces be vector spaces and affine spaces.
From products to sums
Function pairing admits some useful isomorphisms. One replaces a product with a product:
(a → b) × (a → c) ≅ a → (b × c)
Using this product/product isomorphism, we could replace tuples of surfaces with a single function from R2 to tuples of R3.
There is also a handy isomorphism that relates products to sums, in the context of functions:
(b → a) × (c → a) ≅ (b + c) → a
This second isomorphism lets us replace tuples of surfaces with a single “surface”, if we generalize the notion of surface to include domains more complex than R2.
In fact, these two isomorphisms are uncurried forms of the general and useful Haskell functions (&&&)
and (|||)
, defined on arrows:
(&&&) :: Arrow (~>) => (a ~> b) -> (a ~> c) -> (a ~> (b,c))
(|||) :: ArrowChoice (~>) => (a ~> c) -> (b ~> c) -> (Either a b ~> c)
Restricted to the function arrow, (|||) == either
.
The second isomorphism, uncurry (|||)
, has another benefit.
Relaxing the domain type to allow sums opens the way to other domain variations as well.
For instance, we can have types for triangular domains, shapes with holes, and other flavors of bounded and unbounded parameter spaces.
All of these domains are two-dimensional, although they may result from several patches.
Our Geometry
type now becomes parameterized:
type Geometry a = a -> (R3,Material)
The first isomorphism, uncurry (&&&)
, is also useful in a geometric setting.
Think of each component of the range type (here R3
and Material
) as a surface “attribute”.
Then (&&&)
merges two compatible geometries, including attributes from each.
Attributes could include position (and derivatives) and shading-related material, as well as non-visual properties like temperature, elasticity, stickiness, etc.
With this flexibility in mind, Geometry
gets a second type parameter, which is the range type.
Now there’s nothing left of the Geometry
type but general functions:
type Geometry = (->)
Recall that we’re looking for a semantics for 3D geometry.
The type for Geometry
might be abstract, with (->)
being its semantic model.
In that case, the model suggests that Geometry
have all of the same type class instances that (->)
(and its full or partial applications) has, including Monoid
, Functor
, Applicative
, Monad
, and Arrow
.
The semantics of these instances would be given by the corresponding instances for (->)
.
(See posts on type class morphisms and the paper Denotational design with type class morphisms.)
Or drop the notion of Geometry
altogether and use functions directly.
Domains
I’m happy with the simplicity of geometry as functions.
Functions fit the flexibility of programmable GPUs, and they provide simple, powerful & familiar notions of attribute merging ((&&&)
) and union ((|||)
/either
).
The main question I’m left with: what are the domains?
One simple domain is a one-dimensional interval, say [-1,1].
Two useful domain building blocks are sum and product.
I mentioned sum above, in connection with geometric union ((|||)
/either
)
Product combines domains into higher-dimensional domains.
For instance, the product of two 1D intervals is a 2D interval (axis-aligned filled rectangle), which is handy for some parametric surfaces.
What about other domains, e.g., triangular, or having one more holes? Or multi-way branching surfaces? Or unbounded?
One idea is to stitch together simple domains using sum.
We don’t have to build any particular spatial shapes or sizes, since the “geometry” functions themselves yield the shape and size.
For instance, a square region can be mapped to a triangular or even circular region.
An infinite domain can be stitched together from infinitely many finite domains.
Or it can be mapped to from a single finite domain.
For instance, the function x -> x / abs (1-x)
maps [-1,1] to [-∞,∞].
Alternatively, we could represent domains as typed predicates (characteristic functions).
For instance, the closed interval [-1,1] would be x -> abs x <= 1
.
Replacing abs
with magnitude
(for inner product spaces), generalizes this formulation to encompass [-1,1] (1D), a unit disk (2D), and a unit ball (3D).
I like the simple generality of the predicate approach, while I like how the pure type approach supports interpolation and other pointwise operations (via liftA2
etc).
Tessellation
I’ve intentionally formulated the graphics semantics over continuous space, which makes it resolution-independent and easy to compose. (This formulation is typical for 3D geometry and 2D vector graphics. The benefits of continuity apply to generally imagery and to animation/behavior.)
Graphics hardware specializes in finite collections of triangles. For rendering, curved surfaces have to be tessellated, i.e., approximated as collections of triangles. Desirable choice of tessellation depends on characteristics of the surface and of the view, as well as scene complexity and available CPU and GPU resources. Formulating geometry in its ideal curved form allows for automated analysis and choice of tessellation. For instance, since triangles are linear, the error of a triangle relative to the surface it approximates depends on how non-linear the surface is over the subset of its domain corresponding to the triangle. Using interval analysis and derivatives, non-linearity can be measured as a size bound on the second derivative or a range of first derivative. Error could also be analyzed in terms of the resulting image rather than the surface.
For a GPU-based implementation, one could tessellate dynamically, in a “geometry shader” or (I presume) in a more general framework like CUDA or OpenCL.
Abstractness
A denotational model is “fully abstract” when it equates observationally equivalent terms. The parametric model of surfaces is not fully abstract in that reparameterizing a surface yields a different function that looks the same as a surface. (Surface reparametrization alters the relationship between domain and range, while covering exactly the same surface, geometrically.) Properties that are independent of particular parametrization are called “geometric”, which I think corresponds to full abstraction (considering those properties as semantic functions).
What might a fully abstract (geometric) model for geometry be?
Luke Palmer:
Interesting discussion. Here are a few notes:
In a dependently typed setting, the two domains you discuss (pure types and characteristic functions) are the same, barring computability concerns.
You seem to have forgotten about derivatives by the end of this article. I get the impression that continuity and differentiability are pretty important to 3D geometry. Continuity has a general interpretation in the realm of computable functions: i.e. a function from [0,1] to Bool can be continuous as long as it includes a _|_ in between the intervals where it is True and those where it is False. I have been quietly nagged for a while about the corresponding notion for derivatives. What would a differentiable function from Either [0,1] [0,1] look like?
But I feel like if you can capture derivatives, the polymorphic domain approach is pretty good. Mathematicians (in diff. geometry) work with parameterized surfaces using whatever domain is the most convenient… and they have thought about this stuff a lot
23 November 2009, 10:21 amPatai Gergely:
Have you considered looking at this problem from the angle of a modelling artist? How can you create complex shapes, like models of real-life objects and creatures, in a human friendly way? It seems to me that the biggest challenge is not to come up with a semantics that’s consistent and elegant in isolation, but to provide a structure that gives elegant solutions to real problems. Of course, it also depends on the audience what ‘real problem’ means, since e.g. procedural geometry is a completely different story from that of the human sculptor.
23 November 2009, 12:22 pmconal:
I look at these challenges as in harmony rather than in tension. My faith (stemming personal intuition and experience) is that when I get to the heart of the matter in elegant isolation, I see pragmatic strengths and defects clearly. I don’t expect this perspective to be a popular one, and I don’t mind unpopularity. It’s a personal path that’s true to myself, which is enough for me.
So, reworking your words to speak for myself: It seems to me that the biggest challenge is to come up with a semantics that’s consistent and elegant in isolation and therefore provides a structure that gives elegant solutions to real problems, including ones we haven’t yet imagined.
The notion of “real problems” is a slippery one. So often our difficulties are in the questions we’re used to asking and the old paradigms we’re attached to. 3D artists might love using an elegant denotational design, or they might have to have their brains rewired, or just wait for a younger generation.
To sum up, I’ll repeat two principles that Luke Palmer shared in a post from last June:
23 November 2009, 1:11 pmJake McArthur:
I’m curious about your reasons for the surface model for 3D rather than the image model as you use for 2D. Is it primarily for GPU friendliness?
23 November 2009, 4:34 pmconal:
Hi Jake.
I’m unclear on the question. The surface model is the image model (functions of 2D).
Or maybe you mean why didn’t I use 3D images, i.e., functions of R3 (infinite, continuous voxel map). If so, then yes, mainly for GPU-friendliness.
23 November 2009, 4:56 pmJake McArthur:
I did indeed mean the latter, why you didn’t use 3D images. To quote your quotation by Luke Palmer, “If extreme idealism isn’t working out for you, maybe it is because you are not wholeheartedly following your own ideals.” Do you, for any reason besides ease of implementation, not see 3D images (perhaps with some contraints on what kind of operations are actually exposed to be friendly to the GPU) as an ideal? (You may delete my previous comment. I did not mean to hit submit.)
23 November 2009, 5:08 pmconal:
Jake: yes, I am copping out here and choosing an implementation-friendly, unnatural model in going for surfaces (over 2D) instead of a 3D model. And as always, the cheater has to pay. For instance, my cop-out means that I can’t do CSG naturally, nor the effect of the medium (air, smoke, heat, water, glass, etc) through which light passes.
Hm. It feels good to come clean. No wonder Catholics like going to confession.
23 November 2009, 5:23 pmMichael Robinson:
Thanks for the inspiring article! As an applied mathematician, with strong affinity for differerntial (and other) topology, I’m wondering how much of this might abstract in dimension. (I’m tangentially responding to Luke’s comment, as well.)
Namely, can you lift your discussion easily into one concerning smooth maps between manifolds of arbitrary dimension (or better yet, definable maps for some o-minimal structure, so you can handle corners in a principled fashion)?
Of course, to supply some hedged answers (though without a doubt you’re more aware of how to formalize this into Haskell than I) it seems that there are two major obstructions: 1. Abstracting over dimension seems to write. Lists for instance, don’t have compile-time length guarantees, and tuples get awkward when you must say (R,R,R,…R). Then again, I really appreciate your vector space library! 2. Bookkeeping manifold charts is bound to be kind of troublesome. On one hand, you might require charts to look the same, with specified ways to combine them (simplicial complexes are a really rigid way to manage this, Cw complexes are more flexible, but still restrictive, and of course neither fit the usual definitions). The essential problem is one of defining subsets of R^n in a nice way, perhaps o-minimal structures help here too.
In any event, this is idle speculation; but it would be a helpful semantics for computing on manifolds in Haskell!
23 November 2009, 7:34 pmconal:
Thanks for the suggestions, Michael. Mark Rafter made some detailed suggestions along these lines in March, which I’d forgotten. Worth picking up!
23 November 2009, 7:58 pmJake McArthur:
“Hm. It feels good to come clean. No wonder Catholics like going to confession.”
Exposing the limitations of a model also gives me a clearer picture of it, which makes it simpler for me to reason about, although it loses a superficial appearance of simplicity in the process. Perhaps a good analogy would be my taste for Arch Linux over Ubuntu. They both share the same basic architecture, but Ubuntu hides its warts with a simple interface, having the advantage of not initially confusing a user, but the disadvantage of masking the true cause of problems that do arise, while Arch intentionally exposes the inherent complexity of its architecture, perhaps making the learning curve steeper, but also making it easier to reason about due to its full disclosure. While Ubuntu has an abstraction which Arch lacks, it is not a good abstraction because it leaks, therefore the user must learn not only the abstraction but also how it works. Masking or failing to make clear the inherent flaws of a model, I think, is kind of like a leaky abstraction.
24 November 2009, 5:25 pmJake McArthur:
I think maybe I pushed that “leaky abstraction” analogy a bit far. All I really mean is that I prefer a model with explicit flaws over a model with hidden flaws.
24 November 2009, 5:27 pmPatai Gergely:
So do you have any intuition about what kind of shapes this essence you’re looking for might take? Are you looking for the ‘SK calculus of 3D geometry’, or a higher-level view?
Of course it is slippery, since there are several ways to look at the same thing, which I also alluded to. But I was basically trying to say here that the ultimate ‘grand unified theory’ should be able to capture all these facets, so there will be a point where you’ll have to consider these connections. As for extrapolation to yet unknown problems, my feeling is that it will practically come for free if the unification is successful.
25 November 2009, 1:30 pmconal:
Yes, I do. It will be simple, familiar, and general to math/types/FP folks. Sum/product/function — that sort of thing. It will explain known domain-specific operations as special cases of more general/simple operations. There will be plenty of TCMs in the API and not much else. Like Reactive’s FRP model built simply from product (Future) and function (Behavior).
Oh — then I guess we’re on the same track.
25 November 2009, 4:15 pmVladimir Slepnev:
This comment is about 5 years late, but I have to write it because I know exactly the right answer to your question! Look up the work of Inigo Quilez on rendering with distance fields.
The basic idea is to represent a 3D object as a “distance field”, i.e. a continuous function from R^3 to R that returns the distance to the object. The value of that function is zero inside the object, and grows as you get further away from it. That technique is well suited to ray marching on the GPU, because when you’re marching along the ray, evaluating the distance field at the current point gives you the size of the next step, so you can make bigger steps when the ray passes far from the object. Simple objects like spheres have distance fields that are easy to define, and simple operations on objects often correspond to simple operations on distance fields. It’s also easy to approximate things like normal direction, ambient occlusion, and even penumbras (send a ray to the light source and note how closely it passed to the occluder).
In a perfect world, we could have a combinator library in Haskell for rendering distance fields that would compile to a single GPU shader program. It’s scary how well this idea fits together.
11 May 2014, 2:35 amDoug Moen:
I implemented Vladimir Slepnev’s idea (not in Haskell, but another pure functional language). Using signed distance fields as the shape representation gives a continuous functional representation for 3D solids, with union, morph, blending, etc. You can represent a lot of shapes that aren’t representable using traditional methods like vector graphics, or using discrete representations or like voxels or triangle meshes. For example, you can model fractals, which have theoretically infinite detail, which you can zoom into. I compile my functional code into a single GPU shader program, as per Inigo Quilez. I’m pretty excited by the results. But like all representations for 3D solids, it’s a leaky abstraction. You ultimately have to be aware of performance issues, and some shapes can’t be represented because their distance functions would be too expensive to evaluate. Sphere tracing requires the distance function to be Lipschitz(1). If you have a rich set of shape operators, like twist and bend and morph, then the output can violate the Lipschitz(1) requirement in certain cases, and ultimately the user has to be aware of this (leaky abstraction) and work around the problem. I’m extending my program to overcome these problems, so that I can represent a larger set of shapes, by giving the user more control over the internal shape representation, and providing multiple representations and rendering methods, but by making the program more powerful in this way, the leaky abstraction problem gets worse. There is no non-leaky abstraction for real numbers, so it’s not surprising that the same problems reoccur for geometric shapes.
30 April 2020, 3:35 pmConal:
Thanks for the comment, Doug! I’d be grateful for more references, including how the Lipschitz condition relates to sphere tracing and how that requirement gets violated. Also the project you’re working on (if public) and/or related projects. Also the project you’re working on (if public) and/or related projects, as well as the work you mentioned by Vladimir Slepnev and by Inigo Quilez.
30 April 2020, 5:42 pmDoug Moen:
Hi Conal. My project is https://github.com/curv3d/curv. Curv is a curried, pure functional language that I created as a DSL for making 2D and 3D shapes. You’ve done a lot of research that matches up with what I’m trying to do, so I’m reading your papers now: FRP, Tangible Values, etc. (My shapes are already continuous functions of time; next step is to make them more generally interactive, beyond what I currently do using sliders to change parameters.)
An excellent reference is Sphere Tracing by John C. Hart: https://github.com/curv3d/curv/blob/master/docs/papers/sphere_tracing.pdf
Here’s a paper I wrote on the math behind Curv’s shape representation. It says some things that Hart does not, which are relevant to Curv: https://github.com/curv3d/curv/blob/master/docs/Theory.rst
My library of shape combinators is documented here: https://github.com/curv3d/curv/tree/master/docs/shapes
This doc explains that a shape has a distance field, and it classifies distance fields as exact, mitred, approximate, bad and discontinuous. You need to know this because shape combinators are sensitive to the class of distance field they receive as input, and they may return a worse class of distance than what they received as input. https://github.com/curv3d/curv/blob/master/docs/shapes/Shapes.rst
The documentation for each shape combinator describes its limitations with respect to the classes of distance field that it accepts as input and produces as output.
Other people have addressed this problem (of users needing to worry about distance field properties) by restricting expressiveness, limiting the set of shapes and shape combinators you can define or use. All of the related projects listed below restrict expressiveness and thereby make the abstraction less leaky.
Related projects:
ImplicitCAD is written in Haskell, and uses the same signed distance representation. No GPU compilation, and there is a fixed set of distance field operations hard coded into the library, which you have to hack to define new primitives. It’s not extensible like Curv is. No colour or animation. https://github.com/colah/ImplicitCAD
libfive uses Scheme as its extension language. It has some restrictions relative to Curv: distance functions must be closed form expressions (no conditionals, no recursion), so many Curv primitives cannot be ported. But, there is no Lipshitz continuity restriction on distance functions, so this restriction removes a burden from users. Shapes are converted to a triangle mesh before being displayed, so no fine detail, no infinite shapes. Also no colour or animation. The author, Matt Keeter, has a paper in this year’s SigGraph on compiling his closed-form distance functions to GPU shader code without using sphere tracing. This is a new algorithm: the paper and the source code are not yet available at time of writing.
The video game “Dreams” on Playstation by Media Molecule uses signed distance fields to represent all geometric shapes in the game. You can create your own content using an editor, by unioning, differencing and blending a small fixed set of geometric primitives. The set of geometric primitives and shape operators is sharply limited to make everything well behaved. The GPU implementation is quite interesting and impressive; I’m studying it with an eye to duplicating their best tricks in Curv some day. https://www.mediamolecule.com/blog/article/siggraph_2015
30 April 2020, 7:06 pmConal:
Terrific. Thanks again, Doug! I’ve missed working in graphics, and I can use some inspiration right now.
I bet a lovely repackaging of your work would be as a reinterpretation of Haskell programs via Compiling to categories.
30 April 2020, 7:19 pm