## Circuits as a bicartesian closed category

My previous few posts have been about cartesian closed categories (CCCs). In From Haskell to hardware via cartesian closed categories, I gave a brief motivation: typed lambda expressions and the CCC vocabulary are equally expressive, but have different strengths:

• In Haskell, the CCC vocabulary is overloadable and so can be interpreted more flexibly than lambda and application.
• Lambda expressions are friendlier for human programmers to write and read.

An interpretation I’m especially keen on—and the one that inspired this series of posts—is circuits, as described in this post.

## Optimizing CCCs

In the post Overloading lambda, I gave a translation from a typed lambda calculus into the vocabulary of cartesian closed categories (CCCs). This simple translation leads to unnecessarily complex expressions. For instance, the simple lambda term, “`λ ds → (λ (a,b) → (b,a)) ds`”, translated to a rather complicated CCC term:

``apply ∘ (curry (apply ∘ (apply ∘ (const (,) △ (id ∘ exr) ∘ exr) △ (id ∘ exl) ∘ exr)) △ id)``

(Recall from the previous post that `(∘)` binds more tightly than `(△)` and `(▽)`.)

However, we can do much better, translating to

``exr △ exl``

which says to pair the right and left halves of the argument pair, i.e., swap.

This post applies some equational properties to greatly simplify/optimize the result of translation to CCC form, including example above. First I’ll show the equational reasoning and then how it’s automated in the lambda-ccc library.

Haskell’s type class facility is a powerful abstraction mechanism. Using it, we can overload multiple interpretations onto a single vocabulary, with each interpretation corresponding to a different type. The class laws constrain these interpretations and allow reasoning that is valid over all (law-abiding) instances—even ones not yet defined.

As Haskell is a higher-order functional language in the heritage of Church’s (typed) lambda calculus, it also supports “lambda abstraction”.

Sadly, however, these two forms of abstraction don’t go together. When we use the vocabulary of lambda abstraction (“`λ x → ⋯`”) and application (“`u v`”), our expressions can only be interpreted as one type (constructor), namely functions. (Note that I am not talking about parametric polymorphism, which is available with both lambda abstraction and type-class-style overloading.) Is it possible to overload lambda and application using type classes, or perhaps in the same spirit? The answer is yes, and there are some wonderful benefits of doing so. I’ll explain the how in this post and hint at the why, to be elaborated in futures posts.

## From Haskell to hardware via cartesian closed categories

Since fall of last year, I’ve been working at Tabula, a Silicon Valley start-up developing an innovative programmable hardware architecture called “Spacetime”, somewhat similar to an FPGA, but much more flexible and efficient. I met the founder, Steve Teig, at a Bay Area Haskell Hackathon in February of 2011. He described his Spacetime architecture, which is based on the geometry of the same name, developed by Hermann Minkowski to elegantly capture Einstein’s theory of special relativity. Within the first 30 seconds or so of hearing what Steve was up to, I knew I wanted to help.

The vision Steve shared with me included not only a better alternative for hardware designers (programmed in hardware languages like Verilog and VHDL), but also a platform for massively parallel execution of software written in a purely functional language. Lately, I’ve been working mainly on this latter aspect, and specifically on the problem of how to compile Haskell. Our plan is to develop the Haskell compiler openly and encourage collaboration. If anything you see in this blog series interests you, and especially if have advice or you’d like to collaborate on the project, please let me know.

In my next series of blog posts, I’ll describe some of the technical ideas I’ve been working with for compiling Haskell for massively parallel execution. For now, I want to introduce a central idea I’m using to approach the problem.

## Reimagining matrices

The function of the imagination is not
to make strange things settled, so much as
to make settled things strange.

- G.K. Chesterton

Why is matrix multiplication defined so very differently from matrix addition? If we didn’t know these procedures, could we derive them from first principles? What might those principles be?

This post gives a simple semantic model for matrices and then uses it to systematically derive the implementations that we call matrix addition and multiplication. The development illustrates what I call “denotational design”, particularly with type class morphisms. On the way, I give a somewhat unusual formulation of matrices and accompanying definition of matrix “multiplication”.

For more details, see the linear-map-gadt source code.

Edits:

• 2012–12–17: Replaced lost $B$ entries in description of matrix addition. Thanks to Travis Cardwell.

Note: I’m using MathML for the math below, which appears to work well on Firefox but on neither Safari nor Chrome. I use Pandoc to generate the HTML+MathML from markdown+lhs+LaTeX. There’s probably a workaround using different Pandoc settings and requiring some tweaks to my WordPress installation. If anyone knows how (especially the WordPress end), I’d appreciate some pointers.

## Parallel speculative addition via memoization

I’ve been thinking much more about parallel computation for the last couple of years, especially since starting to work at Tabula a year ago. Until getting into parallelism explicitly, I’d naïvely thought that my pure functional programming style was mostly free of sequential bias. After all, functional programming lacks the implicit accidental dependencies imposed by the imperative model. Now, however, I’m coming to see that designing parallel-friendly algorithms takes attention to minimizing the depth of the remaining, explicit data dependencies.

As an example, consider binary addition, carried out from least to most significant bit (as usual). We can immediately compute the first (least significant) bit of the result, but in order to compute the second bit, we’ll have to know whether or not a carry resulted from the first addition. More generally, the $\left(n+1\right)$th sum & carry require knowing the $n$th carry, so this algorithm does not allow parallel execution. Even if we have one processor per bit position, only one processor will be able to work at a time, due to the linear chain of dependencies.

One general technique for improving parallelism is speculation—doing more work than might be needed so that we don’t have to wait to find out exactly what will be needed. In this post, we’ll see a progression of definitions for bitwise addition. We’ll start with a linear-depth chain of carry dependencies and end with logarithmic depth. Moreover, by making careful use of abstraction, these versions will be simply different type specializations of a single polymorphic definition with an extremely terse definition.

• 2011-06-04: "Suppose we have a bottom-up tree of top-down trees, i.e., `t ∷ TB (TT a)`. Was backwards. (Thanks to Noah Easterly.)
• 2011-06-04: Notation: "`f ➶ n`" and "`f ➴ n`".