<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Conal Elliott &#187; category</title>
	<atom:link href="http://conal.net/blog/tag/category/feed" rel="self" type="application/rss+xml" />
	<link>http://conal.net/blog</link>
	<description>Inspirations &#38; experiments, mainly about denotative/functional programming in Haskell</description>
	<lastBuildDate>Thu, 25 Jul 2019 18:15:11 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.1.17</generator>
	<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2F&amp;language=en_US&amp;category=text&amp;title=Conal+Elliott&amp;description=Inspirations+%26amp%3B+experiments%2C+mainly+about+denotative%2Ffunctional+programming+in+Haskell&amp;tags=blog" type="text/html" />
	<item>
		<title>Optimizing CCCs</title>
		<link>http://conal.net/blog/posts/optimizing-cccs</link>
		<comments>http://conal.net/blog/posts/optimizing-cccs#comments</comments>
		<pubDate>Sat, 14 Sep 2013 01:27:22 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[category]]></category>
		<category><![CDATA[CCC]]></category>
		<category><![CDATA[overloading]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=537</guid>
		<description><![CDATA[In the post Overloading lambda, I gave a translation from a typed lambda calculus into the vocabulary of cartesian closed categories (CCCs). This simple translation leads to unnecessarily complex expressions. For instance, the simple lambda term, “λ ds → (λ (a,b) → (b,a)) ds”, translated to a rather complicated CCC term: apply ∘ (curry (apply [&#8230;]]]></description>
				<content:encoded><![CDATA[<p><!-- references --></p>

<p><!-- teaser --></p>

<p>In the post <a href="http://conal.net/blog/posts/overloading-lambda/" title="blog post"><em>Overloading lambda</em></a>, I gave a translation from a typed lambda calculus into the vocabulary of cartesian closed categories (CCCs). This simple translation leads to unnecessarily complex expressions. For instance, the simple lambda term, “<code>λ ds → (λ (a,b) → (b,a)) ds</code>”, translated to a rather complicated CCC term:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">curry</span> (apply ∘ (apply ∘ (<span class="fu">const</span> (,) △ (<span class="fu">id</span> ∘ exr) ∘ exr) △ (<span class="fu">id</span> ∘ exl) ∘ exr)) △ <span class="fu">id</span>)</code></pre>

<p>(Recall from the previous post that <code>(∘)</code> binds more tightly than <code>(△)</code> and <code>(▽)</code>.)</p>

<p>However, we can do much better, translating to</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">exr △ exl</code></pre>

<p>which says to pair the right and left halves of the argument pair, i.e., swap.</p>

<p>This post applies some equational properties to greatly simplify/optimize the result of translation to CCC form, including example above. First I’ll show the equational reasoning and then how it’s automated in the <a href="https://github.com/conal/lambda-ccc" title="Github project">lambda-ccc</a> library.</p>

<p><span id="more-537"></span></p>

<h3 id="equational-reasoning-on-ccc-terms">Equational reasoning on CCC terms</h3>

<p>First, use the identity/composition laws:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">f ∘ <span class="fu">id</span> ≡ f
<span class="fu">id</span> ∘ g ≡ g</code></pre>

<p>Our example is now slightly simpler:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">curry</span> (apply ∘ (apply ∘ (<span class="fu">const</span> (,) △ exr ∘ exr) △ exl ∘ exr)) △ <span class="fu">id</span>)</code></pre>

<p>Next, consider the subterm <code>apply ∘ (const (,) △ exr ∘ exr)</code>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply ∘ (<span class="fu">const</span> (,) △ exr ∘ exr)
≡ <span class="co">{- definition of (∘)  -}</span>
  λ x <span class="ot">→</span> apply ((<span class="fu">const</span> (,) △ exr ∘ exr) x)
≡ <span class="co">{- definition of (△) -}</span>
  λ x <span class="ot">→</span> apply (<span class="fu">const</span> (,) x, (exr ∘ exr) x)
≡ <span class="co">{- definition of apply -}</span>
  λ x <span class="ot">→</span> <span class="fu">const</span> (,) x ((exr ∘ exr) x)
≡ <span class="co">{- definition of const -}</span>
  λ x <span class="ot">→</span> (,) ((exr ∘ exr) x)
≡ <span class="co">{- η-reduce -}</span>
  (,) ∘ (exr ∘ exr)</code></pre>

<p>We didn’t use any properties of <code>(,)</code> or of <code>(exr ∘ exr)</code>, so let’s generalize:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply ∘ (<span class="fu">const</span> g △ f)
≡ λ x <span class="ot">→</span> apply ((<span class="fu">const</span> g △ f) x)
≡ λ x <span class="ot">→</span> apply (<span class="fu">const</span> g x, f x)
≡ λ x <span class="ot">→</span> <span class="fu">const</span> g x (f x)
≡ λ x <span class="ot">→</span> g (f x)
≡ g ∘ f</code></pre>

<p>(Note that I’ve cheated here by appealing to the <em>function</em> interpretations of <code>apply</code> and <code>const</code>. <em>Question:</em> Is there a purely algebraic proof, using only the CCC laws?)</p>

<p>With this equivalence, our example simplifies further:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">curry</span> (apply ∘ ((,) ∘ exr ∘ exr △ exl ∘ exr)) △ <span class="fu">id</span>)</code></pre>

<p>Next, lets focus on <code>apply ∘ ((,) ∘ exr ∘ exr △ exl ∘ exr)</code>. Generalize to <code>apply ∘ (h ∘ f △ g)</code> and fiddle about:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply ∘ (h ∘ f △ g)
≡ λ x <span class="ot">→</span> apply (h (f x), g x)
≡ λ x <span class="ot">→</span> h (f x) (g x)
≡ λ x <span class="ot">→</span> <span class="fu">uncurry</span> h (f x, g x)
≡ <span class="fu">uncurry</span> h ∘ (λ x <span class="ot">→</span> (f x, g x))
≡ <span class="fu">uncurry</span> h ∘ (f △ g)</code></pre>

<p>Apply to our example:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">curry</span> (<span class="fu">uncurry</span> (,) ∘ (exr ∘ exr △ exl ∘ exr)) △ <span class="fu">id</span>)</code></pre>

<p>We can simplify <code>uncurry (,)</code> as follows:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  <span class="fu">uncurry</span> (,)
≡ λ (x,y) <span class="ot">→</span> <span class="fu">uncurry</span> (,) (x,y)
≡ λ (x,y) <span class="ot">→</span> (,) x y
≡ λ (x,y) <span class="ot">→</span> (x,y)
≡ <span class="fu">id</span></code></pre>

<p>Together with the left identity law, our example now becomes</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">curry</span> (exr ∘ exr △ exl ∘ exr) △ <span class="fu">id</span>)</code></pre>

<p>Next use the law that relates <code>(∘)</code> and <code>(△)</code>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">f ∘ r △ g ∘ r ≡ (f △ g) ∘ r</code></pre>

<p>In our example, <code>exr ∘ exr △ exl ∘ exr</code> becomes <code>(exr △ exl) ∘ exr</code>, so we have</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">curry</span> ((exr △ exl) ∘ exr) △ <span class="fu">id</span>)</code></pre>

<p>Let’s now look at how <code>apply</code>, <code>(△)</code>, and <code>curry</code> interact:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply ∘ (<span class="fu">curry</span> h △ g)
≡ λ p <span class="ot">→</span> apply ((<span class="fu">curry</span> h △ g) p)
≡ λ p <span class="ot">→</span> apply (<span class="fu">curry</span> h p, g p)
≡ λ p <span class="ot">→</span> <span class="fu">curry</span> h p (g p)
≡ λ p <span class="ot">→</span> h (p, g p)
≡ h ∘ (<span class="fu">id</span> △ g)</code></pre>

<p>We can add more variety for other uses:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply ∘ (<span class="fu">curry</span> h ∘ f △ g)
≡ λ p <span class="ot">→</span> apply ((<span class="fu">curry</span> h ∘ f △ g) p)
≡ λ p <span class="ot">→</span> apply (<span class="fu">curry</span> h (f p), g p)
≡ λ p <span class="ot">→</span> <span class="fu">curry</span> h (f p) (g p)
≡ λ p <span class="ot">→</span> h (f p, g p)
≡ h ∘ (f △ g)</code></pre>

<p>With this rule (even in its more specialized form),</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">curry</span> ((exr △ exl) ∘ exr) △ <span class="fu">id</span>)</code></pre>

<p>becomes</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(exr △ exl) ∘ exr ∘ (<span class="fu">id</span> △ <span class="fu">id</span>)</code></pre>

<p>Next use the universal property of <code>(△)</code>, which is that it is the unique solution of the following two equations (universally quantified over <code>f</code> and <code>g</code>):</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">exl ∘ (f △ g) ≡ f
exr ∘ (f △ g) ≡ g</code></pre>

<p>(See <a href="http://www.cs.ox.ac.uk/jeremy.gibbons/publications/acmmpc-calcfp.pdf" title="Paper by Jeremy Gibbons"><em>Calculating Functional Programs</em></a>, Section 1.3.6.)</p>

<p>Applying the second rule to <code>exr ∘ (id △ id)</code> gives <code>id</code>, so our <code>swap</code> example becomes</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">exr △ exl</code></pre>

<h3 id="automation">Automation</h3>

<p>By using a collection of equational properties, we’ve greatly simplified our CCC example. These properties and more are used in <a href="https://github.com/conal/lambda-ccc/blob/master/src/LambdaCCC/CCC.hs" title="Source module"><code>LambdaCCC.CCC</code></a> to simplify CCC terms during construction. As a general technique, whenever building terms, rather than applying the GADT constructors directly, we’ll use so-called “smart constructors” with built-in optimizations. I’ll show a few smart constructor definitions here. See the <a href="https://github.com/conal/lambda-ccc/blob/master/src/LambdaCCC/CCC.hs" title="Source module"><code>LambdaCCC.CCC</code></a> source code for others.</p>

<p>As a first simple example, consider the identity laws for composition:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">f ∘ <span class="fu">id</span> ≡ f
<span class="fu">id</span> ∘ g ≡ g</code></pre>

<p>Since the top-level operator on the LHSs (left-hand sides) is <code>(∘)</code>, we can easily implement these laws in a “smart constructor” for <code>(∘)</code>, which handles special cases and uses the plain (dumb) constructor if no simplifications apply:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">infixr</span> <span class="dv">9</span> <span class="fu">@</span>∘
(<span class="fu">@</span>∘) <span class="ot">∷</span> (b ↣ c) <span class="ot">→</span> (a ↣ b) <span class="ot">→</span> (a ↣ c)
⋯ <span class="co">-- simplifications go here</span>
g <span class="fu">@</span>∘ f  <span class="fu">=</span> g <span class="fu">:</span>∘ f</code></pre>

<p>where <code>↣</code> is the GADT that represents biCCC terms, as shown in <a href="http://conal.net/blog/posts/overloading-lambda/" title="blog post"><em>Overloading lambda</em></a>.</p>

<p>The identity laws are easy to implement:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">f <span class="fu">@</span>∘ <span class="dt">Id</span> <span class="fu">=</span> f
<span class="dt">Id</span> <span class="fu">@</span>∘ g <span class="fu">=</span> g</code></pre>

<p>Next, the <code>apply</code>/<code>const</code> law derived above:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">const</span> g △ f) ≡ g ∘ f</code></pre>

<p>This rule translates fairly easily:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="dt">Apply</span> <span class="fu">@</span>∘ (<span class="dt">Const</span> g <span class="fu">:</span>△ f) <span class="fu">=</span> prim g <span class="fu">@</span>∘ f</code></pre>

<p>where <code>prim</code> is a smart constructor for <code>Prim</code>.</p>

<p>There are some details worth noting:</p>

<ul>
<li>The LHS uses only dumb constructors and variables except for the smart constructor being defined (here <code>(@∘)</code>).</li>
<li>Besides variables bound on the LHS, the RHS uses only smart constructors, so that the constructed combinations are optimized as well. For instance, <code>f</code> might be <code>Id</code> here.</li>
</ul>

<p>Despite these details, this definition is inadequate in many cases. Consider the following example:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ ((<span class="fu">const</span> u △ v) ∘ w)</code></pre>

<p><em>Syntactically</em>, the LHS of our rule <em>does not</em> match this term, because the two compositions are associated to the right instead of the left. <em>Semantically</em>, the rules does match, since composition is associative. In order to apply this rule, we can first left-associate and then apply the rule.</p>

<p>We could associate <em>all</em> compositions to the left during construction, in which case this rule will apply purely via syntactic matching. However, there will be other rewrites that require <em>right</em>-association in order to apply. Instead, for rules like this one, let’s explicitly left-decompose.</p>

<p>Suppose we have a smart constructor <code>composeApply g</code> that constructs an optimized version of <code>apply ∘ g</code>. This equivalence implies the following type:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">composeApply <span class="ot">∷</span> (z ↣ (a ⇨ b) × a) <span class="ot">→</span> (z ↣ b)</code></pre>

<p>Thus</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply ∘ (g ∘ f)
≡ (apply ∘ g) ∘ f
≡ composeApply g ∘ f</code></pre>

<p>Now we can define a general rule for composing <code>apply</code>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="dt">Apply</span> <span class="fu">@</span>∘ (decompL <span class="ot">→</span> g <span class="fu">:</span>∘ f) <span class="fu">=</span> composeApply g <span class="fu">@</span>∘ f</code></pre>

<p>The function <code>decompL</code> (defined below) does a left-decomposition and is conveniently used here in a <a href="http://ghc.haskell.org/trac/ghc/wiki/ViewPatterns" title="GHC wiki page">view pattern</a>. It decomposes a given term into <code>g ∘ f</code>, where <code>g</code> is as small as possible, but not <code>Id</code>. Where <code>decompL</code> finds such a decomposition, it yields a term with a top-level <code>(:∘)</code> constructor, and <code>composeApply</code> is used. Otherwise, the clause fails.</p>

<p>The implementation of <code>decompL</code>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">decompL <span class="ot">∷</span> (a ↣ c) <span class="ot">→</span> (a ↣ c)
decompL <span class="dt">Id</span>                        <span class="fu">=</span> <span class="dt">Id</span>
decompL ((decompL <span class="ot">→</span> h <span class="fu">:</span>∘ g) <span class="fu">:</span>∘ f) <span class="fu">=</span> h <span class="fu">:</span>∘ (g <span class="fu">@</span>∘ f)
decompL comp<span class="fu">@</span>(_ <span class="fu">:</span>∘ _)             <span class="fu">=</span> comp
decompL f                         <span class="fu">=</span> f <span class="fu">:</span>∘ <span class="dt">Id</span></code></pre>

<p>There’s also <code>decompR</code> for right-factoring, similarly defined.</p>

<p>Note that I broke my rule of using only smart constructors on RHSs, since I specifically want to generate a <code>(:∘)</code> term.</p>

<p>With this re-association trick in place, we can now look at compose/apply rules.</p>

<p>The equivalence</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">const</span> g △ f) ≡ g ∘ f</code></pre>

<p>becomes</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">composeApply (<span class="dt">Const</span> p <span class="fu">:</span>△ f) <span class="fu">=</span> prim p <span class="fu">@</span>∘ f</code></pre>

<p>Likewise, the equivalence</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (h ∘ f △ g) ≡ <span class="fu">uncurry</span> h ∘ (f △ g)</code></pre>

<p>becomes</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">composeApply (h <span class="fu">:</span>∘ f <span class="fu">:</span>△ g) <span class="fu">=</span> uncurryE h <span class="fu">@</span>∘ (f △ g)</code></pre>

<p>where <code>(△)</code> is the smart constructor for <code>(:△)</code>, and <code>uncurryE</code> is a smart constructor for <code>Uncurry</code>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">uncurryE <span class="ot">∷</span> (a ↣ (b ⇨ c)) <span class="ot">→</span> (a × b ↣ c)
uncurryE (<span class="dt">Curry</span> f)    <span class="fu">=</span> f
uncurryE (<span class="dt">Prim</span> <span class="dt">PairP</span>) <span class="fu">=</span> <span class="dt">Id</span>
uncurryE h            <span class="fu">=</span> <span class="dt">Uncurry</span> h</code></pre>

<p>Two more <code>(∘)</code>/<code>apply</code> properties:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply ∘ (<span class="fu">curry</span> (g ∘ exr) △ f)
≡ λ x <span class="ot">→</span> <span class="fu">curry</span> (g ∘ exr) x (f x)
≡ λ x <span class="ot">→</span> (g ∘ exr) (x, f x)
≡ λ x <span class="ot">→</span> g (f x)
≡ g ∘ f</code></pre>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply ∘ first f
≡ λ p <span class="ot">→</span> apply (first f p)
≡ λ (a,b) <span class="ot">→</span> apply (first f (a,b))
≡ λ (a,b) <span class="ot">→</span> apply (f a, b)
≡ λ (a,b) <span class="ot">→</span> f a b
≡ <span class="fu">uncurry</span> f</code></pre>

<p>The <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Arrow.html#v:first"><code>first</code></a> combinator is not represented directly in our <code>(↣)</code> data type, but rather is defined via simpler parts in <a href="https://github.com/conal/lambda-ccc/blob/master/src/LambdaCCC/CCC.hs" title="Source module"><code>LambdaCCC.CCC</code></a>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">first <span class="ot">∷</span> (a ↣ c) <span class="ot">→</span> (a × b ↣ c × b)
first f <span class="fu">=</span> f × <span class="dt">Id</span>

(×) <span class="ot">∷</span> (a ↣ c) <span class="ot">→</span> (b ↣ d) <span class="ot">→</span> (a × b ↣ c × d)
f × g <span class="fu">=</span> f <span class="fu">@</span>∘ <span class="dt">Exl</span> △ g <span class="fu">@</span>∘ <span class="dt">Exr</span></code></pre>

<p>Implementations of these two properties:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">composeApply (<span class="dt">Curry</span> (decompR <span class="ot">→</span> g <span class="fu">:</span>∘ <span class="dt">Exr</span>) <span class="fu">:</span>△ f) <span class="fu">=</span> g <span class="fu">@</span>∘ f

composeApply (f <span class="fu">:</span>∘ <span class="dt">Exl</span> <span class="fu">:</span>△ <span class="dt">Exr</span>) <span class="fu">=</span> uncurryE f</code></pre>

<p>These properties arose while examining CCC terms produced by translation from lambda terms. See the <a href="https://github.com/conal/lambda-ccc/blob/master/src/LambdaCCC/CCC.hs" title="Source module"><code>LambdaCCC.CCC</code></a> for more optimizations. I expect that others will arise with more experience.</p>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=537&amp;md5=6cc29dd4b357a5b351657f3fd9c166d5"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/optimizing-cccs/feed</wfw:commentRss>
		<slash:comments>4</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Foptimizing-cccs&amp;language=en_GB&amp;category=text&amp;title=Optimizing+CCCs&amp;description=In+the+post+Overloading+lambda%2C+I+gave+a+translation+from+a+typed+lambda+calculus+into+the+vocabulary+of+cartesian+closed+categories+%28CCCs%29.+This+simple+translation+leads+to+unnecessarily+complex+expressions....&amp;tags=category%2CCCC%2Coverloading%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Overloading lambda</title>
		<link>http://conal.net/blog/posts/overloading-lambda</link>
		<comments>http://conal.net/blog/posts/overloading-lambda#comments</comments>
		<pubDate>Fri, 13 Sep 2013 16:31:40 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[category]]></category>
		<category><![CDATA[CCC]]></category>
		<category><![CDATA[overloading]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=533</guid>
		<description><![CDATA[Haskell’s type class facility is a powerful abstraction mechanism. Using it, we can overload multiple interpretations onto a single vocabulary, with each interpretation corresponding to a different type. The class laws constrain these interpretations and allow reasoning that is valid over all (law-abiding) instances—even ones not yet defined. As Haskell is a higher-order functional language [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>Haskell’s type class facility is a powerful abstraction mechanism. Using it, we can overload multiple interpretations onto a single vocabulary, with each interpretation corresponding to a different type. The class laws constrain these interpretations and allow reasoning that is valid over all (law-abiding) instances—even ones not yet defined.</p>

<p>As Haskell is a higher-order functional language in the heritage of Church’s (typed) lambda calculus, it also supports “lambda abstraction”.</p>

<p>Sadly, however, these two forms of abstraction don’t go together. When we use the vocabulary of lambda abstraction (“<code>λ x → ⋯</code>”) and application (“<code>u v</code>”), our expressions can only be interpreted as one type (constructor), namely functions. (Note that I am not talking about parametric polymorphism, which is available with both lambda abstraction and type-class-style overloading.) Is it possible to overload lambda and application using type classes, or perhaps in the same spirit? The answer is yes, and there are some wonderful benefits of doing so. I’ll explain the how in this post and hint at the why, to be elaborated in futures posts.</p>

<p><span id="more-533"></span></p>

<h3 id="generalizing-functions">Generalizing functions</h3>

<p>First, let’s look at a related question. Instead of generalized interpretation of the particular <em>vocabulary</em> of lambda abstraction and application, let’s look at re-expressing functions via an alternative vocabulary that can be generalized more readily. If you are into math or have been using Haskell for a while, you may already know where I’m going: the mathematical notion of a <em>category</em> (and the embodiment in the <code>Category</code> and <code>Arrow</code> type classes).</p>

<p>Much has been written about categories, both in the setting of math and of Haskell, so I’ll give only the most cursory summary here.</p>

<p>Recall that every function has two associated sets (or types, CPOs, etc) often referred to as the function’s “domain” and “range”. (As <a href="http://math.stackexchange.com/questions/59432/domain-co-domain-range-of-a-function">explained elsewhere</a>, the term “range” can be misleading.) Moreover, there are two general building blocks (among others) for functions, namely the identity function and composition of compatibly typed functions, satisfying the following properties:</p>

<ul>
<li><em>left identity:</em>  <code>id ∘ f ≡ f</code></li>
<li><em>right identity:</em>  <code>f ∘ id ≡ f</code></li>
<li><em>associativity:</em>  <code>h ∘ (g ∘ f) ≡ (h ∘ g) ∘ f</code></li>
</ul>

<p>Now we can separate these properties from the other specifics of functions. A <em>category</em> is something that has these properties but needn’t be function-like in other ways. Each category has <em>objects</em> (e.g., sets) and <em>morphisms/arrows</em> (e.g., functions), and two building blocks <code>id</code> and <code>(∘)</code> on compatible morphisms. Rather than “domain” and “range”, we usually use the terms (a) “domain” and “codomain” or (b) “source” and “target”.</p>

<p>Examples of categories include sets &amp; functions (as we’ve seen), restricted sets &amp; functions (e.g., vector spaces &amp; linear transformations), preorders, and any monoid (as a one-object category).</p>

<p>The notion of category is very general and correspondingly weak. By imposing so few constraints, it embraces a wide range mathematical notions (including many appearing in programming) but gives correspondingly little leverage with which to define and prove more specific ideas and theorems. Thus we’ll often want additional structure, including products, coproducts (with products distributing over coproducts) and a notion of “exponential”, which is an object that represents a morphism. For the familiar terrain of set/types and functions, products correspond to pairing, coproducts to sums (and choice), and exponentials to functions as things/values. (In programming, we often refer to exponentials as the types of “first class functions”. Some languages have them, and some don’t.) These aspects—together with associated laws—are called “cartesian”, “cocartesian”, and “closed”, respectively. Altogether, we have “bicartesian closed categories”, more succinctly called “biCCCs” (or “CCCs”, without the cocartesian requirement).</p>

<p>The <em>cartesian</em> vocabulary consists a product operation on objects, <code>a × b</code>, plus three morphism building blocks:</p>

<ul>
<li><code>exl ∷ a × b ↝ a</code></li>
<li><code>exr ∷ a × b ↝ b</code></li>
<li><code>f △ g ∷ a ↝ b × c</code> where <code>f ∷ a ↝ b</code> and <code>g ∷ a ↝ c</code></li>
</ul>

<p>I’m using “<code>↝</code>” to refer to morphisms.</p>

<p>We’ll also want the dual notion of coproducts, <code>a + b</code>, with building blocks and laws exactly dual to products:</p>

<ul>
<li><code>inl ∷ a ↝ a + b</code></li>
<li><code>inr ∷ b ↝ a + b</code></li>
<li><code>f ▽ g ∷ a + b ↝ c</code> where <code>f ∷ a ↝ c</code> and <code>g ∷ b ↝ c</code></li>
</ul>

<p>You may have noticed that (a) <code>exl</code> and <code>exr</code> generalize <code>fst</code> and <code>snd</code>, (b) <code>inl</code> and <code>inr</code> generalize <code>Left</code> and <code>Right</code>, and (c) <code>(△)</code> and <code>(▽)</code> come from <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Arrow.html" title="Hackage documentation"><code>Control.Arrow</code></a>, where they’re called “<code>(&amp;&amp;&amp;)</code>” and “<code>(|||)</code>”. I took the names above from <a href="http://www.cs.ox.ac.uk/jeremy.gibbons/publications/acmmpc-calcfp.pdf" title="Paper by Jeremy Gibbons"><em>Calculating Functional Programs</em></a>, where <code>(△)</code> and <code>(▽)</code> are also called “fork” and “join”.</p>

<p>For product and coproduct laws, see <a href="http://www.cs.ox.ac.uk/jeremy.gibbons/publications/acmmpc-calcfp.pdf" title="Paper by Jeremy Gibbons"><em>Calculating Functional Programs</em></a> (pp 155–156) or <a href="http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.125" title="Paper by Erik Meijer, Maarten Fokkinga, and Ross Paterson"><em>Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire</em></a> (p 9).</p>

<p>The <em>closed</em> vocabulary consists of an exponential operation on objects, <code>a ⇨ b</code> (often written “<math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><msup><mi>b</mi><mi>a</mi></msup></mrow></math>”), plus three morphism building blocks:</p>

<ul>
<li><code>uncurry h ∷ a × b ↝ c</code> where <code>h ∷ a ↝ (b ⇨ c)</code></li>
<li><code>curry f ∷ a ↝ (b ⇨ c)</code> where <code>f ∷ a × b ↝ c</code></li>
<li><code>apply ∷ (a ⇨ b) × a ↝ b</code> (sometimes called “<code>eval</code>”)</li>
</ul>

<p>Again, there are laws associated with <code>exl</code>, <code>exr</code>, <code>(△)</code>, <code>inl</code>, <code>inr</code>, <code>(▽)</code>, and with <code>curry</code>, <code>uncurry</code>, and <code>apply</code>.</p>

<p>In reading the signatures above, the operators <code>×</code>, <code>+</code>, and <code>⇨</code> all bind more tightly than <code>↝</code>, and <code>(∘)</code> binds more tightly than <code>(△)</code> and <code>(▽)</code>.</p>

<p>Keep in mind the distinction between morphisms (“<code>↝</code>”) and exponentials (“<code>⇨</code>”). The latter is a sort of data/object representation of the former.</p>

<h3 id="where-are-we-going">Where are we going?</h3>

<p>I suggested that the <em>vocabulary</em> of the lambda calculus—namely lambda abstraction and application—can be generalized beyond functions. Then I showed something else, which is that an <em>alternative</em> vocabulary (biCCC) that applies to functions can be overloaded beyond functions. Instead of overloading the lambda calculus notation, we could simply use the alternative algebraic notation of biCCCs. Unfortunately, doing so leads to rather ugly results. The lambda calculus is a much more human-friendly notation than the algebraic language of biCCC.</p>

<p>I’m not just wasting your time and mine, however; there is a way to combine the flexibility of biCCC with the friendliness of lambda calculus: <em>automatically translate from lambda calculus to biCCC form</em>. The discovery that typed lambda calculus can be interpreted in any CCC is due to Joachim Lambek. See pointers <a href="http://math.ucr.edu/home/baez/qg-fall2006/ccc.html">on John Baez’s blog</a>. (Coproducts do not arise in translation unless the source language has a constraint like <code>if-then-else</code> or definition by cases with pattern matching.)</p>

<h3 id="overview-from-lambda-expressions-to-biccc">Overview: from lambda expressions to biCCC</h3>

<p>We’re going to need a few pieces to complete this story and have it be useful in a language like Haskell:</p>

<ul>
<li>a representation of lambda expressions,</li>
<li>a representation of biCCC expressions,</li>
<li>a translation of lambda expressions to biCCC, and</li>
<li>a translation of Haskell to lambda expressions.</li>
</ul>

<p>This last step (which is actually the first step in turning Haskell into biCCC) is already done by a typical compiler. We start with a syntactically rich language and desugar it into a much smaller lambda calculus. GHC in particular has a small language called “Core”, which is much smaller than the Haskell source language.</p>

<p>I originally intended to convert from Core directly to biCCC form, but I found it difficult to do correctly. Core is dynamically typed, so a type-correct Haskell program can manipulate Core in type-incorrect ways. In other words, a type-correct Haskell program can construct type-incorrect Core. Moreover, Core representations contain an enormous amount of type information, since all type inference has already been done and recorded, so it is tedious to get all of the type information correct and thus likely to get it incorrect. For just this reason, GHC includes an explicit type-checker, “<a href="http://www.haskell.org/ghc/docs/6.10.4/html/users_guide/options-debugging.html#checking-consistency">Core Lint</a>”, for catching type inconsistencies (but not their causes) after the fact. While Core Lint is much better than nothing, it is less helpful than static checking, which points to inconsistencies in the source code (of the Core-manipulation).</p>

<p>Because I want static checking of my source code for lambda-to-biCCC conversion, I defined my own alternative to Core, using a generalized algebraic data type (GADT). The first step of translation then is conversion from GHC Core into this GADT.</p>

<p>The source fragments I’ll show below are from the Github project <a href="https://github.com/conal/lambda-ccc" title="Github project">lambda-ccc</a>.</p>

<h3 id="a-typeful-lambda-calculus-representation">A typeful lambda calculus representation</h3>

<p>In Haskell, pair types are usually written “<code>(a,b)</code>”, sums as “<code>Either a b</code>”, and functions as “<code>a → b</code>”. For the categorical generalizations (products, coproducts, and exponentials), I’ll instead use the notation “<code>a × b</code>”, “<code>a + b</code>”, and “<code>a ⇨ b</code>”. (My blogging software typesets some operators differently from what you’ll see in the <a href="https://github.com/conal/lambda-ccc/blob/master/src/LambdaCCC/Ty.hs" title="Source module">source code</a>.)</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">infixl</span> <span class="dv">7</span> ×
<span class="kw">infixl</span> <span class="dv">6</span> <span class="fu">+</span>
<span class="kw">infixr</span> <span class="dv">1</span> ⇨</code></pre>

<p>For reasons to become clearer in future posts, I’ll want a typed representation of types. The data constructors named to reflect the types they construct:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">Ty</span> <span class="ot">∷</span> <span class="fu">*</span> <span class="ot">→</span> <span class="fu">*</span> <span class="kw">where</span>
  <span class="dt">Unit</span> <span class="ot">∷</span> <span class="dt">Ty</span> <span class="dt">Unit</span>
  (×)  <span class="ot">∷</span> <span class="dt">Ty</span> a <span class="ot">→</span> <span class="dt">Ty</span> b <span class="ot">→</span> <span class="dt">Ty</span> (a × b)
  (<span class="fu">+</span>)  <span class="ot">∷</span> <span class="dt">Ty</span> a <span class="ot">→</span> <span class="dt">Ty</span> b <span class="ot">→</span> <span class="dt">Ty</span> (a <span class="fu">+</span> b)
  (⇨)  <span class="ot">∷</span> <span class="dt">Ty</span> a <span class="ot">→</span> <span class="dt">Ty</span> b <span class="ot">→</span> <span class="dt">Ty</span> (a ⇨ b)</code></pre>

<p>Note that <code>Ty a</code> is a singleton or empty for every type <code>a</code>. I could instead use promoted data type constructors and <a href="http://www.cis.upenn.edu/~eir/packages/singletons/" title="Haskell library">singletons</a>.</p>

<p>Next, names and typed variables:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">type</span> <span class="dt">Name</span> <span class="fu">=</span> <span class="dt">String</span>
<span class="kw">data</span> <span class="dt">V</span> a <span class="fu">=</span> <span class="dt">V</span> <span class="dt">Name</span> (<span class="dt">Ty</span> a)</code></pre>

<p>Lambda expressions contain binding patterns. For now, we’ll have just the unit pattern, variables, and pair of patterns:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">Pat</span> <span class="ot">∷</span> <span class="fu">*</span> <span class="ot">→</span> <span class="fu">*</span> <span class="kw">where</span>
  <span class="dt">UnitPat</span> <span class="ot">∷</span> <span class="dt">Pat</span> <span class="dt">Unit</span>
  <span class="dt">VarPat</span>  <span class="ot">∷</span> <span class="dt">V</span> a <span class="ot">→</span> <span class="dt">Pat</span> a
  (<span class="fu">:#</span>)    <span class="ot">∷</span> <span class="dt">Pat</span> a <span class="ot">→</span> <span class="dt">Pat</span> b <span class="ot">→</span> <span class="dt">Pat</span> (a × b)</code></pre>

<p>Finally, we have lambda expressions, with constructors for variables, constants, application, and abstraction:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">infixl</span> <span class="dv">9</span> <span class="fu">:^</span>
<span class="kw">data</span> <span class="dt">E</span> <span class="ot">∷</span> <span class="fu">*</span> <span class="ot">→</span> <span class="fu">*</span> <span class="kw">where</span>
  <span class="dt">Var</span>    <span class="ot">∷</span> <span class="dt">V</span> a <span class="ot">→</span> <span class="dt">E</span> a
  <span class="dt">ConstE</span> <span class="ot">∷</span> <span class="dt">Prim</span> a <span class="ot">→</span> <span class="dt">Ty</span> a <span class="ot">→</span> <span class="dt">E</span> a
  (<span class="fu">:^</span>)   <span class="ot">∷</span> <span class="dt">E</span> (a ⇨ b) <span class="ot">→</span> <span class="dt">E</span> a <span class="ot">→</span> <span class="dt">E</span> b
  <span class="dt">Lam</span>    <span class="ot">∷</span> <span class="dt">Pat</span> a <span class="ot">→</span> <span class="dt">E</span> b <span class="ot">→</span> <span class="dt">E</span> (a ⇨ b)</code></pre>

<p>The <code>Prim</code> GADT contains typed primitives. The <code>ConstE</code> constructor accompanies a <code>Prim</code> with its specific type, since primitives can be polymorphic.</p>

<h3 id="a-typeful-biccc-representation">A typeful biCCC representation</h3>

<p>The data type <code>a ↣ b</code> contains biCCC expressions that represent morphisms from <code>a</code> to <code>b</code>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">data</span> (↣) <span class="ot">∷</span> <span class="fu">*</span> <span class="ot">→</span> <span class="fu">*</span> <span class="ot">→</span> <span class="fu">*</span> <span class="kw">where</span>
  <span class="co">-- Category</span>
  <span class="dt">Id</span>      <span class="ot">∷</span> a ↣ a
  (<span class="fu">:</span>∘)    <span class="ot">∷</span> (b ↣ c) <span class="ot">→</span> (a ↣ b) <span class="ot">→</span> (a ↣ c)
  <span class="co">-- Products</span>
  <span class="dt">Exl</span>     <span class="ot">∷</span> a × b ↣ a
  <span class="dt">Exr</span>     <span class="ot">∷</span> a × b ↣ b
  (<span class="fu">:</span>△)    <span class="ot">∷</span> (a ↣ b) <span class="ot">→</span> (a ↣ c) <span class="ot">→</span> (a ↣ b × c)
  <span class="co">-- Coproducts</span>
  <span class="dt">Inl</span>     <span class="ot">∷</span> a ↣ a <span class="fu">+</span> b
  <span class="dt">Inr</span>     <span class="ot">∷</span> b ↣ a <span class="fu">+</span> b
  (<span class="fu">:</span>▽)    <span class="ot">∷</span> (b ↣ a) <span class="ot">→</span> (c ↣ a) <span class="ot">→</span> (b <span class="fu">+</span> c ↣ a)
  <span class="co">-- Exponentials</span>
  <span class="dt">Apply</span>   <span class="ot">∷</span> (a ⇨ b) × a ↣ b
  <span class="dt">Curry</span>   <span class="ot">∷</span> (a × b ↣ c) <span class="ot">→</span> (a ↣ (b ⇨ c))
  <span class="dt">Uncurry</span> <span class="ot">∷</span> (a ↣ (b ⇨ c)) <span class="ot">→</span> (a × b ↣ c)
  <span class="co">-- Primitives</span>
  <span class="dt">Prim</span>    <span class="ot">∷</span> <span class="dt">Prim</span> (a <span class="ot">→</span> b) <span class="ot">→</span> (a ↣ b)
  <span class="dt">Const</span>   <span class="ot">∷</span> <span class="dt">Prim</span>       b  <span class="ot">→</span> (a ↣ b)</code></pre>

<p>The actual representation has some constraints on the type variables involved. I could have used type classes instead of a GADT here, except that the existing classes do not allow polymorphism constraints on the methods. The <code>ConstraintKinds</code> language extension allows instance-specific constraints, but I’ve been unable to work out the details in this case.</p>

<p>I’m not happy with the similarity of <code>Prim</code> and <code>Const</code>. Perhaps there’s a simpler formulation.</p>

<h3 id="lambda-to-ccc">Lambda to CCC</h3>

<p>We’ll always convert terms of the form <code>λ p → e</code>, and we’ll keep the pattern <code>p</code> and expression <code>e</code> separate:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convert <span class="ot">∷</span> <span class="dt">Pat</span> a <span class="ot">→</span> <span class="dt">E</span> b <span class="ot">→</span> (a ↣ b)</code></pre>

<p>The pattern argument gets built up from patterns appearing in lambdas and serves as a variable binding “context”. To begin, we’ll strip the pattern off of a lambda, eta-expanding if necessary:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">toCCC <span class="ot">∷</span> <span class="dt">E</span> (a ⇨ b) <span class="ot">→</span> (a ↣ b)
toCCC (<span class="dt">Lam</span> p e) <span class="fu">=</span> convert p e
toCCC e <span class="fu">=</span> toCCC (etaExpand e)</code></pre>

<p>(We could instead begin with a dummy unit pattern/context, giving <code>toCCC</code> the type <code>E c → (() ↣ c)</code>.)</p>

<p>The conversion algorithm uses a collection of simple equivalences.</p>

<p>For constants, we have a simple equivalence:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">λ p <span class="ot">→</span> c ≡ <span class="fu">const</span> c</code></pre>

<p>Thus the implementation:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convert _ (<span class="dt">ConstE</span> o _) <span class="fu">=</span> <span class="dt">Const</span> o</code></pre>

<p>For applications, split the expression in two (repeating the context), compute the function and argument parts separately, combine with <code>(△)</code>, and then <code>apply</code>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">λ p <span class="ot">→</span> u v ≡ apply ∘ ((λ p <span class="ot">→</span> u) △ (λ p <span class="ot">→</span> v))</code></pre>

<p>The implementation:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convert p (u <span class="fu">:^</span> v) <span class="fu">=</span> <span class="dt">Apply</span> <span class="fu">:</span>∘ (convert p u <span class="fu">:</span>△ convert p v)</code></pre>

<p>For lambda expressions, simply curry:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">λ p <span class="ot">→</span> λ q <span class="ot">→</span> e  ≡ <span class="fu">curry</span> (λ (p,q) <span class="ot">→</span> e)</code></pre>

<p>Assume that there is no variable shadowing, so that <code>p</code> and <code>q</code> have no variables in common. The implementation:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convert p (<span class="dt">Lam</span> q e) <span class="fu">=</span> <span class="dt">Curry</span> (convert (p <span class="fu">:#</span> q) e)</code></pre>

<p>Finally, we have to deal with variables. Given <code>λ p → v</code> for a pattern <code>p</code> and variable <code>v</code> appearing in <code>p</code>, either <code>v ≡ p</code>, or <code>p</code> is a pair pattern with <code>v</code> appearing in the left or the right part. To handle these three possibilities, appeal to the following equivalences:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">λ v <span class="ot">→</span> v     ≡ <span class="fu">id</span>
λ (p,q) <span class="ot">→</span> e ≡ (λ p <span class="ot">→</span> e) ∘ exl  <span class="co">-- if q not free in e</span>
λ (p,q) <span class="ot">→</span> e ≡ (λ q <span class="ot">→</span> e) ∘ exr  <span class="co">-- if p not free in e</span></code></pre>

<p>By a pattern not occurring freely, I mean that no variable in the pattern occurs freely.</p>

<p>These properties lead to an implementation:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convert (<span class="dt">VarPat</span> u) (<span class="dt">Var</span> v) <span class="fu">|</span> u ≡ v              <span class="fu">=</span> <span class="dt">Id</span>
convert (p <span class="fu">:#</span> q)   e       <span class="fu">|</span> <span class="fu">not</span> (q <span class="ot">`occurs`</span> e) <span class="fu">=</span> convert p e <span class="fu">:</span>∘ <span class="dt">Exl</span>
convert (p <span class="fu">:#</span> q)   e       <span class="fu">|</span> <span class="fu">not</span> (p <span class="ot">`occurs`</span> e) <span class="fu">=</span> convert q e <span class="fu">:</span>∘ <span class="dt">Exr</span></code></pre>

<p>There are two problems with this code. The first is a performance issue. The recursive <code>convert</code> calls will do considerable redundant work due to the recursive nature of <code>occurs</code>.</p>

<p>To fix this performance problem, handle only <code>λ p → v</code> (variables), and search through the pattern structure only once, returning a <code>Maybe (a ↣ b)</code>. The return value is <code>Nothing</code> when <code>v</code> does not occur in <code>p</code>.</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convert p (<span class="dt">Var</span> v) <span class="fu">=</span>
  fromMaybe (<span class="fu">error</span> (<span class="st">&quot;convert: unbound variable: &quot;</span> <span class="fu">++</span> <span class="fu">show</span> v)) <span class="fu">$</span>
  convertVar p k</code></pre>

<p>If a sub-pattern search succeeds, tack on the <code>( ∘ Exl)</code> or <code>( ∘ Exr)</code> using <code>(&lt;$&gt;)</code> (i.e., <code>fmap</code>). Backtrack using <code>mplus</code>.</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convertVar <span class="ot">∷</span> <span class="ot">∀</span> b a<span class="fu">.</span> <span class="dt">V</span> b <span class="ot">→</span> <span class="dt">Pat</span> a <span class="ot">→</span> <span class="dt">Maybe</span> (a ↣ b)
convertVar u <span class="fu">=</span> conv
 <span class="kw">where</span>
   conv <span class="ot">∷</span> <span class="dt">Pat</span> c <span class="ot">→</span> <span class="dt">Maybe</span> (c ↣ b)
   conv (<span class="dt">VarPat</span> v) <span class="fu">|</span> u ≡ v    <span class="fu">=</span> <span class="kw">Just</span> <span class="dt">Id</span>
                   <span class="fu">|</span> <span class="fu">otherwise</span> <span class="fu">=</span> <span class="kw">Nothing</span>
   conv <span class="dt">UnitPat</span>  <span class="fu">=</span> <span class="kw">Nothing</span>
   conv (p <span class="fu">:#</span> q) <span class="fu">=</span> ((<span class="fu">:</span>∘ <span class="dt">Exr</span>) <span class="fu">&lt;$&gt;</span> conv q) <span class="ot">`mplus`</span> ((<span class="fu">:</span>∘ <span class="dt">Exl</span>) <span class="fu">&lt;$&gt;</span> conv p)</code></pre>

<p>(The explicit type quantification and the <code>ScopedTypeVariables</code> language extension relate the <code>b</code> the signatures of <code>convertVar</code> and <code>conv</code>. Note that we’ve solved the problem of redundant <code>occurs</code> testing, eliminating those tests altogether.</p>

<p>The second problem is more troubling: the definitions of <code>convert</code> for <code>Var</code> above do not type-check. Look again at the first try:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convert <span class="ot">∷</span> <span class="dt">Pat</span> a <span class="ot">→</span> <span class="dt">E</span> b <span class="ot">→</span> (a ↣ b)
convert (<span class="dt">VarPat</span> u) (<span class="dt">Var</span> v) <span class="fu">|</span> u ≡ v <span class="fu">=</span> <span class="dt">Id</span></code></pre>

<p>The error message:</p>

<pre><code>Could not deduce (b ~ a)
...
Expected type: V a
  Actual type: V b
In the second argument of `(==)&#39;, namely `v&#39;
In the expression: u == v</code></pre>

<p>The bug here is that we cannot compare <code>u</code> and <code>v</code> for equality, because they may differ. The definition of <code>convertVar</code> has a similar type error.</p>

<h3 id="taking-care-with-types">Taking care with types</h3>

<p>There’s a trick I’ve used in many libraries to handle this situation of wanting to compare for equality two values that may or may not have the same type. For equal values, don’t return simply <code>True</code>, but rather a proof that the types do indeed match. For unequal values, we simply fail to return an equality proof. Thus the comparison operation on <code>V</code> has the following type:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">varTyEq <span class="ot">∷</span> <span class="dt">V</span> a <span class="ot">→</span> <span class="dt">V</span> b <span class="ot">→</span> <span class="dt">Maybe</span> (a <span class="fu">:=:</span> b)</code></pre>

<p>where <code>a :=: b</code> is populated only proofs that <code>a</code> and <code>b</code> are the same type.</p>

<p>The type of type equality proofs is defined in <a href="http://hackage.haskell.org/packages/archive/ty/0.1.4/doc/html/Data-Proof-EQ.html">Data.Proof.EQ</a> from the <a href="http://hackage.haskell.org/package/ty" title="Haskell package">ty</a> package:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">data</span> (<span class="fu">:=:</span>) <span class="ot">∷</span> <span class="fu">*</span> <span class="ot">→</span> <span class="fu">*</span> <span class="ot">→</span> <span class="fu">*</span> <span class="kw">where</span> <span class="dt">Refl</span> <span class="ot">∷</span> a <span class="fu">:=:</span> a</code></pre>

<p>The <code>Refl</code> constructor is name to suggest the axiom of reflexivity, which says that anything is equal to itself. There are other utilities for commutativity, associativity, and lifting of equality to type constructors.</p>

<p>In fact, this pattern comes up often enough that there’s a type class in the <a href="http://hackage.haskell.org/packages/archive/ty/0.1.4/doc/html/Data-IsTy.html">Data.IsTy</a> module of the <a href="http://hackage.haskell.org/package/ty" title="Haskell package">ty</a> package:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">class</span> <span class="dt">IsTy</span> f <span class="kw">where</span>
  tyEq <span class="ot">∷</span> f a <span class="ot">→</span> f b <span class="ot">→</span> <span class="dt">Maybe</span> (a <span class="fu">:=:</span> b)</code></pre>

<p>With this trick, we can fix our type-incorrect code above. Instead of</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convert (<span class="dt">VarPat</span> u) (<span class="dt">Var</span> v) <span class="fu">|</span> u ≡ v <span class="fu">=</span> <span class="dt">Id</span></code></pre>

<p>define</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convert (<span class="dt">VarPat</span> u) (<span class="dt">Var</span> v) <span class="fu">|</span> <span class="kw">Just</span> <span class="dt">Refl</span> <span class="ot">←</span> u <span class="ot">`tyEq`</span> v <span class="fu">=</span> <span class="dt">Id</span></code></pre>

<p>During type-checking, GHC uses the guard (“<code>Just Refl ← u `tyEq` v</code>”) to deduce an additional <em>local</em> constraint to use in type-checking the right-hand side (here <code>Id</code>). That constraint (<code>a ~ b</code>) suffices to make the definition type-correct.</p>

<p>In the same way, we can fix the more efficient implementation:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">convertVar <span class="ot">∷</span> <span class="ot">∀</span> b a<span class="fu">.</span> <span class="dt">V</span> b <span class="ot">→</span> <span class="dt">Pat</span> a <span class="ot">→</span> <span class="dt">Maybe</span> (a ↣ b)
convertVar u <span class="fu">=</span> conv
 <span class="kw">where</span>
   conv <span class="ot">∷</span> <span class="dt">Pat</span> c <span class="ot">→</span> <span class="dt">Maybe</span> (c ↣ b)
   conv (<span class="dt">VarPat</span> v) <span class="fu">|</span> <span class="kw">Just</span> <span class="dt">Refl</span> <span class="ot">←</span> v <span class="ot">`tyEq`</span> u <span class="fu">=</span> <span class="kw">Just</span> <span class="dt">Id</span>
                   <span class="fu">|</span> <span class="fu">otherwise</span>              <span class="fu">=</span> <span class="kw">Nothing</span>
   conv <span class="dt">UnitPat</span>  <span class="fu">=</span> <span class="kw">Nothing</span>
   conv (p <span class="fu">:#</span> q) <span class="fu">=</span> ((<span class="fu">:</span>∘ <span class="dt">Exr</span>) <span class="fu">&lt;$&gt;</span> conv q) <span class="ot">`mplus`</span> ((<span class="fu">:</span>∘ <span class="dt">Exl</span>) <span class="fu">&lt;$&gt;</span> conv p)</code></pre>

<h3 id="example">Example</h3>

<p>To see how conversion works in practice, consider a simple swap function:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">swap (a,b) <span class="fu">=</span> (b,a)</code></pre>

<p>When reified (as explained in a future post), we get</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">λ ds <span class="ot">→</span> (λ (a,b) <span class="ot">→</span> (b,a)) ds</code></pre>

<p>Lambda expressions can be optimized at construction, in which case an <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>η</mi></mrow></math>-reduction would yield the simpler <code>λ (a,b) → (b,a)</code>. However, to make the translation more interesting, I’ll leave the lambda term unoptimized.</p>

<p>With the conversion algorithm given above, the (unoptimized) lambda term gets translated into the following:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ∘ (<span class="fu">curry</span> (apply ∘ (apply ∘ (<span class="fu">const</span> (,) △ (<span class="fu">id</span> ∘ exr) ∘ exr) △ (<span class="fu">id</span> ∘ exl) ∘ exr)) △ <span class="fu">id</span>)</code></pre>

<p>Reformatted with line breaks:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply
<span class="fu">.</span> ( <span class="fu">curry</span> (apply ∘ ( apply ∘ (<span class="fu">const</span> (,) △ (<span class="fu">id</span> ∘ exr) ∘ exr)
                   △ (<span class="fu">id</span> ∘ exl) ∘ exr) )
  △ <span class="fu">id</span> )</code></pre>

<p>If you squint, you may be able to see how this CCC expression relates to the lambda expression. The “<code>λ ds →</code>” got stripped initially. The remaining application “<code>(λ (a,b) → (b,a)) ds</code>” became <code>apply ∘ (⋯ △ ⋯)</code>, where the right “<code>⋯</code>” is <code>id</code>, which came from <code>ds</code>. The left “<code>⋯</code>” has a <code>curry</code> from the “<code>λ (a,b) →</code>” and two <code>apply</code>s from the curried application of <code>(,)</code> to <code>b</code> and <code>a</code>. The variables <code>b</code> and <code>a</code> become <code>(id ∘ exr) ∘ exr</code> and <code>(id ∘ exl) ∘ exr</code>, which are paths to <code>b</code> and <code>a</code> in the constructed binding pattern <code>(ds,(a,b))</code>.</p>

<p>I hope this example gives you a feeling for how the lambda-to-CCC translation works in practice, <em>and</em> for the complexity of the result. Fortunately, we can simplify the CCC terms as they’re constructed. For this example, as we’ll see in the next post, we get a much simpler result:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">exr △ exl</code></pre>

<p>This combination is common enough that it pretty-prints as</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">swapP</code></pre>

<p>when CCC desugaring is turned on. (The “<code>P</code>” suffix refers to “product”, to distinguish from coproduct swap.)</p>

<h3 id="coming-up">Coming up</h3>

<p>I’ll close this blog post now to keep it digestible. Upcoming posts will address optimization of biCCC expressions, circuit generation and analysis as biCCCs, and the GHC plugin that handles conversion of Haskell code to biCCC form, among other topics.</p>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=533&amp;md5=bccca46f4a5502ee6f9f238c3df66880"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/overloading-lambda/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Foverloading-lambda&amp;language=en_GB&amp;category=text&amp;title=Overloading+lambda&amp;description=Haskell%E2%80%99s+type+class+facility+is+a+powerful+abstraction+mechanism.+Using+it%2C+we+can+overload+multiple+interpretations+onto+a+single+vocabulary%2C+with+each+interpretation+corresponding+to+a+different+type.+The+class...&amp;tags=category%2CCCC%2Coverloading%2Cblog" type="text/html" />
	</item>
		<item>
		<title>From Haskell to hardware via cartesian closed categories</title>
		<link>http://conal.net/blog/posts/haskell-to-hardware-via-cccs</link>
		<comments>http://conal.net/blog/posts/haskell-to-hardware-via-cccs#comments</comments>
		<pubDate>Thu, 12 Sep 2013 23:20:44 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[category]]></category>
		<category><![CDATA[CCC]]></category>
		<category><![CDATA[compilation]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=523</guid>
		<description><![CDATA[Since fall of last year, I’ve been working at Tabula, a Silicon Valley start-up developing an innovative programmable hardware architecture called “Spacetime”, somewhat similar to an FPGA, but much more flexible and efficient. I met the founder, Steve Teig, at a Bay Area Haskell Hackathon in February of 2011. He described his Spacetime architecture, which [&#8230;]]]></description>
				<content:encoded><![CDATA[<p><!-- references --></p>

<p><!-- teaser --></p>

<p>Since fall of last year, I’ve been working at <a href="http://www.tabula.com">Tabula</a>, a Silicon Valley start-up developing an innovative programmable hardware architecture called “Spacetime”, somewhat similar to an FPGA, but much more flexible and efficient. I met the founder, Steve Teig, at a Bay Area Haskell Hackathon in February of 2011. He described his Spacetime architecture, which is based on the geometry of the same name, developed by Hermann Minkowski to elegantly capture Einstein’s theory of special relativity. Within the first 30 seconds or so of hearing what Steve was up to, I knew I wanted to help.</p>

<p>The vision Steve shared with me included not only a better alternative for <em>hardware</em> designers (programmed in hardware languages like Verilog and VHDL), but also a platform for massively parallel execution of <em>software</em> written in a purely functional language. Lately, I’ve been working mainly on this latter aspect, and specifically on the problem of how to compile Haskell. Our plan is to develop the Haskell compiler openly and encourage collaboration. If anything you see in this blog series interests you, and especially if have advice or you’d like to collaborate on the project, please let me know.</p>

<p>In my next series of blog posts, I’ll describe some of the technical ideas I’ve been working with for compiling Haskell for massively parallel execution. For now, I want to introduce a central idea I’m using to approach the problem.</p>

<p><span id="more-523"></span></p>

<h3 id="lambda-calculus-and-cartesian-closed-categories">Lambda calculus and cartesian closed categories</h3>

<p>I’m used to thinking of the typed lambda calculi as languages for describing functions and other mathematical values. For instance, if the type of an expression <code>e</code> is <code>Bool → Bool</code>, then the meaning of <code>e</code> is a function from Booleans to Booleans. (In non-strict pure languages like Haskell, both Boolean types include <code>⊥</code>. In hypothetically pure strict languages, the range is extend to include <code>⊥</code>, but the domain isn’t.)</p>

<p>However, there are other ways to interpret typed lambda-calculi.</p>

<p>You may have heard of “cartesian closed categories” (CCCs). CCC is an abstraction having a small vocabulary with associated laws:</p>

<ul>
<li>The “category” part means we have a notion of “morphisms” (or “arrows”) each having a domain and codomain “object”. There is an identity morphism for and associative composition operator. If this description of morphisms and objects sounds like functions and types (or sets), it’s because functions and types are one example, with <code>id</code> and <code>(∘)</code>.</li>
<li>The “cartesian” part means that we have products, with projection functions and an operator to combine two functions into a pair-producing function. For Haskell functions, these operations are <code>fst</code> and <code>snd</code>, together with <code>(&amp;&amp;&amp;)</code> from <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Arrow.html" title="Hackage documentation"><code>Control.Arrow</code></a>.</li>
<li>The “closed” part means that we have a way to represent morphisms via objects, referred to as “exponentials”. The corresponding operations are <code>curry</code>, <code>uncurry</code>, and <code>apply</code>. Since Haskell is a higher-order language, these exponential objects are simply (first class) functions.</li>
</ul>

<p>A wonderful thing about the CCC interface is that it suffices to translate any lambda expression, as discovered by Joachim Lambek. In other words, lambda expressions can be systematically translated into the CCC vocabulary. Any (law-abiding) interpretation of that vocabulary is thus an interpretation of the lambda calculus.</p>

<p>Besides intellectual curiosity, why might one care about interpreting lambda expressions in terms of CCCs other than the one we usually think of for functional programs? I got interested because I’ve been thinking about how to compile Haskell programs to “circuits”, both the standard static kind and more dynamic variants. Since Haskell is a typed lambda calculus, if we can formulate circuits as a CCC, we’ll have our Haskell-to-circuit compiler. Other interpretations enable analysis of timing and demand propagation (including strictness).</p>

<h3 id="some-future-topics">Some future topics</h3>

<ul>
<li>Converting lambda expressions to CCC form.</li>
<li>Optimizing CCC expressions.</li>
<li>Plugging into GHC, to convert from Haskell source to CCC.</li>
<li>Applications of this translation, including the following:
<ul>
<li>Circuits</li>
<li>Timing analysis</li>
<li>Strictness/demand analysis</li>
<li>Type simplification (normalization)</li>
</ul></li>
</ul>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=523&amp;md5=68f0f234c70be944131a783073bc4349"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/haskell-to-hardware-via-cccs/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fhaskell-to-hardware-via-cccs&amp;language=en_GB&amp;category=text&amp;title=From+Haskell+to+hardware+via+cartesian+closed+categories&amp;description=Since+fall+of+last+year%2C+I%E2%80%99ve+been+working+at+Tabula%2C+a+Silicon+Valley+start-up+developing+an+innovative+programmable+hardware+architecture+called+%E2%80%9CSpacetime%E2%80%9D%2C+somewhat+similar+to+an+FPGA%2C+but+much+more...&amp;tags=category%2CCCC%2Ccompilation%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Reimagining matrices</title>
		<link>http://conal.net/blog/posts/reimagining-matrices</link>
		<comments>http://conal.net/blog/posts/reimagining-matrices#comments</comments>
		<pubDate>Mon, 17 Dec 2012 02:45:42 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[category]]></category>
		<category><![CDATA[denotational design]]></category>
		<category><![CDATA[linear algebra]]></category>
		<category><![CDATA[type class morphism]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=503</guid>
		<description><![CDATA[The function of the imagination is notto make strange things settled, so much asto make settled things strange.- G.K. Chesterton Why is matrix multiplication defined so very differently from matrix addition? If we didn’t know these procedures, could we derive them from first principles? What might those principles be? This post gives a simple semantic [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- LaTeX macros -->

<!-- teaser -->

<div class=flushright>
<em>The function of the imagination is not<br />to make strange things settled, so much as<br />to make settled things strange.</em><br />- G.K. Chesterton
</div>

<p>Why is matrix multiplication defined so very differently from matrix addition? If we didn’t know these procedures, could we derive them from first principles? What might those principles be?</p>

<p>This post gives a simple semantic model for matrices and then uses it to systematically <em>derive</em> the implementations that we call matrix addition and multiplication. The development illustrates what I call “denotational design”, particularly with type class morphisms. On the way, I give a somewhat unusual formulation of matrices and accompanying definition of matrix “multiplication”.</p>

<p>For more details, see the <a href="https://github.com/conal/linear-map-gadt" title="github repository">linear-map-gadt</a> source code.</p>

<p><strong>Edits:</strong></p>

<ul>
<li>2012–12–17: Replaced lost <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>B</mi></mrow></math> entries in description of matrix addition. Thanks to Travis Cardwell.</li>
<li>2012–12018: Added note about math/browser compatibility.</li>
</ul>

<p><strong>Note:</strong> I’m using MathML for the math below, which appears to work well on Firefox but on neither Safari nor Chrome. I use Pandoc to generate the HTML+MathML from markdown+lhs+LaTeX. There’s probably a workaround using different Pandoc settings and requiring some tweaks to my WordPress installation. If anyone knows how (especially the WordPress end), I’d appreciate some pointers.</p>

<p><span id="more-503"></span></p>

<h3 id="matrices">Matrices</h3>

<p>For now, I’ll write matrices in the usual form: <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mrow><mo stretchy="true">(</mo><mtable><mtr><mtd><msub><mi>A</mi><mrow><mn>1</mn><mn>1</mn></mrow></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mn>1</mn><mi>m</mi></mrow></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd><mo>⋱</mo></mtd><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>A</mi><mrow><mi>n</mi><mn>1</mn></mrow></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mi>n</mi><mi>m</mi></mrow></msub></mtd></mtr></mtable><mo stretchy="true">)</mo></mrow></mrow></math></p>

<h4 id="addition">Addition</h4>

<p>To add two matrices, we add their corresponding components. If <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>A</mi><mo>=</mo><mrow><mo stretchy="true">(</mo><mtable><mtr><mtd><msub><mi>A</mi><mn>11</mn></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mn>1</mn><mi>m</mi></mrow></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd><mo>⋱</mo></mtd><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>A</mi><mrow><mi>n</mi><mn>1</mn></mrow></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mi>n</mi><mi>m</mi></mrow></msub></mtd></mtr></mtable><mo stretchy="true">)</mo></mrow><mspace width="0.167em"></mspace><mspace width="0.167em"></mspace><mrow><mtext mathvariant="normal">and </mtext><mspace width="0.333em"></mspace></mrow><mi>B</mi><mo>=</mo><mrow><mo stretchy="true">(</mo><mtable><mtr><mtd><msub><mi>B</mi><mn>11</mn></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>B</mi><mrow><mn>1</mn><mi>m</mi></mrow></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd><mo>⋱</mo></mtd><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>B</mi><mrow><mi>n</mi><mn>1</mn></mrow></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>B</mi><mrow><mi>n</mi><mi>m</mi></mrow></msub></mtd></mtr></mtable><mo stretchy="true">)</mo></mrow><mo>,</mo></mrow></math> then <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>A</mi><mo>+</mo><mi>B</mi><mo>=</mo><mrow><mo stretchy="true">(</mo><mtable><mtr><mtd><msub><mi>A</mi><mn>11</mn></msub><mo>+</mo><msub><mi>B</mi><mn>11</mn></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mn>1</mn><mi>m</mi></mrow></msub><mo>+</mo><msub><mi>B</mi><mrow><mn>1</mn><mi>m</mi></mrow></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd><mo>⋱</mo></mtd><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>A</mi><mrow><mi>n</mi><mn>1</mn></mrow></msub><mo>+</mo><msub><mi>B</mi><mrow><mi>n</mi><mn>1</mn></mrow></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mi>n</mi><mi>m</mi></mrow></msub><mo>+</mo><msub><mi>B</mi><mrow><mi>n</mi><mi>m</mi></mrow></msub></mtd></mtr></mtable><mo stretchy="true">)</mo></mrow><mo>.</mo></mrow></math> More succinctly, <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo stretchy="false">(</mo><mi>A</mi><mo>+</mo><mi>B</mi><msub><mo stretchy="false">)</mo><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><msub><mi>A</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>+</mo><msub><mi>B</mi><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>.</mo></mrow></math></p>

<h4 id="multiplication">Multiplication</h4>

<p>Multiplication, on the other hand, works quite differently. If <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>A</mi><mo>=</mo><mrow><mo stretchy="true">(</mo><mtable><mtr><mtd><msub><mi>A</mi><mn>11</mn></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mn>1</mn><mi>m</mi></mrow></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd><mo>⋱</mo></mtd><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>A</mi><mrow><mi>n</mi><mn>1</mn></mrow></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mi>n</mi><mi>m</mi></mrow></msub></mtd></mtr></mtable><mo stretchy="true">)</mo></mrow><mspace width="0.167em"></mspace><mspace width="0.167em"></mspace><mrow><mtext mathvariant="normal">and </mtext><mspace width="0.333em"></mspace></mrow><mi>B</mi><mo>=</mo><mrow><mo stretchy="true">(</mo><mtable><mtr><mtd><msub><mi>B</mi><mn>11</mn></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>B</mi><mrow><mn>1</mn><mi>p</mi></mrow></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd><mo>⋱</mo></mtd><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>B</mi><mrow><mi>m</mi><mn>1</mn></mrow></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>B</mi><mrow><mi>m</mi><mi>p</mi></mrow></msub></mtd></mtr></mtable><mo stretchy="true">)</mo></mrow><mo>,</mo></mrow></math> then <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo stretchy="false">(</mo><mi>A</mi><mo>∙</mo><mi>B</mi><msub><mo stretchy="false">)</mo><mrow><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><munderover><mo>∑</mo><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></munderover><msub><mi>A</mi><mrow><mi>i</mi><mi>k</mi></mrow></msub><mo>⋅</mo><msub><mi>B</mi><mrow><mi>k</mi><mi>j</mi></mrow></msub><mo>.</mo></mrow></math> This time, we form the dot product of each <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>A</mi></mrow></math> row and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>B</mi></mrow></math> column.</p>

<p>Why are these two matrix operations defined so differently? Perhaps these two operations are <em>implementations</em> of more fundamental <em>specifications</em>. If so, then making those specifications explicit could lead us to clear and compelling explanations of matrix addition and multiplication.</p>

<h4 id="transforming-vectors">Transforming vectors</h4>

<p>Simplifying from matrix multiplication, we have transformation of a vector by a matrix. If <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>A</mi><mo>=</mo><mrow><mo stretchy="true">(</mo><mtable><mtr><mtd><msub><mi>A</mi><mn>11</mn></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mn>1</mn><mi>m</mi></mrow></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd><mo>⋱</mo></mtd><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>A</mi><mrow><mi>n</mi><mn>1</mn></mrow></msub></mtd><mtd><mo>⋯</mo></mtd><mtd><msub><mi>A</mi><mrow><mi>n</mi><mi>m</mi></mrow></msub></mtd></mtr></mtable><mo stretchy="true">)</mo></mrow><mspace width="0.167em"></mspace><mspace width="0.167em"></mspace><mrow><mtext mathvariant="normal">and </mtext><mspace width="0.333em"></mspace></mrow><mi>x</mi><mo>=</mo><mrow><mo stretchy="true">(</mo><mtable><mtr><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>x</mi><mi>m</mi></msub></mtd></mtr></mtable><mo stretchy="true">)</mo></mrow><mo>,</mo></mrow></math> then <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>A</mi><mo>⋅</mo><mi>x</mi><mo>=</mo><mrow><mo stretchy="true">(</mo><mtable><mtr><mtd><msub><mi>A</mi><mrow><mn>1</mn><mn>1</mn></mrow></msub><mo>⋅</mo><msub><mi>x</mi><mn>1</mn></msub></mtd><mtd><mo>+</mo></mtd><mtd><mo>⋯</mo></mtd><mtd><mo>+</mo></mtd><mtd><msub><mi>A</mi><mrow><mn>1</mn><mi>m</mi></mrow></msub><mo>⋅</mo><msub><mi>x</mi><mi>m</mi></msub></mtd></mtr><mtr><mtd><mo>⋮</mo></mtd><mtd></mtd><mtd><mo>⋱</mo></mtd><mtd></mtd><mtd><mo>⋮</mo></mtd></mtr><mtr><mtd><msub><mi>A</mi><mrow><mi>n</mi><mn>1</mn></mrow></msub><mo>⋅</mo><msub><mi>x</mi><mn>1</mn></msub></mtd><mtd><mo>+</mo></mtd><mtd><mo>⋯</mo></mtd><mtd><mo>+</mo></mtd><mtd><msub><mi>A</mi><mrow><mi>n</mi><mi>m</mi></mrow></msub><mo>⋅</mo><msub><mi>x</mi><mi>m</mi></msub></mtd></mtr></mtable><mo stretchy="true">)</mo></mrow></mrow></math> More succinctly, <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo stretchy="false">(</mo><mi>A</mi><mo>⋅</mo><mi>x</mi><msub><mo stretchy="false">)</mo><mi>i</mi></msub><mo>=</mo><munderover><mo>∑</mo><mrow><mi>k</mi><mo>=</mo><mn>1</mn></mrow><mi>m</mi></munderover><msub><mi>A</mi><mrow><mi>i</mi><mi>k</mi></mrow></msub><mo>⋅</mo><msub><mi>x</mi><mi>k</mi></msub><mo>.</mo></mrow></math></p>

<h3 id="whats-it-all-about">What’s it all about?</h3>

<p>We can interpret matrices <em>as</em> transformations. Matrix addition then <em>adds</em> transformations:</p>

<p><math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo stretchy="false">(</mo><mi>A</mi><mo>+</mo><mi>B</mi><mo stretchy="false">)</mo><mspace width="0.167em"></mspace><mi>x</mi><mo>=</mo><mi>A</mi><mspace width="0.167em"></mspace><mi>x</mi><mo>+</mo><mi>B</mi><mspace width="0.167em"></mspace><mi>x</mi></mrow></math></p>

<p>Matrix “multiplication” <em>composes</em> transformations:</p>

<p><math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo stretchy="false">(</mo><mi>A</mi><mo>∙</mo><mi>B</mi><mo stretchy="false">)</mo><mspace width="0.167em"></mspace><mi>x</mi><mo>=</mo><mi>A</mi><mspace width="0.167em"></mspace><mo stretchy="false">(</mo><mi>B</mi><mspace width="0.167em"></mspace><mi>x</mi><mo stretchy="false">)</mo></mrow></math></p>

<p>What kinds of transformations?</p>

<h4 id="linear-transformations">Linear transformations</h4>

<p>Matrices represent <em>linear</em> transformations. To say that a transformation (or “function” or “map”) <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>f</mi></mrow></math> is “linear” means that <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>f</mi></mrow></math> preserves the structure of addition and scalar multiplication. In other words, <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mtable><mtr><mtd columnalign="right"><mi>f</mi><mspace width="0.167em"></mspace><mspace width="0.167em"></mspace><mo stretchy="false">(</mo><mi>x</mi><mo>+</mo><mi>y</mi><mo stretchy="false">)</mo></mtd><mtd columnalign="center"><mo>=</mo></mtd><mtd columnalign="left"><mi>f</mi><mspace width="0.167em"></mspace><mi>x</mi><mo>+</mo><mi>f</mi><mspace width="0.167em"></mspace><mi>y</mi></mtd></mtr><mtr><mtd columnalign="right"><mi>f</mi><mspace width="0.167em"></mspace><mspace width="0.167em"></mspace><mo stretchy="false">(</mo><mi>c</mi><mo>⋅</mo><mi>x</mi><mo stretchy="false">)</mo></mtd><mtd columnalign="center"><mo>=</mo></mtd><mtd columnalign="left"><mi>c</mi><mo>⋅</mo><mi>f</mi><mspace width="0.167em"></mspace><mi>x</mi></mtd></mtr></mtable></mrow></math> Equivalently, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>f</mi></mrow></math> preserves all <em>linear combinations</em>: <math display="block" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>f</mi><mspace width="0.167em"></mspace><mo stretchy="false">(</mo><msub><mi>c</mi><mn>1</mn></msub><mo>⋅</mo><msub><mi>x</mi><mn>1</mn></msub><mo>+</mo><mo>⋯</mo><mo>+</mo><msub><mi>c</mi><mi>m</mi></msub><mo>⋅</mo><msub><mi>x</mi><mi>m</mi></msub><mo stretchy="false">)</mo><mo>=</mo><msub><mi>c</mi><mn>1</mn></msub><mo>⋅</mo><mi>f</mi><mspace width="0.167em"></mspace><msub><mi>x</mi><mn>1</mn></msub><mo>+</mo><mo>⋯</mo><mo>+</mo><msub><mi>c</mi><mi>m</mi></msub><mo>⋅</mo><mi>f</mi><mspace width="0.167em"></mspace><msub><mi>x</mi><mi>m</mi></msub></mrow></math></p>

<p>What does it mean to say that “matrices represent linear transformations”? As we saw in the previous section, we can use a matrix to transform a vector. Our semantic function will exactly be this use, i.e., the <em>meaning</em> of matrix is as a function (map) from vectors to vectors. Moreover, these functions will satisfy the linearity properties above.</p>

<h4 id="representation">Representation</h4>

<p>For simplicity, I’m going structure matrices in a unconventional way. Instead of a rectangular arrangement of numbers, use the following generalized algebraic data type (GADT):</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">data</span> a ⊸ b <span class="kw">where</span>
  <span class="dt">Dot</span>   <span class="ot">∷</span> <span class="dt">InnerSpace</span> b <span class="ot">⇒</span>
          b <span class="ot">→</span> (b ⊸ <span class="dt">Scalar</span> b)
  (<span class="fu">:&amp;&amp;</span>) <span class="ot">∷</span> <span class="dt">VS3</span> a c d <span class="ot">⇒</span>  <span class="co">-- vector spaces with same scalar field</span>
          (a ⊸ c) <span class="ot">→</span> (a ⊸ d) <span class="ot">→</span> (a ⊸ c × d)</code></pre>

<p>I’m using the notation “<code>c × d</code>” in place of the usual “<code>(c,d)</code>”. Precedences are such that “<code>×</code>” binds more tightly than “<code>⊸</code>”, which binds more tightly than “<code>→</code>”.</p>

<p>This definition builds on the <a href="http://hackage.haskell.org/packages/archive/vector-space/latest/doc/html/Data-VectorSpace.html#t:VectorSpace" title="Hackage documentation"><code>VectorSpace</code></a> class, with its associated <code>Scalar</code> type and <a href="http://hackage.haskell.org/packages/archive/vector-space/latest/doc/html/Data-VectorSpace.html#t:InnerSpace" title="Hackage documentation"><code>InnerSpace</code></a> subclass. Using <code>VectorSpace</code> is overkill for linear maps. It suffices to use <a href="http://en.wikipedia.org/wiki/Module_%28mathematics%29" title="Wikipedia entry">module</a>s over <a href="http://en.wikipedia.org/wiki/Semiring" title="Wikipedia entry">semiring</a>s, which means that we don’t assume multiplicative or additive inverses. The more general setting enables many more useful applications than vector spaces do, some of which I will describe in future posts.</p>

<p>The idea here is that a linear map results in either (a) a scalar, in which case it’s equivalent to <code>dot v</code> (partially applied dot product) for some <code>v</code>, or (b) a product, in which case it can be decomposed into two linear maps with simpler range types. Each row in a conventional matrix corresponds to <code>Dot v</code> for some vector <code>v</code>, and the stacking of rows corresponds to nested applications of <code>(:&amp;&amp;)</code>.</p>

<h4 id="semantics">Semantics</h4>

<p>The semantic function, <code>apply</code>, interprets a representation of a linear map as a function (satisfying linearity):</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply <span class="ot">∷</span> (a ⊸ b) <span class="ot">→</span> (a <span class="ot">→</span> b)
apply (<span class="dt">Dot</span> b)   <span class="fu">=</span> dot b
apply (f <span class="fu">:&amp;&amp;</span> g) <span class="fu">=</span> apply f <span class="fu">&amp;&amp;&amp;</span> apply g</code></pre>

<p>where, <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Arrow.html#v:-38--38--38-" title="Hackage documentation"><code>(&amp;&amp;&amp;)</code></a> is from <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Control-Arrow.html" title="Hackage documentation"><code>Control.Arrow</code></a>.</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(<span class="fu">&amp;&amp;&amp;</span>) <span class="ot">∷</span> <span class="dt">Arrow</span> (↝) <span class="ot">⇒</span> (a ↝ b) <span class="ot">→</span> (a ↝ c) <span class="ot">→</span> (a ↝ (b,c))</code></pre>

<p>For functions,</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(f <span class="fu">&amp;&amp;&amp;</span> g) a <span class="fu">=</span> (f a, g a)</code></pre>

<h3 id="functions-linearity-and-multilinearity">Functions, linearity, and multilinearity</h3>

<p>Functions form a vector space, with scaling and addition defined “pointwise”. Instances from the <a href="http://hackage.haskell.org/package/vector-space" title="Hackage package">vector-space</a> package:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">AdditiveGroup</span> v <span class="ot">⇒</span> <span class="dt">AdditiveGroup</span> (a <span class="ot">→</span> v) <span class="kw">where</span>
  zeroV   <span class="fu">=</span> pure   zeroV
  (<span class="fu">^+^</span>)   <span class="fu">=</span> liftA2 (<span class="fu">^+^</span>)
  negateV <span class="fu">=</span> <span class="fu">fmap</span>   negateV

<span class="kw">instance</span> <span class="dt">VectorSpace</span> v <span class="ot">⇒</span> <span class="dt">VectorSpace</span> (a <span class="ot">→</span> v) <span class="kw">where</span>
  <span class="kw">type</span> <span class="dt">Scalar</span> (a <span class="ot">→</span> v) <span class="fu">=</span> a <span class="ot">→</span> <span class="dt">Scalar</span> v
  (<span class="fu">*^</span>) s <span class="fu">=</span> <span class="fu">fmap</span> (s <span class="fu">*^</span>)</code></pre>

<p>I wrote the definitions in this form to fit a template for applicative functors in general. Inlining the definitions of <code>pure</code>, <code>liftA2</code>, and <code>fmap</code> on functions, we get the following equivalent instances:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">AdditiveGroup</span> v <span class="ot">⇒</span> <span class="dt">AdditiveGroup</span> (a <span class="ot">→</span> v) <span class="kw">where</span>
  zeroV     <span class="fu">=</span> λ _ <span class="ot">→</span> zeroV
  f <span class="fu">^+^</span> g   <span class="fu">=</span> λ a <span class="ot">→</span> f a <span class="fu">^+^</span> g a
  negateV f <span class="fu">=</span> λ a <span class="ot">→</span> negateV (f a)

<span class="kw">instance</span> <span class="dt">VectorSpace</span> v <span class="ot">⇒</span> <span class="dt">VectorSpace</span> (a <span class="ot">→</span> v) <span class="kw">where</span>
  <span class="kw">type</span> <span class="dt">Scalar</span> (a <span class="ot">→</span> v) <span class="fu">=</span> a <span class="ot">→</span> <span class="dt">Scalar</span> v
  s <span class="fu">*^</span> f <span class="fu">=</span> λ a <span class="ot">→</span> s <span class="fu">*^</span> f a</code></pre>

<p>In math, we usually say that dot product is “bilinear”, or “linear in each argument”, i.e.,</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">dot (s <span class="fu">*^</span> u,v) ≡ s <span class="fu">*^</span> dot (u,v)
dot (u <span class="fu">^+^</span> w, v) ≡ dot (u,v) <span class="fu">^+^</span> dot (w,v)</code></pre>

<p>Similarly for the second argument:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">dot (u,s <span class="fu">*^</span> v) ≡ s <span class="fu">*^</span> dot (u,v)
dot (u, v <span class="fu">^+^</span> w) ≡ dot (u,v) <span class="fu">^+^</span> dot (u,w)</code></pre>

<p>Now recast the first of these properties in a curried form:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">dot (s <span class="fu">*^</span> u) v ≡ s <span class="fu">*^</span> dot u v</code></pre>

<p>i.e.,</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">dot (s <span class="fu">*^</span> u)
 ≡ <span class="co">{- η-expand -}</span>
λ v <span class="ot">→</span> dot (s <span class="fu">*^</span> u) v
 ≡ <span class="co">{- &quot;bilinearity&quot; -}</span>
λ v <span class="ot">→</span> s <span class="fu">*^</span> dot u v
 ≡ <span class="co">{- (*^) on functions -}</span>
λ v <span class="ot">→</span> (s <span class="fu">*^</span> dot u) v
 ≡ <span class="co">{- η-contract -}</span>
s <span class="fu">*^</span> dot u</code></pre>

<p>Likewise,</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">dot (u <span class="fu">^+^</span> v)
 ≡ <span class="co">{- η-expand -}</span>
λ w <span class="ot">→</span> dot (u <span class="fu">^+^</span> v) w
 ≡ <span class="co">{- &quot;bilinearity&quot; -}</span>
λ w <span class="ot">→</span> dot u w <span class="fu">^+^</span> dot v w
 ≡ <span class="co">{- (^+^) on functions -}</span>
dot u <span class="fu">^+^</span> dot v</code></pre>

<p>Thus, when “bilinearity” is recast in terms of curried functions, it becomes just linearity. (The same reasoning applies more generally to multilinearity.)</p>

<p>Note that we could also define function addition as follows:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">f <span class="fu">^+^</span> g <span class="fu">=</span> add ∘ (f <span class="fu">&amp;&amp;&amp;</span> g)</code></pre>

<p>where</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">add <span class="fu">=</span> <span class="fu">uncurry</span> (<span class="fu">^+^</span>)</code></pre>

<p>This uncurried form will come in handy in derivations below.</p>

<h3 id="deriving-matrix-operations">Deriving matrix operations</h3>

<h4 id="addition-1">Addition</h4>

<p>We’ll add two linear maps using the <a href="http://hackage.haskell.org/packages/archive/vector-space/latest/doc/html/Data-AdditiveGroup.html#v:-94--43--94-" title="Hackage documentation"><code>(^+^)</code></a> operation from <a href="http://hackage.haskell.org/packages/archive/vector-space/latest/doc/html/Data-AdditiveGroup.html" title="Hackage documentation"><code>Data.AdditiveGroup</code></a>.</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(<span class="fu">^+^</span>) <span class="ot">∷</span> (a ⊸ b) <span class="ot">→</span> (a ⊸ b) <span class="ot">→</span> (a ⊸ b)</code></pre>

<p>Following the principle of semantic <a href="http://conal.net/blog/tag/type-class-morphism/" title="Posts on type class morphisms">type class morphism</a>s, the specification simply says that the meaning of the sum is the sum of the meanings:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (f <span class="fu">^+^</span> g) ≡ apply f <span class="fu">^+^</span> apply g</code></pre>

<p>which is half of the definition of “linearity” for <code>apply</code>.</p>

<p>The game plan (as always) is to use the semantic specification to derive (or “calculate”) a correct implementation of each operation. For addition, this goal means we want to come up with a definition like</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">f <span class="fu">^+^</span> g <span class="fu">=</span> <span class="fu">&lt;</span>rhs<span class="fu">&gt;</span></code></pre>

<p>where <code>&lt;rhs&gt;</code> is some expression in terms of <code>f</code> and <code>g</code> whose <em>meaning</em> is the same as the meaning as <code>f ^+^ g</code>, i.e., where</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (f <span class="fu">^+^</span> g) ≡ apply <span class="fu">&lt;</span>rhs<span class="fu">&gt;</span></code></pre>

<p>Since Haskell has convenient pattern matching, we’ll use it for our definition of <code>(^+^)</code> above. Addition has two arguments, and our data type has two constructors, there are at most four different cases to consider.</p>

<p>First, add <code>Dot</code> and <code>Dot</code>. The specification</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (f <span class="fu">^+^</span> g) ≡ apply f <span class="fu">^+^</span> apply g</code></pre>

<p>specializes to</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (<span class="dt">Dot</span> b <span class="fu">^+^</span> <span class="dt">Dot</span> c) ≡ apply (<span class="dt">Dot</span> b) <span class="fu">^+^</span> apply (<span class="dt">Dot</span> c)</code></pre>

<p>Now simplify the right-hand side (RHS):</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (<span class="dt">Dot</span> b) <span class="fu">^+^</span> apply (<span class="dt">Dot</span> c)
 ≡ <span class="co">{- apply definition -}</span>
dot b <span class="fu">^+^</span> dot c
 ≡ <span class="co">{- (bi)linearity of dot, as described above -}</span>
dot (b <span class="fu">^+^</span> c)
 ≡ <span class="co">{- apply definition -}</span>
apply (<span class="dt">Dot</span> (b <span class="fu">^+^</span> c))</code></pre>

<p>So our specialized specification becomes</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (<span class="dt">Dot</span> b <span class="fu">^+^</span> <span class="dt">Dot</span> c) ≡ apply (<span class="dt">Dot</span> (b <span class="fu">^+^</span> c))</code></pre>

<p>which is implied by</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="dt">Dot</span> b <span class="fu">^+^</span> <span class="dt">Dot</span> c ≡ <span class="dt">Dot</span> (b <span class="fu">^+^</span> c)</code></pre>

<p>and easily satisfied by the following partial definition (replacing “<code>≡</code>” by “<code>=</code>”):</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="dt">Dot</span> b <span class="fu">^+^</span> <span class="dt">Dot</span> c <span class="fu">=</span> <span class="dt">Dot</span> (b <span class="fu">^+^</span> c)</code></pre>

<p>Now consider the case of addition with two <code>(:&amp;&amp;)</code> constructors:</p>

<p>The specification specializes to</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ((f <span class="fu">:&amp;&amp;</span> g) <span class="fu">^+^</span> (h <span class="fu">:&amp;&amp;</span> k)) ≡ apply (f <span class="fu">:&amp;&amp;</span> g) <span class="fu">^+^</span> apply (h <span class="fu">:&amp;&amp;</span> k)</code></pre>

<p>As with <code>Dot</code>, simplify the RHS:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (f <span class="fu">:&amp;&amp;</span> g) <span class="fu">^+^</span> apply (h <span class="fu">:&amp;&amp;</span> k)
 ≡ <span class="co">{- apply definition -}</span>
(apply f <span class="fu">&amp;&amp;&amp;</span> apply g) <span class="fu">^+^</span> (apply h <span class="fu">&amp;&amp;&amp;</span> apply k)
 ≡ <span class="co">{- See below -}</span>
(apply f <span class="fu">^+^</span> apply h) <span class="fu">&amp;&amp;&amp;</span> (apply g <span class="fu">^+^</span> apply k)
 ≡ <span class="co">{- induction -}</span>
apply (f <span class="fu">^+^</span> h) <span class="fu">&amp;&amp;&amp;</span> apply (g <span class="fu">^+^</span> k)
 ≡ <span class="co">{- apply definition -}</span>
apply ((f <span class="fu">^+^</span> h) <span class="fu">:&amp;&amp;</span> (g <span class="fu">^+^</span> k))</code></pre>

<p>I used the following property (on functions):</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(f <span class="fu">&amp;&amp;&amp;</span> g) <span class="fu">^+^</span> (h <span class="fu">&amp;&amp;&amp;</span> k) ≡ (f <span class="fu">^+^</span> h) <span class="fu">&amp;&amp;&amp;</span> (g <span class="fu">^+^</span> k)</code></pre>

<p>Proof:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(f <span class="fu">&amp;&amp;&amp;</span> g) <span class="fu">^+^</span> (h <span class="fu">&amp;&amp;&amp;</span> k)
 ≡ <span class="co">{- η-expand -}</span>
λ x <span class="ot">→</span> ((f <span class="fu">&amp;&amp;&amp;</span> g) <span class="fu">^+^</span> (h <span class="fu">&amp;&amp;&amp;</span> k)) x
 ≡ <span class="co">{- (&amp;&amp;&amp;) definition for functions -}</span>
λ x <span class="ot">→</span> (f x, g x) <span class="fu">^+^</span> (h x, k x)
 ≡ <span class="co">{- (^+^) definition for pairs -}</span>
λ x <span class="ot">→</span> (f x <span class="fu">^+^</span> h x, g x <span class="fu">^+^</span> k x)
 ≡ <span class="co">{- (^+^) definition for functions -}</span>
λ x <span class="ot">→</span> ((f <span class="fu">^+^</span> h) x, (g <span class="fu">^+^</span> k) x)
 ≡ <span class="co">{- (&amp;&amp;&amp;) definition for functions -}</span>
(f <span class="fu">^+^</span> h) <span class="fu">&amp;&amp;&amp;</span> (g <span class="fu">^+^</span> k)</code></pre>

<p>The specification becomes</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ((f <span class="fu">:&amp;&amp;</span> g) <span class="fu">^+^</span> (h <span class="fu">:&amp;&amp;</span> k)) ≡ apply ((f <span class="fu">^+^</span> h) <span class="fu">:&amp;&amp;</span> (g <span class="fu">^+^</span> k))</code></pre>

<p>which is easily satisfied by the following partial definition</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(f <span class="fu">:&amp;&amp;</span> g) <span class="fu">^+^</span> (h <span class="fu">:&amp;&amp;</span> k) <span class="fu">=</span> (f <span class="fu">^+^</span> h) <span class="fu">:&amp;&amp;</span> (g <span class="fu">^+^</span> k)</code></pre>

<p>The other two cases are (a) <code>Dot</code> and <code>(:&amp;&amp;)</code>, and (b) <code>(:&amp;&amp;)</code> and <code>Dot</code>, but they don’t type-check (assuming that pairs are not scalars).</p>

<h3 id="composing-linear-maps">Composing linear maps</h3>

<p>I’ll write linear map composition as “<code>g ∘ f</code>”, with type</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(∘) <span class="ot">∷</span> (b ⊸ c) <span class="ot">→</span> (a ⊸ b) <span class="ot">→</span> (a ⊸ c)</code></pre>

<p>This notation is thanks to a <code>Category</code> instance, which depends on a generalized <code>Category</code> class that uses the recent <code>ConstraintKinds</code> language extension. (See the <a href="https://github.com/conal/linear-map-gadt" title="github repository">source code</a>.)</p>

<p>Following the semantic <a href="http://conal.net/blog/tag/type-class-morphism/" title="Posts on type class morphisms">type class morphism</a> principle again, the specification says that the meaning of the composition is the composition of the meanings:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (g ∘ f) ≡ apply g ∘ apply f</code></pre>

<p>In the following, note that the <code>∘</code> operator binds more tightly than <code>&amp;&amp;&amp;</code>, so <code>f ∘ h &amp;&amp;&amp; g ∘ h</code> means <code>(f ∘ h) &amp;&amp;&amp; (g ∘ h)</code>.</p>

<h4 id="derivation">Derivation</h4>

<p>Again, since there are two constructors, we have four possible cases cases. We can handle two of these cases together, namely <code>(:&amp;&amp;)</code> and anything. The specification:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply ((f <span class="fu">:&amp;&amp;</span> g) ∘ h) ≡ apply (f <span class="fu">:&amp;&amp;</span> g) ∘ apply h</code></pre>

<p>Reasoning proceeds as above, simplifying the RHS of the constructor-specialized specification.</p>

<p>Simplify the RHS:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (f <span class="fu">:&amp;&amp;</span> g) ∘ apply h
 ≡ <span class="co">{- apply definition -}</span>
(apply f <span class="fu">&amp;&amp;&amp;</span> apply g) ∘ apply h
 ≡ <span class="co">{- see below -}</span>
apply f ∘ apply h <span class="fu">&amp;&amp;&amp;</span> apply g ∘ apply h
 ≡ <span class="co">{- induction -}</span>
apply (f ∘ h) <span class="fu">&amp;&amp;&amp;</span> apply (g ∘ h)
 ≡ <span class="co">{- apply definition -}</span>
apply (f ∘ h <span class="fu">:&amp;&amp;</span> g ∘ h)</code></pre>

<p>This simplification uses the following property of functions:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(p <span class="fu">&amp;&amp;&amp;</span> q) ∘ r ≡ p ∘ r <span class="fu">&amp;&amp;&amp;</span> q ∘ r</code></pre>

<p>Sufficient definition:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(f <span class="fu">:&amp;&amp;</span> g) ∘ h <span class="fu">=</span> f ∘ h <span class="fu">:&amp;&amp;</span> g ∘ h</code></pre>

<p>We have two more cases, specified as follows:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (<span class="dt">Dot</span> c ∘ <span class="dt">Dot</span> b) ≡ apply (<span class="dt">Dot</span> c) ∘ apply (<span class="dt">Dot</span> b)

apply (<span class="dt">Dot</span> c ∘ (f <span class="fu">:&amp;&amp;</span> g)) ≡ apply (<span class="dt">Dot</span> c) ∘ apply (f <span class="fu">:&amp;&amp;</span> g)</code></pre>

<p>Based on types, <code>c</code> must be a scalar in the first case and a pair in the second. (<code>Dot b</code> produces a scalar, while <code>f :&amp;&amp; g</code> produces a pair.) Thus, we can write these two cases more specifically:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (<span class="dt">Dot</span> s ∘ <span class="dt">Dot</span> b) ≡ apply (<span class="dt">Dot</span> s) ∘ apply (<span class="dt">Dot</span> b)

apply (<span class="dt">Dot</span> (a,b) ∘ (f <span class="fu">:&amp;&amp;</span> g)) ≡ apply (<span class="dt">Dot</span> (a,b)) ∘ apply (f <span class="fu">:&amp;&amp;</span> g)</code></pre>

<p>In the derivation, I won’t spell out as many details as before. Simplify the RHSs:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply (<span class="dt">Dot</span> s) ∘ apply (<span class="dt">Dot</span> b)
≡ dot s ∘ dot b
≡ dot (s <span class="fu">*^</span> b)
≡ apply (<span class="dt">Dot</span> (s <span class="fu">*^</span> b))</code></pre>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply (<span class="dt">Dot</span> (a,b)) ∘ apply (f <span class="fu">:&amp;&amp;</span> g)
≡ dot (a,b) ∘ (apply f <span class="fu">&amp;&amp;&amp;</span> apply g)
≡ add ∘ (dot a ∘ apply f <span class="fu">&amp;&amp;&amp;</span> dot b ∘ apply g)
≡ dot a ∘ apply f <span class="fu">^+^</span> dot b ∘ apply g
≡ apply (<span class="dt">Dot</span> a ∘ f <span class="fu">^+^</span> <span class="dt">Dot</span> b ∘ g)</code></pre>

<p>I’ve used the following properties of functions:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">dot (a,b)             ≡ add ∘ (dot a <span class="fu">***</span> dot b)

(r <span class="fu">***</span> s) ∘ (p <span class="fu">&amp;&amp;&amp;</span> q) ≡ r ∘ p <span class="fu">&amp;&amp;&amp;</span> s ∘ q

add ∘ (p <span class="fu">&amp;&amp;&amp;</span> q)       ≡ p <span class="fu">^+^</span> q

apply (f <span class="fu">^+^</span> g)       ≡ apply f <span class="fu">^+^</span> apply g</code></pre>

<p>Implementation:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"> <span class="dt">Dot</span> s     ∘ <span class="dt">Dot</span> b     <span class="fu">=</span> <span class="dt">Dot</span> (s <span class="fu">*^</span> b)
 <span class="dt">Dot</span> (a,b) ∘ (f <span class="fu">:&amp;&amp;</span> g) <span class="fu">=</span> <span class="dt">Dot</span> a ∘ f <span class="fu">^+^</span> <span class="dt">Dot</span> b ∘ g</code></pre>

<h3 id="cross-products">Cross products</h3>

<p>Another <code>Arrow</code> operation handy for linear maps is the parallel composition (product):</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">(<span class="fu">***</span>) <span class="ot">∷</span> (a ⊸ c) <span class="ot">→</span> (b ⊸ d) <span class="ot">→</span> (a × b ⊸ c × d)</code></pre>

<p>The specification says that <code>apply</code> distributes over <code>(***)</code>. In other words, the meaning of the product is the product of the meanings.</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (f <span class="fu">***</span> g) <span class="fu">=</span> apply f <span class="fu">***</span> apply g</code></pre>

<p>Where, on functions,</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">p <span class="fu">***</span> q <span class="fu">=</span> λ (a,b) <span class="ot">→</span> (p a, q b)
        ≡ p ∘ <span class="fu">fst</span> <span class="fu">&amp;&amp;&amp;</span> q ∘ <span class="fu">snd</span></code></pre>

<p>Simplify the specifications RHS:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply f <span class="fu">***</span> apply g
≡ apply f ∘ <span class="fu">fst</span> <span class="fu">&amp;&amp;&amp;</span> apply g ∘ <span class="fu">snd</span></code></pre>

<p>If we knew how to represent <code>fst</code> and <code>snd</code> via our linear map constructors, we’d be nearly done. Instead, let’s suppose we have the following functions.</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">compFst <span class="ot">∷</span> <span class="dt">VS3</span> a b c <span class="ot">⇒</span> a ⊸ c <span class="ot">→</span> a × b ⊸ c
compSnd <span class="ot">∷</span> <span class="dt">VS3</span> a b c <span class="ot">⇒</span> b ⊸ c <span class="ot">→</span> a × b ⊸ c</code></pre>

<p>specified as follows:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">apply (compFst f) ≡ apply f ∘ <span class="fu">fst</span>
apply (compSnd g) ≡ apply g ∘ <span class="fu">snd</span></code></pre>

<p>With these two functions (to be defined) in hand, let’s try again.</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">  apply f <span class="fu">***</span> apply g
≡ apply f ∘ <span class="fu">fst</span> <span class="fu">&amp;&amp;&amp;</span> apply g ∘ <span class="fu">snd</span>
≡ apply (compFst f) <span class="fu">&amp;&amp;&amp;</span> apply (compSnd g)
≡ apply (compFst f <span class="fu">:&amp;&amp;</span> compSnd g)</code></pre>

<h4 id="composing-with-fst-and-snd">Composing with <code>fst</code> and <code>snd</code></h4>

<p>I’ll elide even more of the derivation this time, focusing reasoning on the meanings. Relating to the representation is left as an exercise. The key steps in the derivation:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">dot a     ∘ <span class="fu">fst</span> ≡ dot (a,<span class="dv">0</span>)

(f <span class="fu">&amp;&amp;&amp;</span> g) ∘ <span class="fu">fst</span> ≡ f ∘ <span class="fu">fst</span> <span class="fu">&amp;&amp;&amp;</span> g ∘ <span class="fu">fst</span>

dot b     ∘ <span class="fu">snd</span> ≡ dot (<span class="dv">0</span>,b)

(f <span class="fu">&amp;&amp;&amp;</span> g) ∘ <span class="fu">snd</span> ≡ f ∘ <span class="fu">snd</span> <span class="fu">&amp;&amp;&amp;</span> g ∘ <span class="fu">snd</span></code></pre>

<p>Implementation:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">compFst (<span class="dt">Dot</span> a)   <span class="fu">=</span> <span class="dt">Dot</span> (a,zeroV)
compFst (f <span class="fu">:&amp;&amp;</span> g) <span class="fu">=</span> compFst f <span class="fu">&amp;&amp;&amp;</span> compFst g

compSnd (<span class="dt">Dot</span> b)   <span class="fu">=</span> <span class="dt">Dot</span> (zeroV,b)
compSnd (f <span class="fu">:&amp;&amp;</span> g) <span class="fu">=</span> compSnd f <span class="fu">&amp;&amp;&amp;</span> compSnd g</code></pre>

<p>where <code>zeroV</code> is the zero vector.</p>

<p>Given <code>compFst</code> and <code>compSnd</code>, we can implement <code>fst</code> and <code>snd</code> as linear maps simply as <code>compFst id</code> and <code>compSnd id</code>, where <code>id</code> is the (polymorphic) identity linear map.</p>

<h3 id="reflections">Reflections</h3>

<p>This post reflects an approach to programming that I apply wherever I’m able. As a summary:</p>

<ul>
<li>Look for an elegant <em>what</em> behind a familiar <em>how</em>.</li>
<li><em>Define</em> a semantic function for each data type.</li>
<li><em>Derive</em> a correct implementation from the semantics.</li>
</ul>

<p>You can find more examples of this methodology elsewhere in this blog and in the paper <a href="http://conal.net/papers/type-class-morphisms/"><em>Denotational design with type class morphisms</em></a>.</p>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=503&amp;md5=061f6fb371675342c9a006b8ec5376e9"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/reimagining-matrices/feed</wfw:commentRss>
		<slash:comments>10</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Freimagining-matrices&amp;language=en_GB&amp;category=text&amp;title=Reimagining+matrices&amp;description=The+function+of+the+imagination+is+notto+make+strange+things+settled%2C+so+much+asto+make+settled+things+strange.-+G.K.+Chesterton+Why+is+matrix+multiplication+defined+so+very+differently+from+matrix...&amp;tags=category%2Cdenotational+design%2Clinear+algebra%2Ctype+class+morphism%2Cblog" type="text/html" />
	</item>
	</channel>
</rss>
