<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Conal Elliott &#187; parallelism</title>
	<atom:link href="http://conal.net/blog/tag/parallelism/feed" rel="self" type="application/rss+xml" />
	<link>http://conal.net/blog</link>
	<description>Inspirations &#38; experiments, mainly about denotative/functional programming in Haskell</description>
	<lastBuildDate>Thu, 25 Jul 2019 18:15:11 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.1.17</generator>
	<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2F&amp;language=en_US&amp;category=text&amp;title=Conal+Elliott&amp;description=Inspirations+%26amp%3B+experiments%2C+mainly+about+denotative%2Ffunctional+programming+in+Haskell&amp;tags=blog" type="text/html" />
	<item>
		<title>Parallel speculative addition via memoization</title>
		<link>http://conal.net/blog/posts/parallel-speculative-addition-via-memoization</link>
		<comments>http://conal.net/blog/posts/parallel-speculative-addition-via-memoization#comments</comments>
		<pubDate>Tue, 27 Nov 2012 23:39:42 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[number]]></category>
		<category><![CDATA[parallelism]]></category>
		<category><![CDATA[speculation]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=493</guid>
		<description><![CDATA[I’ve been thinking much more about parallel computation for the last couple of years, especially since starting to work at Tabula a year ago. Until getting into parallelism explicitly, I’d naïvely thought that my pure functional programming style was mostly free of sequential bias. After all, functional programming lacks the implicit accidental dependencies imposed by [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- teaser -->

<p>I’ve been thinking much more about parallel computation for the last couple of years, especially since starting to work at <a href="http://www.tabula.com">Tabula</a> a year ago. Until getting into parallelism explicitly, I’d naïvely thought that my pure functional programming style was mostly free of sequential bias. After all, functional programming lacks the implicit accidental dependencies imposed by the imperative model. Now, however, I’m coming to see that designing parallel-friendly algorithms takes attention to minimizing the depth of the remaining, explicit data dependencies.</p>

<p>As an example, consider binary addition, carried out from least to most significant bit (as usual). We can immediately compute the first (least significant) bit of the result, but in order to compute the second bit, we’ll have to know whether or not a carry resulted from the first addition. More generally, the <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo stretchy="false">(</mo><mi>n</mi><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo></mrow></math><em>th</em> sum &amp; carry require knowing the <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>n</mi></mrow></math><em>th</em> carry, so this algorithm does not allow parallel execution. Even if we have one processor per bit position, only one processor will be able to work at a time, due to the linear chain of dependencies.</p>

<p>One general technique for improving parallelism is <em>speculation</em>—doing more work than might be needed so that we don’t have to wait to find out exactly what <em>will</em> be needed. In this post, we’ll see a progression of definitions for bitwise addition. We’ll start with a linear-depth chain of carry dependencies and end with logarithmic depth. Moreover, by making careful use of abstraction, these versions will be simply different type specializations of a single polymorphic definition with an extremely terse definition.</p>

<p><span id="more-493"></span></p>

<h3 id="a-full-adder">A full adder</h3>

<p>Let’s start with an adder for two one-bit numbers. Because of the possibility of overflow, the result will be two bits, which I’ll call “sum” and “carry”. So that we can chain these one-bit adders, we’ll also add a carry input.</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">addB <span class="ot">∷</span> (<span class="dt">Bool</span>,<span class="dt">Bool</span>) <span class="ot">→</span> <span class="dt">Bool</span> <span class="ot">→</span> (<span class="dt">Bool</span>,<span class="dt">Bool</span>)</code></pre>

<p>In the result, the first <code>Bool</code> will be the sum, and the second will be the carry. I’ve curried the carry input to make it stand out from the (other) addends.</p>

<p>There are a few ways to define <code>addB</code> in terms of logic operations. I like the following definition, as it shares a little work between sum &amp; carry:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">addB (a,b) cin <span class="fu">=</span> (axb ≠ cin, anb ∨ (cin ∧ axb))
 <span class="kw">where</span>
   axb <span class="fu">=</span> a ≠ b
   anb <span class="fu">=</span> a ∧ b</code></pre>

<p>I’m using <code>(≠)</code> on <code>Bool</code> for exclusive or.</p>

<h3 id="a-ripple-carry-adder">A ripple carry adder</h3>

<p>Now suppose we have not just two bits, but two <em>sequences</em> of bits, interpreted as binary numbers arranged from least to most significant bit. For simplicity, I’d like to assume that these sequences to have the same length, so rather than taking a pair of bit lists, let’s take a list of bit pairs:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">add <span class="ot">∷</span> [(<span class="dt">Bool</span>,<span class="dt">Bool</span>)] <span class="ot">→</span> <span class="dt">Bool</span> <span class="ot">→</span> ([<span class="dt">Bool</span>],<span class="dt">Bool</span>)</code></pre>

<p>To implement <code>add</code>, traverse the list of bit pairs, threading the carries:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">add [] c     <span class="fu">=</span> ([]  , c)
add (p<span class="fu">:</span>ps) c <span class="fu">=</span> (s<span class="fu">:</span>ss, c&#39;&#39;)
 <span class="kw">where</span>
   (s ,c&#39; ) <span class="fu">=</span> addB p c
   (ss,c&#39;&#39;) <span class="fu">=</span> add ps c&#39;</code></pre>

<h3 id="state">State</h3>

<p>This <code>add</code> definition contains a familiar pattern. The carry values act as a sort of <em>state</em> that gets updated in a linear (non-branching) way. The <code>State</code> monad captures this pattern of computation:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">newtype</span> <span class="dt">State</span> s a <span class="fu">=</span> <span class="dt">State</span> (s <span class="ot">→</span> (a,s))</code></pre>

<p>By using <code>State</code> and its <code>Monad</code> instance, we can shorten our <code>add</code> definition. First we’ll need a new full adder definition, tweaked for <code>State</code>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">addB <span class="ot">∷</span> (<span class="dt">Bool</span>,<span class="dt">Bool</span>) <span class="ot">→</span> <span class="dt">State</span> <span class="dt">Bool</span> <span class="dt">Bool</span>
addB (a,b) <span class="fu">=</span> <span class="kw">do</span> cin <span class="ot">←</span> get
                put (anb ∨ cin ∧ axb)
                <span class="fu">return</span> (axb ≠ cin)
 <span class="kw">where</span>
   anb <span class="fu">=</span> a ∧ b
   axb <span class="fu">=</span> a ≠ b</code></pre>

<p>And then the multi-bit adder:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">add <span class="ot">∷</span> [(<span class="dt">Bool</span>,<span class="dt">Bool</span>)] <span class="ot">→</span> <span class="dt">State</span> <span class="dt">Bool</span> [<span class="dt">Bool</span>]
add []     <span class="fu">=</span> <span class="fu">return</span> []
add (p<span class="fu">:</span>ps) <span class="fu">=</span> <span class="kw">do</span> s  <span class="ot">←</span> addB p
                ss <span class="ot">←</span> add ps
                <span class="fu">return</span> (s<span class="fu">:</span>ss)</code></pre>

<p>We don’t really need the <code>Monad</code> interface to define <code>add</code>. The simpler and more general <code>Applicative</code> interface suffices:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">add []     <span class="fu">=</span> pure []
add (p<span class="fu">:</span>ps) <span class="fu">=</span> liftA2 (<span class="fu">:</span>) (addB p) (add ps)</code></pre>

<p>This pattern also looks familiar. Oh — the <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Traversable.html#t:Traversable"><code>Traversable</code></a> instance for lists makes for a very compact definition:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">add <span class="fu">=</span> traverse addB</code></pre>

<p>Wow. The definition is now so simple that it doesn’t depend on the specific choice of lists. To find out the most general type <code>add</code> can have (with this definition), remove the type signature, turn off the monomorphism restriction, and see what GHCi has to say:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">add <span class="ot">∷</span> <span class="kw">Traversable</span> t <span class="ot">⇒</span> t (<span class="dt">Bool</span>,<span class="dt">Bool</span>) <span class="ot">→</span> <span class="dt">State</span> <span class="dt">Bool</span> (t <span class="dt">Bool</span>)</code></pre>

<p>This constraint is <em>very</em> lenient. <code>Traversable</code> can be derived automatically for <em>all</em> algebraic data types, including nested/non-regular ones.</p>

<p>For instance,</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">Tree</span> a <span class="fu">=</span> <span class="dt">Leaf</span> a <span class="fu">|</span> <span class="dt">Branch</span> (<span class="dt">Tree</span> a) (<span class="dt">Tree</span> a)
  <span class="kw">deriving</span> (<span class="kw">Functor</span>,<span class="kw">Foldable</span>,<span class="kw">Traversable</span>)</code></pre>

<p>We can now specialize this general <code>add</code> back to lists:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">addLS <span class="ot">∷</span> [(<span class="dt">Bool</span>,<span class="dt">Bool</span>)] <span class="ot">→</span> <span class="dt">State</span> <span class="dt">Bool</span> [<span class="dt">Bool</span>]
addLS <span class="fu">=</span> add</code></pre>

<p>We can also specialize for trees:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">addTS <span class="ot">∷</span> <span class="dt">Tree</span> (<span class="dt">Bool</span>,<span class="dt">Bool</span>) <span class="ot">→</span> <span class="dt">State</span> <span class="dt">Bool</span> (<span class="dt">Tree</span> <span class="dt">Bool</span>)
addTS <span class="fu">=</span> add</code></pre>

<p>Or for depth-typed perfect trees (e.g., as described in <a href="http://conal.net/blog/posts/from-tries-to-trees/" title="blog post"><em>From tries to trees</em></a>):</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">addTnS <span class="ot">∷</span> <span class="dt">IsNat</span> n <span class="ot">⇒</span>
         <span class="dt">T</span> n (<span class="dt">Bool</span>,<span class="dt">Bool</span>) <span class="ot">→</span> <span class="dt">State</span> <span class="dt">Bool</span> (<span class="dt">T</span> n <span class="dt">Bool</span>)
addTnS <span class="fu">=</span> add</code></pre>

<p>Binary trees are often better than lists for parallelism, because they allow quick recursive splitting and joining. In the case of ripple adders, we don’t really get parallelism, however, because of the single-threaded (linear) nature of <code>State</code>. Can we get around this unfortunate linearization?</p>

<h3 id="speculation">Speculation</h3>

<p>The linearity of carry propagation interferes with parallel execution even when using a tree representation. The problem is that each <code>addB</code> (full adder) invocation must access the carry out from the previous (immediately less significant) bit position and so must wait for that carry to be computed. Since each bit addition must wait for the previous one to finish, we get linear running time, even with unlimited parallel processing available. If we didn’t have to wait for carries, we could instead get logarithmic running time using the tree representation, since subtrees could be added in parallel.</p>

<p>A way out of this dilemma is to speculatively compute the bit sums for <em>both</em> possibilities, i.e., for carry and no carry. We’ll do more work, but much less waiting.</p>

<h3 id="state-memoization">State memoization</h3>

<p>Recall the <code>State</code> definition:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">newtype</span> <span class="dt">State</span> s a <span class="fu">=</span> <span class="dt">State</span> (s <span class="ot">→</span> (a,s))</code></pre>

<p>Rather than using a <em>function</em> of <code>s</code>, let’s use a <em>table</em> indexed by <code>s</code>. Since <code>s</code> is <code>Bool</code> in our use, a table is simply a uniform pair, so we could replace <code>State Bool a</code> with the following:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">newtype</span> <span class="dt">BoolStateTable</span> a <span class="fu">=</span> <span class="dt">BST</span> ((a,<span class="dt">Bool</span>), (a,<span class="dt">Bool</span>))</code></pre>

<p><em>Exercise:</em> define <code>Functor</code>, <code>Applicative</code>, and <code>Monad</code> instances for <code>BoolStateTable</code>.</p>

<p>Rather than defining such a specialized type, let’s stand back and consider what’s going on. We’re replacing a function by an isomorphic data type. This replacement is exactly what memoization is about. So let’s define a general <em>memoizing state monad</em>:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell"><span class="kw">newtype</span> <span class="dt">StateTrie</span> s a <span class="fu">=</span> <span class="dt">StateTrie</span> (s ⇰ (a,s))</code></pre>

<p>Note that the definition of memoizing state is nearly identical to <code>State</code>. I’ve simply replaced “<code>→</code>” by “<code>⇰</code>”, i.e., <a href="http://conal.net/blog/tag/memoization/" title="Posts on memoization">memo</a> <a href="http://conal.net/blog/tag/trie/" title="Posts on tries">tries</a>. For the (simple) source code of <code>StateTrie</code>, see <a href="http://github.com/conal/state-trie.git">the github project</a>. (Poking around on Hackage, I just found <a href="http://hackage.haskell.org/package/monad-memo">monad-memo</a>, which looks related.)</p>

<p>The full-adder function <code>addB</code> is restricted to <code>State</code>, but unnecessarily so. The most general type is inferred as</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">addB <span class="ot">∷</span> <span class="dt">MonadState</span> <span class="dt">Bool</span> m <span class="ot">⇒</span> (<span class="dt">Bool</span>,<span class="dt">Bool</span>) <span class="ot">→</span> m <span class="dt">Bool</span></code></pre>

<p>where the <a href="http://hackage.haskell.org/packages/archive/mtl/latest/doc/html/Control-Monad-State-Class.html#t:MonadState"><code>MonadState</code></a> class comes from the mtl package.</p>

<p>With the type-generalized <code>addB</code>, we get a more general type for <code>add</code> as well:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">add <span class="ot">∷</span> (<span class="kw">Traversable</span> t, <span class="kw">Applicative</span> m, <span class="dt">MonadState</span> <span class="dt">Bool</span> m) <span class="ot">⇒</span>
      t (<span class="dt">Bool</span>,<span class="dt">Bool</span>) <span class="ot">→</span> m (t <span class="dt">Bool</span>)
add <span class="fu">=</span> traverse addB</code></pre>

<p>Now we can specialize <code>add</code> to work with memoized state:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">addLM <span class="ot">∷</span> [(<span class="dt">Bool</span>,<span class="dt">Bool</span>)] <span class="ot">→</span> <span class="dt">StateTrie</span> <span class="dt">Bool</span> [<span class="dt">Bool</span>]
addLM <span class="fu">=</span> add

addTM <span class="ot">∷</span> <span class="dt">Tree</span> (<span class="dt">Bool</span>,<span class="dt">Bool</span>) <span class="ot">→</span> <span class="dt">StateTrie</span> <span class="dt">Bool</span> (<span class="dt">Tree</span> <span class="dt">Bool</span>)
addTM <span class="fu">=</span> add</code></pre>

<h3 id="what-have-we-done">What have we done?</h3>

<p>The essential tricks in this post are to (a) boost parallelism by speculative evaluation (an old idea) and (b) express speculation as memoization (new, to me at least). The technique wins for binary addition thanks to the small number of possible states, which then makes memoization (full speculation) affordable.</p>

<p>I’m not suggesting that the code above has impressive parallel execution when compiled under GHC. Perhaps it could with some <a href="http://www.haskell.org/ghc/docs/latest/html/users_guide/lang-parallel.html#id653837"><code>par</code> and <code>pseq</code> annotations</a>. I haven’t tried. This exploration helps me understand a little of the space of hardware-oriented algorithms.</p>

<p>The <a href="http://www.aoki.ecei.tohoku.ac.jp/arith/mg/algorithm.html#fsa_csu">conditional sum adder</a> looks quite similar to the development above. It has the twist, however, of speculating carries on blocks of a few bits rather than single bits. It’s astonishingly easy to adapt the development above for such a hybrid scheme, forming traversable structures of sequences of bits:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">addH <span class="ot">∷</span> <span class="dt">Tree</span> [(<span class="dt">Bool</span>,<span class="dt">Bool</span>)] <span class="ot">→</span> <span class="dt">StateTrie</span> <span class="dt">Bool</span> (<span class="dt">Tree</span> [<span class="dt">Bool</span>])
addH <span class="fu">=</span> traverse (fromState ∘ add)</code></pre>

<p>I’m using the adapter <code>fromState</code> so that the inner list additions will use <code>State</code> while the outer tree additions will use <code>StateTrie</code>, thanks to type inference. This adapter memoizes and rewraps the state transition function:</p>

<pre class="sourceCode literate haskell"><code class="sourceCode haskell">fromState <span class="ot">∷</span> <span class="dt">HasTrie</span> s <span class="ot">⇒</span> <span class="dt">State</span> s a <span class="ot">→</span> <span class="dt">StateTrie</span> s a
fromState <span class="fu">=</span> <span class="dt">StateTrie</span> ∘ trie ∘ runState</code></pre>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=493&amp;md5=0136cef7713a1f1858ef9075530a873c"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/parallel-speculative-addition-via-memoization/feed</wfw:commentRss>
		<slash:comments>6</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fparallel-speculative-addition-via-memoization&amp;language=en_GB&amp;category=text&amp;title=Parallel+speculative+addition+via+memoization&amp;description=I%E2%80%99ve+been+thinking+much+more+about+parallel+computation+for+the+last+couple+of+years%2C+especially+since+starting+to+work+at+Tabula+a+year+ago.+Until+getting+into+parallelism+explicitly%2C+I%E2%80%99d...&amp;tags=number%2Cparallelism%2Cspeculation%2Cblog" type="text/html" />
	</item>
	</channel>
</rss>
