<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Conal Elliott &#187; program derivation</title>
	<atom:link href="http://conal.net/blog/tag/program-derivation/feed" rel="self" type="application/rss+xml" />
	<link>http://conal.net/blog</link>
	<description>Inspirations &#38; experiments, mainly about denotative/functional programming in Haskell</description>
	<lastBuildDate>Thu, 25 Jul 2019 18:15:11 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=4.1.17</generator>
	<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2F&amp;language=en_US&amp;category=text&amp;title=Conal+Elliott&amp;description=Inspirations+%26amp%3B+experiments%2C+mainly+about+denotative%2Ffunctional+programming+in+Haskell&amp;tags=blog" type="text/html" />
	<item>
		<title>Parallel tree scanning by composition</title>
		<link>http://conal.net/blog/posts/parallel-tree-scanning-by-composition</link>
		<comments>http://conal.net/blog/posts/parallel-tree-scanning-by-composition#comments</comments>
		<pubDate>Tue, 24 May 2011 20:31:23 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[functor]]></category>
		<category><![CDATA[program derivation]]></category>
		<category><![CDATA[scan]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=429</guid>
		<description><![CDATA[My last few blog posts have been on the theme of scans, and particularly on parallel scans. In Composable parallel scanning, I tackled parallel scanning in a very general setting. There are five simple building blocks out of which a vast assortment of data structures can be built, namely constant (no value), identity (one value), [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- teaser -->

<p>My last few blog posts have been on the theme of <em>scans</em>, and particularly on <em>parallel</em> scans. In <a href="http://conal.net/blog/posts/composable-parallel-scanning/" title="blog post"><em>Composable parallel scanning</em></a>, I tackled parallel scanning in a very general setting. There are five simple building blocks out of which a vast assortment of data structures can be built, namely constant (no value), identity (one value), sum, product, and composition. The post defined parallel prefix and suffix scan for each of these five &quot;functor combinators&quot;, in terms of the same scan operation on each of the component functors. Every functor built out of this basic set thus has a parallel scan. Functors defined more conventionally can be given scan implementations simply by converting to a composition of the basic set, scanning, and then back to the original functor. Moreover, I expect this implementation could be generated automatically, similarly to GHC&#8217;s <code>DerivingFunctor</code> extension.</p>

<p>Now I&#8217;d like to show two examples of parallel scan composition in terms of binary trees, namely the top-down and bottom-up variants of perfect binary leaf trees used in previous posts. (In previous posts, I used the terms &quot;right-folded&quot; and &quot;left-folded&quot; instead of &quot;top-down&quot; and &quot;bottom-up&quot;.) The resulting two algorithms are expressed nearly identically, but have differ significantly in the work performed. The top-down version does <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>&#920;</mo><mo stretchy="false">(</mo><mi>n</mi><mspace width="0.167em"></mspace><mi>log</mi><mspace width="0.167em"></mspace><mi>n</mi><mo stretchy="false">)</mo></mrow></math> work, while the bottom-up version does only <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>&#920;</mo><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></mrow></math>, and thus the latter algorithm is work-efficient, while the former is not. Moreover, with a <em>very</em> simple optimization, the bottom-up tree algorithm corresponds closely to Guy Blelloch&#8217;s parallel prefix scan for arrays, given in <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.5739" title="Paper by Guy Blelloch"><em>Programming parallel algorithms</em></a>. I&#8217;m delighted with this result, as I had been wondering how to think about Guy&#8217;s algorithm.</p>

<p><strong>Edit:</strong></p>

<ul>
<li>2011-05-31: Added <code>Scan</code> and <code>Applicative</code> instances for <code>T2</code> and <code>T4</code>.</li>
</ul>

<p><span id="more-429"></span></p>

<h3 id="scanning-via-functor-combinators">Scanning via functor combinators</h3>

<p>In <a href="http://conal.net/blog/posts/composable-parallel-scanning/" title="blog post"><em>Composable parallel scanning</em></a>, we saw the <code>Scan</code> class:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">class</span> <span class="dt">Scan</span> f <span class="kw">where</span><br />  prefixScan, suffixScan <span class="ot">&#8759;</span> <span class="dt">Monoid</span> m <span class="ot">&#8658;</span> f m <span class="ot">&#8594;</span> (m, f m)</code></pre>

<p>Given a structure of values, the prefix and suffix scan methods generate the overall <code>fold</code> (of type <code>m</code>), plus a structure of the same type as the input. (In contrast, the usual Haskell <code>scanl</code> and <code>scanr</code> functions on lists yield a single list with one more element than the source list. I changed the interface for generality and composability.) The <a href="http://conal.net/blog/posts/composable-parallel-scanning/" title="blog post">post</a> gave instances for the basic set of five functor combinators.</p>

<p>Most functors are not defined via the basic combinators, but as mentioned above, we can scan by conversion to and from the basic set. For convenience, encapsulate this conversion in a type class:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">class</span> <span class="dt">EncodeF</span> f <span class="kw">where</span><br />  <span class="kw">type</span> <span class="dt">Enc</span> f <span class="ot">&#8759;</span> <span class="fu">*</span> <span class="ot">&#8594;</span> <span class="fu">*</span><br />  encode <span class="ot">&#8759;</span> f a <span class="ot">&#8594;</span> <span class="dt">Enc</span> f a<br />  decode <span class="ot">&#8759;</span> <span class="dt">Enc</span> f a <span class="ot">&#8594;</span> f a</code></pre>

<p>and define scan functions via <code>EncodeF</code>:</p>

<pre class="sourceCode"><code class="sourceCode haskell">prefixScanEnc, suffixScanEnc <span class="ot">&#8759;</span><br />  (<span class="dt">EncodeF</span> f, <span class="dt">Scan</span> (<span class="dt">Enc</span> f), <span class="dt">Monoid</span> m) <span class="ot">&#8658;</span> f m <span class="ot">&#8594;</span> (m, f m)<br />prefixScanEnc <span class="fu">=</span> second decode &#8728; prefixScan &#8728; encode<br />suffixScanEnc <span class="fu">=</span> second decode &#8728; suffixScan &#8728; encode</code></pre>

<h4 id="lists">Lists</h4>

<p>As a first example, consider</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">EncodeF</span> [] <span class="kw">where</span><br />  <span class="kw">type</span> <span class="dt">Enc</span> [] <span class="fu">=</span> <span class="dt">Const</span> () <span class="fu">+</span> <span class="dt">Id</span> &#215; []<br />  encode [] <span class="fu">=</span> <span class="dt">InL</span> (<span class="dt">Const</span> ())<br />  encode (a <span class="fu">:</span> <span class="kw">as</span>) <span class="fu">=</span> <span class="dt">InR</span> (<span class="dt">Id</span> a &#215; <span class="kw">as</span>)<br />  decode (<span class="dt">InL</span> (<span class="dt">Const</span> ())) <span class="fu">=</span> []<br />  decode (<span class="dt">InR</span> (<span class="dt">Id</span> a &#215; <span class="kw">as</span>)) <span class="fu">=</span> a <span class="fu">:</span> <span class="kw">as</span></code></pre>

<p>And declare a boilerplate <code>Scan</code> instance via <code>EncodeF</code>:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">Scan</span> [] <span class="kw">where</span><br />  prefixScan <span class="fu">=</span> prefixScanEnc<br />  suffixScan <span class="fu">=</span> suffixScanEnc</code></pre>

<p>I haven&#8217;t checked the details, but I think with this instance, suffix scanning has okay performance, while prefix scan does quadratic work. The reason is the in the <code>Scan</code> instance for products, the two components are scanned independently (in parallel), and then the whole second component is adjusted for <code>prefixScan</code>, while the whole first component is adjusted for <code>suffixScan</code>. In the case of lists, the first component is the list head, and second component is the list tail.</p>

<p>For your reading convenience, here&#8217;s that <code>Scan</code> instance again:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> (<span class="dt">Scan</span> f, <span class="dt">Scan</span> g, <span class="kw">Functor</span> f, <span class="kw">Functor</span> g) <span class="ot">&#8658;</span> <span class="dt">Scan</span> (f &#215; g) <span class="kw">where</span><br />  prefixScan (fa &#215; ga) <span class="fu">=</span> (af &#8853; ag, fa' &#215; ((af &#8853;) <span class="fu">&lt;$&gt;</span> ga'))<br />   <span class="kw">where</span> (af,fa') <span class="fu">=</span> prefixScan fa<br />         (ag,ga') <span class="fu">=</span> prefixScan ga<br /><br />  suffixScan (fa &#215; ga) <span class="fu">=</span> (af &#8853; ag, ((&#8853; ag) <span class="fu">&lt;$&gt;</span> fa') &#215; ga')<br />   <span class="kw">where</span> (af,fa') <span class="fu">=</span> suffixScan fa<br />         (ag,ga') <span class="fu">=</span> suffixScan ga</code></pre>

<p>The lop-sidedness of the list type thus interferes with parallelization, and makes the parallel scans perform much worse than cumulative sequential scans.</p>

<p>Let&#8217;s next look at a more balanced type.</p>

<h3 id="binary-trees">Binary Trees</h3>

<p>We&#8217;ll get better parallel performance by organizing our data so that we can cheaply partition it into roughly equal pieces. Tree types allows such partitioning.</p>

<h4 id="top-down-trees">Top-down trees</h4>

<p>We&#8217;ll try a few variations, starting with a simple binary tree.</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">T1</span> a <span class="fu">=</span> <span class="dt">L1</span> a <span class="fu">|</span> <span class="dt">B1</span> (<span class="dt">T1</span> a) (<span class="dt">T1</span> a) <span class="kw">deriving</span> <span class="kw">Functor</span></code></pre>

<p>Encoding and decoding is straightforward:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">EncodeF</span> <span class="dt">T1</span> <span class="kw">where</span><br />  <span class="kw">type</span> <span class="dt">Enc</span> <span class="dt">T1</span> <span class="fu">=</span> <span class="dt">Id</span> <span class="fu">+</span> <span class="dt">T1</span> &#215; <span class="dt">T1</span><br />  encode (<span class="dt">L1</span> a)   <span class="fu">=</span> <span class="dt">InL</span> (<span class="dt">Id</span> a)<br />  encode (<span class="dt">B1</span> s t) <span class="fu">=</span> <span class="dt">InR</span> (s &#215; t)<br />  decode (<span class="dt">InL</span> (<span class="dt">Id</span> a))  <span class="fu">=</span> <span class="dt">L1</span> a<br />  decode (<span class="dt">InR</span> (s &#215; t)) <span class="fu">=</span> <span class="dt">B1</span> s t<br /><br /><span class="kw">instance</span> <span class="dt">Scan</span> <span class="dt">T1</span> <span class="kw">where</span><br />  prefixScan <span class="fu">=</span> prefixScanEnc<br />  suffixScan <span class="fu">=</span> suffixScanEnc</code></pre>

<p>Note that these definitions could be generated automatically from the data type definition.</p>

<p>For <em>balanced trees</em>, prefix and suffix scan divide the problem in half at each step, solve each half, and do linear work to patch up one of the two halves. Letting <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>n</mi></mrow></math> be the number of elements, and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>W</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></mrow></math> the work, we have the recurrence <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>W</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo><mo>=</mo><mn>2</mn><mspace width="0.167em"></mspace><mi>W</mi><mo stretchy="false">(</mo><mi>n</mi><mo>/</mo><mn>2</mn><mo stretchy="false">)</mo><mo>+</mo><mi>c</mi><mspace width="0.167em"></mspace><mi>n</mi></mrow></math> for some constant factor <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>c</mi></mrow></math>. By the <a href="http://en.wikipedia.org/wiki/Master_theorem" title="Wikipedia entry">Master theorem</a>, therefore, the work done is <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>&#920;</mo><mo stretchy="false">(</mo><mi>n</mi><mspace width="0.167em"></mspace><mi>log</mi><mspace width="0.167em"></mspace><mi>n</mi><mo stretchy="false">)</mo></mrow></math>. (Use case 2, with <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>a</mi><mo>=</mo><mi>b</mi><mo>=</mo><mn>2</mn></mrow></math>, <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>f</mi><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo><mo>=</mo><mi>c</mi><mspace width="0.167em"></mspace><mi>n</mi></mrow></math>, and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>k</mi><mo>=</mo><mn>0</mn></mrow></math>.)</p>

<p>Again assuming a <em>balanced</em> tree, the computation dependencies have logarithmic depth, so the ideal parallel running time (assuming sufficient processors) is <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>&#920;</mo><mo stretchy="false">(</mo><mi>log</mi><mi>n</mi><mo stretchy="false">)</mo></mrow></math>. Thus we have an algorithm that is depth-efficient (modulo constant factors) but work-inefficient.</p>

<h4 id="composition">Composition</h4>

<p>A binary tree as defined above is either a leaf or a pair of binary trees. We can make this pair-ness more explicit with a reformulation:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">T2</span> a <span class="fu">=</span> <span class="dt">L2</span> a <span class="fu">|</span> <span class="dt">B2</span> (<span class="dt">Pair</span> (<span class="dt">T2</span> a)) <span class="kw">deriving</span> <span class="kw">Functor</span></code></pre>

<p>where <code>Pair</code>, as in <a href="http://conal.net/blog/posts/composable-parallel-scanning/" title="blog post"><em>Composable parallel scanning</em></a>, is defined as</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">Pair</span> a <span class="fu">=</span> a <span class="fu">:#</span> a <span class="kw">deriving</span> <span class="kw">Functor</span></code></pre>

<p>or even</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">type</span> <span class="dt">Pair</span> <span class="fu">=</span> <span class="dt">Id</span> &#215; <span class="dt">Id</span></code></pre>

<p>For encoding and decoding, we could use the same representation as with <code>T1</code>, but let&#8217;s instead use a more natural one for the definition of <code>T2</code>:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">EncodeF</span> <span class="dt">T2</span> <span class="kw">where</span><br />  <span class="kw">type</span> <span class="dt">Enc</span> <span class="dt">T2</span> <span class="fu">=</span> <span class="dt">Id</span> <span class="fu">+</span> <span class="dt">Pair</span> &#8728; <span class="dt">T2</span><br />  encode (<span class="dt">L2</span> a)  <span class="fu">=</span> <span class="dt">InL</span> (<span class="dt">Id</span> a)<br />  encode (<span class="dt">B2</span> st) <span class="fu">=</span> <span class="dt">InR</span> (<span class="dt">O</span> st)<br />  decode (<span class="dt">InL</span> (<span class="dt">Id</span> a)) <span class="fu">=</span> <span class="dt">L2</span> a<br />  decode (<span class="dt">InR</span> (<span class="dt">O</span> st)) <span class="fu">=</span> <span class="dt">B2</span> st</code></pre>

<p>Boilerplate scanning:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">Scan</span> <span class="dt">T2</span> <span class="kw">where</span><br />  prefixScan <span class="fu">=</span> prefixScanEnc<br />  suffixScan <span class="fu">=</span> suffixScanEnc</code></pre>

<p>for which we&#8217;ll need an applicative instance:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">Applicative</span> <span class="dt">T2</span> <span class="kw">where</span><br />  pure <span class="fu">=</span> <span class="dt">L2</span><br />  <span class="dt">L2</span> f <span class="fu">&lt;*&gt;</span> <span class="dt">L2</span> x <span class="fu">=</span> <span class="dt">L2</span> (f x)<br />  <span class="dt">B2</span> (fs <span class="fu">:#</span> gs) <span class="fu">&lt;*&gt;</span> <span class="dt">B2</span> (xs <span class="fu">:#</span> ys) <span class="fu">=</span> <span class="dt">B2</span> ((fs <span class="fu">&lt;*&gt;</span> xs) <span class="fu">:#</span> (gs <span class="fu">&lt;*&gt;</span> ys))<br />  _ <span class="fu">&lt;*&gt;</span> _ <span class="fu">=</span> <span class="fu">error</span> <span class="st">&quot;T2 (&lt;*&gt;): structure mismatch&quot;</span></code></pre>

<p>The <code>O</code> constructor is for functor composition.</p>

<p>With a small change to the tree type, we can make the composition of <code>Pair</code> and <code>T</code> more explicit:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">T3</span> a <span class="fu">=</span> <span class="dt">L3</span> a <span class="fu">|</span> <span class="dt">B3</span> ((<span class="dt">Pair</span> &#8728; <span class="dt">T3</span>) a) <span class="kw">deriving</span> <span class="kw">Functor</span></code></pre>

<p>Then the conversion becomes even simpler, since there&#8217;s no need to add or remove <code>O</code> wrappers:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">EncodeF</span> <span class="dt">T3</span> <span class="kw">where</span><br />  <span class="kw">type</span> <span class="dt">Enc</span> <span class="dt">T3</span> <span class="fu">=</span> <span class="dt">Id</span> <span class="fu">+</span> <span class="dt">Pair</span> &#8728; <span class="dt">T3</span><br />  encode (<span class="dt">L3</span> a)  <span class="fu">=</span> <span class="dt">InL</span> (<span class="dt">Id</span> a)<br />  encode (<span class="dt">B3</span> st) <span class="fu">=</span> <span class="dt">InR</span> st<br />  decode (<span class="dt">InL</span> (<span class="dt">Id</span> a)) <span class="fu">=</span> <span class="dt">L3</span> a<br />  decode (<span class="dt">InR</span> st)     <span class="fu">=</span> <span class="dt">B3</span> st</code></pre>

<h4 id="bottom-up-trees">Bottom-up trees</h4>

<p>In the formulations above, a non-leaf tree consists of a pair of trees. I&#8217;ll call these trees &quot;top-down&quot;, since visible pair structure begins at the top.</p>

<p>With a very small change, we can instead use a tree of pairs:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">T4</span> a <span class="fu">=</span> <span class="dt">L4</span> a <span class="fu">|</span> <span class="dt">B4</span> (<span class="dt">T4</span> (<span class="dt">Pair</span> a)) <span class="kw">deriving</span> <span class="kw">Functor</span></code></pre>

<p>Again an applicative instance allows a standard <code>Scan</code> instance:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">Scan</span> <span class="dt">T4</span> <span class="kw">where</span><br />  prefixScan <span class="fu">=</span> prefixScanEnc<br />  suffixScan <span class="fu">=</span> suffixScanEnc<br /><br /><span class="kw">instance</span> <span class="dt">Applicative</span> <span class="dt">T4</span> <span class="kw">where</span><br />  pure <span class="fu">=</span> <span class="dt">L4</span><br />  <span class="dt">L4</span> f   <span class="fu">&lt;*&gt;</span> <span class="dt">L4</span> x   <span class="fu">=</span> <span class="dt">L4</span> (f x)<br />  <span class="dt">B4</span> fgs <span class="fu">&lt;*&gt;</span> <span class="dt">B4</span> xys <span class="fu">=</span> <span class="dt">B4</span> (liftA2 h fgs xys)<br />   <span class="kw">where</span> h (f <span class="fu">:#</span> g) (x <span class="fu">:#</span> y) <span class="fu">=</span> f x <span class="fu">:#</span> g y<br />  _ <span class="fu">&lt;*&gt;</span> _ <span class="fu">=</span> <span class="fu">error</span> <span class="st">&quot;T4 (&lt;*&gt;): structure mismatch&quot;</span></code></pre>

<p>or a more explicitly composed form:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">T5</span> a <span class="fu">=</span> <span class="dt">L5</span> a <span class="fu">|</span> <span class="dt">B5</span> ((<span class="dt">T5</span> &#8728; <span class="dt">Pair</span>) a) <span class="kw">deriving</span> <span class="kw">Functor</span></code></pre>

<p>I&#8217;ll call these new variations &quot;bottom-up&quot; trees, since visible pair structure begins at the bottom. After stripping off the branch constructor, <code>B4</code>, we can get at the pair-valued leaves by means of <code>fmap</code>, <code>fold</code>, or <code>traverse</code> (or variations). For <code>B5</code>, we&#8217;d also have to strip off the <code>O</code> wrapper (functor composition).</p>

<p>Encoding is nearly the same as with top-down trees. For instance,</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">EncodeF</span> <span class="dt">T4</span> <span class="kw">where</span><br />  <span class="kw">type</span> <span class="dt">Enc</span> <span class="dt">T4</span> <span class="fu">=</span> <span class="dt">Id</span> <span class="fu">+</span> <span class="dt">T4</span> &#8728; <span class="dt">Pair</span><br />  encode (<span class="dt">L4</span> a) <span class="fu">=</span> <span class="dt">InL</span> (<span class="dt">Id</span> a)<br />  encode (<span class="dt">B4</span> t) <span class="fu">=</span> <span class="dt">InR</span> (<span class="dt">O</span> t)<br />  decode (<span class="dt">InL</span> (<span class="dt">Id</span> a)) <span class="fu">=</span> <span class="dt">L4</span> a<br />  decode (<span class="dt">InR</span> (<span class="dt">O</span> t))  <span class="fu">=</span> <span class="dt">B4</span> t</code></pre>

<h3 id="scanning-pairs">Scanning pairs</h3>

<p>We&#8217;ll need to scan on the <code>Pair</code> functor. If we use the definition of <code>Pair</code> above in terms of <code>Id</code> and <code>(×)</code>, then we&#8217;ll get scanning for free. For <em>using</em> <code>Pair</code>, I find the explicit data type definition above more convenient. We can then derive a <code>Scan</code> instance by conversion. Start with a standard specification:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">Pair</span> a <span class="fu">=</span> a <span class="fu">:#</span> a <span class="kw">deriving</span> <span class="kw">Functor</span></code></pre>

<p>And encode &amp; decode explicitly:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">EncodeF</span> <span class="dt">Pair</span> <span class="kw">where</span><br />  <span class="kw">type</span> <span class="dt">Enc</span> <span class="dt">Pair</span> <span class="fu">=</span> <span class="dt">Id</span> &#215; <span class="dt">Id</span><br />  encode (a <span class="fu">:#</span> b) <span class="fu">=</span> <span class="dt">Id</span> a &#215; <span class="dt">Id</span> b<br />  decode (<span class="dt">Id</span> a &#215; <span class="dt">Id</span> b) <span class="fu">=</span> a <span class="fu">:#</span> b</code></pre>

<p>Then use our boilerplate <code>Scan</code> instance for <code>EncodeF</code> instances:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">Scan</span> <span class="dt">Pair</span> <span class="kw">where</span><br />  prefixScan <span class="fu">=</span> prefixScanEnc<br />  suffixScan <span class="fu">=</span> suffixScanEnc</code></pre>

<p>We&#8217;ve seen the <code>Scan</code> instance for <code>(×)</code> above. The instance for <code>Id</code> is very simple:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">newtype</span> <span class="dt">Id</span> a <span class="fu">=</span> <span class="dt">Id</span> a<br /><br /><span class="kw">instance</span> <span class="dt">Scan</span> <span class="dt">Id</span> <span class="kw">where</span><br />  prefixScan (<span class="dt">Id</span> m) <span class="fu">=</span> (m, <span class="dt">Id</span> &#8709;)<br />  suffixScan        <span class="fu">=</span> prefixScan</code></pre>

<p>Given these definitions, we can calculate a more streamlined <code>Scan</code> instance for <code>Pair</code>:</p>

<pre class="sourceCode"><code class="sourceCode haskell">  prefixScan (a <span class="fu">:#</span> b)<br />&#8801;  <span class="co">{- specification -}</span><br />  prefixScanEnc (a <span class="fu">:#</span> b)<br />&#8801;  <span class="co">{- prefixScanEnc definition -}</span><br />  (second decode &#8728; prefixScan &#8728; encode) (a <span class="fu">:#</span> b)<br />&#8801;  <span class="co">{- (&#8728;) -}</span><br />  second decode (prefixScan (encode (a <span class="fu">:#</span> b)))<br />&#8801;  <span class="co">{- encode definition for Pair -}</span><br />  second decode (prefixScan (<span class="dt">Id</span> a &#215; <span class="dt">Id</span> b))<br />&#8801;  <span class="co">{- prefixScan definition for f &#215; g -}</span><br />  second decode<br />    (af &#8853; ag, fa' &#215; ((af &#8853;) <span class="fu">&lt;$&gt;</span> ga'))<br />     <span class="kw">where</span> (af,fa') <span class="fu">=</span> prefixScan (<span class="dt">Id</span> a)<br />           (ag,ga') <span class="fu">=</span> prefixScan (<span class="dt">Id</span> b)<br />&#8801;  <span class="co">{- Definition of second on functions -}</span><br />  (af &#8853; ag, decode (fa' &#215; ((af &#8853;) <span class="fu">&lt;$&gt;</span> ga')))<br />   <span class="kw">where</span> (af,fa') <span class="fu">=</span> prefixScan (<span class="dt">Id</span> a)<br />         (ag,ga') <span class="fu">=</span> prefixScan (<span class="dt">Id</span> b)<br />&#8801;  <span class="co">{- prefixScan definition for Id -}</span><br />  (af &#8853; ag, decode (fa' &#215; ((af &#8853;) <span class="fu">&lt;$&gt;</span> ga')))<br />   <span class="kw">where</span> (af,fa') <span class="fu">=</span> (a, <span class="dt">Id</span> &#8709;)<br />         (ag,ga') <span class="fu">=</span> (b, <span class="dt">Id</span> &#8709;)<br />&#8801;  <span class="co">{- substitution -}</span><br />  (a &#8853; b, decode (<span class="dt">Id</span> &#8709; &#215; ((a &#8853;) <span class="fu">&lt;$&gt;</span> <span class="dt">Id</span> &#8709;)))<br />&#8801;  <span class="co">{- fmap/(&lt;$&gt;) for Id -}</span><br />  (a &#8853; b, decode (<span class="dt">Id</span> &#8709; &#215; <span class="dt">Id</span> (a &#8853; &#8709;)))<br />&#8801;  <span class="co">{- Monoid law -}</span><br />  (a &#8853; b, decode (<span class="dt">Id</span> &#8709; &#215; <span class="dt">Id</span> a))<br />&#8801;  <span class="co">{- decode definition on Pair -}</span><br />  (a &#8853; b, (&#8709; <span class="fu">:#</span> a))</code></pre>

<p>Whew! And similarly for <code>suffixScan</code>.</p>

<p>Now let&#8217;s recall the <code>Scan</code> instance for <code>Pair</code> given in <a href="http://conal.net/blog/posts/composable-parallel-scanning/" title="blog post"><em>Composable parallel scanning</em></a>:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">Scan</span> <span class="dt">Pair</span> <span class="kw">where</span><br />  prefixScan (a <span class="fu">:#</span> b) <span class="fu">=</span> (a &#8853; b, (&#8709; <span class="fu">:#</span> a))<br />  suffixScan (a <span class="fu">:#</span> b) <span class="fu">=</span> (a &#8853; b, (b <span class="fu">:#</span> &#8709;))</code></pre>

<p>Hurray! The derivation led us to the same definition. A &quot;sufficiently smart&quot; compiler could do this derivation automatically.</p>

<p>With this warm-up derivation, let&#8217;s now turn to trees.</p>

<h3 id="scanning-trees">Scanning trees</h3>

<p>Given the tree encodings above, how does scan work? We&#8217;ll have to consult <code>Scan</code> instances for some of the functor combinators. The product instance is repeated above. We&#8217;ll also want the instances for sum and composition. Omitting the <code>suffixScan</code> definitions for brevity:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> (f <span class="fu">+</span> g) a <span class="fu">=</span> <span class="dt">InL</span> (f a) <span class="fu">|</span> <span class="dt">InR</span> (g a)<br /><br /><span class="kw">instance</span> (<span class="dt">Scan</span> f, <span class="dt">Scan</span> g) <span class="ot">&#8658;</span> <span class="dt">Scan</span> (f <span class="fu">+</span> g) <span class="kw">where</span><br />  prefixScan (<span class="dt">InL</span> fa) <span class="fu">=</span> second <span class="dt">InL</span> (prefixScan fa)<br />  prefixScan (<span class="dt">InR</span> ga) <span class="fu">=</span> second <span class="dt">InR</span> (prefixScan ga)<br /><br /><span class="kw">newtype</span> (g &#8728; f) a <span class="fu">=</span> <span class="dt">O</span> (g (f a))<br /><br /><span class="kw">instance</span> (<span class="dt">Scan</span> g, <span class="dt">Scan</span> f, <span class="kw">Functor</span> f, <span class="dt">Applicative</span> g) <span class="ot">&#8658;</span> <span class="dt">Scan</span> (g &#8728; f) <span class="kw">where</span><br />  prefixScan <span class="fu">=</span> second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>)<br />             &#8728; assocR<br />             &#8728; first prefixScan<br />             &#8728; <span class="fu">unzip</span><br />             &#8728; <span class="fu">fmap</span> prefixScan<br />             &#8728; unO</code></pre>

<p>This last definition uses a few utility functions:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">zip</span> <span class="ot">&#8759;</span> <span class="dt">Applicative</span> g <span class="ot">&#8658;</span> (g a, g b) <span class="ot">&#8594;</span> g (a,b)<br /><span class="fu">zip</span> <span class="fu">=</span> <span class="fu">uncurry</span> (liftA2 (,))<br /><br /><span class="fu">unzip</span> <span class="ot">&#8759;</span> <span class="kw">Functor</span> g <span class="ot">&#8658;</span> g (a,b) <span class="ot">&#8594;</span> (g a, g b)<br /><span class="fu">unzip</span> <span class="fu">=</span> <span class="fu">fmap</span> <span class="fu">fst</span> <span class="fu">&amp;&amp;&amp;</span> <span class="fu">fmap</span> <span class="fu">snd</span><br /><br />assocR <span class="ot">&#8759;</span> ((a,b),c) <span class="ot">&#8594;</span> (a,(b,c))<br />assocR   ((a,b),c) <span class="fu">=</span>  (a,(b,c))<br /><br />adjustL <span class="ot">&#8759;</span> (<span class="kw">Functor</span> f, <span class="dt">Monoid</span> m) <span class="ot">&#8658;</span> (m, f m) <span class="ot">&#8594;</span> f m<br />adjustL (m, ms) <span class="fu">=</span> (m &#8853;) <span class="fu">&lt;$&gt;</span> ms</code></pre>

<p>Let&#8217;s consider how the <code>Scan (g ∘ f)</code> instance plays out for top-down vs bottom-up trees, given the functor-composition encodings above. The critical definitions:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">type</span> <span class="dt">Enc</span> <span class="dt">T2</span> <span class="fu">=</span> <span class="dt">Id</span> <span class="fu">+</span> <span class="dt">Pair</span> &#8728; <span class="dt">T2</span><br /><br /><span class="kw">type</span> <span class="dt">Enc</span> <span class="dt">T4</span> <span class="fu">=</span> <span class="dt">Id</span> <span class="fu">+</span> <span class="dt">T4</span> &#8728; <span class="dt">Pair</span></code></pre>

<p>Focusing on the branch case, we have <code>Pair ∘ T2</code> vs <code>T4 ∘ Pair</code>, so we&#8217;ll use the <code>Scan (g ∘ f)</code> instance either way. Let&#8217;s consider the work implied by that instance. There are two calls to <code>prefixScan</code>, plus a linear amount of other work. The meanings of those two calls differ, however:</p>

<ul>
<li>For top-down trees (<code>T2</code>), the recursive tree scans are in <code>fmap prefixScan</code>, mapping over the pair of trees. The <code>first prefixScan</code> is a pair scan and so does constant work. Since there are two recursive calls, each working on a tree of half size (assuming balance), plus linear other work, the total work <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>&#920;</mo><mo stretchy="false">(</mo><mi>n</mi><mspace width="0.167em"></mspace><mi>log</mi><mspace width="0.167em"></mspace><mi>n</mi><mo stretchy="false">)</mo></mrow></math>, as explained above.</li>
<li>For bottom-up trees (<code>T4</code>), there is only one recursive recursive tree scan, which appears in <code>first prefixScan</code>. The <code>prefixScan</code> in <code>fmap prefixScan</code> is pair scan and so does constant work but is mapped over the half-sized tree (of pairs), and so does linear work altogether. Since there only one recursive tree scan, at half size, plus linear other work, the total work is then proportional to <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>n</mi><mo>+</mo><mi>n</mi><mo>/</mo><mn>2</mn><mo>+</mo><mi>n</mi><mo>/</mo><mn>4</mn><mo>+</mo><mo>&#8230;</mo><mo>&#8776;</mo><mn>2</mn><mspace width="0.167em"></mspace><mi>n</mi><mo>=</mo><mo>&#920;</mo><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></mrow></math>. So we have a work-efficient algorithm!</li>
</ul>

<h3 id="looking-deeper">Looking deeper</h3>

<p>In addition to the simple analysis above of scanning over top-down and over bottom-up, let&#8217;s look in detail at what transpires and how each case can be optimized. This section may well have more detail than you&#8217;re interested in. If so, feel free to skip ahead.</p>

<h4 id="top-down">Top-down</h4>

<p>Beginning as with <code>Pair</code>,</p>

<pre class="sourceCode"><code class="sourceCode haskell">  prefixScan t<br />&#8801;  <span class="co">{- specification -}</span><br />  prefixScanEnc t<br />&#8801;  <span class="co">{- prefixScanEnc definition -}</span><br />  (second decode &#8728; prefixScan &#8728; encode) t<br />&#8801;  <span class="co">{- (&#8728;) -}</span><br />  second decode (prefixScan (encode t))</code></pre>

<p>Take <code>T2</code>, with <code>T3</code> being quite similar. Now split into two cases for the two constructors of <code>T2</code>. First leaf:</p>

<pre class="sourceCode"><code class="sourceCode haskell">  prefixScan (<span class="dt">L2</span> m)<br />&#8801;  <span class="co">{- as above -}</span><br />  second decode (prefixScan (encode (<span class="dt">L2</span> m)))<br />&#8801;  <span class="co">{- encode for L2 -}</span><br />  second decode (prefixScan (<span class="dt">InL</span> (<span class="dt">Id</span> m)))<br />&#8801;  <span class="co">{- prefixScan for functor sum -}</span><br />  second decode (second <span class="dt">InL</span> (prefixScan (<span class="dt">Id</span> m)))<br />&#8801;  <span class="co">{- prefixScan for Id -}</span><br />  second decode (second <span class="dt">InL</span> (m, <span class="dt">Id</span> &#8709;))<br />&#8801;  <span class="co">{- second for functions -}</span><br />  second decode (m, <span class="dt">InL</span> (<span class="dt">Id</span> &#8709;))<br />&#8801;  <span class="co">{- second for functions -}</span><br />  (m, decode (<span class="dt">InL</span> (<span class="dt">Id</span> &#8709;)))<br />&#8801;  <span class="co">{- decode for L2 -}</span><br />  (m, <span class="dt">L2</span> &#8709;)</code></pre>

<p>Then branch:</p>

<pre class="sourceCode"><code class="sourceCode haskell">  prefixScan (<span class="dt">B2</span> (s <span class="fu">:#</span> t))<br />&#8801;  <span class="co">{- as above -}</span><br />  second decode (prefixScan (encode (<span class="dt">B2</span> (s <span class="fu">:#</span> t))))<br />&#8801;  <span class="co">{- encode for L2 -}</span><br />  second decode (prefixScan (<span class="dt">InR</span> (<span class="dt">O</span> (s <span class="fu">:#</span> t))))<br />&#8801;  <span class="co">{- prefixScan for (+) -}</span><br />  second decode (second <span class="dt">InR</span> (prefixScan (<span class="dt">O</span> (s <span class="fu">:#</span> t))))<br />&#8801;  <span class="co">{- property of second -}</span><br />  second (decode &#8728; <span class="dt">InR</span>) (prefixScan (<span class="dt">O</span> (s <span class="fu">:#</span> t)))</code></pre>

<p>Focus on the <code>prefixScan</code> application:</p>

<pre class="sourceCode"><code class="sourceCode haskell">  prefixScan (<span class="dt">O</span> (s <span class="fu">:#</span> t)) <span class="fu">=</span><br />&#8801;  <span class="co">{- prefixScan for (&#8728;) -}</span><br /> ( second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR &#8728; first prefixScan<br /> &#8728; <span class="fu">unzip</span> &#8728; <span class="fu">fmap</span> prefixScan &#8728; unO ) (<span class="dt">O</span> (s <span class="fu">:#</span> t))<br />&#8801;  <span class="co">{- unO/O -}</span><br />  ( second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR &#8728; first prefixScan<br />  &#8728; <span class="fu">unzip</span> &#8728; <span class="fu">fmap</span> prefixScan ) (s <span class="fu">:#</span> t)<br />&#8801;  <span class="co">{- fmap on Pair -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR &#8728; first prefixScan &#8728; <span class="fu">unzip</span>)<br />    (prefixScan s <span class="fu">:#</span> prefixScan t)<br />&#8801;  <span class="co">{- expand prefixScan -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR &#8728; first prefixScan &#8728; <span class="fu">unzip</span>)<br />    ((ms,s') <span class="fu">:#</span> (mt,t'))<br />      <span class="kw">where</span> (ms,s') <span class="fu">=</span> prefixScan s<br />            (mt,t') <span class="fu">=</span> prefixScan t<br />&#8801;  <span class="co">{- unzip -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR &#8728; first prefixScan)<br />    ((ms <span class="fu">:#</span> mt), (s' <span class="fu">:#</span> t')) <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- first -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR)<br />    (prefixScan (ms <span class="fu">:#</span> mt), (s' <span class="fu">:#</span> t')) <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- prefixScan for Pair -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR)<br />    ((ms &#8853; mt, (&#8709; <span class="fu">:#</span> ms)), (s' <span class="fu">:#</span> t')) <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- assocR -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>))<br />    (ms &#8853; mt, ((&#8709; <span class="fu">:#</span> ms), (s' <span class="fu">:#</span> t'))) <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- second -}</span><br />  ( ms &#8853; mt<br />  , (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) ((&#8709; <span class="fu">:#</span> ms), (s' <span class="fu">:#</span> t')) ) <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- zip -}</span><br />  ( ms &#8853; mt<br />  , (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL) ((&#8709;,s') <span class="fu">:#</span> (ms,t')) )  <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- fmap for Pair -}</span><br />  ( ms &#8853; mt<br />  , <span class="dt">O</span> (adjustL (&#8709;,s') <span class="fu">:#</span> adjustL (ms,t')) )  <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- adjustL -}</span><br />  ( ms &#8853; mt<br />  , <span class="dt">O</span> (((&#8709; &#8853;) <span class="fu">&lt;$&gt;</span> s') <span class="fu">:#</span> ((ms &#8853;) <span class="fu">&lt;$&gt;</span> t')) )  <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- Monoid law (left identity) -}</span><br />  ( ms &#8853; mt<br />  , <span class="dt">O</span> ((<span class="fu">id</span> <span class="fu">&lt;$&gt;</span> s') <span class="fu">:#</span> ((ms &#8853;) <span class="fu">&lt;$&gt;</span> t')) )  <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- Functor law (fmap id) -}</span><br />  ( ms &#8853; mt<br />  , <span class="dt">O</span> (s' <span class="fu">:#</span> ((ms &#8853;) <span class="fu">&lt;$&gt;</span> t')) )<br />      <span class="kw">where</span> (ms,s') <span class="fu">=</span> prefixScan s<br />            (mt,t') <span class="fu">=</span> prefixScan t</code></pre>

<p>Continuing from above,</p>

<pre class="sourceCode"><code class="sourceCode haskell">  prefixScan (<span class="dt">B2</span> (s <span class="fu">:#</span> t))<br />&#8801;  <span class="co">{- see above -}</span><br />  second (decode &#8728; <span class="dt">InR</span>) (prefixScan (<span class="dt">O</span> (s <span class="fu">:#</span> t)))<br />&#8801;  <span class="co">{- prefixScan focus from above -}</span><br />  second (decode &#8728; <span class="dt">InR</span>)<br />    ( ms &#8853; mt<br />    , <span class="dt">O</span> (s' <span class="fu">:#</span> ((ms &#8853;) <span class="fu">&lt;$&gt;</span> t')) )<br />        <span class="kw">where</span> (ms,s') <span class="fu">=</span> prefixScan s<br />              (mt,t') <span class="fu">=</span> prefixScan t<br />&#8801;  <span class="co">{- definition of second on functions -}</span><br />    (ms &#8853; mt, (decode &#8728; <span class="dt">InR</span>) (<span class="dt">O</span> (s' <span class="fu">:#</span> ((ms &#8853;) <span class="fu">&lt;$&gt;</span> t')))) <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- (&#8728;) -}</span><br />    (ms &#8853; mt, decode (<span class="dt">InR</span> (<span class="dt">O</span> (s' <span class="fu">:#</span> ((ms &#8853;) <span class="fu">&lt;$&gt;</span> t'))))) <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- decode for B2 -}</span><br />    (ms &#8853; mt, <span class="dt">B2</span> (s' <span class="fu">:#</span> ((ms &#8853;) <span class="fu">&lt;$&gt;</span> t'))) <span class="kw">where</span> &#8943;</code></pre>

<p>This final form is as in <a href="http://conal.net/blog/posts/deriving-parallel-tree-scans/" title="blog post"><em>Deriving parallel tree scans</em></a>, changed for the new scan interface. The derivation saved some work in wrapping &amp; unwrapping and method invocation, plus one of the two adjustment passes over the sub-trees. As explained above, this algorithm performs <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>&#920;</mo><mo stretchy="false">(</mo><mi>n</mi><mspace width="0.167em"></mspace><mi>log</mi><mspace width="0.167em"></mspace><mi>n</mi><mo stretchy="false">)</mo></mrow></math> work.</p>

<p>I&#8217;ll leave <code>suffixScan</code> for you to do yourself.</p>

<h4 id="bottom-up">Bottom-up</h4>

<p>What happens if we switch from top-down to bottom-up binary trees? I&#8217;ll use <code>T4</code> (though <code>T5</code> would work as well):</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">T4</span> a <span class="fu">=</span> <span class="dt">L4</span> a <span class="fu">|</span> <span class="dt">B4</span> (<span class="dt">T4</span> (<span class="dt">Pair</span> a))</code></pre>

<p>The leaf case is just as with <code>T2</code> above, so let&#8217;s get right to branches.</p>

<pre class="sourceCode"><code class="sourceCode haskell">  prefixScan (<span class="dt">B4</span> t)<br />&#8801;  <span class="co">{- as above -}</span><br />  second decode (prefixScan (encode (<span class="dt">B4</span> t)))<br />&#8801;  <span class="co">{- encode for L2 -}</span><br />  second decode (prefixScan (<span class="dt">InR</span> (<span class="dt">O</span> t)))<br />&#8801;  <span class="co">{- prefixScan for (+) -}</span><br />  second decode (second <span class="dt">InR</span> (prefixScan (<span class="dt">O</span> t)))<br />&#8801;  <span class="co">{- property of second -}</span><br />  second (decode &#8728; <span class="dt">InR</span>) (prefixScan (<span class="dt">O</span> t))</code></pre>

<p>As before, now focus on the <code>prefixScan</code> call.</p>

<pre class="sourceCode"><code class="sourceCode haskell">  prefixScan (<span class="dt">O</span> t) <span class="fu">=</span><br />&#8801;  <span class="co">{- prefixScan for (&#8728;) -}</span><br /> ( second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR &#8728; first prefixScan<br /> &#8728; <span class="fu">unzip</span> &#8728; <span class="fu">fmap</span> prefixScan &#8728; unO ) (<span class="dt">O</span> t)<br />&#8801;  <span class="co">{- unO/O -}</span><br />  ( second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR &#8728; first prefixScan<br />  &#8728; <span class="fu">unzip</span> &#8728; <span class="fu">fmap</span> prefixScan ) t<br />&#8801;  <span class="co">{- prefixScan on Pair (derived above) -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR &#8728; first prefixScan &#8728; <span class="fu">unzip</span>)<br />    <span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (a &#8853; b, (&#8709; <span class="fu">:#</span> a))) t<br />&#8801;  <span class="co">{- unzip/fmap -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR &#8728; first prefixScan)<br />    ( <span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (a &#8853; b)) t<br />    , <span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (&#8709; <span class="fu">:#</span> a))   t )<br />&#8801;  <span class="co">{- first on functions -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR)<br />    ( prefixScan (<span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (a &#8853; b)) t)<br />    , <span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (&#8709; <span class="fu">:#</span> a))   t )<br />&#8801;  <span class="co">{- expand prefixScan -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) &#8728; assocR)<br />    ((mp,p'), <span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (&#8709; <span class="fu">:#</span> a)) t)<br />   <span class="kw">where</span> (mp,p') <span class="fu">=</span> prefixScan (<span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (a &#8853; b)) t)<br />&#8801;  <span class="co">{- assocR -}</span><br />  (second (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>))<br />    (mp, (p', <span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (&#8709; <span class="fu">:#</span> a)) t))<br />   <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- second on functions -}</span><br />  (mp, (<span class="dt">O</span> &#8728; <span class="fu">fmap</span> adjustL &#8728; <span class="fu">zip</span>) (p', <span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (&#8709; <span class="fu">:#</span> a)) t))<br />    <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- fmap/zip/fmap -}</span><br />  (mp, <span class="dt">O</span> (liftA2 tweak p' t))<br />    <span class="kw">where</span> tweak s (a <span class="fu">:#</span> _) <span class="fu">=</span> adjustL (s, (&#8709; <span class="fu">:#</span> a))<br />          (mp,p') <span class="fu">=</span> prefixScan (<span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (a &#8853; b)) t)<br />&#8801;  <span class="co">{- adjustL, then simplify -}</span><br />  (mp, <span class="dt">O</span> (liftA2 tweak p' t))<br />    <span class="kw">where</span> tweak s (a <span class="fu">:#</span> _) <span class="fu">=</span> (s <span class="fu">:#</span> s &#8853; a)<br />          (mp,p') <span class="fu">=</span> prefixScan (<span class="fu">fmap</span> (&#955; (a <span class="fu">:#</span> b) <span class="ot">&#8594;</span> (a &#8853; b)) t)</code></pre>

<p>Now re-introduce the context of <code>prefixScan (O t)</code>:</p>

<pre class="sourceCode"><code class="sourceCode haskell">  prefixScan (<span class="dt">B4</span> t)<br />&#8801;  <span class="co">{- see above -}</span><br />  second (decode &#8728; <span class="dt">InR</span>) (prefixScan (<span class="dt">O</span> t))<br />&#8801;  <span class="co">{- see above -}</span><br />  second (decode &#8728; <span class="dt">InR</span>)<br />    (mp, <span class="dt">O</span> (liftA2 tweak p' t))<br />      <span class="kw">where</span> &#8943;<br />&#8801;  <span class="co">{- decode for T4 -}</span><br />  (mp, <span class="dt">B4</span> (liftA2 tweak p' t))<br />    <span class="kw">where</span> p <span class="fu">=</span> <span class="fu">fmap</span> (&#955; (e <span class="fu">:#</span> o) <span class="ot">&#8594;</span> (e &#8853; o)) t<br />          (mp,p') <span class="fu">=</span> prefixScan p<br />          tweak s (e <span class="fu">:#</span> _) <span class="fu">=</span> (s <span class="fu">:#</span> s &#8853; e)</code></pre>

<p>Notice how much this bottom-up tree scan algorithm differs from the top-down algorithm derived above. In particular, there&#8217;s only one recursive tree scan (on a half-sized tree) instead of two, plus linear additional work, for a total of <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mo>&#920;</mo><mo stretchy="false">(</mo><mi>n</mi><mo stretchy="false">)</mo></mrow></math> work.</p>

<h3 id="guy-blellochs-parallel-scan-algorithm">Guy Blelloch&#8217;s parallel scan algorithm</h3>

<p>In <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.53.5739" title="Paper by Guy Blelloch"><em>Programming parallel algorithms</em></a>, Guy Blelloch gives the following algorithm for parallel prefix scan, expressed in the parallel functional language NESL:</p>

<pre class="sourceCode"><code class="sourceCode haskell">function scan(a) <span class="fu">=</span><br /><span class="kw">if</span> <span class="fu">#</span>a &#8801; <span class="dv">1</span> <span class="kw">then</span> [<span class="dv">0</span>]<br /><span class="kw">else</span><br />  <span class="kw">let</span> es <span class="fu">=</span> even_elts(a);<br />      os <span class="fu">=</span> odd_elts(a);<br />      ss <span class="fu">=</span> scan({e<span class="fu">+</span>o<span class="fu">:</span> e <span class="kw">in</span> es; o <span class="kw">in</span> os})<br />  <span class="kw">in</span> interleave(ss,{s<span class="fu">+</span>e<span class="fu">:</span> s <span class="kw">in</span> ss; e <span class="kw">in</span> es})</code></pre>

<p>This algorithm is nearly identical to the <code>T4</code> scan algorithm above. I was very glad to find this route to Guy&#8217;s algorithm, which had been fairly mysterious to me. I mean, I could believe that the algorithm worked, but I had no idea how I might have discovered it myself. With the functor composition approach to scanning, I now see how Guy&#8217;s algorithm emerges as well as how it generalizes to other data structures.</p>

<h3 id="nested-data-types-and-parallelism">Nested data types and parallelism</h3>

<p>Most of the recursive algebraic data types that appear in Haskell programs are <em>regular</em>, meaning that the recursive instances are instantiated with the same type parameter as the containing type. For instance, a top-down tree of elements of type <code>a</code> is either a leaf or has two trees whose elements have that same type <code>a</code>. In contrast, in a bottom-up tree, the (single) recursively contained tree is over elements of type <code>(a,a)</code>. Such non-regular data types are called &quot;nested&quot;. The two tree scan algorithms above suggest to me that nested data types are particularly useful for efficient parallel algorithms.</p>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=429&amp;md5=a05805e935f7c2c3d368a59c3a7c2adb"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/parallel-tree-scanning-by-composition/feed</wfw:commentRss>
		<slash:comments>7</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fparallel-tree-scanning-by-composition&amp;language=en_GB&amp;category=text&amp;title=Parallel+tree+scanning+by+composition&amp;description=My+last+few+blog+posts+have+been+on+the+theme+of+scans%2C+and+particularly+on+parallel+scans.+In+Composable+parallel+scanning%2C+I+tackled+parallel+scanning+in+a+very+general+setting....&amp;tags=functor%2Cprogram+derivation%2Cscan%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Deriving parallel tree scans</title>
		<link>http://conal.net/blog/posts/deriving-parallel-tree-scans</link>
		<comments>http://conal.net/blog/posts/deriving-parallel-tree-scans#comments</comments>
		<pubDate>Tue, 01 Mar 2011 20:41:09 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[program derivation]]></category>
		<category><![CDATA[scan]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=330</guid>
		<description><![CDATA[The post Deriving list scans explored folds and scans on lists and showed how the usual, efficient scan implementations can be derived from simpler specifications. Let&#8217;s see now how to apply the same techniques to scans over trees. This new post is one of a series leading toward algorithms optimized for execution on massively parallel, [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- teaser -->

<p>The post <a href="http://conal.net/blog/posts/deriving-list-scans/" title="blog post"><em>Deriving list scans</em></a> explored folds and scans on lists and showed how the usual, efficient scan implementations can be derived from simpler specifications.</p>

<p>Let&#8217;s see now how to apply the same techniques to scans over trees.</p>

<p>This new post is one of a series leading toward algorithms optimized for execution on massively parallel, consumer hardware, using CUDA or OpenCL.</p>

<p><strong>Edits:</strong></p>

<ul>
<li>2011-03-01: Added clarification about &quot;<code>∅</code>&quot; and &quot;<code>(⊕)</code>&quot;.</li>
<li>2011-03-23: corrected &quot;linear-time&quot; to &quot;linear-work&quot; in two places.</li>
</ul>

<p><span id="more-330"></span></p>

<h3 id="trees">Trees</h3>

<p>Our trees will be non-empty and binary:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">data</span> <span class="dt">T</span> a <span class="fu">=</span> <span class="dt">Leaf</span> a <span class="fu">|</span> <span class="dt">Branch</span> (<span class="dt">T</span> a) (<span class="dt">T</span> a)<br /><br /><span class="kw">instance</span> <span class="kw">Show</span> a <span class="ot">&#8658;</span> <span class="kw">Show</span> (<span class="dt">T</span> a) <span class="kw">where</span><br />  <span class="fu">show</span> (<span class="dt">Leaf</span> a)     <span class="fu">=</span> <span class="fu">show</span> a<br />  <span class="fu">show</span> (<span class="dt">Branch</span> s t) <span class="fu">=</span> <span class="st">&quot;(&quot;</span><span class="fu">++</span><span class="fu">show</span> s<span class="fu">++</span><span class="st">&quot;,&quot;</span><span class="fu">++</span><span class="fu">show</span> t<span class="fu">++</span><span class="st">&quot;)&quot;</span></code></pre>

<p>Nothing surprising in the instances:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="kw">Functor</span> <span class="dt">T</span> <span class="kw">where</span><br />  <span class="fu">fmap</span> f (<span class="dt">Leaf</span> a)     <span class="fu">=</span> <span class="dt">Leaf</span> (f a)<br />  <span class="fu">fmap</span> f (<span class="dt">Branch</span> s t) <span class="fu">=</span> <span class="dt">Branch</span> (<span class="fu">fmap</span> f s) (<span class="fu">fmap</span> f t)<br /><br /><span class="kw">instance</span> <span class="dt">Foldable</span> <span class="dt">T</span> <span class="kw">where</span><br />  fold (<span class="dt">Leaf</span> a)     <span class="fu">=</span> a<br />  fold (<span class="dt">Branch</span> s t) <span class="fu">=</span> fold s &#8853; fold t<br /><br /><span class="kw">instance</span> <span class="dt">Traversable</span> <span class="dt">T</span> <span class="kw">where</span><br />  sequenceA (<span class="dt">Leaf</span> a)     <span class="fu">=</span> <span class="fu">fmap</span> <span class="dt">Leaf</span> a<br />  sequenceA (<span class="dt">Branch</span> s t) <span class="fu">=</span><br />    liftA2 <span class="dt">Branch</span> (sequenceA s) (sequenceA t)</code></pre>

<p>BTW, <a href="https://github.com/conal/fix-symbols-gitit/">my type-setting software</a> uses &quot;<code>∅</code>&quot; and &quot;<code>(⊕)</code>&quot; for Haskell&#8217;s &quot;mempty&quot; and &quot;mappend&quot;.</p>

<p>Also handy will be extracting the first and last (i.e., leftmost and rightmost) leaves in a tree:</p>

<pre class="sourceCode"><code class="sourceCode haskell">headT <span class="ot">&#8759;</span> <span class="dt">T</span> a <span class="ot">&#8594;</span> a<br />headT (<span class="dt">Leaf</span> a)       <span class="fu">=</span> a<br />headT (s <span class="ot">`Branch`</span> _) <span class="fu">=</span> headT s<br /><br />lastT <span class="ot">&#8759;</span> <span class="dt">T</span> a <span class="ot">&#8594;</span> a<br />lastT (<span class="dt">Leaf</span> a)       <span class="fu">=</span> a<br />lastT (_ <span class="ot">`Branch`</span> t) <span class="fu">=</span> lastT t</code></pre>

<div class=exercise>
<p><em>Exercise:</em> Prove that</p>
<pre class="sourceCode"><code class="sourceCode haskell">headT &#8728; <span class="fu">fmap</span> f &#8801; f &#8728; headT<br />lastT &#8728; <span class="fu">fmap</span> f &#8801; f &#8728; lastT</code></pre>
<p>Answer:</p>

<div class=toggle>

<p>Consider the <code>Leaf</code> and <code>Branch</code> cases separately:</p>
<pre class="sourceCode"><code class="sourceCode haskell">  headT (<span class="fu">fmap</span> f (<span class="dt">Leaf</span> a))<br />&#8801;  <span class="co">{- fmap on T -}</span><br />  headT (<span class="dt">Leaf</span> (f a))<br />&#8801;  <span class="co">{- headT def -}</span><br />  f a<br />&#8801;  <span class="co">{- headT def -}</span><br />  f (headT (<span class="dt">Leaf</span> a))</code></pre>
<pre class="sourceCode"><code class="sourceCode haskell">  headT (<span class="fu">fmap</span> f (<span class="dt">Branch</span> s t))<br />&#8801;  <span class="co">{- fmap on T -}</span><br />  headT (<span class="dt">Branch</span> (<span class="fu">fmap</span> f s) (<span class="fu">fmap</span> f t))<br />&#8801;  <span class="co">{- headT def -}</span><br />  headT (<span class="fu">fmap</span> f s)<br />&#8801;  <span class="co">{- induction -}</span><br />  f (headT s)<br />&#8801;  <span class="co">{- headT def -}</span><br />  f (headT (<span class="dt">Branch</span> s t))</code></pre>
<p>Similarly for <code>lastT</code>.</p>

</div>
 </div>

<h3 id="from-lists-to-trees-and-back">From lists to trees and back</h3>

<p>We can flatten trees into lists:</p>

<pre class="sourceCode"><code class="sourceCode haskell">flatten <span class="ot">&#8759;</span> <span class="dt">T</span> a <span class="ot">&#8594;</span> [a]<br />flatten <span class="fu">=</span> fold &#8728; <span class="fu">fmap</span> (<span class="fu">:</span>[])</code></pre>

<p>Equivalently, using <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Foldable.html#v:foldMap"><code>foldMap</code></a>:</p>

<pre class="sourceCode"><code class="sourceCode haskell">flatten <span class="fu">=</span> foldMap (<span class="fu">:</span>[])</code></pre>

<p>Alternatively, we could define <code>fold</code> via <code>flatten</code>:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">instance</span> <span class="dt">Foldable</span> <span class="dt">T</span> <span class="kw">where</span> fold <span class="fu">=</span> fold &#8728; flatten</code></pre>

<pre class="sourceCode"><code class="sourceCode haskell">flatten <span class="ot">&#8759;</span> <span class="dt">T</span> a <span class="ot">&#8594;</span> [a]<br />flatten (<span class="dt">Leaf</span> a)     <span class="fu">=</span> [a]<br />flatten (<span class="dt">Branch</span> s t) <span class="fu">=</span> flatten s <span class="fu">++</span> flatten t</code></pre>

<p>We can also &quot;unflatten&quot; lists into balanced trees:</p>

<pre class="sourceCode"><code class="sourceCode haskell">unflatten <span class="ot">&#8759;</span> [a] <span class="ot">&#8594;</span> <span class="dt">T</span> a<br />unflatten []  <span class="fu">=</span> <span class="fu">error</span> <span class="st">&quot;unflatten: Oops! Empty list&quot;</span><br />unflatten [a] <span class="fu">=</span> <span class="dt">Leaf</span> a<br />unflatten xs  <span class="fu">=</span> <span class="dt">Branch</span> (unflatten prefix) (unflatten suffix)<br /> <span class="kw">where</span><br />   (prefix,suffix) <span class="fu">=</span> <span class="fu">splitAt</span> (<span class="fu">length</span> xs <span class="ot">`div`</span> <span class="dv">2</span>) xs</code></pre>

<p>Both <code>flatten</code> and <code>unflatten</code> can be implemented more efficiently.</p>

<p>For instance,</p>

<pre class="sourceCode"><code class="sourceCode haskell">t1,t2 <span class="ot">&#8759;</span> <span class="dt">T</span> <span class="dt">Int</span><br />t1 <span class="fu">=</span> unflatten [<span class="dv">1</span><span class="fu">&#8229;</span><span class="dv">3</span>]<br />t2 <span class="fu">=</span> unflatten [<span class="dv">1</span><span class="fu">&#8229;</span><span class="dv">16</span>]</code></pre>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> t1<br />(<span class="dv">1</span>,(<span class="dv">2</span>,<span class="dv">3</span>))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> t2<br />((((<span class="dv">1</span>,<span class="dv">2</span>),(<span class="dv">3</span>,<span class="dv">4</span>)),((<span class="dv">5</span>,<span class="dv">6</span>),(<span class="dv">7</span>,<span class="dv">8</span>))),(((<span class="dv">9</span>,<span class="dv">10</span>),(<span class="dv">11</span>,<span class="dv">12</span>)),((<span class="dv">13</span>,<span class="dv">14</span>),(<span class="dv">15</span>,<span class="dv">16</span>))))</code></pre>

<h3 id="specifying-tree-scans">Specifying tree scans</h3>

<h4 id="prefixes-and-suffixes">Prefixes and suffixes</h4>

<p>The post <a href="http://conal.net/blog/posts/deriving-list-scans/" title="blog post"><em>Deriving list scans</em></a> gave specifications for list scanning in terms of <code>inits</code> and <code>tails</code>. One consequence of this specification is that the output of scanning has one more element than the input. Alternatively, we could use non-empty variants of <code>inits</code> and <code>tails</code>, so that the input &amp; output are in one-to-one correspondence.</p>

<pre class="sourceCode"><code class="sourceCode haskell">inits' <span class="ot">&#8759;</span> [a] <span class="ot">&#8594;</span> [[a]]<br />inits' []     <span class="fu">=</span> []<br />inits' (x<span class="fu">:</span>xs) <span class="fu">=</span> <span class="fu">map</span> (x<span class="fu">:</span>) ([] <span class="fu">:</span> inits' xs)</code></pre>

<p>The cons case can also be written as</p>

<pre class="sourceCode"><code class="sourceCode haskell">inits' (x<span class="fu">:</span>xs) <span class="fu">=</span> [x] <span class="fu">:</span> <span class="fu">map</span> (x<span class="fu">:</span>) (inits' xs)</code></pre>

<pre class="sourceCode"><code class="sourceCode haskell">tails' <span class="ot">&#8759;</span> [a] <span class="ot">&#8594;</span> [[a]]<br />tails' []         <span class="fu">=</span> []<br />tails' xs<span class="fu">@</span>(_<span class="fu">:</span>xs') <span class="fu">=</span> xs <span class="fu">:</span> tails' xs'</code></pre>

<p>For instance,</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> inits' <span class="st">&quot;abcd&quot;</span><br />[<span class="st">&quot;a&quot;</span>,<span class="st">&quot;ab&quot;</span>,<span class="st">&quot;abc&quot;</span>,<span class="st">&quot;abcd&quot;</span>]<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> tails' <span class="st">&quot;abcd&quot;</span><br />[<span class="st">&quot;abcd&quot;</span>,<span class="st">&quot;bcd&quot;</span>,<span class="st">&quot;cd&quot;</span>,<span class="st">&quot;d&quot;</span>]</code></pre>

<p>Our tree functor has a symmetric definition, so we get more symmetry in the counterparts to <code>inits'</code> and <code>tails'</code>:</p>

<pre class="sourceCode"><code class="sourceCode haskell">initTs <span class="ot">&#8759;</span> <span class="dt">T</span> a <span class="ot">&#8594;</span> <span class="dt">T</span> (<span class="dt">T</span> a)<br />initTs (<span class="dt">Leaf</span> a)       <span class="fu">=</span> <span class="dt">Leaf</span> (<span class="dt">Leaf</span> a)<br />initTs (s <span class="ot">`Branch`</span> t) <span class="fu">=</span><br />  <span class="dt">Branch</span> (initTs s) (<span class="fu">fmap</span> (s <span class="ot">`Branch`</span>) (initTs t))<br /><br />tailTs <span class="ot">&#8759;</span> <span class="dt">T</span> a <span class="ot">&#8594;</span> <span class="dt">T</span> (<span class="dt">T</span> a)<br />tailTs (<span class="dt">Leaf</span> a)       <span class="fu">=</span> <span class="dt">Leaf</span> (<span class="dt">Leaf</span> a)<br />tailTs (s <span class="ot">`Branch`</span> t) <span class="fu">=</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> (<span class="ot">`Branch`</span> t) (tailTs s)) (tailTs t)</code></pre>

<p>Try it:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> t1<br />(<span class="dv">1</span>,(<span class="dv">2</span>,<span class="dv">3</span>))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> initTs t1<br />(<span class="dv">1</span>,((<span class="dv">1</span>,<span class="dv">2</span>),(<span class="dv">1</span>,(<span class="dv">2</span>,<span class="dv">3</span>))))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> tailTs t1<br />((<span class="dv">1</span>,(<span class="dv">2</span>,<span class="dv">3</span>)),((<span class="dv">2</span>,<span class="dv">3</span>),<span class="dv">3</span>))<br /><br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> unflatten [<span class="dv">1</span><span class="fu">&#8229;</span><span class="dv">5</span>]<br />((<span class="dv">1</span>,<span class="dv">2</span>),(<span class="dv">3</span>,(<span class="dv">4</span>,<span class="dv">5</span>)))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> initTs (unflatten [<span class="dv">1</span><span class="fu">&#8229;</span><span class="dv">5</span>])<br />((<span class="dv">1</span>,(<span class="dv">1</span>,<span class="dv">2</span>)),(((<span class="dv">1</span>,<span class="dv">2</span>),<span class="dv">3</span>),(((<span class="dv">1</span>,<span class="dv">2</span>),(<span class="dv">3</span>,<span class="dv">4</span>)),((<span class="dv">1</span>,<span class="dv">2</span>),(<span class="dv">3</span>,(<span class="dv">4</span>,<span class="dv">5</span>))))))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> tailTs (unflatten [<span class="dv">1</span><span class="fu">&#8229;</span><span class="dv">5</span>])<br />((((<span class="dv">1</span>,<span class="dv">2</span>),(<span class="dv">3</span>,(<span class="dv">4</span>,<span class="dv">5</span>))),(<span class="dv">2</span>,(<span class="dv">3</span>,(<span class="dv">4</span>,<span class="dv">5</span>)))),((<span class="dv">3</span>,(<span class="dv">4</span>,<span class="dv">5</span>)),((<span class="dv">4</span>,<span class="dv">5</span>),<span class="dv">5</span>)))</code></pre>

<div class=exercise>
<p><em>Exercise:</em> Prove that</p>
<pre class="sourceCode"><code class="sourceCode haskell">lastT &#8728; initTs &#8801; <span class="fu">id</span><br />headT &#8728; tailTs &#8801; <span class="fu">id</span></code></pre>
<p>Answer:</p>

<div class=toggle>

<pre class="sourceCode"><code class="sourceCode haskell">  lastT (initTs (<span class="dt">Leaf</span> a))<br />&#8801;  <span class="co">{- initTs def -}</span><br />  lastT (<span class="dt">Leaf</span> (<span class="dt">Leaf</span> a))<br />&#8801;  <span class="co">{- lastT def -}</span><br />  <span class="dt">Leaf</span> a<br /><br />  lastT (initTs (s <span class="ot">`Branch`</span> t))<br />&#8801;  <span class="co">{- initTs def -}</span><br />  lastT (<span class="dt">Branch</span> (&#8943;) (<span class="fu">fmap</span> (s <span class="ot">`Branch`</span>) (initTs t)))<br />&#8801;  <span class="co">{- lastT def -}</span><br />  lastT (<span class="fu">fmap</span> (s <span class="ot">`Branch`</span>) (initTs t))<br />&#8801;  <span class="co">{- lastT &#8728; fmap f -}</span><br />  (s <span class="ot">`Branch`</span>) (lastT (initTs t))<br />&#8801;  <span class="co">{- trivial -}</span><br />  s <span class="ot">`Branch`</span> lastT (initTs t)<br />&#8801;  <span class="co">{- induction -}</span><br />  s <span class="ot">`Branch`</span> t</code></pre>

</div>
 </div>

<h4 id="scan-specification">Scan specification</h4>

<p>Now we can specify prefix &amp; suffix scanning:</p>

<pre class="sourceCode"><code class="sourceCode haskell">scanlT, scanrT <span class="ot">&#8759;</span> <span class="dt">Monoid</span> a <span class="ot">&#8658;</span> <span class="dt">T</span> a <span class="ot">&#8594;</span> <span class="dt">T</span> a<br />scanlT <span class="fu">=</span> <span class="fu">fmap</span> fold &#8728; initTs<br />scanrT <span class="fu">=</span> <span class="fu">fmap</span> fold &#8728; tailTs</code></pre>

<p>Try it out:</p>

<pre class="sourceCode"><code class="sourceCode haskell">t3 <span class="ot">&#8759;</span> <span class="dt">T</span> <span class="dt">String</span><br />t3 <span class="fu">=</span> <span class="fu">fmap</span> (<span class="fu">:</span>[]) (unflatten <span class="st">&quot;abcde&quot;</span>)</code></pre>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> t3<br />((<span class="st">&quot;a&quot;</span>,<span class="st">&quot;b&quot;</span>),(<span class="st">&quot;c&quot;</span>,(<span class="st">&quot;d&quot;</span>,<span class="st">&quot;e&quot;</span>)))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> scanlT t3<br />((<span class="st">&quot;a&quot;</span>,<span class="st">&quot;ab&quot;</span>),(<span class="st">&quot;abc&quot;</span>,(<span class="st">&quot;abcd&quot;</span>,<span class="st">&quot;abcde&quot;</span>)))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> scanrT t3<br />((<span class="st">&quot;abcde&quot;</span>,<span class="st">&quot;bcde&quot;</span>),(<span class="st">&quot;cde&quot;</span>,(<span class="st">&quot;de&quot;</span>,<span class="st">&quot;e&quot;</span>)))</code></pre>

<p>To test on numbers, I&#8217;ll use a <a href="http://matt.immute.net/content/pointless-fun" title="blog post by Matt Hellige">handy notation from Matt Hellige</a> to add pre- and post-processing:</p>

<pre class="sourceCode"><code class="sourceCode haskell">(&#8605;) <span class="ot">&#8759;</span> (a' <span class="ot">&#8594;</span> a) <span class="ot">&#8594;</span> (b <span class="ot">&#8594;</span> b') <span class="ot">&#8594;</span> ((a <span class="ot">&#8594;</span> b) <span class="ot">&#8594;</span> (a' <span class="ot">&#8594;</span> b'))<br />(f &#8605; h) g <span class="fu">=</span> h &#8728; g &#8728; f</code></pre>

<p>And a version specialized to functors:</p>

<pre class="sourceCode"><code class="sourceCode haskell">(&#8605;<span class="fu">*</span>) <span class="ot">&#8759;</span> <span class="kw">Functor</span> f <span class="ot">&#8658;</span> (a' <span class="ot">&#8594;</span> a) <span class="ot">&#8594;</span> (b <span class="ot">&#8594;</span> b')<br />     <span class="ot">&#8594;</span> (f a <span class="ot">&#8594;</span> f b) <span class="ot">&#8594;</span> (f a' <span class="ot">&#8594;</span> f b')<br />f &#8605;<span class="fu">*</span> g <span class="fu">=</span> <span class="fu">fmap</span> f &#8605; <span class="fu">fmap</span> g</code></pre>

<pre class="sourceCode"><code class="sourceCode haskell">t4 <span class="ot">&#8759;</span> <span class="dt">T</span> <span class="dt">Integer</span><br />t4 <span class="fu">=</span> unflatten [<span class="dv">1</span><span class="fu">&#8229;</span><span class="dv">6</span>]<br /><br />t5 <span class="ot">&#8759;</span> <span class="dt">T</span> <span class="dt">Integer</span><br />t5 <span class="fu">=</span> (<span class="dt">Sum</span> &#8605;<span class="fu">*</span> getSum) scanlT t4</code></pre>

<p>Try it:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> t4<br />((<span class="dv">1</span>,(<span class="dv">2</span>,<span class="dv">3</span>)),(<span class="dv">4</span>,(<span class="dv">5</span>,<span class="dv">6</span>)))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> initTs t4<br />((<span class="dv">1</span>,((<span class="dv">1</span>,<span class="dv">2</span>),(<span class="dv">1</span>,(<span class="dv">2</span>,<span class="dv">3</span>)))),(((<span class="dv">1</span>,(<span class="dv">2</span>,<span class="dv">3</span>)),<span class="dv">4</span>),(((<span class="dv">1</span>,(<span class="dv">2</span>,<span class="dv">3</span>)),(<span class="dv">4</span>,<span class="dv">5</span>)),((<span class="dv">1</span>,(<span class="dv">2</span>,<span class="dv">3</span>)),(<span class="dv">4</span>,(<span class="dv">5</span>,<span class="dv">6</span>))))))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> t5<br />((<span class="dv">1</span>,(<span class="dv">3</span>,<span class="dv">6</span>)),(<span class="dv">10</span>,(<span class="dv">15</span>,<span class="dv">21</span>)))</code></pre>

<div class=exercise>
<p><em>Exercise</em>: Prove that we have properties similar to the ones relating <code>fold</code>, <code>scanlT</code>, and <code>scanrT</code> on list:</p>
<pre class="sourceCode"><code class="sourceCode haskell">fold &#8801; lastT &#8728; scanlT<br />fold &#8801; headT &#8728; scanrT</code></pre>
<p>Answer:</p>

<div class=toggle>

<pre class="sourceCode"><code class="sourceCode haskell">  lastT &#8728; scanlT<br />&#8801;  <span class="co">{- scanlT spec -}</span><br />  lastT &#8728; <span class="fu">fmap</span> fold &#8728; initTs<br />&#8801;  <span class="co">{- lastT &#8728; fmap f -}</span><br />  fold &#8728; lastT &#8728; initTs<br />&#8801;  <span class="co">{- lastT &#8728; initTs -}</span><br />  fold<br /><br />  headT &#8728; scanrT <br />&#8801;  <span class="co">{- scanrT def -}</span><br />  headT &#8728; <span class="fu">fmap</span> fold &#8728; tailTs<br />&#8801;  <span class="co">{- headT &#8728; fmap f -}</span><br />  fold &#8728; headT &#8728; tailTs<br />&#8801;  <span class="co">{- headT &#8728; tailTs -}</span><br />  fold</code></pre>

</div>

<p>For instance,</p>
<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> fold t3<br /><span class="st">&quot;abcde&quot;</span><br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> (lastT &#8728; scanlT) t3<br /><span class="st">&quot;abcde&quot;</span><br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> (headT &#8728; scanrT) t3<br /><span class="st">&quot;abcde&quot;</span></code></pre>

</div>

<h3 id="deriving-faster-scans">Deriving faster scans</h3>

<p>Recall the specifications:</p>

<pre class="sourceCode"><code class="sourceCode haskell">scanlT <span class="fu">=</span> <span class="fu">fmap</span> fold &#8728; initTs<br />scanrT <span class="fu">=</span> <span class="fu">fmap</span> fold &#8728; tailTs</code></pre>

<p>To derive more efficient implementations, proceed as in <a href="http://conal.net/blog/posts/deriving-list-scans/" title="blog post"><em>Deriving list scans</em></a>. Start with prefix scan (<code>scanlT</code>), and consider the <code>Leaf</code> and <code>Branch</code> cases separately.</p>

<pre class="sourceCode"><code class="sourceCode haskell">  scanlT (<span class="dt">Leaf</span> a)<br />&#8801;  <span class="co">{- scanlT spec -}</span><br />  <span class="fu">fmap</span> fold (initTs (<span class="dt">Leaf</span> a))<br />&#8801;  <span class="co">{- initTs def -}</span><br />  <span class="fu">fmap</span> fold (<span class="dt">Leaf</span> (<span class="dt">Leaf</span> a))<br />&#8801;  <span class="co">{- fmap def -}</span><br />  <span class="dt">Leaf</span> (fold (<span class="dt">Leaf</span> a))<br />&#8801;  <span class="co">{- fold def -}</span><br />  <span class="dt">Leaf</span> a<br /><br />  scanlT (s <span class="ot">`Branch`</span> t)<br />&#8801;  <span class="co">{- scanlT spec -}</span><br />  <span class="fu">fmap</span> fold (initTs (s <span class="ot">`Branch`</span> t))<br />&#8801;  <span class="co">{- initTs def -}</span><br />  <span class="fu">fmap</span> fold (<span class="dt">Branch</span> (initTs s) (<span class="fu">fmap</span> (s <span class="ot">`Branch`</span>) (initTs t)))<br />&#8801;  <span class="co">{- fmap def -}</span><br />   <span class="dt">Branch</span> (<span class="fu">fmap</span> fold (initTs s)) (<span class="fu">fmap</span> fold (<span class="fu">fmap</span> (s <span class="ot">`Branch`</span>) (initTs t)))<br />&#8801;  <span class="co">{- scanlT spec -}</span><br />  <span class="dt">Branch</span> (scanlT s) (<span class="fu">fmap</span> fold (<span class="fu">fmap</span> (s <span class="ot">`Branch`</span>) (initTs t)))<br />&#8801;  <span class="co">{- functor law -}</span><br />  <span class="dt">Branch</span> (scanlT s) (<span class="fu">fmap</span> (fold &#8728; (s <span class="ot">`Branch`</span>)) (initTs t))<br />&#8801;  <span class="co">{- rework as &#955; -}</span><br />  <span class="dt">Branch</span> (scanlT s) (<span class="fu">fmap</span> (&#955; t' <span class="ot">&#8594;</span> fold (s <span class="ot">`Branch`</span> t')) (initTs t))<br />&#8801;  <span class="co">{- fold def -}</span><br />  <span class="dt">Branch</span> (scanlT s) (<span class="fu">fmap</span> (&#955; t' <span class="ot">&#8594;</span> fold s &#8853; fold t')) (initTs t))<br />&#8801;  <span class="co">{- rework &#955; -}</span><br />  <span class="dt">Branch</span> (scanlT s) (<span class="fu">fmap</span> ((fold s &#8853;) &#8728; fold) (initTs t))<br />&#8801;  <span class="co">{- functor law -}</span><br />  <span class="dt">Branch</span> (scanlT s) (<span class="fu">fmap</span> (fold s &#8853;) (<span class="fu">fmap</span> fold (initTs t)))<br />&#8801;  <span class="co">{- scanlT spec -}</span><br />  <span class="dt">Branch</span> (scanlT s) (<span class="fu">fmap</span> (fold s &#8853;) (scanlT t))<br />&#8801;  <span class="co">{- lastT &#8728; scanlT &#8801; fold -}</span><br />  <span class="dt">Branch</span> (scanlT s) (<span class="fu">fmap</span> (lastT (scanlT s) &#8853;) (scanlT t))<br />&#8801;  <span class="co">{- factor out defs -}</span><br />  <span class="dt">Branch</span> s' (<span class="fu">fmap</span> (lastT s' &#8853;) t')<br />     <span class="kw">where</span> s' <span class="fu">=</span> scanlT s<br />           t' <span class="fu">=</span> scanlT t</code></pre>

<p>Suffix scan has a similar derivation.</p>

<div class=toggle>

<pre class="sourceCode"><code class="sourceCode haskell">  scanrT (<span class="dt">Leaf</span> a)<br />&#8801;  <span class="co">{- scanrT def -}</span><br />  <span class="fu">fmap</span> fold (tailTs (<span class="dt">Leaf</span> a))<br />&#8801;  <span class="co">{- tailTs def -}</span><br />  <span class="fu">fmap</span> fold (<span class="dt">Leaf</span> (<span class="dt">Leaf</span> a))<br />&#8801;  <span class="co">{- fmap on T -}</span><br />  <span class="dt">Leaf</span> (fold (<span class="dt">Leaf</span> a))<br />&#8801;  <span class="co">{- fold def -}</span><br />  <span class="dt">Leaf</span> a<br /><br />  scanrT (s <span class="ot">`Branch`</span> t)<br />&#8801;  <span class="co">{- scanrT spec -}</span><br />  <span class="fu">fmap</span> fold (tailTs (s <span class="ot">`Branch`</span> t))<br />&#8801;  <span class="co">{- tailTs def -}</span><br />  <span class="fu">fmap</span> fold (<span class="dt">Branch</span> (<span class="fu">fmap</span> (<span class="ot">`Branch`</span> t) (tailTs s)) (tailTs t))<br />&#8801;  <span class="co">{- fmap def -}</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> fold (<span class="fu">fmap</span> (<span class="ot">`Branch`</span> t) (tailTs s))) (<span class="fu">fmap</span> fold (tailTs t))<br />&#8801;  <span class="co">{- scanrT spec -}</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> fold (<span class="fu">fmap</span> (<span class="ot">`Branch`</span> t) (tailTs s))) (scanrT t)<br />&#8801;  <span class="co">{- functor law -}</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> (fold &#8728; (<span class="ot">`Branch`</span> t)) (tailTs s)) (scanrT t)<br />&#8801;  <span class="co">{- rework as &#955; -}</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> (&#955; s' <span class="ot">&#8594;</span> fold (s' <span class="ot">`Branch`</span> t)) (tailTs s)) (scanrT t)<br />&#8801;  <span class="co">{- functor law -}</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> (&#955; s' <span class="ot">&#8594;</span> fold s' &#8853; fold t) (tailTs s)) (scanrT t)<br />&#8801;  <span class="co">{- rework &#955; -}</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> ((&#8853; fold t) &#8728; fold) (tailTs s)) (scanrT t)<br />&#8801;  <span class="co">{- scanrT spec -}</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> (&#8853; fold t) (scanrT s)) (scanrT t)<br />&#8801;  <span class="co">{- headT &#8728; scanrT -}</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> (&#8853; headT (scanrT t)) (scanrT s)) (scanrT t)<br />&#8801;  <span class="co">{- factor out defs -}</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> (&#8853; headT t') s') t'<br />    <span class="kw">where</span> s' <span class="fu">=</span> scanrT s<br />          t' <span class="fu">=</span> scanrT t</code></pre>

</div>

<p>Extract code from these derivations:</p>

<pre class="sourceCode"><code class="sourceCode haskell">scanlT' <span class="ot">&#8759;</span> <span class="dt">Monoid</span> a <span class="ot">&#8658;</span> <span class="dt">T</span> a <span class="ot">&#8594;</span> <span class="dt">T</span> a<br />scanlT' (<span class="dt">Leaf</span> a)       <span class="fu">=</span> <span class="dt">Leaf</span> a<br />scanlT' (s <span class="ot">`Branch`</span> t) <span class="fu">=</span><br />  <span class="dt">Branch</span> s' (<span class="fu">fmap</span> (lastT s' &#8853;) t')<br />     <span class="kw">where</span> s' <span class="fu">=</span> scanlT' s<br />           t' <span class="fu">=</span> scanlT' t<br /><br />scanrT' <span class="ot">&#8759;</span> <span class="dt">Monoid</span> a <span class="ot">&#8658;</span> <span class="dt">T</span> a <span class="ot">&#8594;</span> <span class="dt">T</span> a<br />scanrT' (<span class="dt">Leaf</span> a)       <span class="fu">=</span> <span class="dt">Leaf</span> a<br />scanrT' (s <span class="ot">`Branch`</span> t) <span class="fu">=</span><br />  <span class="dt">Branch</span> (<span class="fu">fmap</span> (&#8853; headT t') s') t'<br />    <span class="kw">where</span> s' <span class="fu">=</span> scanrT' s<br />          t' <span class="fu">=</span> scanrT' t</code></pre>

<p>Try it:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> t3<br />((<span class="st">&quot;a&quot;</span>,<span class="st">&quot;b&quot;</span>),(<span class="st">&quot;c&quot;</span>,(<span class="st">&quot;d&quot;</span>,<span class="st">&quot;e&quot;</span>)))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> scanlT' t3<br />((<span class="st">&quot;a&quot;</span>,<span class="st">&quot;ab&quot;</span>),(<span class="st">&quot;abc&quot;</span>,(<span class="st">&quot;abcd&quot;</span>,<span class="st">&quot;abcde&quot;</span>)))<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> scanrT' t3<br />((<span class="st">&quot;abcde&quot;</span>,<span class="st">&quot;bcde&quot;</span>),(<span class="st">&quot;cde&quot;</span>,(<span class="st">&quot;de&quot;</span>,<span class="st">&quot;e&quot;</span>)))</code></pre>

<h3 id="efficiency">Efficiency</h3>

<p>Although I was just following my nose, without trying to get anywhere in particular, this result is exactly the algorithm I first thought of when considering how to parallelize tree scanning.</p>

<p>Let&#8217;s now consider the running time of this algorithm. Assume that the tree is <em>balanced</em>, to maximize parallelism. (I think balancing is optimal for parallelism here, but I&#8217;m not certain.)</p>

<p>For a tree with <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>n</mi></mrow></math> leaves, the work <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>W</mi><mspace width="0.167em"></mspace><mi>n</mi></mrow></math> will be constant when <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>n</mi><mo>=</mo><mn>1</mn></mrow></math> and <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mn>2</mn><mo>&#8901;</mo><mi>W</mi><mspace width="0.167em"></mspace><mo stretchy="false">(</mo><mi>n</mi><mo>/</mo><mn>2</mn><mo stretchy="false">)</mo><mo>+</mo><mi>n</mi></mrow></math> when <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>n</mi><mo>&gt;</mo><mn>1</mn></mrow></math>. Using <a href="http://en.wikipedia.org/wiki/Master_theorem#Case_2">the <em>Master Theorem</em></a> (explained more <a href="http://www.math.dartmouth.edu/archive/m19w03/public_html/Section5-2.pdf">here</a>), <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>W</mi><mspace width="0.167em"></mspace><mi>n</mi><mo>=</mo><mo>&#920;</mo><mspace width="0.167em"></mspace><mo stretchy="false">(</mo><mi>n</mi><mspace width="0.167em"></mspace><mi>log</mi><mi>n</mi><mo stretchy="false">)</mo></mrow></math>.</p>

<p>This result is disappointing, since scanning can be done with linear work by threading a single accumulator while traversing the input tree and building up the output tree.</p>

<p>I&#8217;m using the term &quot;work&quot; instead of &quot;time&quot; here, since I&#8217;m not assuming sequential execution.</p>

<p>We have a parallel algorithm that performs <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mrow><mi>n</mi><mspace width="0.167em"></mspace><mi>log</mi><mspace width="0.167em"></mspace><mi>n</mi></mrow></math> work, and a sequential program that performs linear work. Can we construct a linear-parallel algorithm?</p>

<p>Yes. Guy Blelloch came up with a clever linear-work parallel algorithm, which I&#8217;ll derive in another post.</p>

<h3 id="generalizing-head-and-last">Generalizing <code>head</code> and <code>last</code></h3>

<p>Can we replace the ad hoc (tree-specific) <code>headT</code> and <code>lastT</code> functions with general versions that work on all foldables? I&#8217;d want the generalization to also generalize the list functions <code>head</code> and <code>last</code> or, rather, to <em>total</em> variants (ones that cannot error due to empty list). For totality, provide a default value for when there are no elements.</p>

<pre class="sourceCode"><code class="sourceCode haskell">headF, lastF <span class="ot">&#8759;</span> <span class="dt">Foldable</span> f <span class="ot">&#8658;</span> a <span class="ot">&#8594;</span> f a <span class="ot">&#8594;</span> a</code></pre>

<p>I also want these functions to be as efficient on lists as <code>head</code> and <code>last</code> and as efficient on trees as <code>headT</code> and <code>lastT</code>.</p>

<p>The <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Monoid.html#v:First"><code>First</code></a> and <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-Monoid.html#v:Last"><code>Last</code></a> monoids provide left-biased and right-biased choice. They&#8217;re implemented as <code>newtype</code> wrappers around <code>Maybe</code>:</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">newtype</span> <span class="dt">First</span> a <span class="fu">=</span> <span class="dt">First</span> { getFirst <span class="ot">&#8759;</span> <span class="dt">Maybe</span> a }<br /><br /><span class="kw">instance</span> <span class="dt">Monoid</span> (<span class="dt">First</span> a) <span class="kw">where</span><br />  &#8709; <span class="fu">=</span> <span class="dt">First</span> <span class="kw">Nothing</span><br />  r<span class="fu">@</span>(<span class="dt">First</span> (<span class="kw">Just</span> _)) &#8853; _ <span class="fu">=</span> r<br />  <span class="dt">First</span> <span class="kw">Nothing</span>      &#8853; r <span class="fu">=</span> r</code></pre>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="kw">newtype</span> <span class="dt">Last</span> a <span class="fu">=</span> <span class="dt">Last</span> { getLast <span class="ot">&#8759;</span> <span class="dt">Maybe</span> a }<br /><br /><span class="kw">instance</span> <span class="dt">Monoid</span> (<span class="dt">Last</span> a) <span class="kw">where</span><br />  &#8709; <span class="fu">=</span> <span class="dt">Last</span> <span class="kw">Nothing</span><br />  _ &#8853; r<span class="fu">@</span>(<span class="dt">Last</span> (<span class="kw">Just</span> _)) <span class="fu">=</span> r<br />  r &#8853; <span class="dt">Last</span> <span class="kw">Nothing</span>      <span class="fu">=</span> r</code></pre>

<p>For <code>headF</code>, embed all of the elements into the <code>First</code> monoid (via <code>First ∘ Just</code>), fold over the result, and extract the result, using the provided default value in case there are no elements. Similarly for <code>lastF</code>.</p>

<pre class="sourceCode"><code class="sourceCode haskell">headF dflt <span class="fu">=</span> fromMaybe dflt &#8728; getFirst &#8728; foldMap (<span class="dt">First</span> &#8728; <span class="kw">Just</span>)<br />lastF dflt <span class="fu">=</span> fromMaybe dflt &#8728; getLast  &#8728; foldMap (<span class="dt">Last</span>  &#8728; <span class="kw">Just</span>)</code></pre>

<p>For instance,</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> headF <span class="dv">3</span> [<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">4</span>,<span class="dv">8</span>]<br /><span class="dv">1</span><br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> headF <span class="dv">3</span> []<br /><span class="dv">3</span></code></pre>

<p>When our elements belong to a monoid, we can use <code>∅</code> as the default:</p>

<pre class="sourceCode"><code class="sourceCode haskell">headFM <span class="ot">&#8759;</span> (<span class="dt">Foldable</span> f, <span class="dt">Monoid</span> m) <span class="ot">&#8658;</span> f m <span class="ot">&#8594;</span> m<br />headFM <span class="fu">=</span> headF &#8709;<br /><br />lastFM <span class="ot">&#8759;</span> (<span class="dt">Foldable</span> f, <span class="dt">Monoid</span> m) <span class="ot">&#8658;</span> f m <span class="ot">&#8594;</span> m<br />lastFM <span class="fu">=</span> headF &#8709;</code></pre>

<p>For instance,</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> lastFM ([] <span class="ot">&#8759;</span> [<span class="dt">String</span>])<br /><span class="st">&quot;&quot;</span></code></pre>

<p>Using <code>headFM</code> and <code>lastFM</code> in place of <code>headT</code> and <code>lastT</code>, we can easily handle addition of an <code>Empty</code> case to our tree functor in this post. The key choice is that <code>fold Empty ≡ ∅</code> and <code>fmap _ Empty ≡ Empty</code>. Then <code>headFM</code> will choose the first <em>leaf</em>, and <code>lastT</code></p>

<p>What about efficiency? Because <code>headF</code> and <code>lastF</code> are defined via <code>foldMap</code>, which is a composition of <code>fold</code> and <code>fmap</code>, one might think that we have to traverse the entire structure when used with functors like <code>[]</code> or <code>T</code>.</p>

<p>Laziness saves us, however, and we can even extract the head of an infinite list or a partially defined one. For instance,</p>

<pre class="sourceCode"><code class="sourceCode haskell">  foldMap (<span class="dt">First</span> &#8728; <span class="kw">Just</span>) [<span class="dv">5</span> <span class="fu">&#8229;</span>]<br />&#8801; foldMap (<span class="dt">First</span> &#8728; <span class="kw">Just</span>) (<span class="dv">5</span> <span class="fu">:</span> [<span class="dv">6</span> <span class="fu">&#8229;</span>])<br />&#8801; <span class="dt">First</span> (<span class="kw">Just</span> <span class="dv">5</span>) &#8853; foldMap (<span class="dt">First</span> &#8728; <span class="kw">Just</span>) [<span class="dv">6</span> <span class="fu">&#8229;</span>]<br />&#8801; <span class="dt">First</span> (<span class="kw">Just</span> <span class="dv">5</span>)</code></pre>

<p>So</p>

<pre class="sourceCode"><code class="sourceCode haskell">  headF d [<span class="dv">5</span> <span class="fu">&#8229;</span>]<br />&#8801; fromMaybe d (getFirst (foldMap (<span class="dt">First</span> &#8728; <span class="kw">Just</span>) [<span class="dv">5</span> <span class="fu">&#8229;</span>]))<br />&#8801; fromMaybe d (getFirst (<span class="dt">First</span> (<span class="kw">Just</span> <span class="dv">5</span>)))<br />&#8801; fromMaybe d (<span class="kw">Just</span> <span class="dv">5</span>)<br />&#8801; <span class="dv">5</span></code></pre>

<p>And, sure enough,</p>

<pre class="sourceCode"><code class="sourceCode haskell"><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> foldMap (<span class="dt">First</span> &#8728; <span class="kw">Just</span>) [<span class="dv">5</span> <span class="fu">&#8229;</span>]<br /><span class="dt">First</span> {getFirst <span class="fu">=</span> <span class="kw">Just</span> <span class="dv">5</span>}<br /><span class="fu">*</span><span class="dt">T</span><span class="fu">&gt;</span> headF &#8869; [<span class="dv">5</span> <span class="fu">&#8229;</span>]<br /><span class="dv">5</span></code></pre>

<h3 id="where-to-go-from-here">Where to go from here?</h3>

<ul>
<li>As mentioned above, the derived scanning implementations perform asymtotically more work than necessary. Future posts explore how to derive parallel-friendly, linear-work algorithms. Then we&#8217;ll see how to transform the parallel-friendly algorithms so that they work <em>destructively</em>, overwriting their input as they go, and hence suitably for execution entirely in CUDA or OpenCL.</li>
<li>The functions <code>initTs</code> and <code>tailTs</code> are still tree-specific. To generalize the specification and derivation of list and tree scanning, find a way to generalize these two functions. The types of <code>initTs</code> and <code>tailTs</code> fit with the <a href="http://hackage.haskell.org/packages/archive/comonad/1.0.1/doc/html/Data-Functor-Extend.html#v:duplicate"><code>duplicate</code></a> method on comonads. Moreover, <code>tails</code> is the usual definition of <code>duplicate</code> on lists, and I think <code>inits</code> would be <code>extend</code> for &quot;snoc lists&quot;. For trees, however, I don&#8217;t think the correspondence holds. Am I missing something?</li>
<li>In particular, I want to extend the derivation to depth-typed, perfectly balanced trees, of the sort I played with in <a href="http://conal.net/blog/posts/a-trie-for-length-typed-vectors/" title="blog post"><em>A trie for length-typed vectors</em></a> and <a href="http://conal.net/blog/posts/from-tries-to-trees/" title="blog post"><em>From tries to trees</em></a>. The functions <code>initTs</code> and <code>tailTs</code> make unbalanced trees out of balanced ones, so I don&#8217;t know how to adapt the specifications given here to the setting of depth-typed balanced trees. Maybe I could just fill up the to-be-ignored elements with <code>∅</code>.</li>
</ul>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=330&amp;md5=0fc7825e5d47f397d7ee6f3f19c7c416"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/deriving-parallel-tree-scans/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fderiving-parallel-tree-scans&amp;language=en_GB&amp;category=text&amp;title=Deriving+parallel+tree+scans&amp;description=The+post+Deriving+list+scans+explored+folds+and+scans+on+lists+and+showed+how+the+usual%2C+efficient+scan+implementations+can+be+derived+from+simpler+specifications.+Let%26%238217%3Bs+see+now+how+to...&amp;tags=program+derivation%2Cscan%2Cblog" type="text/html" />
	</item>
		<item>
		<title>Deriving list scans</title>
		<link>http://conal.net/blog/posts/deriving-list-scans</link>
		<comments>http://conal.net/blog/posts/deriving-list-scans#comments</comments>
		<pubDate>Tue, 22 Feb 2011 20:42:40 +0000</pubDate>
		<dc:creator><![CDATA[Conal]]></dc:creator>
				<category><![CDATA[Functional programming]]></category>
		<category><![CDATA[fold]]></category>
		<category><![CDATA[program derivation]]></category>
		<category><![CDATA[scan]]></category>

		<guid isPermaLink="false">http://conal.net/blog/?p=341</guid>
		<description><![CDATA[I&#8217;ve been playing with deriving efficient parallel, imperative implementations of &#34;prefix sum&#34; or more generally &#34;left scan&#34;. Following posts will explore the parallel &#38; imperative derivations, but as a warm-up, I&#8217;ll tackle the functional &#38; sequential case here. FoldsYou&#8217;re probably familiar with the higher-order functions for left and right &#34;fold&#34;. The current documentation says: foldl, [&#8230;]]]></description>
				<content:encoded><![CDATA[<!-- teaser -->

<p
>I&#8217;ve been playing with deriving efficient parallel, imperative implementations of &quot;prefix sum&quot; or more generally &quot;left scan&quot;. Following posts will explore the parallel &amp; imperative derivations, but as a warm-up, I&#8217;ll tackle the functional &amp; sequential case here.</p
>

<div id="folds"
><h3
  >Folds</h3
  ><p
  >You&#8217;re probably familiar with the higher-order functions for left and right &quot;fold&quot;. The current documentation says:</p
  ><blockquote>
<p
  ><a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-List.html#v:foldl"
    ><code
      >foldl</code
      ></a
    >, applied to a binary operator, a starting value (typically the left-identity of the operator), and a list, reduces the list using the binary operator, from left to right:</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >foldl</span
      > f z [x1, x2, &#8943;, xn] &#8801; (&#8943;((z <span class="ot"
      >`f`</span
      > x1) <span class="ot"
      >`f`</span
      > x2) <span class="ot"
      >`f`</span
      >&#8943;) <span class="ot"
      >`f`</span
      > xn<br
       /></code
    ></pre
  ><p
  >The list must be finite.</p
  ><p
  ><a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-List.html#v:foldr"
    ><code
      >foldr</code
      ></a
    >, applied to a binary operator, a starting value (typically the right-identity of the operator), and a list, reduces the list using the binary operator, from right to left:</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >foldr</span
      > f z [x1, x2, &#8943;, xn] &#8801; x1 <span class="ot"
      >`f`</span
      > (x2 <span class="ot"
      >`f`</span
      > &#8943; (xn <span class="ot"
      >`f`</span
      > z)&#8943;)<br
       /></code
    ></pre
  ></blockquote>
<p
  >And here are typical definitions:</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >foldl</span
      > <span class="dv"
      >&#8759;</span
      > (b &#8594; a &#8594; b) &#8594; b &#8594; [a] &#8594; b<br
       /><span class="fu"
      >foldl</span
      > f z []     <span class="fu"
      >=</span
      > z<br
       /><span class="fu"
      >foldl</span
      > f z (x<span class="fu"
      >:</span
      >xs) <span class="fu"
      >=</span
      > <span class="fu"
      >foldl</span
      > f (z <span class="ot"
      >`f`</span
      > x) xs<br
       /><br
       /><span class="fu"
      >foldr</span
      > <span class="dv"
      >&#8759;</span
      > (a &#8594; b &#8594; b) &#8594; b &#8594; [a] &#8594; b<br
       /><span class="fu"
      >foldr</span
      > f z []     <span class="fu"
      >=</span
      > z<br
       /><span class="fu"
      >foldr</span
      > f z (x<span class="fu"
      >:</span
      >xs) <span class="fu"
      >=</span
      > x <span class="ot"
      >`f`</span
      > <span class="fu"
      >foldr</span
      > f z xs<br
       /></code
    ></pre
  ><p
  >Notice that <code
    >foldl</code
    > builds up its result one step at a time and reveals it all at once, in the end. The whole result value is locked up until the entire input list has been traversed. In contrast, <code
    >foldr</code
    > starts revealing information right away, and so works well with infinite lists. Like <code
    >foldl</code
    >, <code
    >foldr</code
    > also yields only a final value.</p
  ><p
  >Sometimes it's handy to also get to all of the intermediate steps. Doing so takes us beyond the land of folds to the kingdom of scans.</p
  ></div
>

<div id="scans"
><h3
  >Scans</h3
  ><p
  >The <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-List.html#v:scanl"
    ><code
      >scanl</code
      ></a
    > and <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-List.html#v:scanr"
    ><code
      >scanr</code
      ></a
    > functions correspond to <code
    >foldl</code
    > and <code
    >foldr</code
    > but produce <em
    >all</em
    > intermediate accumulations, not just the final one.</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >scanl</span
      > <span class="dv"
      >&#8759;</span
      > (b &#8594; a &#8594; b) &#8594; b &#8594; [a] &#8594; [b]<br
       /><br
       /><span class="fu"
      >scanl</span
      > f z [x1, x2,  &#8943; ] &#8801; [z, z <span class="ot"
      >`f`</span
      > x1, (z <span class="ot"
      >`f`</span
      > x1) <span class="ot"
      >`f`</span
      > x2, &#8943;]<br
       /><br
       /><span class="fu"
      >scanr</span
      > <span class="dv"
      >&#8759;</span
      > (a &#8594; b &#8594; b) &#8594; b &#8594; [a] &#8594; [b]<br
       /><br
       /><span class="fu"
      >scanr</span
      > f z [&#8943;, xn_1, xn] &#8801; [&#8943;, xn_1 <span class="ot"
      >`f`</span
      > (xn <span class="ot"
      >`f`</span
      > z), xn <span class="ot"
      >`f`</span
      > z, z]<br
       /></code
    ></pre
  ><p
  >As you might expect, the last value is the complete left fold, and the <em
    >first</em
    > value in the scan is the complete <em
    >right</em
    > fold:</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >last</span
      > (<span class="fu"
      >scanl</span
      > f z xs) &#8801; <span class="fu"
      >foldl</span
      > f z xs<br
       /><span class="fu"
      >head</span
      > (<span class="fu"
      >scanr</span
      > f z xs) &#8801; <span class="fu"
      >foldr</span
      > f z xs<br
       /></code
    ></pre
  ><p
  >which is to say</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >last</span
      > &#8728; <span class="fu"
      >scanl</span
      > f z &#8801; <span class="fu"
      >foldl</span
      > f z<br
       /><span class="fu"
      >head</span
      > &#8728; <span class="fu"
      >scanr</span
      > f z &#8801; <span class="fu"
      >foldr</span
      > f z<br
       /></code
    ></pre
  ><p
  >The standard scan definitions are trickier than the fold definitions:</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >scanl</span
      > <span class="dv"
      >&#8759;</span
      > (b &#8594; a &#8594; b) &#8594; b &#8594; [a] &#8594; [b]<br
       /><span class="fu"
      >scanl</span
      > f z ls <span class="fu"
      >=</span
      > z <span class="fu"
      >:</span
      > (<span class="kw"
      >case</span
      > ls <span class="kw"
      >of</span
      ><br
       />                     []   &#8594; []<br
       />                     x<span class="fu"
      >:</span
      >xs &#8594; <span class="fu"
      >scanl</span
      > f (z <span class="ot"
      >`f`</span
      > x) xs)<br
       /><br
       /><span class="fu"
      >scanr</span
      > <span class="dv"
      >&#8759;</span
      > (a &#8594; b &#8594; b) &#8594; b &#8594; [a] &#8594; [b]<br
       /><span class="fu"
      >scanr</span
      > _ z []     <span class="fu"
      >=</span
      > [z]<br
       /><span class="fu"
      >scanr</span
      > f z (x<span class="fu"
      >:</span
      >xs) <span class="fu"
      >=</span
      > (x <span class="ot"
      >`f`</span
      > q) <span class="fu"
      >:</span
      > qs<br
       />                   <span class="kw"
      >where</span
      > qs<span class="fu"
      >@</span
      >(q<span class="fu"
      >:</span
      >_) <span class="fu"
      >=</span
      > <span class="fu"
      >scanr</span
      > f z xs<br
       /></code
    ></pre
  ><p
  >Every time I encounter these definitions, I have to walk through it again to see what's going on. I finally sat down to figure out how these tricky definitions might <em
    >emerge</em
    > from simpler specifications. In other words, how to <em
    >derive</em
    > these definitions systematically from simpler but less efficient definitions.</p
  ><p
  >Most likely, these derivations have been done before, but I learned something from the effort, and I hope you do, too.</p
  ><span id="more-341"></span></div
>

<div id="specifying-scans"
><h3
  >Specifying scans</h3
  ><p
  >As I pointed out above, the last element of a left scan is the left fold over the whole list. In fact, <em
    >all</em
    > of the elements are left folds, but over <em
    >prefixes</em
    > of the list. Similarly, all of the elements of a right-scan are right folds, but over <em
    >suffixes</em
    > of the list. These observations give rise to very simple specifications for <code
    >scanl</code
    > and <code
    >scanr</code
    >:</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >scanl</span
      > f z xs <span class="fu"
      >=</span
      > <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f z) (inits xs)<br
       /><span class="fu"
      >scanr</span
      > f z xs <span class="fu"
      >=</span
      > <span class="fu"
      >map</span
      > (<span class="fu"
      >foldr</span
      > f z) (tails xs)<br
       /></code
    ></pre
  ><p
  >Equivalently,</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >scanl</span
      > f z <span class="fu"
      >=</span
      > <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f z) &#8728; inits<br
       /><span class="fu"
      >scanr</span
      > f z <span class="fu"
      >=</span
      > <span class="fu"
      >map</span
      > (<span class="fu"
      >foldr</span
      > f z) &#8728; tails<br
       /></code
    ></pre
  ><p
  >Here I'm using the standard <code
    >inits</code
    > and <code
    >tails</code
    > functions from <code
    >Data.List</code
    >, documented as follows:</p
  ><blockquote>
<p
  >The <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-List.html#v:inits"
    ><code
      >inits</code
      ></a
    > function returns all initial segments of the argument, shortest first. For example,</p
  ><pre class="sourceCode haskell"
  ><code
    >inits <span class="st"
      >&quot;abc&quot;</span
      > &#8801; [<span class="st"
      >&quot;&quot;</span
      >,<span class="st"
      >&quot;a&quot;</span
      >,<span class="st"
      >&quot;ab&quot;</span
      >,<span class="st"
      >&quot;abc&quot;</span
      >]<br
       /></code
    ></pre
  ><p
  >The <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/Data-List.html#v:tails"
    ><code
      >tails</code
      ></a
    > function returns all final segments of the argument, longest first. For example,</p
  ><pre class="sourceCode haskell"
  ><code
    >tails <span class="st"
      >&quot;abc&quot;</span
      > &#8801; [<span class="st"
      >&quot;abc&quot;</span
      >, <span class="st"
      >&quot;bc&quot;</span
      >, <span class="st"
      >&quot;c&quot;</span
      >,<span class="st"
      >&quot;&quot;</span
      >]<br
       /></code
    ></pre
  ></blockquote>
<p
  ><a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/src/Data-List.html#inits"
    >The</a
    > <a href="http://hackage.haskell.org/packages/archive/base/latest/doc/html/src/Data-List.html#tails"
    >definitions</a
    >:</p
  ><pre class="sourceCode haskell"
  ><code
    >inits <span class="dv"
      >&#8759;</span
      > [a] &#8594; [[a]]<br
       />inits []     <span class="fu"
      >=</span
      > [[]]<br
       />inits (x<span class="fu"
      >:</span
      >xs) <span class="fu"
      >=</span
      > [[]] <span class="fu"
      >++</span
      > <span class="fu"
      >map</span
      > (x<span class="fu"
      >:</span
      >) (inits xs)<br
       /><br
       />tails <span class="dv"
      >&#8759;</span
      > [a] &#8594; [[a]]<br
       />tails []         <span class="fu"
      >=</span
      > [[]]<br
       />tails xs<span class="fu"
      >@</span
      >(_<span class="fu"
      >:</span
      >xs') <span class="fu"
      >=</span
      > xs <span class="fu"
      >:</span
      > tails xs'<br
       /></code
    ></pre
  ><p
  >This definition of <code
    >inits</code
    > is stricter than necessary, as it examines its argument before emitting anything but ends up emitting an initial empty list whether the argument is nil or a cons. Here's a lazier definition:</p
  ><pre class="sourceCode haskell"
  ><code
    >inits xs <span class="fu"
      >=</span
      > [] <span class="fu"
      >:</span
      > <span class="kw"
      >case</span
      > xs <span class="kw"
      >of</span
      ><br
       />                  []     &#8594; []<br
       />                  (x<span class="fu"
      >:</span
      >xs) &#8594; <span class="fu"
      >map</span
      > (x<span class="fu"
      >:</span
      >) (inits xs)<br
       /></code
    ></pre
  ><p
  >This second version produces the initial <code
    >[]</code
    > before examining its argument, which helps to avoid deadlock in some recursive contexts.</p
  ><p
  >These specifications of <code
    >scanl</code
    > and <code
    >scanr</code
    > make it very easy to prove the properties given above that</p
  ><pre class="sourceCode haskell"
  ><code
    ><span class="fu"
      >last</span
      > &#8728; <span class="fu"
      >scanl</span
      > f z &#8801; <span class="fu"
      >foldl</span
      > f z<br
       /><span class="fu"
      >head</span
      > &#8728; <span class="fu"
      >scanr</span
      > f z &#8801; <span class="fu"
      >foldr</span
      > f z<br
       /></code
    ></pre
  ><p
  >(Hint: use the fact that <code
    >last &#8728; inits &#8801; id</code
    >, and <code
    >head &#8728; tails &#8801; id</code
    >.)</p
  ><p
  >Although these specifications of <code
    >scanl</code
    > (via <code
    >map</code
    >, <code
    >foldl</code
    >/<code
    >foldr</code
    >, and <code
    >inits</code
    >/<code
    >tails</code
    >) state succinctly and simply <em
    >what</em
    > <code
    >scanl</code
    > and <code
    >scanr</code
    > compute, they are terrible recipes for <em
    >how</em
    >, because they perform quadratic work instead of linear.</p
  ><p
  >But that's okay, because now we're going to see how to turn these inefficient &amp; clear specifications into efficient &amp; less clear implementations.</p
  ></div
>

<div id="deriving-efficient-scans"
><h3
  >Deriving efficient scans</h3
  ><p
  >To derive efficient scans, use the specifications and perform some simplifications.</p
  ><p
  >Divide <code
    >scanl</code
    > into empty lists and nonempty lists, starting with empty:</p
  ><pre class="sourceCode haskell"
  ><code
    >  <span class="fu"
      >scanl</span
      > f z []<br
       />&#8801; <span class="co"
      >{- scanl spec -}</span
      ><br
       />  <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f z) (inits [])<br
       />&#8801; <span class="co"
      >{- inits def -}</span
      ><br
       />  <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f z) [[]]<br
       />&#8801; <span class="co"
      >{- map def -}</span
      ><br
       />  [<span class="fu"
      >foldl</span
      > f z []]<br
       />&#8801; <span class="co"
      >{- foldl def -}</span
      ><br
       />  [z]<br
       /></code
    ></pre
  ><pre class="sourceCode haskell"
  ><code
    >  <span class="fu"
      >scanl</span
      > f z (x<span class="fu"
      >:</span
      >xs)<br
       />&#8801; <span class="co"
      >{- scanl spec -}</span
      ><br
       />  <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f z) (inits (x<span class="fu"
      >:</span
      >xs))<br
       />&#8801; <span class="co"
      >{- inits def -}</span
      ><br
       />  <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f z) ([] <span class="fu"
      >:</span
      > <span class="fu"
      >map</span
      > (x<span class="fu"
      >:</span
      >) (inits xs))<br
       />&#8801; <span class="co"
      >{- map def -}</span
      ><br
       />  <span class="fu"
      >foldl</span
      > f z [] <span class="fu"
      >:</span
      > <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f z) (<span class="fu"
      >map</span
      > (x<span class="fu"
      >:</span
      >) (inits xs))<br
       />&#8801; <span class="co"
      >{- foldl def -}</span
      ><br
       />  z <span class="fu"
      >:</span
      > <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f z) (<span class="fu"
      >map</span
      > (x<span class="fu"
      >:</span
      >) (inits xs))<br
       />&#8801; <span class="co"
      >{- map g &#8728; map f &#8801; map (g &#8728; f) -}</span
      ><br
       />  z <span class="fu"
      >:</span
      > <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f z &#8728; (x<span class="fu"
      >:</span
      >)) (inits xs)<br
       />&#8801; <span class="co"
      >{- (&#8728;) def -}</span
      ><br
       />  z <span class="fu"
      >:</span
      > <span class="fu"
      >map</span
      > (&#955; ys &#8594; <span class="fu"
      >foldl</span
      > f z (x<span class="fu"
      >:</span
      >ys)) (inits xs)<br
       />&#8801; <span class="co"
      >{- foldl def -}</span
      ><br
       />  z <span class="fu"
      >:</span
      > <span class="fu"
      >map</span
      > (&#955; ys &#8594; <span class="fu"
      >foldl</span
      > f (z <span class="ot"
      >`f`</span
      > x) ys)) (inits xs)<br
       />&#8801; <span class="co"
      >{- &#951; reduction -}</span
      ><br
       />  z <span class="fu"
      >:</span
      > <span class="fu"
      >map</span
      > (<span class="fu"
      >foldl</span
      > f (z <span class="ot"
      >`f`</span
      > x))) (inits xs)<br
       />&#8801; <span class="co"
      >{- scanl spec -}</span
      ><br
       />  z <span class="fu"
      >:</span
      > <span class="fu"
      >scanl</span
      > f (z <span class="ot"
      >`f`</span
      > x) xs<br
       /></code
    ></pre
  ><p
  >Combine these conclusions and factor out the common <code
    >(z : )</code
    > to yield the standard &quot;optimized&quot; definition.</p
  ><p
  >Does <code
    >scanr</code
    > work out similarly? Let's find out, replacing <code
    >scanl</code
    > with <code
    >scanr</code
    >, <code
    >foldl</code
    > with <code
    >foldr</code
    >, and <code
    >inits</code
    > with <code
    >tails</code
    >:</p
  ><pre class="sourceCode haskell"
  ><code
    >  <span class="fu"
      >scanr</span
      > f z []<br
       />&#8801; <span class="co"
      >{- scanr spec -}</span
      ><br
       />  <span class="fu"
      >map</span
      > (<span class="fu"
      >foldr</span
      > f z) (tails [])<br
       />&#8801; <span class="co"
      >{- tails def -}</span
      ><br
       />  <span class="fu"
      >map</span
      > (<span class="fu"
      >foldr</span
      > f z) ([[]])<br
       />&#8801; <span class="co"
      >{- map def -}</span
      ><br
       />  [<span class="fu"
      >foldr</span
      > f z []]<br
       />&#8801; <span class="co"
      >{- foldr def -}</span
      ><br
       />  [z]<br
       /></code
    ></pre
  ><p
  >The derivation for nonempty lists deviates from <code
    >scanl</code
    >, due to differences between <code
    >inits</code
    > and <code
    >tails</code
    >, but it all works out nicely.</p
  ><pre class="sourceCode haskell"
  ><code
    >  <span class="fu"
      >scanr</span
      > f z (x<span class="fu"
      >:</span
      >xs)<br
       />&#8801; <span class="co"
      >{- scanr spec -}</span
      ><br
       />  <span class="fu"
      >map</span
      > (<span class="fu"
      >foldr</span
      > f z) (tails (x<span class="fu"
      >:</span
      >xs))<br
       />&#8801; <span class="co"
      >{- tails def -}</span
      ><br
       />  <span class="fu"
      >map</span
      > (<span class="fu"
      >foldr</span
      > f z) ((x<span class="fu"
      >:</span
      >xs) <span class="fu"
      >:</span
      > tails xs)<br
       />&#8801; <span class="co"
      >{- map def -}</span
      ><br
       />  <span class="fu"
      >foldr</span
      > f z (x<span class="fu"
      >:</span
      >xs) <span class="fu"
      >:</span
      > <span class="fu"
      >map</span
      > (<span class="fu"
      >foldr</span
      > f z) (tails xs)<br
       />&#8801; <span class="co"
      >{- scanr spec -}</span
      ><br
       />  <span class="fu"
      >foldr</span
      > f z (x<span class="fu"
      >:</span
      >xs) <span class="fu"
      >:</span
      > <span class="fu"
      >scanr</span
      > f z xs<br
       />&#8801; <span class="co"
      >{- foldr def -}</span
      ><br
       />  (x <span class="ot"
      >`f`</span
      > <span class="fu"
      >foldr</span
      > f z xs) <span class="fu"
      >:</span
      > <span class="fu"
      >scanr</span
      > f z xs<br
       />&#8801; <span class="co"
      >{- head/scanr property -}</span
      ><br
       />  (x <span class="ot"
      >`f`</span
      > <span class="fu"
      >head</span
      > (<span class="fu"
      >scanr</span
      > f z xs)) <span class="fu"
      >:</span
      > <span class="fu"
      >scanr</span
      > f z xs<br
       />&#8801; <span class="co"
      >{- factor out shared expression -}</span
      ><br
       />  (x <span class="ot"
      >`f`</span
      > <span class="fu"
      >head</span
      > qs) <span class="fu"
      >:</span
      > qs <span class="kw"
      >where</span
      > qs <span class="fu"
      >=</span
      > <span class="fu"
      >scanr</span
      > f z xs<br
       />&#8801; <span class="co"
      >{- stylistic variation -}</span
      ><br
       />  (x <span class="ot"
      >`f`</span
      > q) <span class="fu"
      >:</span
      > qs <span class="kw"
      >where</span
      > qs<span class="fu"
      >@</span
      >(q<span class="fu"
      >:</span
      >_) <span class="fu"
      >=</span
      > <span class="fu"
      >scanr</span
      > f z xs<br
       /></code
    ></pre
  ></div
>

<div id="coming-attractions"
><h3
  >Coming attractions</h3
  ><p
  >The scan implementations above are thoroughly sequential, in that they thread a single linear chain of data dependencies throughout the computation. Upcoming posts will look at more parallel-friendly variations.</p
  ></div
>
<p><a href="http://conal.net/blog/?flattrss_redirect&amp;id=341&amp;md5=1d0b111320d43a274e7ac15b91118e5d"><img src="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png" srcset="http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@2x.png 2xhttp://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white.png, http://conal.net/blog/wp-content/plugins/flattr/img/flattr-badge-white@3x.png 3x" alt="Flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://conal.net/blog/posts/deriving-list-scans/feed</wfw:commentRss>
		<slash:comments>5</slash:comments>
		<atom:link rel="payment" title="Flattr this!" href="https://flattr.com/submit/auto?user_id=conal&amp;popout=1&amp;url=http%3A%2F%2Fconal.net%2Fblog%2Fposts%2Fderiving-list-scans&amp;language=en_GB&amp;category=text&amp;title=Deriving+list+scans&amp;description=I%26%238217%3Bve+been+playing+with+deriving+efficient+parallel%2C+imperative+implementations+of+%26quot%3Bprefix+sum%26quot%3B+or+more+generally+%26quot%3Bleft+scan%26quot%3B.+Following+posts+will+explore+the+parallel+%26amp%3B+imperative+derivations%2C+but+as+a+warm-up%2C...&amp;tags=fold%2Cprogram+derivation%2Cscan%2Cblog" type="text/html" />
	</item>
	</channel>
</rss>
