Parallel tree scanning by composition

My last few blog posts have been on the theme of scans, and particularly on parallel scans. In Composable parallel scanning, I tackled parallel scanning in a very general setting. There are five simple building blocks out of which a vast assortment of data structures can be built, namely constant (no value), identity (one value), sum, product, and composition. The post defined parallel prefix and suffix scan for each of these five "functor combinators", in terms of the same scan operation on each of the component functors. Every functor built out of this basic set thus has a parallel scan. Functors defined more conventionally can be given scan implementations simply by converting to a composition of the basic set, scanning, and then back to the original functor. Moreover, I expect this implementation could be generated automatically, similarly to GHC’s `DerivingFunctor` extension.

Now I’d like to show two examples of parallel scan composition in terms of binary trees, namely the top-down and bottom-up variants of perfect binary leaf trees used in previous posts. (In previous posts, I used the terms "right-folded" and "left-folded" instead of "top-down" and "bottom-up".) The resulting two algorithms are expressed nearly identically, but have differ significantly in the work performed. The top-down version does $\Theta \left(n\phantom{\rule{0.167em}{0ex}}\mathrm{log}\phantom{\rule{0.167em}{0ex}}n\right)$ work, while the bottom-up version does only $\Theta \left(n\right)$, and thus the latter algorithm is work-efficient, while the former is not. Moreover, with a very simple optimization, the bottom-up tree algorithm corresponds closely to Guy Blelloch’s parallel prefix scan for arrays, given in Programming parallel algorithms. I’m delighted with this result, as I had been wondering how to think about Guy’s algorithm.

Edit:

• 2011-05-31: Added `Scan` and `Applicative` instances for `T2` and `T4`.

Continue reading ‘Parallel tree scanning by composition’ »

Composable parallel scanning

The post Deriving list scans gave a simple specification of the list-scanning functions `scanl` and `scanr`, and then transformed those specifications into the standard optimized implementations. Next, the post Deriving parallel tree scans adapted the specifications and derivations to a type of binary trees. The resulting implementations are parallel-friendly, but not work-efficient, in that they perform $n\phantom{\rule{0.167em}{0ex}}\mathrm{log}\phantom{\rule{0.167em}{0ex}}n$ work vs linear work as in the best-known sequential algorithm.

Besides the work-inefficiency, I don’t know how to extend the critical `initTs` and `tailTs` functions (analogs of `inits` and `tails` on lists) to depth-typed, perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The difficulty I encounter is that the functions `initTs` and `tailTs` make unbalanced trees out of balanced ones, so I don’t know how to adapt the specifications when types prevent the existence of unbalanced trees.

This new post explores an approach to generalized scanning via type classes. After defining the classes and giving a simple example, I’ll give a simple & general framework based on composing functor combinators.

Edits:

• 2011-03-02: Fixed typo. "constant functor is easiest" (instead of "identity functor"). Thanks, frguybob.
• 2011-03-05: Removed final unfinished sentence.
• 2011-07-28: Replace "`assocL`" with "`assocR`" in `prefixScan` derivation for `g ∘ f`.

Continue reading ‘Composable parallel scanning’ »

Deriving parallel tree scans

The post Deriving list scans explored folds and scans on lists and showed how the usual, efficient scan implementations can be derived from simpler specifications.

Let’s see now how to apply the same techniques to scans over trees.

This new post is one of a series leading toward algorithms optimized for execution on massively parallel, consumer hardware, using CUDA or OpenCL.

Edits:

• 2011-03-01: Added clarification about "`∅`" and "`(⊕)`".
• 2011-03-23: corrected "linear-time" to "linear-work" in two places.

Continue reading ‘Deriving parallel tree scans’ »

Deriving list scans

I’ve been playing with deriving efficient parallel, imperative implementations of "prefix sum" or more generally "left scan". Following posts will explore the parallel & imperative derivations, but as a warm-up, I’ll tackle the functional & sequential case here.

Folds

You’re probably familiar with the higher-order functions for left and right "fold". The current documentation says:

`foldl`, applied to a binary operator, a starting value (typically the left-identity of the operator), and a list, reduces the list using the binary operator, from left to right:

``foldl f z [x1, x2, ⋯, xn] ≡ (⋯((z `f` x1) `f` x2) `f`⋯) `f` xn``

The list must be finite.

`foldr`, applied to a binary operator, a starting value (typically the right-identity of the operator), and a list, reduces the list using the binary operator, from right to left:

``foldr f z [x1, x2, ⋯, xn] ≡ x1 `f` (x2 `f` ⋯ (xn `f` z)⋯)``

And here are typical definitions:

``foldl ∷ (b → a → b) → b → [a] → bfoldl f z []     = zfoldl f z (x:xs) = foldl f (z `f` x) xsfoldr ∷ (a → b → b) → b → [a] → bfoldr f z []     = zfoldr f z (x:xs) = x `f` foldr f z xs``

Notice that `foldl` builds up its result one step at a time and reveals it all at once, in the end. The whole result value is locked up until the entire input list has been traversed. In contrast, `foldr` starts revealing information right away, and so works well with infinite lists. Like `foldl`, `foldr` also yields only a final value.

Sometimes it’s handy to also get to all of the intermediate steps. Doing so takes us beyond the land of folds to the kingdom of scans.

Scans

The `scanl` and `scanr` functions correspond to `foldl` and `foldr` but produce all intermediate accumulations, not just the final one.

``scanl ∷ (b → a → b) → b → [a] → [b]scanl f z [x1, x2,  ⋯ ] ≡ [z, z `f` x1, (z `f` x1) `f` x2, ⋯]scanr ∷ (a → b → b) → b → [a] → [b]scanr f z [⋯, xn_1, xn] ≡ [⋯, xn_1 `f` (xn `f` z), xn `f` z, z]``

As you might expect, the last value is the complete left fold, and the first value in the scan is the complete right fold:

``last (scanl f z xs) ≡ foldl f z xshead (scanr f z xs) ≡ foldr f z xs``

which is to say

``last ∘ scanl f z ≡ foldl f zhead ∘ scanr f z ≡ foldr f z``

The standard scan definitions are trickier than the fold definitions:

``scanl ∷ (b → a → b) → b → [a] → [b]scanl f z ls = z : (case ls of                     []   → []                     x:xs → scanl f (z `f` x) xs)scanr ∷ (a → b → b) → b → [a] → [b]scanr _ z []     = [z]scanr f z (x:xs) = (x `f` q) : qs                   where qs@(q:_) = scanr f z xs``

Every time I encounter these definitions, I have to walk through it again to see what’s going on. I finally sat down to figure out how these tricky definitions might emerge from simpler specifications. In other words, how to derive these definitions systematically from simpler but less efficient definitions.

Most likely, these derivations have been done before, but I learned something from the effort, and I hope you do, too.

Continue reading ‘Deriving list scans’ »