## Composable parallel scanning

The post Deriving list scans gave a simple specification of the list-scanning functions scanl and scanr, and then transformed those specifications into the standard optimized implementations. Next, the post Deriving parallel tree scans adapted the specifications and derivations to a type of binary trees. The resulting implementations are parallel-friendly, but not work-efficient, in that they perform $n\phantom{\rule{0.167em}{0ex}}\mathrm{log}\phantom{\rule{0.167em}{0ex}}n$ work vs linear work as in the best-known sequential algorithm.

Besides the work-inefficiency, I don’t know how to extend the critical initTs and tailTs functions (analogs of inits and tails on lists) to depth-typed, perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The difficulty I encounter is that the functions initTs and tailTs make unbalanced trees out of balanced ones, so I don’t know how to adapt the specifications when types prevent the existence of unbalanced trees.

This new post explores an approach to generalized scanning via type classes. After defining the classes and giving a simple example, I’ll give a simple & general framework based on composing functor combinators.

Edits:

• 2011-03-02: Fixed typo. "constant functor is easiest" (instead of "identity functor"). Thanks, frguybob.
• 2011-03-05: Removed final unfinished sentence.
• 2011-07-28: Replace "assocL" with "assocR" in prefixScan derivation for g ∘ f.

### Generalizing list scans

The left and right scan functions on lists have an awkward feature. The output list has one more element than the input list, corresponding to the fact that the number of prefixes (inits) of a list is one more than the number of elements, and similarly for suffixes (tails).

While it’s easy to extend a list by adding one more element, it’s not easy with other functors. In Deriving parallel tree scans, I simply removed the element from the scan. In this post, I’ll instead change the interface to produce an output of exactly the same shape, plus one extra element. The extra element will equal a fold over the complete input. If you recall, we had to search for that complete fold in an input subtree in order to adjust the other subtree. (See headT and lastT and their generalizations in Deriving parallel tree scans.) Separating out this value eliminates the search.

Define a class with methods for prefix and suffix scan:

class Scan f where
prefixScan, suffixScan Monoid m f m (m, f m)

Prefix scans (prefixScan) accumulate moving left-to-right, while suffix scans (suffixScan) accumulate moving right-to-left.

#### A simple example: pairs

To get a first sense of generalized scans, let’s use see how to scan over a pair functor.

data Pair a = a :# a deriving (Eq,Ord,Show)

With GHC’s DeriveFunctor option, we could also derive a Functor instance, but for clarity, define it explicitly:

instance Functor Pair where
fmap f (a :# b) = (f a :# f b)

The scans:

instance Scan Pair where
prefixScan (a :# b) = (a ⊕ b, (∅ :# a))
suffixScan (a :# b) = (a ⊕ b, (b :# ∅))

As you can see, if we eliminated the elements, we could shift to the left or right and forgo the extra result.

Naturally, there is also a Fold instance, and the scans produce the fold results as well sub-folds:

instance Foldable Pair where
fold (a :# b) = a ⊕ b

The Pair functor also has unsurprising instances for Applicative and Traversable.

instance Applicative Pair where
pure a = a :# a
(f :# g) <*> (x :# y) = (f x :# g y)

instance Traversable Pair where
sequenceA (fa :# fb) = (:#) <\$> fa <*> fb

We don’t really have to figure out how to define scans for every functor separately. We can instead look at how functors are are composed out of their essential building blocks.

### Scans for functor combinators

To see how to scan over a broad range of functors, let’s look at each of the functor combinators, e.g., as in Elegant memoization with higher-order types.

#### Constant

The constant functor is easiest.

newtype Const x a = Const x

There are no values to accumulate, so the final result (fold) is .

instance Scan (Const x) where
prefixScan (Const x) = (∅, Const x)
suffixScan = prefixScan

#### Identity

The identity functor is nearly as easy.

newtype Id a = Id a
instance Scan Id where
prefixScan (Id m) = (m, Id ∅)
suffixScan = prefixScan

#### Sum

Scanning in a sum is just scanning in a summand:

data (f + g) a = InL (f a) | InR (g a)
instance (Scan f, Scan g)  Scan (f + g) where
prefixScan (InL fa) = second InL (prefixScan fa)
prefixScan (InR ga) = second InR (prefixScan ga)

suffixScan (InL fa) = second InL (suffixScan fa)
suffixScan (InR ga) = second InR (suffixScan ga)

These definitions correspond to simple "commutative diagram" properties, e.g.,

prefixScan ∘ InL ≡ second InL ∘ prefixScan

#### Product

Product scannning is a little trickier.

data (f × g) a = f a × g a

Scan each of the two parts separately, and then combine the final (fold) part of one result with each of the non-final elements of the other.

instance (Scan f, Scan g, Functor f, Functor g)  Scan (f × g) where
prefixScan (fa × ga) = (af ⊕ ag, fa' × ((af ⊕) <\$> ga'))
where (af,fa') = prefixScan fa
(ag,ga') = prefixScan ga

suffixScan (fa × ga) = (af ⊕ ag, ((⊕ ag) <\$> fa') × ga')
where (af,fa') = suffixScan fa
(ag,ga') = suffixScan ga

#### Composition

Finally, composition is the trickiest.

newtype (g ∘ f) a = O (g (f a))

The target signatures:

prefixScan, suffixScan  Monoid m  (g ∘ f) m  (m, (g ∘ f) m)

To find the prefix and suffix scan definitions, fiddle with types beginning at the domain type for prefixScan or suffixScan and arriving at the range type.

Some helpers:

zip  Applicative g  (g a, g b)  g (a,b)
zip = uncurry (liftA2 (,))

unzip Functor g g (a,b) (g a, g b)
unzip = fmap fst &&& fmap snd
assocR  ((a,b),c)  (a,(b,c))
assocR ((a,b),c) = (a,(b,c))
adjustL  (Functor f, Monoid m)  (m, f m)  f m
adjustL (m, ms) = (m ⊕) <\$> ms

adjustR (Functor f, Monoid m) (m, f m) f m
adjustR (m, ms) = (⊕ m) <\$> ms

First prefixScan:

gofm                      (g ∘ f) m
unO '' g (f m)
fmap prefixScan '' g (m, f m)
unzip '' (g m, g (f m))
first prefixScan '' ((m, g m), g (f m))
assocR '' (m, (g m, g (f m)))
second zip '' (m, g (m, f m))
second (fmap adjustL) '' (m, g (f m))
second O '' (m, (g ∘ f) m)

Then suffixScan:

gofm                      (g ∘ f) m
unO '' g (f m)
fmap suffixScan '' g (m, f m)
unzip '' (g m, g (f m))
first suffixScan '' ((m, g m), g (f m))
assocR '' (m, (g m, g (f m)))
second zip '' (m, (g (m, f m)))
second (fmap adjustR) '' (m, (g (f m)))
second O '' (m, ((g ∘ f) m))

Putting together the pieces and simplifying just a bit leads to the method definitions:

instance (Scan g, Scan f, Functor f, Applicative g)  Scan (g ∘ f) where
prefixScan = second (Ofmap adjustL ∘ zip)
∘ assocR
∘ first prefixScan
unzip
fmap prefixScan
∘ unO

suffixScan = second (Ofmap adjustR ∘ zip)
∘ assocR
∘ first suffixScan
unzip
fmap suffixScan
∘ unO

### What’s coming up?

• What might not easy to spot at this point is that the prefixScan and suffixScan methods given in this post do essentially the same job as in Deriving parallel tree scans, when the binary tree type is deconstructed into functor combinators. A future post will show this connection.
• Switch from standard (right-folded) trees to left-folded trees (in the sense of A trie for length-typed vectors and From tries to trees), which reduces the running time from $\Theta \phantom{\rule{0.167em}{0ex}}\left(n\phantom{\rule{0.167em}{0ex}}\mathrm{log}\phantom{\rule{0.167em}{0ex}}n\right)$ to $\Theta \phantom{\rule{0.167em}{0ex}}n$.
• Scanning in place, i.e., destructively replacing the values in the input structure rather than allocating a new structure.

Your post seems to end mid-sentence (unless it is just very dry humour!).

2. #### conal:

Thanks, Jade. I’ve eliminated that final sentence fragment.

3. #### Conal Elliott » Blog Archive » Parallel tree scanning by composition:

[...] About « Composable parallel scanning [...]

4. #### Twan van Laarhoven:

You could implement prefixScan and suffixScan using mapAccumL and mapAccumR from Data.Traversable. Since Ghc can also derive Traversable, this way you need no extra manually written code for each data type.

5. #### conal:

Twan: Yes. I was thinking of using mapAccumL and mapAccumR to define prefixScan and suffixScan as specifications. I wouldn’t want the resulting implementations, since they’re sequential, not parallel.