Conal Elliott

Composable parallel scanning

1st March 2011, 02:33 pm

The post Deriving list scans gave a simple specification of the list-scanning functions scanl and scanr, and then transformed those specifications into the standard optimized implementations. Next, the post Deriving parallel tree scans adapted the specifications and derivations to a type of binary trees. The resulting implementations are parallel-friendly, but not work-efficient, in that they perform $n \log n$ work vs linear work as in the best-known sequential algorithm.

Besides the work-inefficiency, I don’t know how to extend the critical initTs and tailTs functions (analogs of inits and tails on lists) to depth-typed, perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The difficulty I encounter is that the functions initTs and tailTs make unbalanced trees out of balanced ones, so I don’t know how to adapt the specifications when types prevent the existence of unbalanced trees.

This new post explores an approach to generalized scanning via type classes. After defining the classes and giving a simple example, I’ll give a simple & general framework based on composing functor combinators.

Edits:

2011-03-02: Fixed typo. "constant functor is easiest" (instead of "identity functor"). Thanks, frguybob.
2011-03-05: Removed final unfinished sentence.
2011-07-28: Replace "assocL" with "assocR" in prefixScan derivation for g ∘ f.

Generalizing list scans

The left and right scan functions on lists have an awkward feature. The output list has one more element than the input list, corresponding to the fact that the number of prefixes (inits) of a list is one more than the number of elements, and similarly for suffixes (tails).

While it’s easy to extend a list by adding one more element, it’s not easy with other functors. In Deriving parallel tree scans, I simply removed the ∅ element from the scan. In this post, I’ll instead change the interface to produce an output of exactly the same shape, plus one extra element. The extra element will equal a fold over the complete input. If you recall, we had to search for that complete fold in an input subtree in order to adjust the other subtree. (See headT and lastT and their generalizations in Deriving parallel tree scans.) Separating out this value eliminates the search.

Define a class with methods for prefix and suffix scan:

class Scan f where
  prefixScan, suffixScan ∷ Monoid m ⇒ f m → (m, f m)

Prefix scans (prefixScan) accumulate moving left-to-right, while suffix scans (suffixScan) accumulate moving right-to-left.

A simple example: pairs

To get a first sense of generalized scans, let’s use see how to scan over a pair functor.

data Pair a = a :# a deriving (Eq,Ord,Show)

With GHC’s DeriveFunctor option, we could also derive a Functor instance, but for clarity, define it explicitly:

instance Functor Pair where
  fmap f (a :# b) = (f a :# f b)

The scans:

instance Scan Pair where
  prefixScan (a :# b) = (a ⊕ b, (∅ :# a))
  suffixScan (a :# b) = (a ⊕ b, (b :# ∅))

As you can see, if we eliminated the ∅ elements, we could shift to the left or right and forgo the extra result.

Naturally, there is also a Fold instance, and the scans produce the fold results as well sub-folds:

instance Foldable Pair where
  fold (a :# b) = a ⊕ b

The Pair functor also has unsurprising instances for Applicative and Traversable.

instance Applicative Pair where
  pure a = a :# a
  (f :# g) <*> (x :# y) = (f x :# g y)

instance Traversable Pair where
  sequenceA (fa :# fb) = (:#) <$> fa <*> fb

We don’t really have to figure out how to define scans for every functor separately. We can instead look at how functors are are composed out of their essential building blocks.

Scans for functor combinators

To see how to scan over a broad range of functors, let’s look at each of the functor combinators, e.g., as in Elegant memoization with higher-order types.

Constant

The constant functor is easiest.

newtype Const x a = Const x

There are no values to accumulate, so the final result (fold) is ∅.

instance Scan (Const x) where
  prefixScan (Const x) = (∅, Const x)
  suffixScan           = prefixScan

Identity

The identity functor is nearly as easy.

newtype Id a = Id a

instance Scan Id where
  prefixScan (Id m) = (m, Id ∅)
  suffixScan        = prefixScan

Sum

Scanning in a sum is just scanning in a summand:

data (f + g) a = InL (f a) | InR (g a)

instance (Scan f, Scan g) ⇒ Scan (f + g) where
  prefixScan (InL fa) = second InL (prefixScan fa)
  prefixScan (InR ga) = second InR (prefixScan ga)

  suffixScan (InL fa) = second InL (suffixScan fa)
  suffixScan (InR ga) = second InR (suffixScan ga)

These definitions correspond to simple "commutative diagram" properties, e.g.,

prefixScan ∘ InL ≡ second InL ∘ prefixScan

Product

Product scannning is a little trickier.

data (f × g) a = f a × g a

Scan each of the two parts separately, and then combine the final (fold) part of one result with each of the non-final elements of the other.

instance (Scan f, Scan g, Functor f, Functor g) ⇒ Scan (f × g) where
  prefixScan (fa × ga) = (af ⊕ ag, fa' × ((af ⊕) <$> ga'))
   where (af,fa') = prefixScan fa
         (ag,ga') = prefixScan ga

  suffixScan (fa × ga) = (af ⊕ ag, ((⊕ ag) <$> fa') × ga')
   where (af,fa') = suffixScan fa
         (ag,ga') = suffixScan ga

Composition

Finally, composition is the trickiest.

newtype (g ∘ f) a = O (g (f a))

The target signatures:

  prefixScan, suffixScan ∷ Monoid m ⇒ (g ∘ f) m → (m, (g ∘ f) m)

To find the prefix and suffix scan definitions, fiddle with types beginning at the domain type for prefixScan or suffixScan and arriving at the range type.

Some helpers:

zip ∷ Applicative g ⇒ (g a, g b) → g (a,b)
zip = uncurry (liftA2 (,))

unzip ∷ Functor g ⇒ g (a,b) → (g a, g b)
unzip = fmap fst &&& fmap snd

assocR ∷ ((a,b),c) → (a,(b,c))
assocR   ((a,b),c) =  (a,(b,c))

adjustL ∷ (Functor f, Monoid m) ⇒ (m, f m) → f m
adjustL (m, ms) = (m ⊕) <$> ms

adjustR ∷ (Functor f, Monoid m) ⇒ (m, f m) → f m
adjustR (m, ms) = (⊕ m) <$> ms

First prefixScan:

gofm                     ∷ (g ∘ f) m
unO                   '' ∷ g (f m)
fmap prefixScan       '' ∷ g (m, f m)
unzip                 '' ∷ (g m, g (f m))
first prefixScan      '' ∷ ((m, g m), g (f m))
assocR                '' ∷ (m, (g m, g (f m)))
second zip            '' ∷ (m, g (m, f m))
second (fmap adjustL) '' ∷ (m, g (f m))
second O              '' ∷ (m, (g ∘ f) m)

Then suffixScan:

gofm                     ∷ (g ∘ f) m
unO                   '' ∷ g (f m)
fmap suffixScan       '' ∷ g (m, f m)
unzip                 '' ∷ (g m, g (f m))
first suffixScan      '' ∷ ((m, g m), g (f m))
assocR                '' ∷ (m, (g m, g (f m)))
second zip            '' ∷ (m, (g (m, f m)))
second (fmap adjustR) '' ∷ (m, (g (f m)))
second O              '' ∷ (m, ((g ∘ f) m))

Putting together the pieces and simplifying just a bit leads to the method definitions:

instance (Scan g, Scan f, Functor f, Applicative g) ⇒ Scan (g ∘ f) where
  prefixScan = second (O ∘ fmap adjustL ∘ zip)
             ∘ assocR
             ∘ first prefixScan
             ∘ unzip
             ∘ fmap prefixScan
             ∘ unO

  suffixScan = second (O ∘ fmap adjustR ∘ zip)
             ∘ assocR
             ∘ first suffixScan
             ∘ unzip
             ∘ fmap suffixScan
             ∘ unO

What’s coming up?

What might not easy to spot at this point is that the prefixScan and suffixScan methods given in this post do essentially the same job as in Deriving parallel tree scans, when the binary tree type is deconstructed into functor combinators. A future post will show this connection.
Switch from standard (right-folded) trees to left-folded trees (in the sense of A trie for length-typed vectors and From tries to trees), which reduces the running time from $Θ (n \log n)$ to $Θ n$ .
Scanning in place, i.e., destructively replacing the values in the input structure rather than allocating a new structure.

Tags: functor, scan | Comment (RSS) | Trackback

5 Comments

Jade NB:

Your post seems to end mid-sentence (unless it is just very dry humour!).
2 March 2011, 2:55 pm
conal:

Thanks, Jade. I’ve eliminated that final sentence fragment.
5 March 2011, 11:57 am
Conal Elliott » Blog Archive » Parallel tree scanning by composition:

[…] About « Composable parallel scanning […]
24 May 2011, 12:31 pm
Twan van Laarhoven:

You could implement prefixScan and suffixScan using mapAccumL and mapAccumR from Data.Traversable. Since Ghc can also derive Traversable, this way you need no extra manually written code for each data type.
26 May 2011, 11:58 pm
conal:

Twan: Yes. I was thinking of using mapAccumL and mapAccumR to define prefixScan and suffixScan as specifications. I wouldn’t want the resulting implementations, since they’re sequential, not parallel.
27 May 2011, 4:39 pm

Composable parallel scanning

Generalizing list scans

A simple example: pairs

Scans for functor combinators

Constant

Identity

Sum

Product

Composition

What’s coming up?

5 Comments

Jade NB:

conal:

Conal Elliott » Blog Archive » Parallel tree scanning by composition:

Twan van Laarhoven:

conal:

Leave a comment

Recent Posts

Recent Comments

Archives

Tags

Tags

Meta