Conal Elliott » program derivation

Parallel tree scanning by composition

Conal — Tue, 24 May 2011 20:31:23 +0000

My last few blog posts have been on the theme of scans, and particularly on parallel scans. In Composable parallel scanning, I tackled parallel scanning in a very general setting. There are five simple building blocks out of which a vast assortment of data structures can be built, namely constant (no value), identity (one value), sum, product, and composition. The post defined parallel prefix and suffix scan for each of these five "functor combinators", in terms of the same scan operation on each of the component functors. Every functor built out of this basic set thus has a parallel scan. Functors defined more conventionally can be given scan implementations simply by converting to a composition of the basic set, scanning, and then back to the original functor. Moreover, I expect this implementation could be generated automatically, similarly to GHC’s DerivingFunctor extension.

Now I’d like to show two examples of parallel scan composition in terms of binary trees, namely the top-down and bottom-up variants of perfect binary leaf trees used in previous posts. (In previous posts, I used the terms "right-folded" and "left-folded" instead of "top-down" and "bottom-up".) The resulting two algorithms are expressed nearly identically, but have differ significantly in the work performed. The top-down version does $Θ (n \log n)$ work, while the bottom-up version does only $Θ (n)$ , and thus the latter algorithm is work-efficient, while the former is not. Moreover, with a very simple optimization, the bottom-up tree algorithm corresponds closely to Guy Blelloch’s parallel prefix scan for arrays, given in Programming parallel algorithms. I’m delighted with this result, as I had been wondering how to think about Guy’s algorithm.

Edit:

2011-05-31: Added Scan and Applicative instances for T2 and T4.

Scanning via functor combinators

In Composable parallel scanning, we saw the Scan class:

class Scan f where
  prefixScan, suffixScan ∷ Monoid m ⇒ f m → (m, f m)

Given a structure of values, the prefix and suffix scan methods generate the overall fold (of type m), plus a structure of the same type as the input. (In contrast, the usual Haskell scanl and scanr functions on lists yield a single list with one more element than the source list. I changed the interface for generality and composability.) The post gave instances for the basic set of five functor combinators.

Most functors are not defined via the basic combinators, but as mentioned above, we can scan by conversion to and from the basic set. For convenience, encapsulate this conversion in a type class:

class EncodeF f where
  type Enc f ∷ * → *
  encode ∷ f a → Enc f a
  decode ∷ Enc f a → f a

and define scan functions via EncodeF:

prefixScanEnc, suffixScanEnc ∷
  (EncodeF f, Scan (Enc f), Monoid m) ⇒ f m → (m, f m)
prefixScanEnc = second decode ∘ prefixScan ∘ encode
suffixScanEnc = second decode ∘ suffixScan ∘ encode

Lists

As a first example, consider

instance EncodeF [] where
  type Enc [] = Const () + Id × []
  encode [] = InL (Const ())
  encode (a : as) = InR (Id a × as)
  decode (InL (Const ())) = []
  decode (InR (Id a × as)) = a : as

And declare a boilerplate Scan instance via EncodeF:

instance Scan [] where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

I haven’t checked the details, but I think with this instance, suffix scanning has okay performance, while prefix scan does quadratic work. The reason is the in the Scan instance for products, the two components are scanned independently (in parallel), and then the whole second component is adjusted for prefixScan, while the whole first component is adjusted for suffixScan. In the case of lists, the first component is the list head, and second component is the list tail.

For your reading convenience, here’s that Scan instance again:

instance (Scan f, Scan g, Functor f, Functor g) ⇒ Scan (f × g) where
  prefixScan (fa × ga) = (af ⊕ ag, fa' × ((af ⊕) <$> ga'))
   where (af,fa') = prefixScan fa
         (ag,ga') = prefixScan ga

  suffixScan (fa × ga) = (af ⊕ ag, ((⊕ ag) <$> fa') × ga')
   where (af,fa') = suffixScan fa
         (ag,ga') = suffixScan ga

The lop-sidedness of the list type thus interferes with parallelization, and makes the parallel scans perform much worse than cumulative sequential scans.

Let’s next look at a more balanced type.

Binary Trees

We’ll get better parallel performance by organizing our data so that we can cheaply partition it into roughly equal pieces. Tree types allows such partitioning.

Top-down trees

We’ll try a few variations, starting with a simple binary tree.

data T1 a = L1 a | B1 (T1 a) (T1 a) deriving Functor

Encoding and decoding is straightforward:

instance EncodeF T1 where
  type Enc T1 = Id + T1 × T1
  encode (L1 a)   = InL (Id a)
  encode (B1 s t) = InR (s × t)
  decode (InL (Id a))  = L1 a
  decode (InR (s × t)) = B1 s t

instance Scan T1 where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

Note that these definitions could be generated automatically from the data type definition.

For balanced trees, prefix and suffix scan divide the problem in half at each step, solve each half, and do linear work to patch up one of the two halves. Letting $n$ be the number of elements, and $W (n)$ the work, we have the recurrence $W (n) = 2 W (n / 2) + c n$ for some constant factor $c$ . By the Master theorem, therefore, the work done is $Θ (n \log n)$ . (Use case 2, with $a = b = 2$ , $f (n) = c n$ , and $k = 0$ .)

Again assuming a balanced tree, the computation dependencies have logarithmic depth, so the ideal parallel running time (assuming sufficient processors) is $Θ (\log n)$ . Thus we have an algorithm that is depth-efficient (modulo constant factors) but work-inefficient.

Composition

A binary tree as defined above is either a leaf or a pair of binary trees. We can make this pair-ness more explicit with a reformulation:

data T2 a = L2 a | B2 (Pair (T2 a)) deriving Functor

where Pair, as in Composable parallel scanning, is defined as

data Pair a = a :# a deriving Functor

or even

type Pair = Id × Id

For encoding and decoding, we could use the same representation as with T1, but let’s instead use a more natural one for the definition of T2:

instance EncodeF T2 where
  type Enc T2 = Id + Pair ∘ T2
  encode (L2 a)  = InL (Id a)
  encode (B2 st) = InR (O st)
  decode (InL (Id a)) = L2 a
  decode (InR (O st)) = B2 st

Boilerplate scanning:

instance Scan T2 where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

for which we’ll need an applicative instance:

instance Applicative T2 where
  pure = L2
  L2 f <*> L2 x = L2 (f x)
  B2 (fs :# gs) <*> B2 (xs :# ys) = B2 ((fs <*> xs) :# (gs <*> ys))
  _ <*> _ = error "T2 (<*>): structure mismatch"

The O constructor is for functor composition.

With a small change to the tree type, we can make the composition of Pair and T more explicit:

data T3 a = L3 a | B3 ((Pair ∘ T3) a) deriving Functor

Then the conversion becomes even simpler, since there’s no need to add or remove O wrappers:

instance EncodeF T3 where
  type Enc T3 = Id + Pair ∘ T3
  encode (L3 a)  = InL (Id a)
  encode (B3 st) = InR st
  decode (InL (Id a)) = L3 a
  decode (InR st)     = B3 st

Bottom-up trees

In the formulations above, a non-leaf tree consists of a pair of trees. I’ll call these trees "top-down", since visible pair structure begins at the top.

With a very small change, we can instead use a tree of pairs:

data T4 a = L4 a | B4 (T4 (Pair a)) deriving Functor

Again an applicative instance allows a standard Scan instance:

instance Scan T4 where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

instance Applicative T4 where
  pure = L4
  L4 f   <*> L4 x   = L4 (f x)
  B4 fgs <*> B4 xys = B4 (liftA2 h fgs xys)
   where h (f :# g) (x :# y) = f x :# g y
  _ <*> _ = error "T4 (<*>): structure mismatch"

or a more explicitly composed form:

data T5 a = L5 a | B5 ((T5 ∘ Pair) a) deriving Functor

I’ll call these new variations "bottom-up" trees, since visible pair structure begins at the bottom. After stripping off the branch constructor, B4, we can get at the pair-valued leaves by means of fmap, fold, or traverse (or variations). For B5, we’d also have to strip off the O wrapper (functor composition).

Encoding is nearly the same as with top-down trees. For instance,

instance EncodeF T4 where
  type Enc T4 = Id + T4 ∘ Pair
  encode (L4 a) = InL (Id a)
  encode (B4 t) = InR (O t)
  decode (InL (Id a)) = L4 a
  decode (InR (O t))  = B4 t

Scanning pairs

We’ll need to scan on the Pair functor. If we use the definition of Pair above in terms of Id and (×), then we’ll get scanning for free. For using Pair, I find the explicit data type definition above more convenient. We can then derive a Scan instance by conversion. Start with a standard specification:

data Pair a = a :# a deriving Functor

And encode & decode explicitly:

instance EncodeF Pair where
  type Enc Pair = Id × Id
  encode (a :# b) = Id a × Id b
  decode (Id a × Id b) = a :# b

Then use our boilerplate Scan instance for EncodeF instances:

instance Scan Pair where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

We’ve seen the Scan instance for (×) above. The instance for Id is very simple:

newtype Id a = Id a

instance Scan Id where
  prefixScan (Id m) = (m, Id ∅)
  suffixScan        = prefixScan

Given these definitions, we can calculate a more streamlined Scan instance for Pair:

  prefixScan (a :# b)
≡  {- specification -}
  prefixScanEnc (a :# b)
≡  {- prefixScanEnc definition -}
  (second decode ∘ prefixScan ∘ encode) (a :# b)
≡  {- (∘) -}
  second decode (prefixScan (encode (a :# b)))
≡  {- encode definition for Pair -}
  second decode (prefixScan (Id a × Id b))
≡  {- prefixScan definition for f × g -}
  second decode
    (af ⊕ ag, fa' × ((af ⊕) <$> ga'))
     where (af,fa') = prefixScan (Id a)
           (ag,ga') = prefixScan (Id b)
≡  {- Definition of second on functions -}
  (af ⊕ ag, decode (fa' × ((af ⊕) <$> ga')))
   where (af,fa') = prefixScan (Id a)
         (ag,ga') = prefixScan (Id b)
≡  {- prefixScan definition for Id -}
  (af ⊕ ag, decode (fa' × ((af ⊕) <$> ga')))
   where (af,fa') = (a, Id ∅)
         (ag,ga') = (b, Id ∅)
≡  {- substitution -}
  (a ⊕ b, decode (Id ∅ × ((a ⊕) <$> Id ∅)))
≡  {- fmap/(<$>) for Id -}
  (a ⊕ b, decode (Id ∅ × Id (a ⊕ ∅)))
≡  {- Monoid law -}
  (a ⊕ b, decode (Id ∅ × Id a))
≡  {- decode definition on Pair -}
  (a ⊕ b, (∅ :# a))

Whew! And similarly for suffixScan.

Now let’s recall the Scan instance for Pair given in Composable parallel scanning:

instance Scan Pair where
  prefixScan (a :# b) = (a ⊕ b, (∅ :# a))
  suffixScan (a :# b) = (a ⊕ b, (b :# ∅))

Hurray! The derivation led us to the same definition. A "sufficiently smart" compiler could do this derivation automatically.

With this warm-up derivation, let’s now turn to trees.

Scanning trees

Given the tree encodings above, how does scan work? We’ll have to consult Scan instances for some of the functor combinators. The product instance is repeated above. We’ll also want the instances for sum and composition. Omitting the suffixScan definitions for brevity:

data (f + g) a = InL (f a) | InR (g a)

instance (Scan f, Scan g) ⇒ Scan (f + g) where
  prefixScan (InL fa) = second InL (prefixScan fa)
  prefixScan (InR ga) = second InR (prefixScan ga)

newtype (g ∘ f) a = O (g (f a))

instance (Scan g, Scan f, Functor f, Applicative g) ⇒ Scan (g ∘ f) where
  prefixScan = second (O ∘ fmap adjustL ∘ zip)
             ∘ assocR
             ∘ first prefixScan
             ∘ unzip
             ∘ fmap prefixScan
             ∘ unO

This last definition uses a few utility functions:

zip ∷ Applicative g ⇒ (g a, g b) → g (a,b)
zip = uncurry (liftA2 (,))

unzip ∷ Functor g ⇒ g (a,b) → (g a, g b)
unzip = fmap fst &&& fmap snd

assocR ∷ ((a,b),c) → (a,(b,c))
assocR   ((a,b),c) =  (a,(b,c))

adjustL ∷ (Functor f, Monoid m) ⇒ (m, f m) → f m
adjustL (m, ms) = (m ⊕) <$> ms

Let’s consider how the Scan (g ∘ f) instance plays out for top-down vs bottom-up trees, given the functor-composition encodings above. The critical definitions:

type Enc T2 = Id + Pair ∘ T2

type Enc T4 = Id + T4 ∘ Pair

Focusing on the branch case, we have Pair ∘ T2 vs T4 ∘ Pair, so we’ll use the Scan (g ∘ f) instance either way. Let’s consider the work implied by that instance. There are two calls to prefixScan, plus a linear amount of other work. The meanings of those two calls differ, however:

For top-down trees (T2), the recursive tree scans are in fmap prefixScan, mapping over the pair of trees. The first prefixScan is a pair scan and so does constant work. Since there are two recursive calls, each working on a tree of half size (assuming balance), plus linear other work, the total work $Θ (n \log n)$ , as explained above.
For bottom-up trees (T4), there is only one recursive recursive tree scan, which appears in first prefixScan. The prefixScan in fmap prefixScan is pair scan and so does constant work but is mapped over the half-sized tree (of pairs), and so does linear work altogether. Since there only one recursive tree scan, at half size, plus linear other work, the total work is then proportional to $n + n / 2 + n / 4 + \dots \approx 2 n = Θ (n)$ . So we have a work-efficient algorithm!

Looking deeper

In addition to the simple analysis above of scanning over top-down and over bottom-up, let’s look in detail at what transpires and how each case can be optimized. This section may well have more detail than you’re interested in. If so, feel free to skip ahead.

Top-down

Beginning as with Pair,

  prefixScan t
≡  {- specification -}
  prefixScanEnc t
≡  {- prefixScanEnc definition -}
  (second decode ∘ prefixScan ∘ encode) t
≡  {- (∘) -}
  second decode (prefixScan (encode t))

Take T2, with T3 being quite similar. Now split into two cases for the two constructors of T2. First leaf:

  prefixScan (L2 m)
≡  {- as above -}
  second decode (prefixScan (encode (L2 m)))
≡  {- encode for L2 -}
  second decode (prefixScan (InL (Id m)))
≡  {- prefixScan for functor sum -}
  second decode (second InL (prefixScan (Id m)))
≡  {- prefixScan for Id -}
  second decode (second InL (m, Id ∅))
≡  {- second for functions -}
  second decode (m, InL (Id ∅))
≡  {- second for functions -}
  (m, decode (InL (Id ∅)))
≡  {- decode for L2 -}
  (m, L2 ∅)

Then branch:

  prefixScan (B2 (s :# t))
≡  {- as above -}
  second decode (prefixScan (encode (B2 (s :# t))))
≡  {- encode for L2 -}
  second decode (prefixScan (InR (O (s :# t))))
≡  {- prefixScan for (+) -}
  second decode (second InR (prefixScan (O (s :# t))))
≡  {- property of second -}
  second (decode ∘ InR) (prefixScan (O (s :# t)))

Focus on the prefixScan application:

  prefixScan (O (s :# t)) =
≡  {- prefixScan for (∘) -}
 ( second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan
 ∘ unzip ∘ fmap prefixScan ∘ unO ) (O (s :# t))
≡  {- unO/O -}
  ( second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan
  ∘ unzip ∘ fmap prefixScan ) (s :# t)
≡  {- fmap on Pair -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan ∘ unzip)
    (prefixScan s :# prefixScan t)
≡  {- expand prefixScan -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan ∘ unzip)
    ((ms,s') :# (mt,t'))
      where (ms,s') = prefixScan s
            (mt,t') = prefixScan t
≡  {- unzip -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan)
    ((ms :# mt), (s' :# t')) where ⋯
≡  {- first -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR)
    (prefixScan (ms :# mt), (s' :# t')) where ⋯
≡  {- prefixScan for Pair -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR)
    ((ms ⊕ mt, (∅ :# ms)), (s' :# t')) where ⋯
≡  {- assocR -}
  (second (O ∘ fmap adjustL ∘ zip))
    (ms ⊕ mt, ((∅ :# ms), (s' :# t'))) where ⋯
≡  {- second -}
  ( ms ⊕ mt
  , (O ∘ fmap adjustL ∘ zip) ((∅ :# ms), (s' :# t')) ) where ⋯
≡  {- zip -}
  ( ms ⊕ mt
  , (O ∘ fmap adjustL) ((∅,s') :# (ms,t')) )  where ⋯
≡  {- fmap for Pair -}
  ( ms ⊕ mt
  , O (adjustL (∅,s') :# adjustL (ms,t')) )  where ⋯
≡  {- adjustL -}
  ( ms ⊕ mt
  , O (((∅ ⊕) <$> s') :# ((ms ⊕) <$> t')) )  where ⋯
≡  {- Monoid law (left identity) -}
  ( ms ⊕ mt
  , O ((id <$> s') :# ((ms ⊕) <$> t')) )  where ⋯
≡  {- Functor law (fmap id) -}
  ( ms ⊕ mt
  , O (s' :# ((ms ⊕) <$> t')) )
      where (ms,s') = prefixScan s
            (mt,t') = prefixScan t

Continuing from above,

  prefixScan (B2 (s :# t))
≡  {- see above -}
  second (decode ∘ InR) (prefixScan (O (s :# t)))
≡  {- prefixScan focus from above -}
  second (decode ∘ InR)
    ( ms ⊕ mt
    , O (s' :# ((ms ⊕) <$> t')) )
        where (ms,s') = prefixScan s
              (mt,t') = prefixScan t
≡  {- definition of second on functions -}
    (ms ⊕ mt, (decode ∘ InR) (O (s' :# ((ms ⊕) <$> t')))) where ⋯
≡  {- (∘) -}
    (ms ⊕ mt, decode (InR (O (s' :# ((ms ⊕) <$> t'))))) where ⋯
≡  {- decode for B2 -}
    (ms ⊕ mt, B2 (s' :# ((ms ⊕) <$> t'))) where ⋯

This final form is as in Deriving parallel tree scans, changed for the new scan interface. The derivation saved some work in wrapping & unwrapping and method invocation, plus one of the two adjustment passes over the sub-trees. As explained above, this algorithm performs $Θ (n \log n)$ work.

I’ll leave suffixScan for you to do yourself.

Bottom-up

What happens if we switch from top-down to bottom-up binary trees? I’ll use T4 (though T5 would work as well):

data T4 a = L4 a | B4 (T4 (Pair a))

The leaf case is just as with T2 above, so let’s get right to branches.

  prefixScan (B4 t)
≡  {- as above -}
  second decode (prefixScan (encode (B4 t)))
≡  {- encode for L2 -}
  second decode (prefixScan (InR (O t)))
≡  {- prefixScan for (+) -}
  second decode (second InR (prefixScan (O t)))
≡  {- property of second -}
  second (decode ∘ InR) (prefixScan (O t))

As before, now focus on the prefixScan call.

  prefixScan (O t) =
≡  {- prefixScan for (∘) -}
 ( second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan
 ∘ unzip ∘ fmap prefixScan ∘ unO ) (O t)
≡  {- unO/O -}
  ( second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan
  ∘ unzip ∘ fmap prefixScan ) t
≡  {- prefixScan on Pair (derived above) -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan ∘ unzip)
    fmap (λ (a :# b) → (a ⊕ b, (∅ :# a))) t
≡  {- unzip/fmap -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan)
    ( fmap (λ (a :# b) → (a ⊕ b)) t
    , fmap (λ (a :# b) → (∅ :# a))   t )
≡  {- first on functions -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR)
    ( prefixScan (fmap (λ (a :# b) → (a ⊕ b)) t)
    , fmap (λ (a :# b) → (∅ :# a))   t )
≡  {- expand prefixScan -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR)
    ((mp,p'), fmap (λ (a :# b) → (∅ :# a)) t)
   where (mp,p') = prefixScan (fmap (λ (a :# b) → (a ⊕ b)) t)
≡  {- assocR -}
  (second (O ∘ fmap adjustL ∘ zip))
    (mp, (p', fmap (λ (a :# b) → (∅ :# a)) t))
   where ⋯
≡  {- second on functions -}
  (mp, (O ∘ fmap adjustL ∘ zip) (p', fmap (λ (a :# b) → (∅ :# a)) t))
    where ⋯
≡  {- fmap/zip/fmap -}
  (mp, O (liftA2 tweak p' t))
    where tweak s (a :# _) = adjustL (s, (∅ :# a))
          (mp,p') = prefixScan (fmap (λ (a :# b) → (a ⊕ b)) t)
≡  {- adjustL, then simplify -}
  (mp, O (liftA2 tweak p' t))
    where tweak s (a :# _) = (s :# s ⊕ a)
          (mp,p') = prefixScan (fmap (λ (a :# b) → (a ⊕ b)) t)

Now re-introduce the context of prefixScan (O t):

  prefixScan (B4 t)
≡  {- see above -}
  second (decode ∘ InR) (prefixScan (O t))
≡  {- see above -}
  second (decode ∘ InR)
    (mp, O (liftA2 tweak p' t))
      where ⋯
≡  {- decode for T4 -}
  (mp, B4 (liftA2 tweak p' t))
    where p = fmap (λ (e :# o) → (e ⊕ o)) t
          (mp,p') = prefixScan p
          tweak s (e :# _) = (s :# s ⊕ e)

Notice how much this bottom-up tree scan algorithm differs from the top-down algorithm derived above. In particular, there’s only one recursive tree scan (on a half-sized tree) instead of two, plus linear additional work, for a total of $Θ (n)$ work.

Guy Blelloch’s parallel scan algorithm

In Programming parallel algorithms, Guy Blelloch gives the following algorithm for parallel prefix scan, expressed in the parallel functional language NESL:

function scan(a) =
if #a ≡ 1 then [0]
else
  let es = even_elts(a);
      os = odd_elts(a);
      ss = scan({e+o: e in es; o in os})
  in interleave(ss,{s+e: s in ss; e in es})

This algorithm is nearly identical to the T4 scan algorithm above. I was very glad to find this route to Guy’s algorithm, which had been fairly mysterious to me. I mean, I could believe that the algorithm worked, but I had no idea how I might have discovered it myself. With the functor composition approach to scanning, I now see how Guy’s algorithm emerges as well as how it generalizes to other data structures.

Nested data types and parallelism

Most of the recursive algebraic data types that appear in Haskell programs are regular, meaning that the recursive instances are instantiated with the same type parameter as the containing type. For instance, a top-down tree of elements of type a is either a leaf or has two trees whose elements have that same type a. In contrast, in a bottom-up tree, the (single) recursively contained tree is over elements of type (a,a). Such non-regular data types are called "nested". The two tree scan algorithms above suggest to me that nested data types are particularly useful for efficient parallel algorithms.

Deriving parallel tree scans

Conal — Tue, 01 Mar 2011 20:41:09 +0000

The post Deriving list scans explored folds and scans on lists and showed how the usual, efficient scan implementations can be derived from simpler specifications.

Let’s see now how to apply the same techniques to scans over trees.

This new post is one of a series leading toward algorithms optimized for execution on massively parallel, consumer hardware, using CUDA or OpenCL.

Edits:

2011-03-01: Added clarification about "∅" and "(⊕)".
2011-03-23: corrected "linear-time" to "linear-work" in two places.

Trees

Our trees will be non-empty and binary:

data T a = Leaf a | Branch (T a) (T a)

instance Show a ⇒ Show (T a) where
  show (Leaf a)     = show a
  show (Branch s t) = "("++show s++","++show t++")"

Nothing surprising in the instances:

instance Functor T where
  fmap f (Leaf a)     = Leaf (f a)
  fmap f (Branch s t) = Branch (fmap f s) (fmap f t)

instance Foldable T where
  fold (Leaf a)     = a
  fold (Branch s t) = fold s ⊕ fold t

instance Traversable T where
  sequenceA (Leaf a)     = fmap Leaf a
  sequenceA (Branch s t) =
    liftA2 Branch (sequenceA s) (sequenceA t)

BTW, my type-setting software uses "∅" and "(⊕)" for Haskell’s "mempty" and "mappend".

Also handy will be extracting the first and last (i.e., leftmost and rightmost) leaves in a tree:

headT ∷ T a → a
headT (Leaf a)       = a
headT (s `Branch` _) = headT s

lastT ∷ T a → a
lastT (Leaf a)       = a
lastT (_ `Branch` t) = lastT t

Exercise: Prove that

headT ∘ fmap f ≡ f ∘ headT
lastT ∘ fmap f ≡ f ∘ lastT

Answer:

Consider the Leaf and Branch cases separately:

  headT (fmap f (Leaf a))
≡  {- fmap on T -}
  headT (Leaf (f a))
≡  {- headT def -}
  f a
≡  {- headT def -}
  f (headT (Leaf a))

  headT (fmap f (Branch s t))
≡  {- fmap on T -}
  headT (Branch (fmap f s) (fmap f t))
≡  {- headT def -}
  headT (fmap f s)
≡  {- induction -}
  f (headT s)
≡  {- headT def -}
  f (headT (Branch s t))

Similarly for lastT.

From lists to trees and back

We can flatten trees into lists:

flatten ∷ T a → [a]
flatten = fold ∘ fmap (:[])

Equivalently, using foldMap:

flatten = foldMap (:[])

Alternatively, we could define fold via flatten:

instance Foldable T where fold = fold ∘ flatten

flatten ∷ T a → [a]
flatten (Leaf a)     = [a]
flatten (Branch s t) = flatten s ++ flatten t

We can also "unflatten" lists into balanced trees:

unflatten ∷ [a] → T a
unflatten []  = error "unflatten: Oops! Empty list"
unflatten [a] = Leaf a
unflatten xs  = Branch (unflatten prefix) (unflatten suffix)
 where
   (prefix,suffix) = splitAt (length xs `div` 2) xs

Both flatten and unflatten can be implemented more efficiently.

For instance,

t1,t2 ∷ T Int
t1 = unflatten [1‥3]
t2 = unflatten [1‥16]

*T> t1
(1,(2,3))
*T> t2
((((1,2),(3,4)),((5,6),(7,8))),(((9,10),(11,12)),((13,14),(15,16))))

Specifying tree scans

Prefixes and suffixes

The post Deriving list scans gave specifications for list scanning in terms of inits and tails. One consequence of this specification is that the output of scanning has one more element than the input. Alternatively, we could use non-empty variants of inits and tails, so that the input & output are in one-to-one correspondence.

inits' ∷ [a] → [[a]]
inits' []     = []
inits' (x:xs) = map (x:) ([] : inits' xs)

The cons case can also be written as

inits' (x:xs) = [x] : map (x:) (inits' xs)

tails' ∷ [a] → [[a]]
tails' []         = []
tails' xs@(_:xs') = xs : tails' xs'

For instance,

*T> inits' "abcd"
["a","ab","abc","abcd"]
*T> tails' "abcd"
["abcd","bcd","cd","d"]

Our tree functor has a symmetric definition, so we get more symmetry in the counterparts to inits' and tails':

initTs ∷ T a → T (T a)
initTs (Leaf a)       = Leaf (Leaf a)
initTs (s `Branch` t) =
  Branch (initTs s) (fmap (s `Branch`) (initTs t))

tailTs ∷ T a → T (T a)
tailTs (Leaf a)       = Leaf (Leaf a)
tailTs (s `Branch` t) =
  Branch (fmap (`Branch` t) (tailTs s)) (tailTs t)

Try it:

*T> t1
(1,(2,3))
*T> initTs t1
(1,((1,2),(1,(2,3))))
*T> tailTs t1
((1,(2,3)),((2,3),3))

*T> unflatten [1‥5]
((1,2),(3,(4,5)))
*T> initTs (unflatten [1‥5])
((1,(1,2)),(((1,2),3),(((1,2),(3,4)),((1,2),(3,(4,5))))))
*T> tailTs (unflatten [1‥5])
((((1,2),(3,(4,5))),(2,(3,(4,5)))),((3,(4,5)),((4,5),5)))

Exercise: Prove that

lastT ∘ initTs ≡ id
headT ∘ tailTs ≡ id

Answer:

  lastT (initTs (Leaf a))
≡  {- initTs def -}
  lastT (Leaf (Leaf a))
≡  {- lastT def -}
  Leaf a

  lastT (initTs (s `Branch` t))
≡  {- initTs def -}
  lastT (Branch (⋯) (fmap (s `Branch`) (initTs t)))
≡  {- lastT def -}
  lastT (fmap (s `Branch`) (initTs t))
≡  {- lastT ∘ fmap f -}
  (s `Branch`) (lastT (initTs t))
≡  {- trivial -}
  s `Branch` lastT (initTs t)
≡  {- induction -}
  s `Branch` t

Scan specification

Now we can specify prefix & suffix scanning:

scanlT, scanrT ∷ Monoid a ⇒ T a → T a
scanlT = fmap fold ∘ initTs
scanrT = fmap fold ∘ tailTs

Try it out:

t3 ∷ T String
t3 = fmap (:[]) (unflatten "abcde")

*T> t3
(("a","b"),("c",("d","e")))
*T> scanlT t3
(("a","ab"),("abc",("abcd","abcde")))
*T> scanrT t3
(("abcde","bcde"),("cde",("de","e")))

To test on numbers, I’ll use a handy notation from Matt Hellige to add pre- and post-processing:

(↝) ∷ (a' → a) → (b → b') → ((a → b) → (a' → b'))
(f ↝ h) g = h ∘ g ∘ f

And a version specialized to functors:

(↝*) ∷ Functor f ⇒ (a' → a) → (b → b')
     → (f a → f b) → (f a' → f b')
f ↝* g = fmap f ↝ fmap g

t4 ∷ T Integer
t4 = unflatten [1‥6]

t5 ∷ T Integer
t5 = (Sum ↝* getSum) scanlT t4

Try it:

*T> t4
((1,(2,3)),(4,(5,6)))
*T> initTs t4
((1,((1,2),(1,(2,3)))),(((1,(2,3)),4),(((1,(2,3)),(4,5)),((1,(2,3)),(4,(5,6))))))
*T> t5
((1,(3,6)),(10,(15,21)))

Exercise: Prove that we have properties similar to the ones relating fold, scanlT, and scanrT on list:

fold ≡ lastT ∘ scanlT
fold ≡ headT ∘ scanrT

Answer:

  lastT ∘ scanlT
≡  {- scanlT spec -}
  lastT ∘ fmap fold ∘ initTs
≡  {- lastT ∘ fmap f -}
  fold ∘ lastT ∘ initTs
≡  {- lastT ∘ initTs -}
  fold

  headT ∘ scanrT 
≡  {- scanrT def -}
  headT ∘ fmap fold ∘ tailTs
≡  {- headT ∘ fmap f -}
  fold ∘ headT ∘ tailTs
≡  {- headT ∘ tailTs -}
  fold

For instance,

*T> fold t3
"abcde"
*T> (lastT ∘ scanlT) t3
"abcde"
*T> (headT ∘ scanrT) t3
"abcde"

Deriving faster scans

Recall the specifications:

scanlT = fmap fold ∘ initTs
scanrT = fmap fold ∘ tailTs

To derive more efficient implementations, proceed as in Deriving list scans. Start with prefix scan (scanlT), and consider the Leaf and Branch cases separately.

  scanlT (Leaf a)
≡  {- scanlT spec -}
  fmap fold (initTs (Leaf a))
≡  {- initTs def -}
  fmap fold (Leaf (Leaf a))
≡  {- fmap def -}
  Leaf (fold (Leaf a))
≡  {- fold def -}
  Leaf a

  scanlT (s `Branch` t)
≡  {- scanlT spec -}
  fmap fold (initTs (s `Branch` t))
≡  {- initTs def -}
  fmap fold (Branch (initTs s) (fmap (s `Branch`) (initTs t)))
≡  {- fmap def -}
   Branch (fmap fold (initTs s)) (fmap fold (fmap (s `Branch`) (initTs t)))
≡  {- scanlT spec -}
  Branch (scanlT s) (fmap fold (fmap (s `Branch`) (initTs t)))
≡  {- functor law -}
  Branch (scanlT s) (fmap (fold ∘ (s `Branch`)) (initTs t))
≡  {- rework as λ -}
  Branch (scanlT s) (fmap (λ t' → fold (s `Branch` t')) (initTs t))
≡  {- fold def -}
  Branch (scanlT s) (fmap (λ t' → fold s ⊕ fold t')) (initTs t))
≡  {- rework λ -}
  Branch (scanlT s) (fmap ((fold s ⊕) ∘ fold) (initTs t))
≡  {- functor law -}
  Branch (scanlT s) (fmap (fold s ⊕) (fmap fold (initTs t)))
≡  {- scanlT spec -}
  Branch (scanlT s) (fmap (fold s ⊕) (scanlT t))
≡  {- lastT ∘ scanlT ≡ fold -}
  Branch (scanlT s) (fmap (lastT (scanlT s) ⊕) (scanlT t))
≡  {- factor out defs -}
  Branch s' (fmap (lastT s' ⊕) t')
     where s' = scanlT s
           t' = scanlT t

Suffix scan has a similar derivation.

  scanrT (Leaf a)
≡  {- scanrT def -}
  fmap fold (tailTs (Leaf a))
≡  {- tailTs def -}
  fmap fold (Leaf (Leaf a))
≡  {- fmap on T -}
  Leaf (fold (Leaf a))
≡  {- fold def -}
  Leaf a

  scanrT (s `Branch` t)
≡  {- scanrT spec -}
  fmap fold (tailTs (s `Branch` t))
≡  {- tailTs def -}
  fmap fold (Branch (fmap (`Branch` t) (tailTs s)) (tailTs t))
≡  {- fmap def -}
  Branch (fmap fold (fmap (`Branch` t) (tailTs s))) (fmap fold (tailTs t))
≡  {- scanrT spec -}
  Branch (fmap fold (fmap (`Branch` t) (tailTs s))) (scanrT t)
≡  {- functor law -}
  Branch (fmap (fold ∘ (`Branch` t)) (tailTs s)) (scanrT t)
≡  {- rework as λ -}
  Branch (fmap (λ s' → fold (s' `Branch` t)) (tailTs s)) (scanrT t)
≡  {- functor law -}
  Branch (fmap (λ s' → fold s' ⊕ fold t) (tailTs s)) (scanrT t)
≡  {- rework λ -}
  Branch (fmap ((⊕ fold t) ∘ fold) (tailTs s)) (scanrT t)
≡  {- scanrT spec -}
  Branch (fmap (⊕ fold t) (scanrT s)) (scanrT t)
≡  {- headT ∘ scanrT -}
  Branch (fmap (⊕ headT (scanrT t)) (scanrT s)) (scanrT t)
≡  {- factor out defs -}
  Branch (fmap (⊕ headT t') s') t'
    where s' = scanrT s
          t' = scanrT t

Extract code from these derivations:

scanlT' ∷ Monoid a ⇒ T a → T a
scanlT' (Leaf a)       = Leaf a
scanlT' (s `Branch` t) =
  Branch s' (fmap (lastT s' ⊕) t')
     where s' = scanlT' s
           t' = scanlT' t

scanrT' ∷ Monoid a ⇒ T a → T a
scanrT' (Leaf a)       = Leaf a
scanrT' (s `Branch` t) =
  Branch (fmap (⊕ headT t') s') t'
    where s' = scanrT' s
          t' = scanrT' t

Try it:

*T> t3
(("a","b"),("c",("d","e")))
*T> scanlT' t3
(("a","ab"),("abc",("abcd","abcde")))
*T> scanrT' t3
(("abcde","bcde"),("cde",("de","e")))

Efficiency

Although I was just following my nose, without trying to get anywhere in particular, this result is exactly the algorithm I first thought of when considering how to parallelize tree scanning.

Let’s now consider the running time of this algorithm. Assume that the tree is balanced, to maximize parallelism. (I think balancing is optimal for parallelism here, but I’m not certain.)

For a tree with $n$ leaves, the work $W n$ will be constant when $n = 1$ and $2 \cdot W (n / 2) + n$ when $n > 1$ . Using the Master Theorem (explained more here), $W n = Θ (n \log n)$ .

This result is disappointing, since scanning can be done with linear work by threading a single accumulator while traversing the input tree and building up the output tree.

I’m using the term "work" instead of "time" here, since I’m not assuming sequential execution.

We have a parallel algorithm that performs $n \log n$ work, and a sequential program that performs linear work. Can we construct a linear-parallel algorithm?

Yes. Guy Blelloch came up with a clever linear-work parallel algorithm, which I’ll derive in another post.

Generalizing `head` and `last`

Can we replace the ad hoc (tree-specific) headT and lastT functions with general versions that work on all foldables? I’d want the generalization to also generalize the list functions head and last or, rather, to total variants (ones that cannot error due to empty list). For totality, provide a default value for when there are no elements.

headF, lastF ∷ Foldable f ⇒ a → f a → a

I also want these functions to be as efficient on lists as head and last and as efficient on trees as headT and lastT.

The First and Last monoids provide left-biased and right-biased choice. They’re implemented as newtype wrappers around Maybe:

newtype First a = First { getFirst ∷ Maybe a }

instance Monoid (First a) where
  ∅ = First Nothing
  r@(First (Just _)) ⊕ _ = r
  First Nothing      ⊕ r = r

newtype Last a = Last { getLast ∷ Maybe a }

instance Monoid (Last a) where
  ∅ = Last Nothing
  _ ⊕ r@(Last (Just _)) = r
  r ⊕ Last Nothing      = r

For headF, embed all of the elements into the First monoid (via First ∘ Just), fold over the result, and extract the result, using the provided default value in case there are no elements. Similarly for lastF.

headF dflt = fromMaybe dflt ∘ getFirst ∘ foldMap (First ∘ Just)
lastF dflt = fromMaybe dflt ∘ getLast  ∘ foldMap (Last  ∘ Just)

For instance,

*T> headF 3 [1,2,4,8]
1
*T> headF 3 []
3

When our elements belong to a monoid, we can use ∅ as the default:

headFM ∷ (Foldable f, Monoid m) ⇒ f m → m
headFM = headF ∅

lastFM ∷ (Foldable f, Monoid m) ⇒ f m → m
lastFM = headF ∅

For instance,

*T> lastFM ([] ∷ [String])
""

Using headFM and lastFM in place of headT and lastT, we can easily handle addition of an Empty case to our tree functor in this post. The key choice is that fold Empty ≡ ∅ and fmap _ Empty ≡ Empty. Then headFM will choose the first leaf, and lastT

What about efficiency? Because headF and lastF are defined via foldMap, which is a composition of fold and fmap, one might think that we have to traverse the entire structure when used with functors like [] or T.

Laziness saves us, however, and we can even extract the head of an infinite list or a partially defined one. For instance,

  foldMap (First ∘ Just) [5 ‥]
≡ foldMap (First ∘ Just) (5 : [6 ‥])
≡ First (Just 5) ⊕ foldMap (First ∘ Just) [6 ‥]
≡ First (Just 5)

  headF d [5 ‥]
≡ fromMaybe d (getFirst (foldMap (First ∘ Just) [5 ‥]))
≡ fromMaybe d (getFirst (First (Just 5)))
≡ fromMaybe d (Just 5)
≡ 5

And, sure enough,

*T> foldMap (First ∘ Just) [5 ‥]
First {getFirst = Just 5}
*T> headF ⊥ [5 ‥]
5

Where to go from here?

As mentioned above, the derived scanning implementations perform asymtotically more work than necessary. Future posts explore how to derive parallel-friendly, linear-work algorithms. Then we’ll see how to transform the parallel-friendly algorithms so that they work destructively, overwriting their input as they go, and hence suitably for execution entirely in CUDA or OpenCL.
The functions initTs and tailTs are still tree-specific. To generalize the specification and derivation of list and tree scanning, find a way to generalize these two functions. The types of initTs and tailTs fit with the duplicate method on comonads. Moreover, tails is the usual definition of duplicate on lists, and I think inits would be extend for "snoc lists". For trees, however, I don’t think the correspondence holds. Am I missing something?
In particular, I want to extend the derivation to depth-typed, perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The functions initTs and tailTs make unbalanced trees out of balanced ones, so I don’t know how to adapt the specifications given here to the setting of depth-typed balanced trees. Maybe I could just fill up the to-be-ignored elements with ∅.

Deriving list scans

Conal — Tue, 22 Feb 2011 20:42:40 +0000

I’ve been playing with deriving efficient parallel, imperative implementations of "prefix sum" or more generally "left scan". Following posts will explore the parallel & imperative derivations, but as a warm-up, I’ll tackle the functional & sequential case here.

Folds

You’re probably familiar with the higher-order functions for left and right "fold". The current documentation says:

foldl, applied to a binary operator, a starting value (typically the left-identity of the operator), and a list, reduces the list using the binary operator, from left to right:
foldl f z [x1, x2, ⋯, xn] ≡ (⋯((z `f` x1) `f` x2) `f`⋯) `f` xn
The list must be finite.
foldr, applied to a binary operator, a starting value (typically the right-identity of the operator), and a list, reduces the list using the binary operator, from right to left:
foldr f z [x1, x2, ⋯, xn] ≡ x1 `f` (x2 `f` ⋯ (xn `f` z)⋯)

And here are typical definitions:

foldl ∷ (b → a → b) → b → [a] → b
foldl f z []     = z
foldl f z (x:xs) = foldl f (z `f` x) xs

foldr ∷ (a → b → b) → b → [a] → b
foldr f z []     = z
foldr f z (x:xs) = x `f` foldr f z xs

Notice that foldl builds up its result one step at a time and reveals it all at once, in the end. The whole result value is locked up until the entire input list has been traversed. In contrast, foldr starts revealing information right away, and so works well with infinite lists. Like foldl, foldr also yields only a final value.

Sometimes it's handy to also get to all of the intermediate steps. Doing so takes us beyond the land of folds to the kingdom of scans.

Scans

The scanl and scanr functions correspond to foldl and foldr but produce all intermediate accumulations, not just the final one.

scanl ∷ (b → a → b) → b → [a] → [b]

scanl f z [x1, x2,  ⋯ ] ≡ [z, z `f` x1, (z `f` x1) `f` x2, ⋯]

scanr ∷ (a → b → b) → b → [a] → [b]

scanr f z [⋯, xn_1, xn] ≡ [⋯, xn_1 `f` (xn `f` z), xn `f` z, z]

As you might expect, the last value is the complete left fold, and the first value in the scan is the complete right fold:

last (scanl f z xs) ≡ foldl f z xs
head (scanr f z xs) ≡ foldr f z xs

which is to say

last ∘ scanl f z ≡ foldl f z
head ∘ scanr f z ≡ foldr f z

The standard scan definitions are trickier than the fold definitions:

scanl ∷ (b → a → b) → b → [a] → [b]
scanl f z ls = z : (case ls of
                     []   → []
                     x:xs → scanl f (z `f` x) xs)

scanr ∷ (a → b → b) → b → [a] → [b]
scanr _ z []     = [z]
scanr f z (x:xs) = (x `f` q) : qs
                   where qs@(q:_) = scanr f z xs

Every time I encounter these definitions, I have to walk through it again to see what's going on. I finally sat down to figure out how these tricky definitions might emerge from simpler specifications. In other words, how to derive these definitions systematically from simpler but less efficient definitions.

Most likely, these derivations have been done before, but I learned something from the effort, and I hope you do, too.

Specifying scans

As I pointed out above, the last element of a left scan is the left fold over the whole list. In fact, all of the elements are left folds, but over prefixes of the list. Similarly, all of the elements of a right-scan are right folds, but over suffixes of the list. These observations give rise to very simple specifications for scanl and scanr:

scanl f z xs = map (foldl f z) (inits xs)
scanr f z xs = map (foldr f z) (tails xs)

Equivalently,

scanl f z = map (foldl f z) ∘ inits
scanr f z = map (foldr f z) ∘ tails

Here I'm using the standard inits and tails functions from Data.List, documented as follows:

The inits function returns all initial segments of the argument, shortest first. For example,
inits "abc" ≡ ["","a","ab","abc"]
The tails function returns all final segments of the argument, longest first. For example,
tails "abc" ≡ ["abc", "bc", "c",""]

The definitions:

inits ∷ [a] → [[a]]
inits []     = [[]]
inits (x:xs) = [[]] ++ map (x:) (inits xs)

tails ∷ [a] → [[a]]
tails []         = [[]]
tails xs@(_:xs') = xs : tails xs'

This definition of inits is stricter than necessary, as it examines its argument before emitting anything but ends up emitting an initial empty list whether the argument is nil or a cons. Here's a lazier definition:

inits xs = [] : case xs of
                  []     → []
                  (x:xs) → map (x:) (inits xs)

This second version produces the initial [] before examining its argument, which helps to avoid deadlock in some recursive contexts.

These specifications of scanl and scanr make it very easy to prove the properties given above that

last ∘ scanl f z ≡ foldl f z
head ∘ scanr f z ≡ foldr f z

(Hint: use the fact that last ∘ inits ≡ id, and head ∘ tails ≡ id.)

Although these specifications of scanl (via map, foldl/foldr, and inits/tails) state succinctly and simply what scanl and scanr compute, they are terrible recipes for how, because they perform quadratic work instead of linear.

But that's okay, because now we're going to see how to turn these inefficient & clear specifications into efficient & less clear implementations.

Deriving efficient scans

To derive efficient scans, use the specifications and perform some simplifications.

Divide scanl into empty lists and nonempty lists, starting with empty:

  scanl f z []
≡ {- scanl spec -}
  map (foldl f z) (inits [])
≡ {- inits def -}
  map (foldl f z) [[]]
≡ {- map def -}
  [foldl f z []]
≡ {- foldl def -}
  [z]

  scanl f z (x:xs)
≡ {- scanl spec -}
  map (foldl f z) (inits (x:xs))
≡ {- inits def -}
  map (foldl f z) ([] : map (x:) (inits xs))
≡ {- map def -}
  foldl f z [] : map (foldl f z) (map (x:) (inits xs))
≡ {- foldl def -}
  z : map (foldl f z) (map (x:) (inits xs))
≡ {- map g ∘ map f ≡ map (g ∘ f) -}
  z : map (foldl f z ∘ (x:)) (inits xs)
≡ {- (∘) def -}
  z : map (λ ys → foldl f z (x:ys)) (inits xs)
≡ {- foldl def -}
  z : map (λ ys → foldl f (z `f` x) ys)) (inits xs)
≡ {- η reduction -}
  z : map (foldl f (z `f` x))) (inits xs)
≡ {- scanl spec -}
  z : scanl f (z `f` x) xs

Combine these conclusions and factor out the common (z : ) to yield the standard "optimized" definition.

Does scanr work out similarly? Let's find out, replacing scanl with scanr, foldl with foldr, and inits with tails:

  scanr f z []
≡ {- scanr spec -}
  map (foldr f z) (tails [])
≡ {- tails def -}
  map (foldr f z) ([[]])
≡ {- map def -}
  [foldr f z []]
≡ {- foldr def -}
  [z]

The derivation for nonempty lists deviates from scanl, due to differences between inits and tails, but it all works out nicely.

  scanr f z (x:xs)
≡ {- scanr spec -}
  map (foldr f z) (tails (x:xs))
≡ {- tails def -}
  map (foldr f z) ((x:xs) : tails xs)
≡ {- map def -}
  foldr f z (x:xs) : map (foldr f z) (tails xs)
≡ {- scanr spec -}
  foldr f z (x:xs) : scanr f z xs
≡ {- foldr def -}
  (x `f` foldr f z xs) : scanr f z xs
≡ {- head/scanr property -}
  (x `f` head (scanr f z xs)) : scanr f z xs
≡ {- factor out shared expression -}
  (x `f` head qs) : qs where qs = scanr f z xs
≡ {- stylistic variation -}
  (x `f` q) : qs where qs@(q:_) = scanr f z xs

Coming attractions

The scan implementations above are thoroughly sequential, in that they thread a single linear chain of data dependencies throughout the computation. Upcoming posts will look at more parallel-friendly variations.

Conal Elliott » program derivation

Parallel tree scanning by composition

Scanning via functor combinators

Lists

Binary Trees

Top-down trees

Composition

Bottom-up trees

Scanning pairs

Scanning trees

Looking deeper

Top-down

Bottom-up

Guy Blelloch’s parallel scan algorithm

Nested data types and parallelism

Deriving parallel tree scans

Trees

From lists to trees and back

Specifying tree scans

Prefixes and suffixes

Scan specification

Deriving faster scans

Efficiency

Generalizing head and last

Where to go from here?

Deriving list scans

Folds

Scans

Specifying scans

Deriving efficient scans

Coming attractions

Generalizing `head` and `last`