Conal Elliott » functor

A third view on trees

Conal — Sat, 04 Jun 2011 02:46:20 +0000

A few recent posts have played with trees from two perspectives. The more commonly used I call "top-down", because the top-level structure is most immediately apparent. A top-down binary tree is either a leaf or a pair of such trees, and that pair can be accessed without wading through intervening structure. Much less commonly used are "bottom-up" trees. A bottom-up binary tree is either a leaf or a single such tree of pairs. In the non-leaf case, the pair structure of the tree elements is accessible by operations like mapping, folding, or scanning. The difference is between a pair of trees and a tree of pairs.

As an alternative to the top-down and bottom-up views on trees, I now want to examine a third view, which is a hybrid of the two. Instead of pairs of trees or trees of pairs, this hybrid view is of trees of trees, and more specifically of bottom-up trees of top-down trees. As we’ll see, these hybrid trees emerge naturally from the top-down and bottom-up views. A later post will show how this third view lends itself to an in-place (destructive) scan algorithm, suitable for execution on modern GPUs.

Edits:

2011-06-04: "Suppose we have a bottom-up tree of top-down trees, i.e., t ∷ TB (TT a). Was backwards. (Thanks to Noah Easterly.)
2011-06-04: Notation: "f ➶ n" and "f ➴ n".

The post Parallel tree scanning by composition defines "top-down" and a "bottom-up" binary trees as follows (modulo type and constructor names):

data TT a = LT a | BT { unBT ∷ Pair (TT a) } deriving Functor

data TB a = LB a | BB { unBB ∷ TB (Pair a) } deriving Functor

So, while a non-leaf TT (top-down tree) has a pair at the top (outside), a non-leaf TB (bottom-up tree) has pairs at the bottom (inside).

Combining these two observations leads to an interesting possibility. Suppose we have a bottom-up tree of top-down trees, i.e., t ∷ TB (TT a). If t is not a leaf, then t ≡ BB tt where tt is a bottom-up tree whose leaves are pairs of top-down trees, i.e., tt ∷ TB (Pair (TT a)). Each of those leaves of type Pair (TT a) can be converted to type TT a (single tree), simply by applying the BT constructor. Moreover, this transformation is invertible. For convenience, define a type alias for hybrid trees:

type TH a = TB (TT a)

Then the two conversions:

upT   ∷ TH a → TH a
upT   = fmap BT ∘ unBB

downT ∷ TH a → TH a
downT = BB ∘ fmap unBT

Exercise: Prove upT and downT are inverses where defined.

Answer:

  upT ∘ downT
≡ fmap BT ∘ unBB ∘ BB ∘ fmap unBT
≡ fmap BT ∘ fmap unBT
≡ fmap (BT ∘ unBT)
≡ fmap id
≡ id

  downT ∘ upT
≡ BB ∘ fmap unBT ∘ fmap BT ∘ unBB
≡ BB ∘ fmap (unBT ∘ BT) ∘ unBB
≡ BB ∘ fmap id ∘ unBB
≡ BB ∘ id ∘ unBB
≡ BB ∘ unBB
≡ id

Consider a perfect binary leaf tree of depth $n$ , i.e., an $n$ -deep binary tree with each level full and data only at the leaves (where a leaf is depth $0$ tree.) We can view such a tree as top-down, or bottom-up, or as a hybrid.

Each of these three views is really $n + 1$ views:

Top-down: a depth $n$ tree, or a pair of depth $n - 1$ trees, or a pair of pairs of depth $n - 2$ trees, etc.
Bottom-up: a depth $n$ tree, or a depth $n - 1$ tree of pairs, or a depth $n - 2$ tree of pairs of pairs, etc.
Hybrid: a depth $n$ tree of depth $0$ trees, or a depth $n - 1$ tree of depth $1$ trees, or, …, or a depth $0$ tree of depth $n$ trees.

In the hybrid case, counting from $0$ to $n$ , the $k^{t h}$ such view is a depth $n - k$ bottom-up tree whose elements (leaf values) are depth $k$ top-down trees. When $k = n$ , we have a bottom-up tree whose leaves are all single-leaf trees, and when $k = 0$ , we have a single-leaf bottom-up tree containing a top-down tree. Imagine a horizontal line at depth $k$ , dividing the bottom-up outer structure from the top-down inner structure. The downT function moves the dividing line downward, and the upT function moves the line upward. Both functions are partial.

Generalizing

The role of Pair in the tree types above is simple and regular. We can abstract out this particular type constructor, generalizing to an arbitrary functor. I’ll call this generalization "functor trees". Again, there are top-down and bottom-up versions:

data FT f a = FLT a | FBT { unFBT ∷ f (FT f a) } deriving Functor

data FB f a = FLB a | FBB { unFBB ∷ FB f (f a) } deriving Functor

And a hybrid version, with generalized versions of upT and downT:

type FH f a = FB f (FT f a)

upH   ∷ Functor f ⇒ FH f a → FH f a
upH   = fmap FBT ∘ unFBB

downH ∷ Functor f ⇒ FH f a → FH f a
downH = FBB ∘ fmap unFBT

These definitions specialize to the ones (for binary trees) by substituting Pair for the parameter f.

Depth-typing

The upward and downward view-changing functions above are partial, as they can fail at extreme tree views (at depth $0$ or $n$ ). We could make this partiality explicit by changing the result type to Maybe (TH a) for binary hybrid trees and to Maybe (FH f a) for the functor generalization. Alternatively, make the tree sizes explicit in the types, as in a few recent posts, including A trie for length-typed vectors. (In those posts, I used the terms "right-folded" and "left-folded" in place of "top-down" and "bottom-up", reflecting the right- or left-folding of functor composition. The "folded" terms led to some confusion, especially in the context of data type folds and scans.) In the depth-typed versions, "leaves" are zero-ary compositions, and "branches" are $(m + 1)$ -ary compositions for some $m$ .

Top-down:

data (➴) ∷ (* → *) → * → (* → *) where
  ZeroT ∷ a → (f ➴ Z) a
  SuccT ∷ IsNat n ⇒ f ((f ➴ n) a) → (f ➴ S n) a

unZeroT ∷ (f ➴ Z) a → a
unZeroT (ZeroT a) = a

unSuccT ∷ (f ➴ S n) a → f ((f ➴ n) a)
unSuccT (SuccT fsa) = fsa

instance Functor f ⇒ Functor (f ➴ n) where
  fmap h (ZeroT a)  = ZeroT (h a)
  fmap h (SuccT fs) = SuccT ((fmap∘fmap) h fs)

Bottom-up:

data (➶) ∷ (* → *) → * → (* → *) where
  ZeroB ∷ a → (f ➶ Z) a
  SuccB ∷ IsNat n ⇒ (f ➶ n) (f a) → (f ➶ S n) a

unZeroB ∷ (f ➶ Z) a → a
unZeroB (ZeroB a) = a

unSuccB ∷ (f ➶ S n) a → (f ➶ n) (f a)
unSuccB (SuccB fsa) = fsa

instance Functor f ⇒ Functor (f ➶ n) where
  fmap h (ZeroB a)  = ZeroB (h a)
  fmap h (SuccB fs) = SuccB ((fmap∘fmap) h fs)

Hybrid:

type H p q f a = (f ➶ p) ((f ➴ q) a)

Upward and downward shift become total functions, and their types explicitly describe how the line shifts between $(p + 1) / q$ and $p / (q + 1)$ :

up   ∷ (Functor f, IsNat q) ⇒ H (S p) q f a → H p (S q) f a
up   = fmap SuccT ∘ unSuccB

down ∷ (Functor f, IsNat p) ⇒ H p (S q) f a → H (S p) q f a
down = SuccB ∘ fmap unSuccT

So what?

Why care about the multitude of views on trees?

It’s pretty.
A future post will show how these hybrid trees enable an elegant formulation of parallel scanning that lends itself to an in-place, GPU-friendly implementation.

Parallel tree scanning by composition

Conal — Tue, 24 May 2011 20:31:23 +0000

My last few blog posts have been on the theme of scans, and particularly on parallel scans. In Composable parallel scanning, I tackled parallel scanning in a very general setting. There are five simple building blocks out of which a vast assortment of data structures can be built, namely constant (no value), identity (one value), sum, product, and composition. The post defined parallel prefix and suffix scan for each of these five "functor combinators", in terms of the same scan operation on each of the component functors. Every functor built out of this basic set thus has a parallel scan. Functors defined more conventionally can be given scan implementations simply by converting to a composition of the basic set, scanning, and then back to the original functor. Moreover, I expect this implementation could be generated automatically, similarly to GHC’s DerivingFunctor extension.

Now I’d like to show two examples of parallel scan composition in terms of binary trees, namely the top-down and bottom-up variants of perfect binary leaf trees used in previous posts. (In previous posts, I used the terms "right-folded" and "left-folded" instead of "top-down" and "bottom-up".) The resulting two algorithms are expressed nearly identically, but have differ significantly in the work performed. The top-down version does $Θ (n \log n)$ work, while the bottom-up version does only $Θ (n)$ , and thus the latter algorithm is work-efficient, while the former is not. Moreover, with a very simple optimization, the bottom-up tree algorithm corresponds closely to Guy Blelloch’s parallel prefix scan for arrays, given in Programming parallel algorithms. I’m delighted with this result, as I had been wondering how to think about Guy’s algorithm.

Edit:

2011-05-31: Added Scan and Applicative instances for T2 and T4.

Scanning via functor combinators

In Composable parallel scanning, we saw the Scan class:

class Scan f where
  prefixScan, suffixScan ∷ Monoid m ⇒ f m → (m, f m)

Given a structure of values, the prefix and suffix scan methods generate the overall fold (of type m), plus a structure of the same type as the input. (In contrast, the usual Haskell scanl and scanr functions on lists yield a single list with one more element than the source list. I changed the interface for generality and composability.) The post gave instances for the basic set of five functor combinators.

Most functors are not defined via the basic combinators, but as mentioned above, we can scan by conversion to and from the basic set. For convenience, encapsulate this conversion in a type class:

class EncodeF f where
  type Enc f ∷ * → *
  encode ∷ f a → Enc f a
  decode ∷ Enc f a → f a

and define scan functions via EncodeF:

prefixScanEnc, suffixScanEnc ∷
  (EncodeF f, Scan (Enc f), Monoid m) ⇒ f m → (m, f m)
prefixScanEnc = second decode ∘ prefixScan ∘ encode
suffixScanEnc = second decode ∘ suffixScan ∘ encode

Lists

As a first example, consider

instance EncodeF [] where
  type Enc [] = Const () + Id × []
  encode [] = InL (Const ())
  encode (a : as) = InR (Id a × as)
  decode (InL (Const ())) = []
  decode (InR (Id a × as)) = a : as

And declare a boilerplate Scan instance via EncodeF:

instance Scan [] where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

I haven’t checked the details, but I think with this instance, suffix scanning has okay performance, while prefix scan does quadratic work. The reason is the in the Scan instance for products, the two components are scanned independently (in parallel), and then the whole second component is adjusted for prefixScan, while the whole first component is adjusted for suffixScan. In the case of lists, the first component is the list head, and second component is the list tail.

For your reading convenience, here’s that Scan instance again:

instance (Scan f, Scan g, Functor f, Functor g) ⇒ Scan (f × g) where
  prefixScan (fa × ga) = (af ⊕ ag, fa' × ((af ⊕) <$> ga'))
   where (af,fa') = prefixScan fa
         (ag,ga') = prefixScan ga

  suffixScan (fa × ga) = (af ⊕ ag, ((⊕ ag) <$> fa') × ga')
   where (af,fa') = suffixScan fa
         (ag,ga') = suffixScan ga

The lop-sidedness of the list type thus interferes with parallelization, and makes the parallel scans perform much worse than cumulative sequential scans.

Let’s next look at a more balanced type.

Binary Trees

We’ll get better parallel performance by organizing our data so that we can cheaply partition it into roughly equal pieces. Tree types allows such partitioning.

Top-down trees

We’ll try a few variations, starting with a simple binary tree.

data T1 a = L1 a | B1 (T1 a) (T1 a) deriving Functor

Encoding and decoding is straightforward:

instance EncodeF T1 where
  type Enc T1 = Id + T1 × T1
  encode (L1 a)   = InL (Id a)
  encode (B1 s t) = InR (s × t)
  decode (InL (Id a))  = L1 a
  decode (InR (s × t)) = B1 s t

instance Scan T1 where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

Note that these definitions could be generated automatically from the data type definition.

For balanced trees, prefix and suffix scan divide the problem in half at each step, solve each half, and do linear work to patch up one of the two halves. Letting $n$ be the number of elements, and $W (n)$ the work, we have the recurrence $W (n) = 2 W (n / 2) + c n$ for some constant factor $c$ . By the Master theorem, therefore, the work done is $Θ (n \log n)$ . (Use case 2, with $a = b = 2$ , $f (n) = c n$ , and $k = 0$ .)

Again assuming a balanced tree, the computation dependencies have logarithmic depth, so the ideal parallel running time (assuming sufficient processors) is $Θ (\log n)$ . Thus we have an algorithm that is depth-efficient (modulo constant factors) but work-inefficient.

Composition

A binary tree as defined above is either a leaf or a pair of binary trees. We can make this pair-ness more explicit with a reformulation:

data T2 a = L2 a | B2 (Pair (T2 a)) deriving Functor

where Pair, as in Composable parallel scanning, is defined as

data Pair a = a :# a deriving Functor

or even

type Pair = Id × Id

For encoding and decoding, we could use the same representation as with T1, but let’s instead use a more natural one for the definition of T2:

instance EncodeF T2 where
  type Enc T2 = Id + Pair ∘ T2
  encode (L2 a)  = InL (Id a)
  encode (B2 st) = InR (O st)
  decode (InL (Id a)) = L2 a
  decode (InR (O st)) = B2 st

Boilerplate scanning:

instance Scan T2 where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

for which we’ll need an applicative instance:

instance Applicative T2 where
  pure = L2
  L2 f <*> L2 x = L2 (f x)
  B2 (fs :# gs) <*> B2 (xs :# ys) = B2 ((fs <*> xs) :# (gs <*> ys))
  _ <*> _ = error "T2 (<*>): structure mismatch"

The O constructor is for functor composition.

With a small change to the tree type, we can make the composition of Pair and T more explicit:

data T3 a = L3 a | B3 ((Pair ∘ T3) a) deriving Functor

Then the conversion becomes even simpler, since there’s no need to add or remove O wrappers:

instance EncodeF T3 where
  type Enc T3 = Id + Pair ∘ T3
  encode (L3 a)  = InL (Id a)
  encode (B3 st) = InR st
  decode (InL (Id a)) = L3 a
  decode (InR st)     = B3 st

Bottom-up trees

In the formulations above, a non-leaf tree consists of a pair of trees. I’ll call these trees "top-down", since visible pair structure begins at the top.

With a very small change, we can instead use a tree of pairs:

data T4 a = L4 a | B4 (T4 (Pair a)) deriving Functor

Again an applicative instance allows a standard Scan instance:

instance Scan T4 where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

instance Applicative T4 where
  pure = L4
  L4 f   <*> L4 x   = L4 (f x)
  B4 fgs <*> B4 xys = B4 (liftA2 h fgs xys)
   where h (f :# g) (x :# y) = f x :# g y
  _ <*> _ = error "T4 (<*>): structure mismatch"

or a more explicitly composed form:

data T5 a = L5 a | B5 ((T5 ∘ Pair) a) deriving Functor

I’ll call these new variations "bottom-up" trees, since visible pair structure begins at the bottom. After stripping off the branch constructor, B4, we can get at the pair-valued leaves by means of fmap, fold, or traverse (or variations). For B5, we’d also have to strip off the O wrapper (functor composition).

Encoding is nearly the same as with top-down trees. For instance,

instance EncodeF T4 where
  type Enc T4 = Id + T4 ∘ Pair
  encode (L4 a) = InL (Id a)
  encode (B4 t) = InR (O t)
  decode (InL (Id a)) = L4 a
  decode (InR (O t))  = B4 t

Scanning pairs

We’ll need to scan on the Pair functor. If we use the definition of Pair above in terms of Id and (×), then we’ll get scanning for free. For using Pair, I find the explicit data type definition above more convenient. We can then derive a Scan instance by conversion. Start with a standard specification:

data Pair a = a :# a deriving Functor

And encode & decode explicitly:

instance EncodeF Pair where
  type Enc Pair = Id × Id
  encode (a :# b) = Id a × Id b
  decode (Id a × Id b) = a :# b

Then use our boilerplate Scan instance for EncodeF instances:

instance Scan Pair where
  prefixScan = prefixScanEnc
  suffixScan = suffixScanEnc

We’ve seen the Scan instance for (×) above. The instance for Id is very simple:

newtype Id a = Id a

instance Scan Id where
  prefixScan (Id m) = (m, Id ∅)
  suffixScan        = prefixScan

Given these definitions, we can calculate a more streamlined Scan instance for Pair:

  prefixScan (a :# b)
≡  {- specification -}
  prefixScanEnc (a :# b)
≡  {- prefixScanEnc definition -}
  (second decode ∘ prefixScan ∘ encode) (a :# b)
≡  {- (∘) -}
  second decode (prefixScan (encode (a :# b)))
≡  {- encode definition for Pair -}
  second decode (prefixScan (Id a × Id b))
≡  {- prefixScan definition for f × g -}
  second decode
    (af ⊕ ag, fa' × ((af ⊕) <$> ga'))
     where (af,fa') = prefixScan (Id a)
           (ag,ga') = prefixScan (Id b)
≡  {- Definition of second on functions -}
  (af ⊕ ag, decode (fa' × ((af ⊕) <$> ga')))
   where (af,fa') = prefixScan (Id a)
         (ag,ga') = prefixScan (Id b)
≡  {- prefixScan definition for Id -}
  (af ⊕ ag, decode (fa' × ((af ⊕) <$> ga')))
   where (af,fa') = (a, Id ∅)
         (ag,ga') = (b, Id ∅)
≡  {- substitution -}
  (a ⊕ b, decode (Id ∅ × ((a ⊕) <$> Id ∅)))
≡  {- fmap/(<$>) for Id -}
  (a ⊕ b, decode (Id ∅ × Id (a ⊕ ∅)))
≡  {- Monoid law -}
  (a ⊕ b, decode (Id ∅ × Id a))
≡  {- decode definition on Pair -}
  (a ⊕ b, (∅ :# a))

Whew! And similarly for suffixScan.

Now let’s recall the Scan instance for Pair given in Composable parallel scanning:

instance Scan Pair where
  prefixScan (a :# b) = (a ⊕ b, (∅ :# a))
  suffixScan (a :# b) = (a ⊕ b, (b :# ∅))

Hurray! The derivation led us to the same definition. A "sufficiently smart" compiler could do this derivation automatically.

With this warm-up derivation, let’s now turn to trees.

Scanning trees

Given the tree encodings above, how does scan work? We’ll have to consult Scan instances for some of the functor combinators. The product instance is repeated above. We’ll also want the instances for sum and composition. Omitting the suffixScan definitions for brevity:

data (f + g) a = InL (f a) | InR (g a)

instance (Scan f, Scan g) ⇒ Scan (f + g) where
  prefixScan (InL fa) = second InL (prefixScan fa)
  prefixScan (InR ga) = second InR (prefixScan ga)

newtype (g ∘ f) a = O (g (f a))

instance (Scan g, Scan f, Functor f, Applicative g) ⇒ Scan (g ∘ f) where
  prefixScan = second (O ∘ fmap adjustL ∘ zip)
             ∘ assocR
             ∘ first prefixScan
             ∘ unzip
             ∘ fmap prefixScan
             ∘ unO

This last definition uses a few utility functions:

zip ∷ Applicative g ⇒ (g a, g b) → g (a,b)
zip = uncurry (liftA2 (,))

unzip ∷ Functor g ⇒ g (a,b) → (g a, g b)
unzip = fmap fst &&& fmap snd

assocR ∷ ((a,b),c) → (a,(b,c))
assocR   ((a,b),c) =  (a,(b,c))

adjustL ∷ (Functor f, Monoid m) ⇒ (m, f m) → f m
adjustL (m, ms) = (m ⊕) <$> ms

Let’s consider how the Scan (g ∘ f) instance plays out for top-down vs bottom-up trees, given the functor-composition encodings above. The critical definitions:

type Enc T2 = Id + Pair ∘ T2

type Enc T4 = Id + T4 ∘ Pair

Focusing on the branch case, we have Pair ∘ T2 vs T4 ∘ Pair, so we’ll use the Scan (g ∘ f) instance either way. Let’s consider the work implied by that instance. There are two calls to prefixScan, plus a linear amount of other work. The meanings of those two calls differ, however:

For top-down trees (T2), the recursive tree scans are in fmap prefixScan, mapping over the pair of trees. The first prefixScan is a pair scan and so does constant work. Since there are two recursive calls, each working on a tree of half size (assuming balance), plus linear other work, the total work $Θ (n \log n)$ , as explained above.
For bottom-up trees (T4), there is only one recursive recursive tree scan, which appears in first prefixScan. The prefixScan in fmap prefixScan is pair scan and so does constant work but is mapped over the half-sized tree (of pairs), and so does linear work altogether. Since there only one recursive tree scan, at half size, plus linear other work, the total work is then proportional to $n + n / 2 + n / 4 + \dots \approx 2 n = Θ (n)$ . So we have a work-efficient algorithm!

Looking deeper

In addition to the simple analysis above of scanning over top-down and over bottom-up, let’s look in detail at what transpires and how each case can be optimized. This section may well have more detail than you’re interested in. If so, feel free to skip ahead.

Top-down

Beginning as with Pair,

  prefixScan t
≡  {- specification -}
  prefixScanEnc t
≡  {- prefixScanEnc definition -}
  (second decode ∘ prefixScan ∘ encode) t
≡  {- (∘) -}
  second decode (prefixScan (encode t))

Take T2, with T3 being quite similar. Now split into two cases for the two constructors of T2. First leaf:

  prefixScan (L2 m)
≡  {- as above -}
  second decode (prefixScan (encode (L2 m)))
≡  {- encode for L2 -}
  second decode (prefixScan (InL (Id m)))
≡  {- prefixScan for functor sum -}
  second decode (second InL (prefixScan (Id m)))
≡  {- prefixScan for Id -}
  second decode (second InL (m, Id ∅))
≡  {- second for functions -}
  second decode (m, InL (Id ∅))
≡  {- second for functions -}
  (m, decode (InL (Id ∅)))
≡  {- decode for L2 -}
  (m, L2 ∅)

Then branch:

  prefixScan (B2 (s :# t))
≡  {- as above -}
  second decode (prefixScan (encode (B2 (s :# t))))
≡  {- encode for L2 -}
  second decode (prefixScan (InR (O (s :# t))))
≡  {- prefixScan for (+) -}
  second decode (second InR (prefixScan (O (s :# t))))
≡  {- property of second -}
  second (decode ∘ InR) (prefixScan (O (s :# t)))

Focus on the prefixScan application:

  prefixScan (O (s :# t)) =
≡  {- prefixScan for (∘) -}
 ( second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan
 ∘ unzip ∘ fmap prefixScan ∘ unO ) (O (s :# t))
≡  {- unO/O -}
  ( second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan
  ∘ unzip ∘ fmap prefixScan ) (s :# t)
≡  {- fmap on Pair -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan ∘ unzip)
    (prefixScan s :# prefixScan t)
≡  {- expand prefixScan -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan ∘ unzip)
    ((ms,s') :# (mt,t'))
      where (ms,s') = prefixScan s
            (mt,t') = prefixScan t
≡  {- unzip -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan)
    ((ms :# mt), (s' :# t')) where ⋯
≡  {- first -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR)
    (prefixScan (ms :# mt), (s' :# t')) where ⋯
≡  {- prefixScan for Pair -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR)
    ((ms ⊕ mt, (∅ :# ms)), (s' :# t')) where ⋯
≡  {- assocR -}
  (second (O ∘ fmap adjustL ∘ zip))
    (ms ⊕ mt, ((∅ :# ms), (s' :# t'))) where ⋯
≡  {- second -}
  ( ms ⊕ mt
  , (O ∘ fmap adjustL ∘ zip) ((∅ :# ms), (s' :# t')) ) where ⋯
≡  {- zip -}
  ( ms ⊕ mt
  , (O ∘ fmap adjustL) ((∅,s') :# (ms,t')) )  where ⋯
≡  {- fmap for Pair -}
  ( ms ⊕ mt
  , O (adjustL (∅,s') :# adjustL (ms,t')) )  where ⋯
≡  {- adjustL -}
  ( ms ⊕ mt
  , O (((∅ ⊕) <$> s') :# ((ms ⊕) <$> t')) )  where ⋯
≡  {- Monoid law (left identity) -}
  ( ms ⊕ mt
  , O ((id <$> s') :# ((ms ⊕) <$> t')) )  where ⋯
≡  {- Functor law (fmap id) -}
  ( ms ⊕ mt
  , O (s' :# ((ms ⊕) <$> t')) )
      where (ms,s') = prefixScan s
            (mt,t') = prefixScan t

Continuing from above,

  prefixScan (B2 (s :# t))
≡  {- see above -}
  second (decode ∘ InR) (prefixScan (O (s :# t)))
≡  {- prefixScan focus from above -}
  second (decode ∘ InR)
    ( ms ⊕ mt
    , O (s' :# ((ms ⊕) <$> t')) )
        where (ms,s') = prefixScan s
              (mt,t') = prefixScan t
≡  {- definition of second on functions -}
    (ms ⊕ mt, (decode ∘ InR) (O (s' :# ((ms ⊕) <$> t')))) where ⋯
≡  {- (∘) -}
    (ms ⊕ mt, decode (InR (O (s' :# ((ms ⊕) <$> t'))))) where ⋯
≡  {- decode for B2 -}
    (ms ⊕ mt, B2 (s' :# ((ms ⊕) <$> t'))) where ⋯

This final form is as in Deriving parallel tree scans, changed for the new scan interface. The derivation saved some work in wrapping & unwrapping and method invocation, plus one of the two adjustment passes over the sub-trees. As explained above, this algorithm performs $Θ (n \log n)$ work.

I’ll leave suffixScan for you to do yourself.

Bottom-up

What happens if we switch from top-down to bottom-up binary trees? I’ll use T4 (though T5 would work as well):

data T4 a = L4 a | B4 (T4 (Pair a))

The leaf case is just as with T2 above, so let’s get right to branches.

  prefixScan (B4 t)
≡  {- as above -}
  second decode (prefixScan (encode (B4 t)))
≡  {- encode for L2 -}
  second decode (prefixScan (InR (O t)))
≡  {- prefixScan for (+) -}
  second decode (second InR (prefixScan (O t)))
≡  {- property of second -}
  second (decode ∘ InR) (prefixScan (O t))

As before, now focus on the prefixScan call.

  prefixScan (O t) =
≡  {- prefixScan for (∘) -}
 ( second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan
 ∘ unzip ∘ fmap prefixScan ∘ unO ) (O t)
≡  {- unO/O -}
  ( second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan
  ∘ unzip ∘ fmap prefixScan ) t
≡  {- prefixScan on Pair (derived above) -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan ∘ unzip)
    fmap (λ (a :# b) → (a ⊕ b, (∅ :# a))) t
≡  {- unzip/fmap -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR ∘ first prefixScan)
    ( fmap (λ (a :# b) → (a ⊕ b)) t
    , fmap (λ (a :# b) → (∅ :# a))   t )
≡  {- first on functions -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR)
    ( prefixScan (fmap (λ (a :# b) → (a ⊕ b)) t)
    , fmap (λ (a :# b) → (∅ :# a))   t )
≡  {- expand prefixScan -}
  (second (O ∘ fmap adjustL ∘ zip) ∘ assocR)
    ((mp,p'), fmap (λ (a :# b) → (∅ :# a)) t)
   where (mp,p') = prefixScan (fmap (λ (a :# b) → (a ⊕ b)) t)
≡  {- assocR -}
  (second (O ∘ fmap adjustL ∘ zip))
    (mp, (p', fmap (λ (a :# b) → (∅ :# a)) t))
   where ⋯
≡  {- second on functions -}
  (mp, (O ∘ fmap adjustL ∘ zip) (p', fmap (λ (a :# b) → (∅ :# a)) t))
    where ⋯
≡  {- fmap/zip/fmap -}
  (mp, O (liftA2 tweak p' t))
    where tweak s (a :# _) = adjustL (s, (∅ :# a))
          (mp,p') = prefixScan (fmap (λ (a :# b) → (a ⊕ b)) t)
≡  {- adjustL, then simplify -}
  (mp, O (liftA2 tweak p' t))
    where tweak s (a :# _) = (s :# s ⊕ a)
          (mp,p') = prefixScan (fmap (λ (a :# b) → (a ⊕ b)) t)

Now re-introduce the context of prefixScan (O t):

  prefixScan (B4 t)
≡  {- see above -}
  second (decode ∘ InR) (prefixScan (O t))
≡  {- see above -}
  second (decode ∘ InR)
    (mp, O (liftA2 tweak p' t))
      where ⋯
≡  {- decode for T4 -}
  (mp, B4 (liftA2 tweak p' t))
    where p = fmap (λ (e :# o) → (e ⊕ o)) t
          (mp,p') = prefixScan p
          tweak s (e :# _) = (s :# s ⊕ e)

Notice how much this bottom-up tree scan algorithm differs from the top-down algorithm derived above. In particular, there’s only one recursive tree scan (on a half-sized tree) instead of two, plus linear additional work, for a total of $Θ (n)$ work.

Guy Blelloch’s parallel scan algorithm

In Programming parallel algorithms, Guy Blelloch gives the following algorithm for parallel prefix scan, expressed in the parallel functional language NESL:

function scan(a) =
if #a ≡ 1 then [0]
else
  let es = even_elts(a);
      os = odd_elts(a);
      ss = scan({e+o: e in es; o in os})
  in interleave(ss,{s+e: s in ss; e in es})

This algorithm is nearly identical to the T4 scan algorithm above. I was very glad to find this route to Guy’s algorithm, which had been fairly mysterious to me. I mean, I could believe that the algorithm worked, but I had no idea how I might have discovered it myself. With the functor composition approach to scanning, I now see how Guy’s algorithm emerges as well as how it generalizes to other data structures.

Nested data types and parallelism

Most of the recursive algebraic data types that appear in Haskell programs are regular, meaning that the recursive instances are instantiated with the same type parameter as the containing type. For instance, a top-down tree of elements of type a is either a leaf or has two trees whose elements have that same type a. In contrast, in a bottom-up tree, the (single) recursively contained tree is over elements of type (a,a). Such non-regular data types are called "nested". The two tree scan algorithms above suggest to me that nested data types are particularly useful for efficient parallel algorithms.

Composable parallel scanning

Conal — Tue, 01 Mar 2011 22:33:36 +0000

The post Deriving list scans gave a simple specification of the list-scanning functions scanl and scanr, and then transformed those specifications into the standard optimized implementations. Next, the post Deriving parallel tree scans adapted the specifications and derivations to a type of binary trees. The resulting implementations are parallel-friendly, but not work-efficient, in that they perform $n \log n$ work vs linear work as in the best-known sequential algorithm.

Besides the work-inefficiency, I don’t know how to extend the critical initTs and tailTs functions (analogs of inits and tails on lists) to depth-typed, perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The difficulty I encounter is that the functions initTs and tailTs make unbalanced trees out of balanced ones, so I don’t know how to adapt the specifications when types prevent the existence of unbalanced trees.

This new post explores an approach to generalized scanning via type classes. After defining the classes and giving a simple example, I’ll give a simple & general framework based on composing functor combinators.

Edits:

2011-03-02: Fixed typo. "constant functor is easiest" (instead of "identity functor"). Thanks, frguybob.
2011-03-05: Removed final unfinished sentence.
2011-07-28: Replace "assocL" with "assocR" in prefixScan derivation for g ∘ f.

Generalizing list scans

The left and right scan functions on lists have an awkward feature. The output list has one more element than the input list, corresponding to the fact that the number of prefixes (inits) of a list is one more than the number of elements, and similarly for suffixes (tails).

While it’s easy to extend a list by adding one more element, it’s not easy with other functors. In Deriving parallel tree scans, I simply removed the ∅ element from the scan. In this post, I’ll instead change the interface to produce an output of exactly the same shape, plus one extra element. The extra element will equal a fold over the complete input. If you recall, we had to search for that complete fold in an input subtree in order to adjust the other subtree. (See headT and lastT and their generalizations in Deriving parallel tree scans.) Separating out this value eliminates the search.

Define a class with methods for prefix and suffix scan:

class Scan f where
  prefixScan, suffixScan ∷ Monoid m ⇒ f m → (m, f m)

Prefix scans (prefixScan) accumulate moving left-to-right, while suffix scans (suffixScan) accumulate moving right-to-left.

A simple example: pairs

To get a first sense of generalized scans, let’s use see how to scan over a pair functor.

data Pair a = a :# a deriving (Eq,Ord,Show)

With GHC’s DeriveFunctor option, we could also derive a Functor instance, but for clarity, define it explicitly:

instance Functor Pair where
  fmap f (a :# b) = (f a :# f b)

The scans:

instance Scan Pair where
  prefixScan (a :# b) = (a ⊕ b, (∅ :# a))
  suffixScan (a :# b) = (a ⊕ b, (b :# ∅))

As you can see, if we eliminated the ∅ elements, we could shift to the left or right and forgo the extra result.

Naturally, there is also a Fold instance, and the scans produce the fold results as well sub-folds:

instance Foldable Pair where
  fold (a :# b) = a ⊕ b

The Pair functor also has unsurprising instances for Applicative and Traversable.

instance Applicative Pair where
  pure a = a :# a
  (f :# g) <*> (x :# y) = (f x :# g y)

instance Traversable Pair where
  sequenceA (fa :# fb) = (:#) <$> fa <*> fb

We don’t really have to figure out how to define scans for every functor separately. We can instead look at how functors are are composed out of their essential building blocks.

Scans for functor combinators

To see how to scan over a broad range of functors, let’s look at each of the functor combinators, e.g., as in Elegant memoization with higher-order types.

Constant

The constant functor is easiest.

newtype Const x a = Const x

There are no values to accumulate, so the final result (fold) is ∅.

instance Scan (Const x) where
  prefixScan (Const x) = (∅, Const x)
  suffixScan           = prefixScan

Identity

The identity functor is nearly as easy.

newtype Id a = Id a

instance Scan Id where
  prefixScan (Id m) = (m, Id ∅)
  suffixScan        = prefixScan

Sum

Scanning in a sum is just scanning in a summand:

data (f + g) a = InL (f a) | InR (g a)

instance (Scan f, Scan g) ⇒ Scan (f + g) where
  prefixScan (InL fa) = second InL (prefixScan fa)
  prefixScan (InR ga) = second InR (prefixScan ga)

  suffixScan (InL fa) = second InL (suffixScan fa)
  suffixScan (InR ga) = second InR (suffixScan ga)

These definitions correspond to simple "commutative diagram" properties, e.g.,

prefixScan ∘ InL ≡ second InL ∘ prefixScan

Product

Product scannning is a little trickier.

data (f × g) a = f a × g a

Scan each of the two parts separately, and then combine the final (fold) part of one result with each of the non-final elements of the other.

instance (Scan f, Scan g, Functor f, Functor g) ⇒ Scan (f × g) where
  prefixScan (fa × ga) = (af ⊕ ag, fa' × ((af ⊕) <$> ga'))
   where (af,fa') = prefixScan fa
         (ag,ga') = prefixScan ga

  suffixScan (fa × ga) = (af ⊕ ag, ((⊕ ag) <$> fa') × ga')
   where (af,fa') = suffixScan fa
         (ag,ga') = suffixScan ga

Composition

Finally, composition is the trickiest.

newtype (g ∘ f) a = O (g (f a))

The target signatures:

  prefixScan, suffixScan ∷ Monoid m ⇒ (g ∘ f) m → (m, (g ∘ f) m)

To find the prefix and suffix scan definitions, fiddle with types beginning at the domain type for prefixScan or suffixScan and arriving at the range type.

Some helpers:

zip ∷ Applicative g ⇒ (g a, g b) → g (a,b)
zip = uncurry (liftA2 (,))

unzip ∷ Functor g ⇒ g (a,b) → (g a, g b)
unzip = fmap fst &&& fmap snd

assocR ∷ ((a,b),c) → (a,(b,c))
assocR   ((a,b),c) =  (a,(b,c))

adjustL ∷ (Functor f, Monoid m) ⇒ (m, f m) → f m
adjustL (m, ms) = (m ⊕) <$> ms

adjustR ∷ (Functor f, Monoid m) ⇒ (m, f m) → f m
adjustR (m, ms) = (⊕ m) <$> ms

First prefixScan:

gofm                     ∷ (g ∘ f) m
unO                   '' ∷ g (f m)
fmap prefixScan       '' ∷ g (m, f m)
unzip                 '' ∷ (g m, g (f m))
first prefixScan      '' ∷ ((m, g m), g (f m))
assocR                '' ∷ (m, (g m, g (f m)))
second zip            '' ∷ (m, g (m, f m))
second (fmap adjustL) '' ∷ (m, g (f m))
second O              '' ∷ (m, (g ∘ f) m)

Then suffixScan:

gofm                     ∷ (g ∘ f) m
unO                   '' ∷ g (f m)
fmap suffixScan       '' ∷ g (m, f m)
unzip                 '' ∷ (g m, g (f m))
first suffixScan      '' ∷ ((m, g m), g (f m))
assocR                '' ∷ (m, (g m, g (f m)))
second zip            '' ∷ (m, (g (m, f m)))
second (fmap adjustR) '' ∷ (m, (g (f m)))
second O              '' ∷ (m, ((g ∘ f) m))

Putting together the pieces and simplifying just a bit leads to the method definitions:

instance (Scan g, Scan f, Functor f, Applicative g) ⇒ Scan (g ∘ f) where
  prefixScan = second (O ∘ fmap adjustL ∘ zip)
             ∘ assocR
             ∘ first prefixScan
             ∘ unzip
             ∘ fmap prefixScan
             ∘ unO

  suffixScan = second (O ∘ fmap adjustR ∘ zip)
             ∘ assocR
             ∘ first suffixScan
             ∘ unzip
             ∘ fmap suffixScan
             ∘ unO

What’s coming up?

What might not easy to spot at this point is that the prefixScan and suffixScan methods given in this post do essentially the same job as in Deriving parallel tree scans, when the binary tree type is deconstructed into functor combinators. A future post will show this connection.
Switch from standard (right-folded) trees to left-folded trees (in the sense of A trie for length-typed vectors and From tries to trees), which reduces the running time from $Θ (n \log n)$ to $Θ n$ .
Scanning in place, i.e., destructively replacing the values in the input structure rather than allocating a new structure.

Fixing lists

Conal — Sun, 30 Jan 2011 18:14:30 +0000

In the post Memoizing polymorphic functions via unmemoization, I toyed with the idea of lists as tries. I don’t think [a] is a trie, simply because [a] is a sum type (being either nil or a cons), while tries are built out of the identity, product, and composition functors. In contrast, Stream is a trie, being built solely with the identity and product functors. Moreover, Stream is not just any old trie, it is the trie that corresponds to Peano (unary natural) numbers, i.e., Stream a ≅ N → a, where

data N = Zero | Succ N

data Stream a = Cons a (Stream a)

If we didn't already know the Stream type, we would derive it systematically from N, using standard isomorphisms.

Stream is a trie (over unary numbers), thanks to it having no choice points, i.e., no sums in its construction. However, streams are infinite-only, which is not always what we want. In contrast, lists can be finite, but are not a trie in any sense I understand. In this post, I look at how to fix lists, so they can be finite and yet be a trie, thanks to having no choice points (sums)?

You can find the code for this post and the previous one in a code repository.

Edits:

2011-01-30: Added spoilers warning.
2011-01-30: Pointer to code repository.

Fixing lists

Is there a type of finite lists without choice points (sums)? Yes. There are lots of them. One for each length. Instead of having a single type of lists, have an infinite family of types of n-element lists, one type for each n.

In other words, to fix the problem with lists (trie-unfriendliness), split up the usual list type into subtypes (so to speak), each of which has a fixed length.

I realize I'm changing the question to a simpler one. I hope you'll forgive me and hang in to see where this ride goes.

As a first try, we might use tuples as our fixed-length lists:

type L0 a = ()
type L1 a = (a)
type L2 a = (a,a)
type L3 a = (a,a,a)
⋯

However, we can only write down finitely many such types, and I don't know how we could write any definitions that are polymorphic over length.

What can "polymorphic over length" mean in a setting like Haskell, where polymorphism is over types rather than values. Can we express numbers (for lengths, etc) as types? Yes, as in the previous post, Type-bounded numbers, using a common encoding:

data Z    -- zero
data S n  -- successor

Given these type-level numbers, we can define a data type Vec n a, containing only vectors (fixed lists) of length n and elements of type a. Such vectors can be built up as either the zero-length vector, or by adding an element to an vector of length n to get a vector of length n + 1. I don't know how to define this type as a regular algebraic data type, but it's easy as a generalized algebraic data type (GADT):

infixr 5 :<

data Vec ∷ * → * → * where
  ZVec ∷                Vec Z     a
  (:<) ∷ a → Vec n a → Vec (S n) a

For example,

*Vec> :ty 'z' :< 'o' :< 'm' :< 'g' :< ZVec
'z' :< 'o' :< 'm' :< 'g' :< ZVec ∷ Vec (S (S (S (S Z)))) Char

As desired, Vec is length-typed, covers all (finite) lengths, and allows definition of length-polymorphic functions. For instance, it's easy to map functions over vectors:

instance Functor (Vec n) where
  fmap _ ZVec     = ZVec
  fmap f (a :< u) = f a :< fmap f u

The type of fmap here is (a → b) → Vec n a → Vec n b.

Folding over vectors is also straightforward:

instance Foldable (Vec n) where
  foldr _ b ZVec      = b
  foldr h b (a :< as) = a `h` foldr h b as

Is Vec n an applicative functor as well?

instance Applicative (Vec n) where
  ⋯

We would need

pure ∷ a → Vec n a
(⊛)  ∷ Vec n (a → b) → Vec n a → Vec n b

The (⊛) method can be defined similarly to fmap:

  ZVec      ⊛ ZVec      = ZVec
  (f :< fs) ⊛ (x :< xs) = f x :< (fs ⊛ xs)

Unlike fmap and (⊛), pure doesn't have a vector structure to crawl over. It must create just the right structure anyway. You might enjoy thinking about how to solve this puzzle, which I'll tackle in my next post. (Warning: spoilers in the comments below.)

Fixing broken isomorphisms — details for non-strict memoization, part 2

Conal — Wed, 22 Sep 2010 23:02:26 +0000

The post Details for non-strict memoization, part 1 works out a systematic way of doing non-strict memoization, i.e., correct memoization of non-strict (and more broadly, non-hyper-strict) functions. As I mentioned at the end, there was an awkward aspect, which is that the purported “isomorphisms” used for regular types are not quite isomorphisms.

For instance, functions from triples are memoized by converting to and from nested pairs:

untriple ∷ (a,b,c) -> ((a,b),c)
untriple (a,b,c) = ((a,b),c)

triple ∷ ((a,b),c) -> (a,b,c)
triple ((a,b),c) = (a,b,c)

Then untriple and triple form an embedding/projection pair, i.e.,

triple ∘ untriple ≡ id
untriple ∘ triple ⊑ id

The reason for the inequality is that the nested-pair form permits (⊥,c), which does not correspond to any triple.

untriple (triple (⊥,c)) ≡ untriple ⊥ ≡ ⊥

Can we patch this problem by simply using an irrefutable (lazy) pattern in the definition of triple, i.e., triple (~(a,b),c) = (a,b,c)? Let’s try:

untriple (triple (⊥,c)) ≡ untriple (⊥,⊥,c) ≡ ((⊥,⊥),c)

So isomorphism fails and so does even the embedding/projection property.

Similarly, to deal with regular algebraic data types, I used a class that describes regular data types as repeated applications of a single, associated pattern functor (following A Lightweight Approach to Datatype-Generic Rewriting):

class Functor (PF t) ⇒ Regular t where
  type PF t ∷ * → *
  unwrap ∷ t → PF t t
  wrap   ∷ PF t t → t

Here unwrap converts a value into its pattern functor form, and wrap converts back. For example, here is the Regular instance I had used for lists:

instance Regular [a] where
  type PF [a] = Const () :+: Const a :*: Id

  unwrap []     = InL (Const ())
  unwrap (a:as) = InR (Const a :*: Id as)

  wrap (InL (Const ()))          = []
  wrap (InR (Const a :*: Id as)) = a:as

Again, we have an embedding/projection pair, rather than a genuine isomorphism:

wrap ∘ unwrap ≡ id
unwrap ∘ wrap ⊑ id

The inequality comes from ⊥ values occurring in PF [a] [a] at type Const () [a], (), (Const a :*: Id) [a], Const a [a], or Id [a].

Why care?

What harm results from the lack of genuine isomorphism? For hyper-strict functions, as usually handled (correctly) in memoization, I don’t think there is any harm. For correct memoization of non-hyper-strict functions, however, the superfluous points of undefinedness lead to larger memo tries and wasted effort. For instance, a function from triples goes through some massaging on the way to being memoized:

λ (a,b,c) → ⋯
⇓
λ ((a,b),c) → ⋯
⇓
λ (a,b) → λ c → ⋯

For hyper-strict memoization, the next step transforms to λ a → λ b → λ c → ⋯. For non-strict memoization, however, we first stash away the value of the function applied to ⊥ ∷ (a,b), which will always be ⊥ in this context.

Strict products and sums

To eliminate the definedness discrepancy and regain isomorphism, we might make all non-strictness explicit via unlifted product & sums, and explicit lifting.

-- | Add a bottom to a type
data Lift a = Lift { unLift ∷ a } deriving Functor

infixl 6 :+:!
infixl 7 :*:!

-- | Strict pair
data a :*! b = !a :*! !b

-- | Strict sum
data a :+! b = Left' !a | Right' !b

Note that the Id and Const a functors used in canonical representations are already strict, as they’re defined via newtype.

With these new tools, we can decompose isomorphically. For instance,

(a,b,c) ≅ Lift a :*! Lift b :*! Lift c

with the isomorphism given by

untriple' ∷ (a,b,c) -> Lift a :*! Lift b :*! Lift c
untriple' (a,b,c) = Lift a :*! Lift b :*! Lift c

triple' ∷ Lift a :*! Lift b :*! Lift c -> (a,b,c)
triple' (Lift a :*! Lift b :*! Lift c) = (a,b,c)

For regular types, we’ll also want variations as functor combinators:

-- | Strict product functor
data (f :*:! g) a = !(f a) :*:! !(g a) deriving Functor

-- | Strict sum functor
data (f :+:! g) a = InL' !(f a) | InR' !(g a) deriving Functor

Then change the Regular instance on lists to the following:

instance Regular [a] where
  type PF [a] = Const () :+:! Const (Lift a) :*:! Lift

  unwrap []     = InL' (Const ())
  unwrap (a:as) = InR' (Const (Lift a) :*:! Lift as)

  wrap (InL' (Const ()))                    = []
  wrap (InR' (Const (Lift a) :*:! Lift as)) = a:as

I suppose it would be fairly straightforward to derive such instances for algebraic data types automatically via Template Haskell.

Tries for non-strict memoization

As in part 1, represent a non-strict memo trie for a function f ∷ k -> v as a value for f ⊥ and a strict (but not hyper-strict) memo trie for f:

type k :→: v = Trie v (k :→ v)

For non-strict sum domains, the strict memo trie was a pair of non-strict tries:

instance (HasTrie a, HasTrie b) ⇒ HasTrie (Either a b) where
  type STrie (Either a b) = Trie a :*: Trie b
  sTrie   f           = trie (f ∘ Left) :*: trie (f ∘ Right)
  sUntrie (ta :*: tb) = untrie ta `either` untrie tb

For non-strict product, the strict trie was a composition of non-strict tries:

instance (HasTrie a, HasTrie b) => HasTrie (a , b) where
  type STrie (a , b) = Trie a :. Trie b
  sTrie   f = O (fmap trie (trie (curry f)))
  sUntrie (O tt) = uncurry (untrie (fmap untrie tt))

What about strict sum and product domains? Since strict sums & products cannot contain ⊥ as their immediate components, we can omit the values corresponding to ⊥ for those components. That is, we can use pairs and compositions of strict tries instead.

instance (HasTrie a, HasTrie b) => HasTrie (a :+! b) where
  type STrie (a :+! b) = STrie a :*: STrie b
  sTrie   f           = sTrie (f . Left') :*: sTrie (f . Right')
  sUntrie (ta :*: tb) = sUntrie ta `either'` sUntrie tb

instance (HasTrie a, HasTrie b) => HasTrie (a :*! b) where
  type STrie (a :*! b) = STrie a :. STrie b
  sTrie   f      = O (fmap sTrie (sTrie (curry' f)))
  sUntrie (O tt) = uncurry' (sUntrie (fmap sUntrie tt))

I’ve also substituted versions of curry and uncurry for strict products and either for strict sums:

curry' ∷ (a :*! b -> c) -> (a -> b -> c)
curry' f a b = f (a :*! b)

uncurry' ∷ (a -> b -> c) -> ((a :*! b) -> c)
uncurry' f (a :*! b) = f a b

either' ∷ (a -> c) -> (b -> c) -> (a :+! b -> c)
either' f _ (Left'  a) = f a
either' _ g (Right' b) = g b

We’ll also need to handle the lifting functor. The type Lift a has an additional bottom. A strict function or trie over Lift a is only strict in the lower (outer) one. So a strict trie over Lift a is simply a non-strict trie over a.

instance HasTrie a => HasTrie (Lift a) where
  type STrie (Lift a) = Trie a
  sTrie   f = trie (f . Lift)
  sUntrie t = untrie t . unLift

Notice that this instance puts back exactly what was lost from memo tries when going from non-strict products and sums to strict products and sums. The reason for this relationship is explained in the following simple isomorphisms:

(a,b)      ≅ Lift a :*! Lift b
Either a b ≅ Lift a :+! Lift b

Then isomorphisms can then be used to implement memoize over non-strict products and sums via memoization over strict products and sums.

Higher-order memoization

The post Memoizing higher-order functions suggested a simple way to memoize functions over function-valued domains by using (as always) type isomorphisms. The isomorphism used is between functions and memo tries.

I gave one example in that post

ft1 ∷ (Bool → a) → [a]
ft1 f = [f False, f True]

In retrospect, this example was a lousy choice, as it hides an important problem. The Bool type is finite, and so the corresponding trie type has only finitely large elements. For that reason, higher-order memoization can get away with the usual hyper-strict memoization.

If instead, we try memoizing a function of type (a → b) → c, where the type a has infinitely many elements (e.g., Integer or [Bool]), then we’ll have to memoize over the domain a :→: b (memo tries from a to b), which includes infinite elements. In that case, hyper-strict memoization blows up, so we’ll want to use non-strict memoization instead.

As mentioned above, the type of non-strict tries contains a value and a strict trie:

type k :→: v = Trie v (k :→ v)

I thought I’d memoize by mapping to & from the isomorphic pair type (v, k :→ v). However, now I’m not satisfied with this mapping. A non-strict trie from k to v is not just any such pair of v and k :→ v. Monotonicity requires that the single v value (for ⊥) be a lower bound (information-wise) of every v in the trie. Ignoring this constraint would lead to a trie in which most of the entries do not correspond to any non-strict memo trie.

Puzzle: Can this constraint be captured as a static type in modern Haskell’s (GHC’s) type system (i.e., without resorting to general dependent typing)? I don’t know the answer.

Memoizing abstract types

This problem is more wide-spread still. Whenever there are constraints on a representation beyond what is expressed directly and statically in the representation type, we will have this same sort of isomorphism puzzle. Can we capture the constraint as a Haskell type? When we cannot, what do we do?

If we didn’t care about efficiency, I think we could ignore the issue, and everything else in this blog post, and accept making memo tries that are much larger than necessary. Although laziness will keep from filling in range values for unaccessed domain values, I worry that there will be quite a lot time and space wasted navigating past large portions of unusable trie structure.

Another angle on zippers

Conal — Thu, 29 Jul 2010 17:06:10 +0000

The zipper is an efficient and elegant data structure for purely functional editing of tree-like data structures, first published by Gérard Huet. Zippers maintain a location of focus in a tree and support navigation operations (up, down, left, right) and editing (replace current focus).

The original zipper type and operations are customized for a single type, but it’s not hard to see how to adapt to other tree-like types, and hence to regular data types. There have been many follow-up papers to The Zipper, including a polytypic version in the paper Type-indexed data types.

All of the zipper adaptations and generalizations I’ve seen so far maintain the original navigation interface. In this post, I propose an alternative interface that appears to significantly simplify matters. There are only two navigation functions instead of four, and each of the two is specified and implemented via a fairly simple one-liner.

I haven’t used this new zipper formulation in an application yet, so I do not know whether some usefulness has been lost in simplifying the interface.

The code in this blog post is taken from the Haskell library functor-combo and completes the Holey type class introduced in Differentiation of higher-order types.

Edits:

2010-07-29: Removed some stray Just applications in up definitions. (Thanks, illissius.)
2010-07-29: Augmented my complicated definition of tweak2 with a much simpler version from Sjoerd Visscher.
2010-07-29: Replaced fmap (first (:ds')) with (fmap.first) (:ds') in down definitions. (Thanks, Sjoerd.)

Extraction

The post Differentiation of higher-order types gave part of a type class for one-hole contexts (functor derivatives) and the filling of those contexts:

class Functor f ⇒ Holey f where
  type Der f :: * → *
  fillC :: Der f a → a → f a

The arguments of fillC correspond roughly to the components of what Gérard Huet called a “location”, namely context and something to fill the context:

type Loc f a = (Der f a, a)

So an alternative hole-filling interface is

fill :: Holey f ⇒ Loc f a → f a
fill = uncurry fillC

Now consider a reverse operation, a kind of extraction:

guess1 :: f a → Loc f a

There’s an awkward problem here. What if f a has more than one possible hole, or has no hole at all? If more than one, then which do we pick? Perhaps the left-most. If none, then we might want to have a failure representation, e.g.,

guess2 :: f a → Maybe (Loc f a)

To handle the more-than-one possibility, we could add another method for traversing the various extractions, like the go_right operation in The Zipper, section 2.2. I don’t know what changes we’d have to make to the Loc type.

We could instead use a list of possible extractions.

guess3 :: f a → [Loc f a]

Why a list? I guess because it’s in our habitual functional toolbox, and it covers any number of alternative extracted locations. On the other hand, our toolbox is growing, and sometimes list isn’t the best functor for the job. For instance, we might use a finger tree, which has better performance for some sequence operations.

Or we could a functor closer at hand, namely f itself.

class Functor f ⇒ Holey f where
  type Der f :: * → *
  fillC   :: Der f a → a → f a
  extract :: f a → f (Loc f a)

For instance, when f ≡ [], extract returns a list of extractions; and when f ≡ Id :*: Id, extract returns a pair of extractions.

How to extract

A constant functor has void derivative. Extraction yields another constant structure, with the same data but a different type:

instance Holey (Const x) where
  type Der (Const x) = Void
  fillC = voidF
  extract (Const x) = Const x

The identity functor has exactly one opportunity for a hole, leaving no information behind:

instance Holey Id where
  type Der Id = Unit
  fillC (Const ()) = Id
  extract (Id a) = Id (Const (), a)

The definitions of Der and fillC above and below are lifted directly from Differentiation of higher-order types.

For sums, there are two cases: InL fa, InR ga :: (f :+: g) a. Starting with the first case:

InL fa :: (f :+: g) a

fa :: f a

extract fa :: f (Loc f a)

           :: f (Der f a, a)

(fmap.first) InL (extract fa) :: f ((Der f :+: Der g) a, a)

                              :: f ((Der (f :+: g) a), a)

See Semantic editor combinators for an explanation of (fmap.first) friends. Continuing, apply the definition of Der on sums:

InL ((fmap.first) InL (extract fa)) :: (f :+: g) ((Der (f :+: g) a), a)

                                    :: (f :+: g) (Loc (f :+: g) a)

The two steps that introduce g are motivated by the required type of extract. Similarly, for the second case:

InR ((fmap.first) InR (extract ga)) :: (f :+: g) (Loc (f :+: g) a)

So,

instance (Holey f, Holey g) ⇒ Holey (f :+: g) where
  type Der (f :+: g) = Der f :+: Der g
  fillC (InL df) = InL ∘ fillC df
  fillC (InR df) = InR ∘ fillC df
  extract (InL fa) = InL ((fmap.first) InL (extract fa))
  extract (InR ga) = InR ((fmap.first) InR (extract ga))

For products, recall the derivative type:

  type Der (f :*: g) = Der f :*: g  :+:  f :*: Der g

To extract from a product, we extract from either component and then pair with the other component. The form of an argument to extract is

fa :*: ga :: (f :*: g) a

Again, start with the left part:

fa :: f a

extract fa :: f (Loc f a)
           :: f (Der f a, a)

(fmap.first) (:*: ga) (extract fa) :: f ((Der f :*: g) a, a)

(fmap.first) (InL ∘ (:*: ga)) (extract fa)
  :: f (((Der f :*: g) :+: (f :*: Der g)) a, a)

  :: f ((Der (f :*: g)) a, a)

Similarly, for the second component,

(fmap.first) (InR ∘ (fa :*:)) (extract ga)
  :: g ((Der (f :*: g)) a, a)

Combining the two extraction routes:

(fmap.first) (InL ∘ (:*: ga)) (extract fa) :*:
(fmap.first) (InR ∘ (fa :*:)) (extract ga)
  :: (f :*: g) (Der (f :*: g) a, a)

So,

instance (Holey f, Holey g) ⇒ Holey (f :*: g) where
  type Der (f :*: g) = Der f :*: g  :+:  f :*: Der g
  fillC (InL (dfa :*:  ga)) = (:*: ga) ∘ fillC dfa
  fillC (InR ( fa :*: dga)) = (fa :*:) ∘ fillC dga
  extract (fa :*: ga) = 
    (fmap.first) (InL ∘ (:*: ga)) (extract fa) :*:
    (fmap.first) (InR ∘ (fa :*:)) (extract ga)

Finally, the chain rule, for functor composition:

type Der (g :. f) = Der g :. f  :*:  Der f

A value of type (g :. f) a has form O gfa, where gfa :: g (f a). To extract:

form all g-extractions, yielding values of type fa :: f a and their contexts of type Der g (f a);
form all f-extractions of each such fa, yielding values of type a and their contexts of type Der f a; and
reassemble these pieces into the shape determined by Der (g :. f).

Let’s go:

gfa :: g (f a)

extract gfa :: g (Loc g (f a))

            :: g (Der g (f a), f a)

fmap (second extract) (extract gfa)
  :: g (Der g (f a), f (Loc f a))

Continuing, the following lemmas come in handy.

tweak2 :: Functor f ⇒
          (dg (f a), f (df a, a)) → f (((dg :. f) :*: df) a, a)
tweak2 = (fmap.first) chainRule ∘ tweak1

tweak1 :: Functor f ⇒
          (dg (fa), f (dfa, a)) → f ((dg (fa), dfa), a)
tweak1 = fmap lassoc ∘ squishP

squishP :: Functor f ⇒ (a, f b) → f (a,b)
squishP (a,fb) = fmap (a,) fb

chainRule :: (dg (f a), df a) → ((dg :. f) :*: df) a
chainRule (dgfa, dfa) = O dgfa :*: dfa

lassoc :: (p,(q,r)) → ((p,q),r)
lassoc    (p,(q,r)) =  ((p,q),r)

Edit: Sjoerd Visscher found a much simpler form to replace the previous group of definitions:

tweak2 (dgfa, fl) = (fmap.first) (O dgfa :*:) fl

More specifically,

tweak2 :: Functor f => (Der g (f a), f (Loc f a))
                    -> f (((Der g :. f) :*: Der f) a, a)
       :: Functor f => (Der g (f a), f (Loc f a))
                    -> f (Der (g :. f) a, a)
       :: Functor f => (Der g (f a), f (Loc f a))
                    -> f (Loc (g :. f) a)

This lemma gives just what we need to tweak the inner extraction:

fmap (tweak2 ∘ second extract) (extract gfa) :: g (f (Loc (g :. f) a))

extractGF :: (Holey f, Holey g) ⇒
             g (f a) → g (f (Loc (g :. f) a))
extractGF = fmap (tweak2 ∘ second extract) ∘ extract

and

instance (Holey f, Holey g) ⇒ Holey (g :. f) where
  type Der (g :.  f) = Der g :. f  :*:  Der f
  fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa
  extract = inO extractGF

where inO is from Control.Compose, and is defined using the ideas from Prettier functions for wrapping and wrapping and the notational improvement from Matt Hellige’s Pointless fun.

-- | Apply a unary function within the 'O' constructor.
inO :: (g (f a) → g' (f' a')) → ((g :. f) a → (g' :. f') a')
inO = unO ~> O

infixr 1 ~>
-- | Add pre- and post processing
(~>) :: (a' → a) → (b → b') → ((a → b) → (a' → b'))
(f ~> h) g = h ∘ g ∘ f

In case you’re wondering, these definitions did not come to me effortlessly. I sweated through the derivation, guided always by my intuition and the necessary types, as determined by the shape of Der (g :. f). The type-checker helped me get from one step to the next.

I do a lot of type-directed derivations of this style while I program in Haskell, with the type-checker checking each step for me. I’d love to have mechanized help in creating these derivations, not just checking them.

Zippers

How does the Holey class relate to zippers? As in a few recent blog posts, let’s use the fact that regular data types are isomorphic to fixed-points of functors.

Functor fixed-points are like function fixed points

fix f = f (fix f)

type Fix f = f (Fix f)

However, Haskell doesn’t support recursive type synonyms, so use a newtype:

newtype Fix f = Fix { unFix :: f (Fix f) }

A context for a functor fixed-point is either empty, if we’re at the very top of an “f-tree”, or it’s an f-context for f (Fix f), and a parent context:

data Context f = TopC | Context (Der f (Fix f)) (Context f)  -- first try

Hm. On the outside, Context f looks like a list, so let’s use a list instead:

type Context f = [Der f (Fix f)]

The location type we used above is

type Loc f a = (Der f a, a)

Similarly, define a type of zippers (also called “locations”) for functor fixed-points:

type Zipper f = (Context f, Fix f)

This Zipper type corresponds to a zipper, and has operations up and down. The down motion can yield multiple results.

up   :: Holey f ⇒ Zipper f →    Zipper f

down :: Holey f ⇒ Zipper f → f (Zipper f)

Since down yields an f-collection of locations, we do not need sibling navigation functions (left & right).

To move up in Zipper, strip off a derivative (one-hole functor context) and fill the hole with the current tree, leaving the other derivatives as the remaining fixed-point context. Like so:

up   :: Holey f ⇒ Zipper f →    Zipper f
up (d:ds', t) = (ds', Fix (fill (d,t)))

To see how the typing works out:

(d:ds', t) :: Zipper f
(d:ds', t) :: (Context f, Fix f)

d:ds' :: [Der f (Fix f)]

t :: Fix f

d   ::  Der f (Fix f)
ds' :: [Der f (Fix f)]

fill :: Loc f b → f b
fill :: (Der f b, b) → f b
fill :: (Der f (Fix f), Fix f) → f (Fix f)

fill (d,t) :: f (Fix f)

Fix (fill (d,t)) :: Fix f

(ds', Fix (fill (d,t))) :: (Context f, Fix f)
                        :: Zipper f

Note that the up motion fails when at the top of a zipper (empty context). If desired, we can also provide an unfailing version (really, a version with explictly typed failure):

up' :: Holey f => Zipper f -> Maybe (Zipper f)
up' ([]   , _) = Nothing
up' l          = Just (up l)

To move down in an f-tree t, form the extractions of t, each of which has a derivative and a sub-tree. The derivative becomes part of an extended fixed-point context, and the sub-tree becomes the new sub-tree of focus.

down :: Holey f ⇒ Zipper f → f (Zipper f)
down (ds', t) = (fmap.first) (:ds') (extract (unFix t)) (unFix t))

The typing (in case you’re curious):

(ds',t) :: Zipper f
        :: (Context f, Fix f)
        :: ([Der f (Fix f)], Fix f)

ds' :: [Der f (Fix f)]
t :: Fix f
unFix t :: f (Fix f)

extract (unFix t) :: f (Der f (Fix f), Fix f)

(fmap.first) (:ds') (extract (unFix t))
  :: ([Der f (Fix f)], Fix f)
  :: (Context f, Fix f)
  :: LocFix f

Zipping back to regular data types

I like the (functor) fixed-point perspective on regular data types, for its austere formal simplicity. It shows me the naked essence of regular data types, so I can more easily see and more deeply understand patterns like memoization, derivatives, and zippers.

For convenience and friendliness of use, I prefer working with regular types directly, rather than through the (nearly) isomorphic form of functor fixed-points. While the fixed-point perspective is formalism-friendly, the pattern functor perspective is more user-friendly, allowing us to work with our familiar regular data as they are.

As in Elegant memoization with higher-order types, let’s use the following class:

class Functor (PF t) ⇒ Regular t where
  type PF t :: * → *
  wrap   :: PF t t → t
  unwrap :: t → PF t t

The idea is that a type t is isomorphic to Fix (PF t), although really there may be more points of undefinedness in the fixed-point representation, so rather than an isomorphism, we have an embedding/projection pair.

The notions of context and location are similar to the ones above:

type Context t = [Der (PF t) t]

type Zipper t = (Context t, t)

So are the up and down motions, in which wrap and unwrap replace Fix and unFix:

up   :: (Regular t, Holey (PF t)) ⇒ Zipper t →       Zipper t
down :: (Regular t, Holey (PF t)) ⇒ Zipper t → PF t (Zipper t)

up (d:ds', t) = (ds', wrap (fill (d,t)))

down (ds', t) = (fmap.first) (:ds') (extract (unwrap t))

Differentiation of higher-order types

Conal — Thu, 29 Jul 2010 02:45:51 +0000

A “one-hole context” is a data structure with one piece missing. Conor McBride pointed out that the derivative of a regular type is its type of one-hole contexts. When a data structure is assembled out of common functor combinators, a corresponding type of one-hole contexts can be derived mechanically by rules that mirror the standard derivative rules learned in beginning differential calculus.

I’ve been playing with functor combinators lately. I was delighted to find that the data-structure derivatives can be expressed directly using the standard functor combinators and type families.

The code in this blog post is taken from the Haskell library functor-combo.

See also the Haskell Wikibooks page on zippers, especially the section called “Differentiation of data types”.

I mean this post not as new research, but rather as a tidy, concrete presentation of some of Conor’s delightful insight.

Functor combinators

Let’s use the same set of functor combinators as in Elegant memoization with higher-order types and Memoizing higher-order functions:

data Void a   -- no constructors

type Unit a        = Const () a

data Const x a     = Const x

newtype Id a       = Id a

data (f :+: g) a   = InL (f a) | InR (g a)

data (f :*: g) a   = f a :*: g a

newtype (g :. f) a = O (g (f a))

Derivatives

The derivative of a functor is another functor. Since the shape of the derivative is non-uniform (depends on the shape of the functor being differentiated) define a higher-order type family:

type family Der (f :: (* → *)) :: (* → *)

The usual derivative rules can then be translated without applying much imagination. That is, if we start with derivative rules in their functional form (e.g., as in the paper Beautiful differentiation, Section 2 and Figure 1).

For instance, the derivative of the constant function is the constant 0 function, and the derivative of the identity function is the constant 1 function. If der is the derivative functional mapping functions (of real numbers) to functions,

der (const x) ≡ 0
der id        ≡ 1

On the right-hand sides, I am exploiting the function instances of Num from the library applicative-numbers. To be more explicit, I could have written “const 0” and “const 1“.

Correspondingly,

type instance Der (Const x) = Void   -- 0

type instance Der Id        = Unit   -- 1

Note that the types Void a and Unit a have 0 and 1 element, respectively, if we ignore ⊥. Moreover, Void is a sort of additive identity, and Unit is a sort of multiplicative identity, again ignoring ⊥. For these reasons, Void and Unit might be more aptly named “Zero” and “One“.

The first rule says that the a value of type Const x a has no one-hole context (for type a), which is true, since there is an x but no a. The second rule says that there is exactly one possible context for Id a, since the one and only a value must be removed, and no information remains.

A (one-hole) context for a sum is a context for the left or the right possibility of the sum:

type instance Der (f :+: g) = Der f :+: Der g

Correspondingly, the derivative of a sum of functions is the sum of the functions’ derivatives::

der (f + g) ≡ der f + der g

Again I’m using the function Num instance from applicative-numbers.

For a pair, the one hole of a context can be made somewhere in the first component or somewhere in the second component. So the pair context consists of a holey first component and a full second component or a full first component and a holey second component.

type instance Der (f :*: g) = Der f :*: g  :+:  f :*: Der g

Similarly, for functions:

der (f * g) ≡ der f * g + f * der g

Finally, consider functor composition. If g and f are container types, then (g :. f) a is the type of g containers of f containers of a elements. The a-shaped hole must come from one of the contained f a structures.

type instance Der (g :. f) = (Der g :. f) :*: Der f

Here’s one way to think of this derivative functor: to make an a-shaped hole in a g (f a), first remove an f a structure, leaving an (f a)-shaped hole, and then put back all but an a value extracted from the removed f a struture. So the overall (one-hole) context can be assembled from two parts: a g context of f a structures, and an f context of a values.

The corresponding rule for function derivatives:

der (g ∘ f) ≡ (der g ∘ f) * der f

which again uses Num on functions. Written out more explicitly:

der (g ∘ f) a ≡ der g (f a)  * der f a

which may look more like the form you’re used to.

Summary of derivatives

To emphasize the correspondence between forms of differentiation, here are rules for function and functor derivatives:

der (const x) ≡ 0
Der (Const x) ≡ Void

der id ≡ 1
Der Id ≡ Unit

der (f  +  g) ≡ der f  +  der g
Der (f :+: g) ≡ Der f :+: Der g

der (f  *  g) ≡ der f  *  g  +  f  *  der g
Der (f :*: g) ≡ Der f :*: g :+: f :*: Der g

der (g  ∘ f) ≡ (der g  ∘ f)  *  der f
Der (g :. f) ≡ (Der g :. f) :*: Der f

Filling holes

Each derivative functor is a one-hole container. One useful operation on derivatives is filling that hole.

fillC :: Functor f ⇒ Der f a → a → f a

The specifics of how to fill in a hole will depend on the choice of functor f, so let’s make the fillC operation a method of a new type class. This new class is also a handy place to stash the associated type of derivatives, as an alternative to the top-level declarations above.

class Functor f ⇒ Holey f where
  type Der f :: * → *
  fillC :: Der f a → a → f a

I’ll add one more method to this class in an upcoming post.

For Const x, there are no cases to handle, since there are no holes.

instance Holey (Const x) where
  type Der (Const x) = Void
  fillC = error "fillC for Const x: no Der values"

I added a definition just to keep the compiler from complaining. This particular fillC can only be applied to a value of type Void a, and there are no such values other than ⊥.

Is there a more elegant way to define functions over data types with no constructors? One idea is to provide a single, polymorphic function over void types:

  voidF :: Void a → b
  voidF = error "voidF: no value of type Void"

And use whenever as needed, e.g.,

  fillC = voidF

Next is our identity functor:

instance Holey Id where
  type Der Id = Unit
  fillC (Const ()) a = Id a

More succinctly,

  fillC (Const ()) = Id

For sums,

instance (Holey f, Holey g) ⇒ Holey (f :+: g) where
  type Der (f :+: g) = Der f :+: Der g
  fillC (InL df) a = InL (fillC df a)
  fillC (InR df) a = InR (fillC df a)

  fillC (InL df) = InL ∘ fillC df
  fillC (InR df) = InR ∘ fillC df

Products also have two cases, since the derivative of a product is a sum:

instance (Holey f, Holey g) ⇒ Holey (f :*: g) where
  type Der (f :*: g) = Der f :*: g  :+:  f :*: Der g
  fillC (InL (dfa :*:  ga)) a = fillC dfa a :*: ga
  fillC (InR ( fa :*: dga)) a = fa :*: fillC dga a

Less pointfully,

  fillC (InL (dfa :*:  ga)) = (:*: ga) ∘ fillC dfa
  fillC (InR ( fa :*: dga)) = (fa :*:) ∘ fillC dga

Finally, functor composition:

instance (Holey f, Holey g) ⇒ Holey (g :. f) where
  type Der (g :. f) = (Der g :. f) :*: Der f
  fillC (O dgfa :*: dfa) a = O (fillC dgfa (fillC dfa a))

The less pointful form is more telling.

  fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa

In words: filling of the derivative of a composition is a composition of filling of the derivatives.

Thoughts on composition

Let’s return to the derivative rules for composition, i.e., the chain rule, on functions and on functors:

der (g  ∘ f) ≡ (der g  ∘ f)  *  der f

Der (g :. f) ≡ (Der g :. f) :*: Der f

Written in this way, the functor rule looks quite compelling. Something bothers me, however. For functions, multiplication is a special case, not the general case, and is only meaningful and correct when differentiating functions from scalars to scalars. In general, derivative values are linear maps, and the chain rule uses composition on linear maps rather than multiplication on scalars (that represent linear maps). I’ve written several posts on derivatives and a paper Beautiful differentiation, describing this perspective, which comes from calculus on manifolds.

Look again at the less pointful formulation of fillC for derivatives of compositions:

  fillC (O dgfa :*: dfa) = O ∘ fillC dgfa ∘ fillC dfa

The product in this case is just structural. The actual use in fillC is indeed a composition of linear maps. In this context, “linear” has a different meaning from before. It’s another way of saying “fills a one-hole context” (as the linear patterns of term rewriting and of ML & Haskell).

So maybe there’s a more general/abstract view of functor derivatives, just as there is a more general/abstract view of function derivatives. In that view, we might replace the functor chain rule’s product with a notion of composition.

Details for non-strict memoization, part 1

Conal — Tue, 27 Jul 2010 00:14:20 +0000

In Non-strict memoization, I sketched out a means of memoizing non-strict functions. I gave the essential insight but did not show the details of how a nonstrict memoization library comes together. In this new post, I give details, which are a bit delicate, in terms of the implementation described in Elegant memoization with higher-order types.

Near the end, I run into some trouble with regular data types, which I don’t know how to resolve cleanly and efficiently.

Edits:

2010-09-10: Fixed minor typos.

Hyper-strict memo tries

Strict memoization (really hyper-strict) is centered on a family of trie functors, defined as a functor Trie k, associated with a type k.

type k :→: v = Trie k v

class HasTrie k where
    type Trie k :: * → *
    trie   :: (k  →  v) → (k :→: v)
    untrie :: (k :→: v) → (k  →  v)

The simplest instance is for the unit type:

instance HasTrie () where
  type Trie ()  = Id
  trie   f      = Id (f ())
  untrie (Id v) = λ () → v

For consistency with other types, I just made a small change from the previous version, which used const v instead of the stricter λ () → v.

Sums and products are a little more intricate:

instance (HasTrie a, HasTrie b) ⇒ HasTrie (Either a b) where
  type Trie (Either a b) = Trie a :*: Trie b
  trie   f           = trie (f ∘ Left) :*: trie (f ∘ Right)
  untrie (ta :*: tb) = untrie ta `either` untrie tb

instance (HasTrie a, HasTrie b) ⇒ HasTrie (a , b) where
  type Trie (a , b) = Trie a :. Trie b
  trie   f = O (trie (trie ∘ curry f))
  untrie (O tt) = uncurry (untrie ∘ untrie tt)

These trie types are not just strict, they’re hyper-strict. During trie search, arguments get thorougly evaluated. (See Section 9 in the paper Denotational design with type class morphisms.) In other words, all of the points of possible undefinedness are lost.

Strict and non-strict memo tries

The formulation of strict tries will look very like the hyper-strict tries we’ve already seen, with new names for the associated trie type and the conversion methods:

type k :→ v = STrie k v

class HasTrie k where
    type STrie k :: * → *
    sTrie   ::             (k  → v) → (k :→ v)
    sUntrie :: HasLub v ⇒ (k :→ v) → (k  → v)

Besides renaming, I’ve also added a HasLub constraint for sUntrie, which we’ll need later.

For instance, the (almost) simplest strict trie is the one for the unit type, defined exactly as before (with new names):

instance HasTrie () where
  type STrie ()  = Id
  sTrie   f      = Id (f ())
  sUntrie (Id v) = λ () → v

For non-strict memoization, we’ll want to recover all of the points of possible undefinedness lost in hyper-strict memoization. At every level of a structured value, there is the possibility of ⊥ or of a non-⊥ value. Correspondingly, a non-strict trie consists of the value corresponding to the argument ⊥, together with a strict (but not hyper-strict) trie for the non-⊥ values:

data Trie k v = Trie v (k :→ v)

type k :→: v = Trie k v

The conversions between functions and non-strict tries are no longer methods, as they can be defined uniformly for all domain types. To form a non-strict trie, capture the function’s value at ⊥, and build a strict (but not hyper-strict) trie:

trie   :: (HasTrie k          ) ⇒ (k  →  v) → (k :→: v)
trie f = Trie (f ⊥) (sTrie f)

To convert back from a non-strict trie to a (now memoized) function, combine the information from two sources: the original function’s value at ⊥, and the function resulting from the strict (but not hyper-strict) trie:

untrie :: (HasTrie k, HasLub v) ⇒ (k :→: v) → (k  →  v)
untrie (Trie b t) = const b ⊔ sUntrie t

The least-upper-bound (⊔) here is well-defined because its arguments are information-compatible (consistent, non-contradictory). More strongly, const b ⊑ sUntrie t, i.e., the first argument is an information approximation to (contains no information absent from) the second argument. Now we see the need for HasLub v in the type of sUntrie above: functions are ⊔-able exactly when their result types are.

Sums

Just as non-strict tries contain strict tries, so also strict tries contain non-strict tries. For instance, consider a sum type, Either a b. An element is either ⊥ or Left x or Right y, for x :: a and y :: b. The types a and b also contain a bottom element, so we’ll need non-strict memo tries for them:

instance (HasTrie a, HasTrie b) ⇒ HasTrie (Either a b) where
  type STrie (Either a b) = Trie a :*: Trie b
  sTrie   f           = trie (f ∘ Left) :*: trie (f ∘ Right)
  sUntrie (ta :*: tb) = untrie ta `either` untrie tb

Just as in the unit instance (above), the only visible change from hyper-strict to strict is that the left-hand sides use the strict trie type and operations. The right-hand sides are written exactly as before, though now they refer to non-strict tries and their operations.

Products

With product, we run into some trouble. As a first attempt, change only the names on the left-hand side:

instance (HasTrie a, HasTrie b) ⇒ HasTrie (a , b) where
  type STrie (a , b) = Trie a :. Trie b
  sTrie   f      = O (trie (trie ∘ curry f))
  sUntrie (O tt) = uncurry (untrie ∘ untrie tt)

This sUntrie definition, however, leads to an error in type-checking:

Could not deduce (HasLub (Trie b v)) from the context (HasLub v)
  arising from a use of `untrie'

The troublesome untrie use is the one applied directly to tt. (Thank you for column numbers, GHC.)

So what’s going on here? Since sUntrie in this definition takes a (a,b) :→ v, or equivalently, STrie (a,b) v,

O tt :: (a,b) :→ v
     :: STrie (a,b) v
     :: (Trie a :. Trie b) v

The definition of type composition (from an earlier post) is

newtype (g :. f) x = O (g (f x))

tt :: Trie a (Trie b v)
   :: a :→: b :→: v

and

untrie tt :: HasLub (b :→: v) ⇒ a → (b :→: v)

The HasLub constraint comes from the type of untrie (above).

Continuing,

untrie ∘ untrie tt ::
  (HasLub v, HasLub (b :→: v)) ⇒ a → (b → v)

uncurry (untrie ∘ untrie tt) ::
  (HasLub v, HasLub (b :→: v)) ⇒ (a , b) → v

which is almost the required type but contains the extra requirement that HasLub (b :→: v).

Hm.

Looking at the definition of Trie and the definitions of STrie for various domain types b, I think it’s the case that HasLub (b :→: v), whenever HasLub v, exactly as needed. In principle, I could make this requirement of b explicit as a superclass for HasTrie:

class (forall v. HasLub v ⇒ HasLub (b :→: v)) ⇒ HasTrie k where ...

However, Haskell’s type system isn’t quite expressive enough, even with GHC extensions (as far as I know).

A possible solution

We could instead define a functor-level variant of HasLub:

class HasLubF f where
  lubF :: HasLub v ⇒ f v → f v → f v

and then use lubF instead of (⊔) in sUntrie. The revised HasTrie class definition:

class HasLubF (Trie k) ⇒ HasTrie k where
    type STrie k :: * → *
    sTrie   ::             (k  → v) → (k :→ v)
    sUntrie :: HasLub v ⇒ (k :→ v) → (k  → v)

I would rather not replicate and modify the HasLub class and all of its instances, so I’m going to set this idea aside and look for another.

Another route

Let’s return to the problematic definition of sUntrie for pairs:

sUntrie (O tt) = uncurry (untrie ∘ untrie tt)

and recall that tt :: a :→: b :→: v. The strategy here was to first convert the outer trie (with domain a) and then the inner trie (with domain b).

Alternatively, we might reverse the order.

If we’re going to convert inside-out instead of outside-in, then we’ll need a way to transform each of the range elements of a trie. Which is exactly what fmap is for. If only we had a functor instance for Trie a, then we could re-define sUntrie on pair tries as follows:

sUntrie (O tt) = uncurry (untrie (fmap untrie tt))

As a sanity check, try compiling this definition. Sure enough, it’s okay except for a missing Functor instance:

Could not deduce (Functor (Trie a))
  from the context (HasTrie (a, b), HasTrie a, HasTrie b)
  arising from a use of `fmap'

Fixed easily enough:

instance Functor (STrie k) ⇒ Functor (Trie k) where
  fmap f (Trie b t) = Trie (f b) (fmap f t)

Or even, using the GHC language extensions DeriveFunctor and StandaloneDeriving, just

deriving instance Functor (STrie k) ⇒ Functor (Trie k)

Now we get a slightly different error message. We’re now missing a Functor instance for STrie a instead of Trie a:

Could not deduce (Functor (STrie a))
  from the context (HasTrie (a, b), HasTrie a, HasTrie b)
  arising from a use of `fmap'

By the way, we can also construct tries inside-out, if we want:

sTrie f = O (fmap trie (trie (curry f)))

So we’ll be in good shape if we can satisfy the Functor requirement on strict tries. Fortunately, all of the strict trie (higher-order) types appearing are indeed functors, since we built them up using functor combinators.

Still, we’ll have to help the type-checker prove that all of the trie types it involved must indeed be functors. Again, a superclass constraint can capture this requirement:

class Functor (STrie k) ⇒ HasTrie k where ...

Unlike HasLub, this time the required constraint is already at the functor level, so we don’t have to define a new class. We don’t even have to define any new instances, as our functor combinators come with Functor instances, all of which can be derived automatically by GHC.

With this one change, all of the HasTrie instances go through!

Isomorphisms

As pointed out in Memoizing higher-order functions, type isomorphism is the central, repeated theme of functional memoization. In addition to the isomorphism between functions and tries, the tries for many types are given via isomorphism with other types that have tries. In this way, we only have to define tries for our tiny set of functor combinators.

Isomorphism support is as in Elegant memoization with higher-order types, just using the new names:

#define HasTrieIsomorph(Context,Type,IsoType,toIso,fromIso) 
instance Context ⇒ HasTrie (Type) where { 
  type STrie (Type) = STrie (IsoType); 
  sTrie f = sTrie (f ∘ (fromIso)); 
  sUntrie t = sUntrie t ∘ (toIso); 
}

Note the use of strict tries even on the right-hand sides.

Aside: as mentioned in Composing memo tries, trie/untrie forms not just an isomorphism but a pair of type class morphisms (TCMs). (For motivation and examples of TCMs in software design, see Denotational design with type class morphisms.)

Regular data types

Regular data types are isomorphic to fixed-points of functors. Elegant memoization with higher-order types gives a brief introduction to these notions and pointers to more information. That post also shows how to use the Regular type class and its instances (defined for other purposes as well) to provide hyper-strict memo tries for all regular data types.

Switching from hyper-strict to non-strict raises an awkward issue. The functor isomorphisms we used are only correct for fully defined data-types. When we allow full or partial undefinedness, as in a lazy language like Haskell, our isomorphisms break down.

Following A Lightweight Approach to Datatype-Generic Rewriting, here is the class I used, where “PF” stands for “pattern functor”:

class Functor (PF t) ⇒ Regular t where
  type PF t :: * → *
  unwrap :: t → PF t t
  wrap   :: PF t t → t

The unwrap method peels off a single layer from a regular type. For example, the top level of a list is either a unit (nil) or a pair (cons) of an element and a hole in which a list can be placed.

instance Regular [a] where
  type PF [a] = Unit :+: Const a :*: Id   -- note Unit == Const ()

  unwrap []     = InL (Const ())
  unwrap (a:as) = InR (Const a :*: Id as)

  wrap (InL (Const ()))          = []
  wrap (InR (Const a :*: Id as)) = a:as

The catch here is that the unwrap and wrap methods do not really form an isomorphism. Instead, they satisfy a weaker connection: they form embedding/projection pair. That is,

wrap ∘ unwrap ≡ id
unwrap ∘ wrap ⊑ id

To see the mismatch between [a] and PF [a] [a], note that the latter has opportunities for partial undefinedness that have no corresponding opportunities in [a]. Specifically, ⊥ could occur at type Const () [a], (), (Const a :*: Id) [a], Const a [a], or Id [a]. Any of these ⊥ values will result in wrap returning ⊥ altogether. For instance, if

oops :: PF [Integer]
oops = InR (⊥ :*: Id [3,5])

then

unwrap (wrap oops) ≡ unwrap ⊥ ≡ ⊥ ⊑ oops

By examining various cases, we can prove that unwrap (wrap p) ⊑ p for all p, which is to say unwrap ∘ wrap ⊑ id, since information ordering on functions is defined point-wise. (See Merging partial values.)

Examining the definition of unwrap above shows that it does not give rise to the troublesome ⊥ points, and so a trivial equational proof shows that wrap ∘ unwrap ≡ id.

In the context of memoization, the additional undefined values are problematic. Consider the case of lists. The specification macro

HasTrieRegular1([], ListSTrie)

expands into a newtype and its HasTrie instance. Changing only the associated type and method names in the version for hyper-strict memoization:

newtype ListSTrie a v = ListSTrie (PF [a] [a] :→: v)

instance HasTrie a ⇒ HasTrie [a] where
  type STrie [a] = ListSTrie a
  sTrie f = ListSTrie (sTrie (f . wrap))
  sUntrie (ListSTrie t) = sUntrie t . unwrap

Note that the trie in ListSTrie trie contains entries for many ⊥ sub-elements that do not correspond to any list values. The memoized function is f ∘ wrap, which will have many fewer ⊥ possibilities than the trie structure supports. At each of the superfluous ⊥ points, the function sampled is strict, so the Trie (rather than STrie) will contain a predictable ⊥. Considering the definition of untrie:

untrie (Trie b t) = const b ⊔ sUntrie t

we know b ≡ ⊥, and so const b ⊔ sUntrie t ≡ sUntrie t. Thus, at these points, the ⊥ value is never helpful, and we could use a strict (though not hyper-strict) trie instead of a non-strict trie.

Perhaps we could safely ignore this whole issue and lose only some efficiency, rather than correctness. Still, I’d rather build and traverse just the right trie for our regular types.

As this post is already longer than I intended, and my attention is wandering, I’ll publish it here and pick up later. Comments & suggestions please!

Elegant memoization with higher-order types

Conal — Wed, 21 Jul 2010 04:48:22 +0000

A while back, I got interested in functional memoization, especially after seeing some code from Spencer Janssen using the essential idea of Ralf Hinze’s paper Generalizing Generalized Tries. The blog post Elegant memoization with functional memo tries describes a library, MemoTrie, based on both of these sources, and using associated data types. I would have rather used associated type synonyms and standard types, but I couldn’t see how to get the details to work out. Recently, while playing with functor combinators, I realized that they might work for memoization, which they do quite nicely.

This blog post shows how functor combinators lead to an even more elegant formulation of functional memoization. The code is available as part of the functor-combo package.

The techniques in this post are not so much new as they are ones that have recently been sinking in for me. See Generalizing Generalized Tries, as well as Generic programming with fixed points for mutually recursive datatypes.

Edits:

2011-01-28: Fixed small typo: “b^^a^^” ⟼ “b^a“
2010-09-10: Corrected Const definition to use newtype instead of data.
2010-09-10: Added missing Unit type definition (as Const ()).

Tries as associated data type

The MemoTrie library is centered on a class HasTrie with an associated data type of tries (efficient indexing structures for memoized functions):

class HasTrie k where
    data (:→:) k :: * → *
    trie   :: (k  →  v) → (k :→: v)
    untrie :: (k :→: v) → (k  →  v)

The type a :→: b represents a trie that maps values of type a to values of type b. The trie representation depends only on a.

Memoization is a simple combination of these two methods:

memo :: HasTrie a ⇒ (a → b) → (a → b)
memo = untrie . trie

The HasTrie instance definitions correspond to isomorphisms invoving function types. The isomorphisms correspond to the familiar rules of exponents, if we translate a → b into b^a. (See Elegant memoization with functional memo tries for more explanation.)

instance HasTrie () where
    data () :→: x = UnitTrie x
    trie f = UnitTrie (f ())
    untrie (UnitTrie x) = const x

instance (HasTrie a, HasTrie b) ⇒ HasTrie (Either a b) where
    data (Either a b) :→: x = EitherTrie (a :→: x) (b :→: x)
    trie f = EitherTrie (trie (f . Left)) (trie (f . Right))
    untrie (EitherTrie s t) = either (untrie s) (untrie t)

instance (HasTrie a, HasTrie b) ⇒ HasTrie (a,b) where
    data (a,b) :→: x = PairTrie (a :→: (b :→: x))
    trie f = PairTrie (trie (trie . curry f))
    untrie (PairTrie t) = uncurry (untrie .  untrie t)

Functors and functor combinators

For notational convenience, let “(:→:)” be a synonym for “Trie“:

type k :→: v = Trie k v

And replace the associated data with an associated type.

class HasTrie k where
    type Trie k :: * → *
    trie   :: (k  →  v) → (k :→: v)
    untrie :: (k :→: v) → (k  →  v)

Then, imitating the three HasTrie instances above,

type Trie () v = v

type Trie (Either a b) v = (Trie a v, Trie b v)

type Trie (a,b) v = Trie a (Trie b v)

Imagine that we have type lambdas for writing higher-kinded types.

type Trie () = λ v → v

type Trie (Either a b) = λ v → (Trie a v, Trie b v)

type Trie (a,b) = λ v → Trie a (Trie b v)

Type lambdas are often written as “Λ” (capital “λ”) instead. In the land of values, these three right-hand sides correspond to common building blocks for functions, namely identity, product, and composition:

id      = λ v → v
f *** g = λ v → (f v, g v)
g  .  f = λ v → g (f v)

These building blocks arise in the land of types.

newtype Id a = Id a

data (f :*: g) a = f a :*: g a

newtype (g :. f) a = O (g (f a))

where Id, f and g are functors. Sum and a constant functor are also common building blocks:

data (f :+: g) a = InL (f a) | InR (g a)

newtype Const x a = Const x

type Unit = Const () -- one non-⊥ inhabitant

Tries as associated type synonym

Given these standard definitions, we can eliminate the special-purpose data types used, replacing them with our standard functor combinators:

instance HasTrie () where
  type Trie ()  = Id
  trie   f      = Id (f ())
  untrie (Id v) = const v

instance (HasTrie a, HasTrie b) => HasTrie (Either a b) where
  type Trie (Either a b) = Trie a :*: Trie b
  trie   f           = trie (f . Left) :*: trie (f . Right)
  untrie (ta :*: tb) = untrie ta `either` untrie tb

instance (HasTrie a, HasTrie b) ⇒ HasTrie (a , b) where
  type Trie (a , b) = Trie a :. Trie b
  trie   f      = O (trie (trie . curry f))
  untrie (O tt) = uncurry (untrie . untrie tt)

At first blush, it might appear that we’ve simply moved the data type definitions outside of the instances. However, the extracted functor combinators have other uses, as explored in polytypic programming. I’ll point out some of these uses in the next few blog posts.

Isomorphisms

Many types are isomorphic variations, and so their corresponding tries can share a common representation. For instance, triples are isomorphic to nested pairs:

detrip :: (a,b,c) → ((a,b),c)
detrip (a,b,c) = ((a,b),c)

trip :: ((a,b),c) → (a,b,c)
trip ((a,b),c) = (a,b,c)

A trie for triples can be a a trie for pairs (already defined). The trie and untrie methods then just perform conversions around the corresponding methods on pairs:

instance (HasTrie a, HasTrie b, HasTrie c) ⇒ HasTrie (a,b,c) where
    type Trie (a,b,c) = Trie ((a,b),c)
    trie f = trie (f . trip)
    untrie t = untrie t . detrip

All type isomorphisms can use this same pattern. I don’t think Haskell is sufficiently expressive to capture this pattern within the language, so I’ll resort to a C macro. There are five parameters:

Context: the instance context;
Type: the type whose instance is being defined;
IsoType: the isomorphic type;
toIso: conversion function to IsoType; and
fromIso: conversion function from IsoType.

The macro:

#define HasTrieIsomorph(Context,Type,IsoType,toIso,fromIso)  
instance Context ⇒ HasTrie (Type) where {  
  type Trie (Type) = Trie (IsoType);  
  trie f = trie (f . (fromIso));  
  untrie t = untrie t . (toIso);  
}

Now we can easily define HasTrie instances:

HasTrieIsomorph( (), Bool, Either () ()
               ,  c -> if c then Left () else Right ()
               , either ( () -> True) ( () -> False))

HasTrieIsomorph( (HasTrie a, HasTrie b, HasTrie c), (a,b,c), ((a,b),c)
               , λ (a,b,c) → ((a,b),c), λ ((a,b),c) → (a,b,c))

HasTrieIsomorph( (HasTrie a, HasTrie b, HasTrie c, HasTrie d)
               , (a,b,c,d), ((a,b,c),d)
               , λ (a,b,c,d) → ((a,b,c),d), λ ((a,b,c),d) → (a,b,c,d))

In most (but not all) cases, the first argument (Context) could simply be that the isomorphic type HasTrie, e.g.,

HasTrieIsomorph( HasTrie ((a,b),c), (a,b,c), ((a,b),c)
               , λ (a,b,c) → ((a,b),c), λ ((a,b),c) → (a,b,c))

We could define another macro that captures this pattern and requires one fewer argument. On the other hand, there is merit to keeping the contextual requirements explicit.

Regular data types

A regular data type is one in which the recursive uses are at the same type. Functions over such types are often defined via monomorphic recursion. Data types that do not satisfy this constraint are called “nested“.

As in several recent generic programming systems, regular data types can be encoded generically through a type class that unwraps one level of functor from a type. The regular data type is the fixpoint of that functor. See, e.g., Polytypic programming in Haskell. Adopting the style of A Lightweight Approach to Datatype-Generic Rewriting,

class Functor (PF t) ⇒ Regular t where
  type PF t :: * → *
  wrap   :: PF t t → t
  unwrap :: t → PF t t

Here “PF” stands for “pattern functor”.

The pattern functors can be constructed out of the functor combinators above. For instance, a list at the top level is either empty or a value and a list. Translating this description:

instance Regular [a] where
  type PF [a] = Unit :+: Const a :*: Id

  unwrap []     = InL (Const ())
  unwrap (a:as) = InR (Const a :*: Id as)

  wrap (InL (Const ()))          = []
  wrap (InR (Const a :*: Id as)) = a:as

As another example, consider rose trees ([Data.Tree][]):

data Tree  a = Node a [Tree a]

instance Regular (Tree a) where

  type PF (Tree a) = Const a :*: []

  unwrap (Node a ts) = Const a :*: ts

  wrap (Const a :*: ts) = Node a ts

Regular types allow for even more succinct HasTrie instance implementations. Specialize HasTrieIsomorph further:

#define HasTrieRegular(Context,Type)  
HasTrieIsomorph(Context, Type, PF (Type) (Type) , unwrap, wrap)

For instance, for lists and rose trees:

HasTrieRegular(HasTrie a, [a])
HasTrieRegular(HasTrie a, Tree a)

The HasTrieRegular macro could be specialized even further for single-parameter polymorphic data types:

#define HasTrieRegular1(TypeCon) HasTrieRegular(HasTrie a, TypeCon a)

HasTrieRegular1([])
HasTrieRegular1(Tree)

You might wonder if I’m cheating here, by claiming very simple trie specifications when I’m really just shuffling code around. After all, the complexity removed from HasTrie instances shows up in Regular instances. The win in making this shuffle is that Regular is handy for other purposes, as illustrated in Generic programming with fixed points for mutually recursive datatypes (including fold, unfold, and fmap). (More examples in A Lightweight Approach to Datatype-Generic Rewriting.)

Trouble

Sadly, these elegant trie definitions have a problem. Trying to compile them leads to a error message from GHC. For instance,

Nested type family application
  in the type family application: Trie (PF [a] [a])
(Use -XUndecidableInstances to permit this)

Adding UndecidableInstances silences this error message, but leads to nontermination in the compiler.

Expanding definitions, I can see the likely cause of nontermination. The definition in terms of a type family allows an infinite type to sneak through, and I guess GHC’s type checker is unfolding infinitely.

As a simpler example:

{-# LANGUAGE TypeFamilies, UndecidableInstances #-}

type family List a :: *

type instance List a = Either () (a, List a)

-- Hangs ghc 6.12.1:
nil :: List a
nil = Left ()

A solution

Since GHC’s type-checker cannot handle directly recursive types, perhaps we can use a standard avoidance strategy, namely introducing a newtype or data definition to break the cycle. For instance, as a trie for [a], we got into trouble by using the trie of the unwrapped form of [a], i.e., Trie (PF [a] [a]). So instead,

newtype ListTrie a v = ListTrie (Trie (PF [a] [a]) v)

which is to say

newtype ListTrie a v = ListTrie (PF [a] [a] :→: v)

Now wrap and unwrap as before, and add & remove ListTrie as needed:

instance HasTrie a ⇒ HasTrie [a] where
  type Trie [a] = ListTrie a
  trie f = ListTrie (trie (f . wrap))
  untrie (ListTrie t) = untrie t . unwrap

Again, abstract the boilerplate code into a C macro:

#define HasTrieRegular(Context,Type,TrieType,TrieCon) 
newtype TrieType v = TrieCon (PF (Type) (Type) :→: v); 
instance Context ⇒ HasTrie (Type) where { 
  type Trie (Type) = TrieType; 
  trie f = TrieCon (trie (f . wrap)); 
  untrie (TrieCon t) = untrie t . unwrap; 
}

For instance,

HasTrieRegular(HasTrie a, [a] , ListTrie a, ListTrie)
HasTrieRegular(HasTrie a, Tree, TreeTrie a, TreeTrie)

Again, simplify a bit with a specialization to unary regular types:

#define HasTrieRegular1(TypeCon,TrieCon) 
HasTrieRegular(HasTrie a, TypeCon a, TrieCon a, TrieCon)

And then use the following declarations instead:

HasTrieRegular1([]  , ListTrie)
HasTrieRegular1(Tree, TreeTrie)

Similarly for binary etc as needed.

The second macro parameter (TrieCon) is just a name, which I don’t to be used other than in the macro-generated code. It could be eliminated, if there were a way to gensym the name. Perhaps with Template Haskell?

Conclusion

I like the elegance of constructing memo tries in terms of common functor combinators. Standard pattern functors allow for extremely succinct trie specifications for regular data types. However, these specifications lead to nontermination of the type checker, which can then be avoided by the standard trick of introducing a newtype to break type recursion. As often, this trick brings introduces some clumsiness. Perhaps the problem can also be avoided by using a formulation using bifunctors, as in Design Patterns as Higher-Order Datatype-Generic Programs and Polytypic programming in Haskell, which allows the fixed-point nature of regular data types to be exposed.

Paper: Beautiful differentiation

Conal — Tue, 24 Feb 2009 08:05:10 +0000

I have another paper draft for submission to ICFP 2009. This one is called Beautiful differentiation, The paper is a culmination of the several posts I’ve written on derivatives and automatic differentiation (AD). I’m happy with how the derivation keeps getting simpler. Now I’ve boiled extremely general higher-order AD down to a Functor and Applicative morphism.

I’d love to get some readings and feedback. I’m a bit over the page the limit, so I’ll have to do some trimming before submitting.

The abstract:

Automatic differentiation (AD) is a precise, efficient, and convenient method for computing derivatives of functions. Its implementation can be quite simple even when extended to compute all of the higher-order derivatives as well. The higher-dimensional case has also been tackled, though with extra complexity. This paper develops an implementation of higher-dimensional, higher-order differentiation in the extremely general and elegant setting of calculus on manifolds and derives that implementation from a simple and precise specification.

In order to motivate and discover the implementation, the paper poses the question “What does AD mean, independently of implementation?” An answer arises in the form of naturality of sampling a function and its derivative. Automatic differentiation flows out of this naturality condition, together with the chain rule. Graduating from first-order to higher-order AD corresponds to sampling all derivatives instead of just one. Next, the notion of a derivative is generalized via the notions of vector space and linear maps. The specification of AD adapts to this elegant and very general setting, which even simplifies the development.

You can get the paper and see current errata here.

The submission deadline is March 2, so comments before then are most helpful to me.

Enjoy, and thanks!