## From tries to trees

This post is the last of a series of six relating numbers, vectors, and trees, revolving around the themes of static size-typing and memo tries. We’ve seen that length-typed vectors form a trie for bounded numbers, and can handily represent numbers as well. We’ve also seen that n-dimensional vectors themselves have an elegant trie, which is the n-ary composition of the element type’s trie functor:

``type VTrie n a = Trie a :^ n ``

where for any functor `f` and natural number type `n`,

``f :^ n ≅ f ∘ ⋯ ∘ f  -- (n times)``

This final post in the series places this elegant mechanism of n-ary functor composition into a familiar & useful context, namely trees. Again, type-encoded Peano numbers are central. Just as `BNat` uses these number types to (statically) bound natural numbers (e.g., for a vector index or a numerical digit), and `Vec` uses number types to capture vector length, we'll next use number types to capture tree depth.

Edits:

• 2011-02-02: Changes thanks to comments from Sebastian Fischer
• Added pointer to Memoizing polymorphic functions via unmemoization for derivation of `Tree d a ≅ [d] → a`.
• Fixed signatures for some `Branch` variants, bringing type parameter `a` into parens.
• Clarification about number of `VecTree` vs pairing constructors in remarks on left- vs right-folded trees.
• 2011-02-06: Fixed link to From Fast Exponentiation to Square Matrices.

### Infinite trees

In the post Memoizing polymorphic functions via unmemoization, I played with a number of container types, looking at which ones are tries over what domain types. I referred to these domain types as "index types" for the container type. One such container was a type of infinite binary trees with values at every node:

``data BinTree a = BinTree a (BinTree a) (BinTree a)``

By the usual exponent laws, this `BinTree` functor is (isomorphic to) the functor of tries over a type of binary natural numbers formulated as follows:

``data BinNat = Zero | Even BinNat | Odd BinNat``

As a variation on this `BinTree`, we can replace the two subtrees with a pair of subtrees:

``data BinTree a = BinTree a (Pair (BinTree a))``

Where `Pair` could be defined as

``data Pair a = Pair a a``

or, using functor combinators,

``type Pair = Id × Id``

The reformulation of `BinTree` leads to a slightly different representation for our index type, a little-endian list of bits:

``data BinNat = Zero | NonZero Bool BinNat``

or simply

``type BinNat = [Bool]``

Note that `Bool` is the index type for `Pair` (and conversely, `Pair` is the trie for `Bool`), which suggests that we play this same trick for all index types and their corresponding trie functors. Generalizing,

``data Tree d a = Tree a (d ↛ Tree d a)``

where `k ↛ v` is short for `Trie k v`, and `Trie k` is the trie functor associated with the type `k`. See Elegant memoization with higher-order types.

These generalized trees are indexed by little-endian natural numbers over a "digit" type `d`:

``Tree d a ≅ [d] → a``

which is to say that `Tree d` is a trie for `[d]`. See Memoizing polymorphic functions via unmemoization for a derivation.

Note that all of these number representations have a serious problem, which is that they distinguish between number representations that differ only by leading zeros. The size-typed versions do not have this problem.

### Finite trees

The reason I chose infinite trees was that the finite tree types I knew of have choice-points/alternatives, and so are isomorphic to sums. I don't know of trie-construction techniques that synthesize sums.

Can we design a tree type that is both finite and choice-free? We've already tackled a similar challenge above with lists earlier in previous posts.

In Fixing lists, I wanted to "fix" lists, in the sense of eliminating the choice points in the standard type `[a]` so that the result could be a trie. Doing so led to the type `Vec n a`, which appears to have choice points, due to the two constructors `ZVec` and `(:<)`, but for any given `n`, at most one constructor is applicable. (For this reason, regular algebraic data types are inadequate.) For handy review,

``data Vec ∷ * → * → * where  ZVec ∷                Vec Z     a  (:<) ∷ a → Vec n a → Vec (S n) a``

Let's try the same trick with trees, fixing depth instead length, to get depth-typed binary trees:

``data BinTree ∷ * → * → * where  Leaf   ∷ a                         → BinTree Z     a  Branch ∷ BinTree n a → BinTree n a → BinTree (S n) a``

Again, we can replace the two subtrees with a single pair of subtrees in the `Branch` constructor::

``  Branch ∷ Pair (BinTree n) a → BinTree (S n) a``

Or, recalling that `Bool` is the index type for `Pair`:

``data BinTree ∷ * → * → * where  Leaf   ∷                    a → BinTree Z     a  Branch ∷ (Bool ↛ BinTree n a) → BinTree (S n) a``

The use of `Bool` is rather ad hoc. Its useful property is isomorphism with `1 + 1`, whose corresponding trie functor is `Id + Id`, i.e., `Pair`. In the post Type-bounded numbers, we saw another, more systematic, type isomorphic to `1 + 1`, which is `BNat TwoT` (i.e., `BNat (S (S Z))`), which is the type of natural numbers less than two.

``  Branch ∷ (BNat TwoT ↛ BinTree n a) → BinTree (S n) a``

This replacement suggests a generalization from binary trees to b-ary trees (i.e., having branch factor b).

``data VecTree ∷ * → * → * → * where  Leaf   ∷                        a → VecTree b Z     a  Branch ∷ (BNat b ↛ VecTree b n a) → VecTree b (S n) a``

Recall that the motivation for `BNat b` was as an index type for `Vec b`, which is to say that `Vec b` turned out to be the trie functor for the type `BNat b`. With this relationship in mind, the `Branch` type is equivalent to

``  Branch ∷ Vec b (VecTree b n a) → VecTree b (S n) a``

Unsurprisingly then, a b-ary tree is either a single leaf value or a branch node containing b subtrees. Also, the depth of a leaf is zero, and the depth of a branch node containing b subtrees each of of depth n is n + 1.

We can also generalize this `VecTree` type by replacing `Vec b` with an arbitrary functor `f`:

``data FTree ∷ (* → *) → * → * → * where  Leaf   ∷               a → FTree f Z     a  Branch ∷ f (FTree f n) a → FTree f (S n) atype VecTree b = FTree (Vec b)``

Better yet, introduce an intermediate generalization, using the property that `Vec b ≡ Trie (BNat b)`:

``type TrieTree i = FTree (Trie i)type VecTree b = TrieTree (BNat b)``

With the exception of the most general form (`FTree`), these trees are also tries.

### Generalizing and inverting our trees

The `FTree` type looks very like another data type that came up above, namely right-folded b-ary functor composition:

``data (:^) ∷ (* → *) → * → (* → *) where  ZeroC ∷             a  → (f :^ Z    ) a  SuccC ∷ f ((f :^ n) a) → (f :^ (S n)) a``

These two types are not just similar; they're identical (different only in naming, i.e., α-equivalent), so we can use `f :^ n` in place of `FTree f n`:

``type TrieTree i = (:^) (Trie i)``

Instead of right-folded functor composition, we could go with left-folded. What difference would it make to our notions of b-ary or binary trees?

First look at (right-folded) `BinTree`:

``data BinTree ∷ * → * → * where  Leaf   ∷                  a → BinTree Z     a  Branch ∷ Pair (BinTree n) a → BinTree (S n) a``

Equivalently,

``data BinTree ∷ * → * → * where  Leaf   ∷                         a → BinTree Z     a  Branch ∷ (BNat TwoT ↛ BinTree n a) → BinTree (S n) a``

With left-folding, the `Branch` constructors would be

``  Branch ∷ BinTree n (Pair a) → BinTree (S n) a``

or

``  Branch ∷ BinTree n (BNat TwoT ↛ a) → BinTree (S n) a``

Then (right-folded) `VecTree`:

``data VecTree ∷ * → * → * → * where  Leaf   ∷                        a → VecTree b Z     a  Branch ∷ (BNat b ↛ VecTree b n a) → VecTree b (S n) a``

Equivalently,

``data VecTree ∷ * → * → * → * where  Leaf   ∷                    a  → VecTree b Z     a  Branch ∷ Vec b (VecTree b n a) → VecTree b (S n) a``

With left-folding:

``  Branch ∷ VecTree b n (BNat b ↛ a) → VecTree b (S n) a``

or

``  Branch ∷ VecTree b n (Vec b a) → VecTree b (S n) a``

In shifting from right- to left-folding, our tree structuring becomes inverted. Now a "b-ary" tree really has only one subtree per branch node, not b subtrees.

For instance, right-folded a binary tree of depth two might look like

``Branch (Branch (Leaf 0, Leaf 1), Branch (Leaf 2, Leaf 3))``

For readability, I'm using normal pairs instead of 2-vectors or `Pair` pairs here. In contrast, the corresponding left-folded a binary tree would look like

``Branch (Branch (Leaf ((0,1),(2,3))))``

Note that the `VecTree` constructors are in a linear chain, forming an outer shell, and the number of such constructors is the one more than the depth, and hence logarithmic in the number of leaves. The right-folded form has `VecTree` constructors scattered throughout the tree, and the number of such constructors is exponential in the depth, and hence linear in the number of leaves. (As Sebastian Fischer pointed out, however, the number of pairing constructors is not reduced in the left-folded form.)

For more examples of this sort of inversion, see Chris Okasaki's gem of a paper From Fast Exponentiation to Square Matrices.

### What sort of trees do we have?

I pulled a bit of a bait-and-switch above in reformulating trees. The initial infinite tree type had values and branching at every node:

``data BinTree a = BinTree a (BinTree a) (BinTree a)``

In contrast, the depth-typed trees (whether binary, b-ary, trie-ary, or functor-ary) all have strict separation of leaf nodes from branching nodes.

A conventional, finite binary tree data type might look like

``data BinTree a = Leaf a | Branch (Pair (BinTree a))``

Its inverted (left-folded) form:

``data BinTree a = Leaf a | Branch (BinTree (Pair a))``

When we had depth typing, the right- and left-folded forms were equally expressive. They both described "perfect" trees, with each depth consisting entirely of branching or (for the deepest level) entirely of values. (I'm speaking figuratively for the left-folded case, since literally there is no branching.)

Without depth typing, the expressiveness differs significantly. Right-folded trees can be ragged, with leaves occurring at various depths in the same tree. Left-folded binary trees of depth n can only be perfect, even though the depth is determined dynamically, not statically (i.e., not from type).

Dynamically-depthed binary trees generalize to b-ary, trie-ary, and functor-ary versions. In each case, the left-folded versions are much more statically constrained than their right-folded counterparts.

### From here

This post is the last of a six-part series on tries and static size-typing in the context of numbers, vectors, and trees. Maybe you're curious where these ideas came from and where they might be going.

I got interested in these relationships while noodling over some imperative, data-parallel programs. I asked one one of my standard questions: What elegant beauty is hiding deep beneath these low-level implementation details. In this case, prominent details include array indices, bit fiddling, and power-of-two restrictions, which led me to play with binary numbers. Moreover, parallel algorithms often use a divide-and-conquer strategy. That strategy hints at balanced binary trees, which then can be indexed by binary numbers (bit sequences). Indexing brought memo tries to mind.

I expect to write soon about some ideas & techniques for deriving low-level, side-effecting, parallel algorithms from semantically simple and elegant specifications.

1. #### augustss:

With dependent types you can encode sums as dependent pairs. The first component of the pair is the tag and the second is the data (with a type depending on the first component). I wonder if you can use this encoding for tries.

2. #### Conal Elliott » Blog Archive » Deriving parallel tree scans:

[…] perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The functions initTs and tailTs make unbalanced trees out of balanced ones, so I don’t know […]

3. #### Conal Elliott » Blog Archive » Composable parallel scanning:

[…] perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The difficulty I encounter is that the functions initTs and tailTs make unbalanced trees out of […]

4. #### Conal Elliott » Blog Archive » Parallel speculative addition via memoization:

[…] Or for depth-typed perfect trees (e.g., as described in From tries to trees): […]