From tries to trees

This post is the last of a series of six relating numbers, vectors, and trees, revolving around the themes of static size-typing and memo tries. We’ve seen that length-typed vectors form a trie for bounded numbers, and can handily represent numbers as well. We’ve also seen that n-dimensional vectors themselves have an elegant trie, which is the n-ary composition of the element type’s trie functor:

type VTrie n a = Trie a :^ n 

where for any functor f and natural number type n,

f :^ n  f ∘ ⋯ ∘ f  -- (n times)

This final post in the series places this elegant mechanism of n-ary functor composition into a familiar & useful context, namely trees. Again, type-encoded Peano numbers are central. Just as BNat uses these number types to (statically) bound natural numbers (e.g., for a vector index or a numerical digit), and Vec uses number types to capture vector length, we'll next use number types to capture tree depth.

Edits:

  • 2011-02-02: Changes thanks to comments from Sebastian Fischer
    • Added note about number representations and leading zeros (without size-typing).
    • Added pointer to Memoizing polymorphic functions via unmemoization for derivation of Tree d a ≅ [d] → a.
    • Fixed signatures for some Branch variants, bringing type parameter a into parens.
    • Clarification about number of VecTree vs pairing constructors in remarks on left- vs right-folded trees.
  • 2011-02-06: Fixed link to From Fast Exponentiation to Square Matrices.

Infinite trees

In the post Memoizing polymorphic functions via unmemoization, I played with a number of container types, looking at which ones are tries over what domain types. I referred to these domain types as "index types" for the container type. One such container was a type of infinite binary trees with values at every node:

data BinTree a = BinTree a (BinTree a) (BinTree a)

By the usual exponent laws, this BinTree functor is (isomorphic to) the functor of tries over a type of binary natural numbers formulated as follows:

data BinNat = Zero | Even BinNat | Odd BinNat

As a variation on this BinTree, we can replace the two subtrees with a pair of subtrees:

data BinTree a = BinTree a (Pair (BinTree a))

Where Pair could be defined as

data Pair a = Pair a a

or, using functor combinators,

type Pair = Id × Id

The reformulation of BinTree leads to a slightly different representation for our index type, a little-endian list of bits:

data BinNat = Zero | NonZero Bool BinNat

or simply

type BinNat = [Bool]

Note that Bool is the index type for Pair (and conversely, Pair is the trie for Bool), which suggests that we play this same trick for all index types and their corresponding trie functors. Generalizing,

data Tree d a = Tree a (d ↛ Tree d a)

where k ↛ v is short for Trie k v, and Trie k is the trie functor associated with the type k. See Elegant memoization with higher-order types.

These generalized trees are indexed by little-endian natural numbers over a "digit" type d:

Tree d a  [d] → a

which is to say that Tree d is a trie for [d]. See Memoizing polymorphic functions via unmemoization for a derivation.

Note that all of these number representations have a serious problem, which is that they distinguish between number representations that differ only by leading zeros. The size-typed versions do not have this problem.

Finite trees

The reason I chose infinite trees was that the finite tree types I knew of have choice-points/alternatives, and so are isomorphic to sums. I don't know of trie-construction techniques that synthesize sums.

Can we design a tree type that is both finite and choice-free? We've already tackled a similar challenge above with lists earlier in previous posts.

In Fixing lists, I wanted to "fix" lists, in the sense of eliminating the choice points in the standard type [a] so that the result could be a trie. Doing so led to the type Vec n a, which appears to have choice points, due to the two constructors ZVec and (:<), but for any given n, at most one constructor is applicable. (For this reason, regular algebraic data types are inadequate.) For handy review,

data Vec  *** where
ZVec Vec Z a
(:<) a → Vec n a → Vec (S n) a

Let's try the same trick with trees, fixing depth instead length, to get depth-typed binary trees:

data BinTree  *** where
Leaf a → BinTree Z a
Branch BinTree n a → BinTree n a → BinTree (S n) a

Again, we can replace the two subtrees with a single pair of subtrees in the Branch constructor::

  Branch  Pair (BinTree n) a → BinTree (S n) a

Or, recalling that Bool is the index type for Pair:

data BinTree  *** where
Leaf a → BinTree Z a
Branch (BoolBinTree n a) → BinTree (S n) a

The use of Bool is rather ad hoc. Its useful property is isomorphism with 1 + 1, whose corresponding trie functor is Id + Id, i.e., Pair. In the post Type-bounded numbers, we saw another, more systematic, type isomorphic to 1 + 1, which is BNat TwoT (i.e., BNat (S (S Z))), which is the type of natural numbers less than two.

  Branch  (BNat TwoTBinTree n a) → BinTree (S n) a

This replacement suggests a generalization from binary trees to b-ary trees (i.e., having branch factor b).

data VecTree  **** where
Leaf a → VecTree b Z a
Branch (BNat b ↛ VecTree b n a) → VecTree b (S n) a

Recall that the motivation for BNat b was as an index type for Vec b, which is to say that Vec b turned out to be the trie functor for the type BNat b. With this relationship in mind, the Branch type is equivalent to

  Branch  Vec b (VecTree b n a) → VecTree b (S n) a

Unsurprisingly then, a b-ary tree is either a single leaf value or a branch node containing b subtrees. Also, the depth of a leaf is zero, and the depth of a branch node containing b subtrees each of of depth n is n + 1.

We can also generalize this VecTree type by replacing Vec b with an arbitrary functor f:

data FTree  (**) → *** where
Leaf a → FTree f Z a
Branch f (FTree f n) a → FTree f (S n) a

type VecTree b = FTree (Vec b)

Better yet, introduce an intermediate generalization, using the property that Vec b ≡ Trie (BNat b):

type TrieTree i = FTree (Trie i)

type VecTree b = TrieTree (BNat b)

With the exception of the most general form (FTree), these trees are also tries.

Generalizing and inverting our trees

The FTree type looks very like another data type that came up above, namely right-folded b-ary functor composition:

data (:^)  (**) → * → (**) where
ZeroC a → (f :^ Z ) a
SuccC f ((f :^ n) a) → (f :^ (S n)) a

These two types are not just similar; they're identical (different only in naming, i.e., α-equivalent), so we can use f :^ n in place of FTree f n:

type TrieTree i = (:^) (Trie i)

Instead of right-folded functor composition, we could go with left-folded. What difference would it make to our notions of b-ary or binary trees?

First look at (right-folded) BinTree:

data BinTree  *** where
Leaf a → BinTree Z a
Branch Pair (BinTree n) a → BinTree (S n) a

Equivalently,

data BinTree  *** where
Leaf a → BinTree Z a
Branch (BNat TwoTBinTree n a) → BinTree (S n) a

With left-folding, the Branch constructors would be

  Branch  BinTree n (Pair a) → BinTree (S n) a

or

  Branch  BinTree n (BNat TwoT ↛ a) → BinTree (S n) a

Then (right-folded) VecTree:

data VecTree  **** where
Leaf a → VecTree b Z a
Branch (BNat b ↛ VecTree b n a) → VecTree b (S n) a

Equivalently,

data VecTree  **** where
Leaf a → VecTree b Z a
Branch Vec b (VecTree b n a) → VecTree b (S n) a

With left-folding:

  Branch  VecTree b n (BNat b ↛ a) → VecTree b (S n) a

or

  Branch  VecTree b n (Vec b a) → VecTree b (S n) a

In shifting from right- to left-folding, our tree structuring becomes inverted. Now a "b-ary" tree really has only one subtree per branch node, not b subtrees.

For instance, right-folded a binary tree of depth two might look like

Branch (Branch (Leaf 0, Leaf 1), Branch (Leaf 2, Leaf 3))

For readability, I'm using normal pairs instead of 2-vectors or Pair pairs here. In contrast, the corresponding left-folded a binary tree would look like

Branch (Branch (Leaf ((0,1),(2,3))))

Note that the VecTree constructors are in a linear chain, forming an outer shell, and the number of such constructors is the one more than the depth, and hence logarithmic in the number of leaves. The right-folded form has VecTree constructors scattered throughout the tree, and the number of such constructors is exponential in the depth, and hence linear in the number of leaves. (As Sebastian Fischer pointed out, however, the number of pairing constructors is not reduced in the left-folded form.)

For more examples of this sort of inversion, see Chris Okasaki's gem of a paper From Fast Exponentiation to Square Matrices.

What sort of trees do we have?

I pulled a bit of a bait-and-switch above in reformulating trees. The initial infinite tree type had values and branching at every node:

data BinTree a = BinTree a (BinTree a) (BinTree a)

In contrast, the depth-typed trees (whether binary, b-ary, trie-ary, or functor-ary) all have strict separation of leaf nodes from branching nodes.

A conventional, finite binary tree data type might look like

data BinTree a = Leaf a | Branch (Pair (BinTree a))

Its inverted (left-folded) form:

data BinTree a = Leaf a | Branch (BinTree (Pair a))

When we had depth typing, the right- and left-folded forms were equally expressive. They both described "perfect" trees, with each depth consisting entirely of branching or (for the deepest level) entirely of values. (I'm speaking figuratively for the left-folded case, since literally there is no branching.)

Without depth typing, the expressiveness differs significantly. Right-folded trees can be ragged, with leaves occurring at various depths in the same tree. Left-folded binary trees of depth n can only be perfect, even though the depth is determined dynamically, not statically (i.e., not from type).

Dynamically-depthed binary trees generalize to b-ary, trie-ary, and functor-ary versions. In each case, the left-folded versions are much more statically constrained than their right-folded counterparts.

From here

This post is the last of a six-part series on tries and static size-typing in the context of numbers, vectors, and trees. Maybe you're curious where these ideas came from and where they might be going.

I got interested in these relationships while noodling over some imperative, data-parallel programs. I asked one one of my standard questions: What elegant beauty is hiding deep beneath these low-level implementation details. In this case, prominent details include array indices, bit fiddling, and power-of-two restrictions, which led me to play with binary numbers. Moreover, parallel algorithms often use a divide-and-conquer strategy. That strategy hints at balanced binary trees, which then can be indexed by binary numbers (bit sequences). Indexing brought memo tries to mind.

I expect to write soon about some ideas & techniques for deriving low-level, side-effecting, parallel algorithms from semantically simple and elegant specifications.

4 Comments

  1. augustss:

    With dependent types you can encode sums as dependent pairs. The first component of the pair is the tag and the second is the data (with a type depending on the first component). I wonder if you can use this encoding for tries.

  2. Conal Elliott » Blog Archive » Deriving parallel tree scans:

    […] perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The functions initTs and tailTs make unbalanced trees out of balanced ones, so I don’t know […]

  3. Conal Elliott » Blog Archive » Composable parallel scanning:

    […] perfectly balanced trees, of the sort I played with in A trie for length-typed vectors and From tries to trees. The difficulty I encounter is that the functions initTs and tailTs make unbalanced trees out of […]

  4. Conal Elliott » Blog Archive » Parallel speculative addition via memoization:

    […] Or for depth-typed perfect trees (e.g., as described in From tries to trees): […]

Leave a comment