From tries to trees

This post is the last of a series of six relating numbers, vectors, and trees, revolving around the themes of static size-typing and memo tries. We’ve seen that length-typed vectors form a trie for bounded numbers, and can handily represent numbers as well. We’ve also seen that n-dimensional vectors themselves have an elegant trie, which is the n-ary composition of the element type’s trie functor:

type VTrie n a = Trie a :^ n 

where for any functor f and natural number type n,

f :^ n  f ∘ ⋯ ∘ f  -- (n times)

This final post in the series places this elegant mechanism of n-ary functor composition into a familiar & useful context, namely trees. Again, type-encoded Peano numbers are central. Just as BNat uses these number types to (statically) bound natural numbers (e.g., for a vector index or a numerical digit), and Vec uses number types to capture vector length, we’ll next use number types to capture tree depth.

Edits:

  • 2011-02-02: Changes thanks to comments from Sebastian Fischer
    • Added note about number representations and leading zeros (without size-typing).
    • Added pointer to Memoizing polymorphic functions via unmemoization for derivation of Tree d a ≅ [d] → a.
    • Fixed signatures for some Branch variants, bringing type parameter a into parens.
    • Clarification about number of VecTree vs pairing constructors in remarks on left- vs right-folded trees.
  • 2011-02-06: Fixed link to From Fast Exponentiation to Square Matrices.

Continue reading ‘From tries to trees’ »

A trie for length-typed vectors

As you might have noticed, I’ve been thinking and writing about memo tries lately. I don’t mean to; they just keep coming up.

Memoization is the conversion of functions to data structures. A simple, elegant, and purely functional form of memoization comes from applying three common type isomorphisms, which also correspond to three laws of exponents, familiar from high school math, as noted by Ralf Hinze in his paper Generalizing Generalized Tries.

In Haskell, one can neatly formulate memo tries via an associated functor, Trie, with a convenient synonym "k ↛ v" for Trie k v, as in Elegant memoization with higher-order types. (Note that I’ve changed my pretty-printing from "k :→: v" to "k ↛ v".) The key property is that the data structure encodes (is isomorphic to) a function, i.e.,

k ↛ a  k → a

In most cases, we ignore non-strictness, though there is a delightful solution for memoizing non-strict functions correctly.

My previous four posts explored use of types to statically bound numbers and to determine lengths of vectors.

Just as (infinite-only) streams are the natural trie for unary natural numbers, we saw in Reverse-engineering length-typed vectors that length-typed vectors (one-dimensional arrays) are the natural trie for statically bounded natural numbers.

BNat n ↛ a ≡ Vec n a

and so

BNat n → a  Vec n a

In retrospect, this relationship is completely unsurprising, since a vector of length n is a collection of values, indexed by 0, . . . , n - 1.

In that same post, I noted that vectors are not only a trie for bounded numbers, but when the elements are also bounded numbers, the vectors can also be thought of as numbers. Both the number of digits and the number base are captured statically, in types:

type Digits n b = Vec n (BNat b)

The type parameters n and b here are type-encodigs of unary numbers, i.e., built up from zero and successor (Z and S). For instance, when b ≡ S (S Z), we have n-bit binary numbers.

In this new post, I look at another question of tries and vectors. Given that Vec n is the trie for BNat n, is there also a trie for Vec n?

Edits:

  • 2011-01-31: Switched trie notation to "k ↛ v" to avoid missing character on iPad.

Continue reading ‘A trie for length-typed vectors’ »

Fixing lists

In the post Memoizing polymorphic functions via unmemoization, I toyed with the idea of lists as tries. I don’t think [a] is a trie, simply because [a] is a sum type (being either nil or a cons), while tries are built out of the identity, product, and composition functors. In contrast, Stream is a trie, being built solely with the identity and product functors. Moreover, Stream is not just any old trie, it is the trie that corresponds to Peano (unary natural) numbers, i.e., Stream a ≅ N → a, where

data N = Zero | Succ N

data Stream a = Cons a (Stream a)

If we didn’t already know the Stream type, we would derive it systematically from N, using standard isomorphisms.

Stream is a trie (over unary numbers), thanks to it having no choice points, i.e., no sums in its construction. However, streams are infinite-only, which is not always what we want. In contrast, lists can be finite, but are not a trie in any sense I understand. In this post, I look at how to fix lists, so they can be finite and yet be a trie, thanks to having no choice points (sums)?

You can find the code for this post and the previous one in a code repository.

Edits:

Continue reading ‘Fixing lists’ »

Fixing broken isomorphisms — details for non-strict memoization, part 2

The post Details for non-strict memoization, part 1 works out a systematic way of doing non-strict memoization, i.e., correct memoization of non-strict (and more broadly, non-hyper-strict) functions. As I mentioned at the end, there was an awkward aspect, which is that the purported “isomorphisms” used for regular types are not quite isomorphisms.

For instance, functions from triples are memoized by converting to and from nested pairs:

untriple ∷ (a,b,c) -> ((a,b),c)
untriple (a,b,c) = ((a,b),c)
 
triple ∷ ((a,b),c) -> (a,b,c)
triple ((a,b),c) = (a,b,c)

Then untriple and triple form an embedding/projection pair, i.e.,

triple ∘ untriple ≡ id
untriple ∘ triple ⊑ id

The reason for the inequality is that the nested-pair form permits (⊥,c), which does not correspond to any triple.

untriple (triple (,c)) ≡ untriple ⊥ ≡ ⊥

Can we patch this problem by simply using an irrefutable (lazy) pattern in the definition of triple, i.e., triple (~(a,b),c) = (a,b,c)? Let’s try:

untriple (triple (,c)) ≡ untriple (,,c)((,),c)

So isomorphism fails and so does even the embedding/projection property.

Similarly, to deal with regular algebraic data types, I used a class that describes regular data types as repeated applications of a single, associated pattern functor (following A Lightweight Approach to Datatype-Generic Rewriting):

class Functor (PF t) ⇒ Regular t where
  type PF t ∷ **
  unwrap ∷ t → PF t t
  wrap   ∷ PF t t → t

Here unwrap converts a value into its pattern functor form, and wrap converts back. For example, here is the Regular instance I had used for lists:

instance Regular [a] where
  type PF [a] = Const () :+: Const a :*: Id
 
  unwrap []     = InL (Const ())
  unwrap (a:as) = InR (Const a :*: Id as)
 
  wrap (InL (Const ()))          = []
  wrap (InR (Const a :*: Id as)) = a:as

Again, we have an embedding/projection pair, rather than a genuine isomorphism:

wrap ∘ unwrap ≡ id
unwrap ∘ wrap ⊑ id

The inequality comes from ⊥ values occurring in PF [a] [a] at type Const () [a], (), (Const a :*: Id) [a], Const a [a], or Id [a].

Continue reading ‘Fixing broken isomorphisms — details for non-strict memoization, part 2’ »

Details for non-strict memoization, part 1

Memoizing higher-order functions

Memoization incrementally converts functions into data structures. It pays off when a function is repeatedly applied to the same arguments and applying the function is more expensive than accessing the corresponding data structure.

In lazy functional memoization, the conversion from function to data structure happens all at once from a denotational perspective, and incrementally from an operational perspective. See Elegant memoization with functional memo tries and Elegant memoization with higher-order types.

As Ralf Hinze presented in Generalizing Generalized Tries, trie-based memoization follows from three simple isomorphisms involving functions types:

1 → a ≅ a

(a + b) → c ≅ (a → c) × (b → c)

(a × b) → c ≅ a → (b → c)

which correspond to the familiar laws of exponents

a ^ 1 = a

ca + b = ca × cb

ca × b = (cb)a

When applied as a transformation from left to right, each law simplifies the domain part of a function type. Repeated application of the rules then eliminate all function types or reduce them to functions of atomic types. These atomic domains are eliminated as well by additional mappings, such as between a natural number and a list of bits (as in patricia trees). Algebraic data types corresponding to sums of products and so are eliminated by the sum and product rules. Recursive algebraic data types (lists, trees, etc) give rise to correspondingly recursive trie types.

So, with a few simple and familiar rules, we can memoize functions over an infinite variety of common types. Have we missed any?

Yes. What about functions over functions?

Edits:

  • 2010-07-22: Made the memoization example polymorphic and switched from pairs to lists. The old example accidentally coincided with a specialized version of trie itself.
  • 2011-02-27: updated some notation

Continue reading ‘Memoizing higher-order functions’ »

Elegant memoization with higher-order types

A while back, I got interested in functional memoization, especially after seeing some code from Spencer Janssen using the essential idea of Ralf Hinze’s paper Generalizing Generalized Tries. The blog post Elegant memoization with functional memo tries describes a library, MemoTrie, based on both of these sources, and using associated data types. I would have rather used associated type synonyms and standard types, but I couldn’t see how to get the details to work out. Recently, while playing with functor combinators, I realized that they might work for memoization, which they do quite nicely.

This blog post shows how functor combinators lead to an even more elegant formulation of functional memoization. The code is available as part of the functor-combo package.

The techniques in this post are not so much new as they are ones that have recently been sinking in for me. See Generalizing Generalized Tries, as well as Generic programming with fixed points for mutually recursive datatypes.

Edits:

  • 2011-01-28: Fixed small typo: “b^^a^^” ⟼ “ba
  • 2010-09-10: Corrected Const definition to use newtype instead of data.
  • 2010-09-10: Added missing Unit type definition (as Const ()).

Continue reading ‘Elegant memoization with higher-order types’ »