Recursion Schemes, Part 4½: Better Living Through Base Functors
2018-01-24
In an effort to publish more than one blog post a year, I’ve decided to write about smaller topics. Today I’m going to talk about the notion of a ‘base functor’, and how the popular recursion-schemes library uses base functors to make recursion schemes more elegant and ergonomic in practice.
Repeating Ourselves
Throughout this series of posts, we’ve seen the pattern of parameterizing our data types in terms of a recursive type variable. In the first installment, we went from this:
data Expr
= Index Expr Expr
| Call Expr [Expr]
| Unary String Expr
| Binary Expr String Expr
| Paren Expr
| Literal Int
deriving (Show, Eq)
to this—the same data type, but of kindIf you’re not familiar with the notion of a ‘kind’, you can think (loosely) about the kind of a data type t
as a measure of how many arguments it takes. Our Expr
type takes no arguments, so its kind is “star”: *
. A data type that takes one argument has kind ‘star to star’, * -> *
. A data type that takes three arguments, like Either
, has kind * -> * -> *
. The high-level description of a kind is ‘the type of a type’, but you can think about them as merely providing information as to the parameters taken, if any, by a data type. (The :k
) directive in GHCi provides information on the kind of any type or typeclass you provide.
* -> *
, with all recursive occurrences of the Expr
type replaced with the type variable a
:
data ExprF a
= Index a a
| Call a [a]
| Unary String a
| Binary a String a
| Paren a
| Literal Int
deriving (Show, Eq, Functor)
We then used the Term
trick, a Y-combinator for the type system, to parameterize ExprF
in terms of itself, yielding a data type equivalent to the original Expr
but that can be mapped over with fmap
and folded with cata
:
newtype Term f = In { out :: f (Term f) }
type Expr = ExprF (Term ExprF)
Similarly, in part 4 we represented the natural numbers with a data type of kind * -> *
:
data Nat a
= Zero
| Next a
deriving Functor
And we represented the Plant
type we grow with another such data type:
data Plant a
= Root a
| Stalk a
| Fork a a a
| Bloom
This is a good and fluent way to build data types. However, it has its disadvantages.
Consider the humble linked list, containing values of type a
: [a]
.
infixr 5 :
data [] a = a : [a]
| []
Certainly cata
, the fundamental right-fold operation, should be able to support folding over this structure: after all, it’s of kind * -> *
. But if we apply Term to this data type, we run into trouble quite quickly: we lose the ability to store elements in a list. The type variable a
can only hold nested Term a
values: there is no place we can actually store list elements. This is, as you can tell, not particularly useful.
Now, of course, we can do our standard song-and-dance routine, converting [a]
into a type where its recursive element is replaced by a new type variable to represent recursive occurrences (the [a]
in the (:)
constructor:)
data ListF a b
= Cons a b
| Nil
deriving (Show, Eq, Functor)
This looks a little different from our prior examples—it’s of kind * -> * -> *
, not * -> *
—but the principle is the same. We add a new type variable, here called b
, to represent recursive occurrences of the List
type. And now we can fold over a list:
listSum :: Num a => Algebra (ListF a) a
Cons a b) = a + b
listSum (Nil = 0 listSum
But this is clumsy. There’s something fundamentally wrong if we have to write a replacement for the venerable []
type. Recursion schemes are meant to be elegant—manually creating alternate implementations for every recursive data type doesn’t strike me as particularly elegant. But I am, thankfully, not alone in feeling this way—the literature contains many ways to work around this. I’ll focus on the solution provided by the recursion-schemes package.
Base Functors to the Rescue
The first thing you see when you open up the recursion-schemes
documentation is the following type family declaration:
This understandably intimidates a number of people, especially given its lack of documentation. But this is less forbidding then it appears, and the added mental overhead is worth it—the presence of the Base
type family is one of the things that sets recursion-schemes
apart from its competition in terms of ease-of-use.
The purpose of the Base
type family is to tie together a standard Haskell data type—such as our original Expr
formulation, or the humble []
list—with its parameterized, “base” representation. Its definition is very concise:
type family Base t :: * -> *
While a full-fledged tutorial on type families is beyond the scope of this post (the best such reference is the GHC wiki), we can think of type families as a way to write functions on the type-level. If we were to declare a type family, and an instance of this family (analogous to an instance of a typeclass):
type family Something t
type instance Something Foo = Bar
then anywhere we encounter the invocation Something Foo
the GHC type system will resolve that to Bar
. While this may seem unnecessary—if you mean Bar
, why not just write Bar
?—it provides us a rich facility with which to associate a type with another.
Look at the definition of Base
again. The kind signature there indicates that, whenever you pass in a concrete type as the variable t
, you will yield a data type parameterized with one additional variable. This corresponds to our experience with ExprF
and List
: Expr
went from kind *
to * -> *
, and [a]
went from kind * -> *
to * -> * -> *
.
The Base
type family doesn’t tell us much on our own. The most illustrative path open to us is to look at an instance declared with Base
.
type instance Base [a] = ListF a
This declares a type family instance. Anywhere we mean to read ListF a
, we can write Base [a]
. This provides important organizational meaning: there is only one sensible parameterized implementation for any recursive type, and thus (thanks to Haskell’s support for type families) there is only one implementation of Base a
for any given type a
.
This isn’t particularly exciting on its own. The real fireworks start here, with the definition (simplified here for pedagogical purposes) of the Recursive
typeclass.
class (Functor (Base t)) => Recursive t where
project :: t -> Base t t
cata :: (Base t a -> a) -> t -> a
The Recursive
typeclass is similar to the Foldable
typeclassAs you can see by the name of its containing module, Data.Functor.Foldable
. This class was originally called Foldable
, but the presence of Foldable
in the standard library made the duplication unsupportable, and it was changed to Recursive
.
. The analogy holds: if we define a Recursive
instance for our data type t
, whether that’s Expr
or [a]
or anything else, we can fold over this type. However, instead of providing a simple fold, we provide two functions: a project
function that takes a type t
and returns the same type but transformed into its Base
form, and a cata
function that, given an algebra from Base t a
to a
(to wit: Base t a -> a
), and an initial t
value over which to fold, yields us a folded a
value.
This is, at first glance, more unwieldy than our formulation of cata
, which only involved a Functor
constraint, and nothing related to Base
functors or a new Recursive
type:
cata :: (Functor f) => Algebra f a -> Term f -> a
= out >>> fmap (cata f) >>> f cata f
But, critically, this formulation of cata
forces us to work with Term List
rather than the simple [a]
. However, Recursive
allows us to work with ordinary data types: rather than provide a Term t
to cata
, we give an ordinary t
. Recursive
instances use the project
function to transform the provided type into a parameterized one, then pass that transformed data type to the cata
function. As such, we can now use cata
to fold over an ordinary list, without having to wrap it in Term
and Cons
values.
sumList :: Num a => [a] -> a
= cata go where
sumList Nil = 0
go Cons a acc) = a + acc go (
Recursive
has further tricks up its sleeve. Thanks to a MINIMAL
pragma, the Recursive
class only needs an implementation of project
to implement Recursive
fully—we still get a universal cata
function for free. The implementation looks like this, if you’re interested:
class Functor (Base t) => Recursive t where
project :: t -> Base t t
cata :: (Base t a -> a) -- ^ a (Base t)-algebra
-> t -- ^ fixed point
-> a -- ^ result
= c where c = f . fmap c . project cata f
Note that the cata
function is given a default definition. If you had some magic data type that admitted a faster cata
than the default implementation, you could override it—however, I struggle to think of a type which would admit a custom cata
.
Contained inside the Recursive
class are other useful folds—para
, the paramorphism, which we discussed in part 3, and ones we haven’t covered yet—the generalized paramorphism gpara
and Fokkinga’s prepromorphism prepro
. (We will discuss these in a future installment).
Note that the Base type is constrained in Recursive instances: t
must have a Base
instance, and the Base
instance for t
must be a Functor
. Since cata
is defined in terms of a recursive invocation with fmap
, we need a useful fmap
function over any Base
instance to have a cata
that typechecks.
Thanks to the Recursive
typeclass, we can deal in simple data types—[a]
rather than ListF
, Expr
rather than ExprF
—while retaining the expressive folding power granted by paramterized data types. This is cool as hell. This technique is used in other libraries, such as José Pedro Magalhães’s regular
.
The implementation of Recursive
for [a]
follows. We convert the empty list []
into Nil
, and replace :
with the Cons
constructor.
instance Recursive [a] where
:xs) = Cons x xs
project (x= Nil project []
Another valuable instance is one for Natural
—as we discussed in the previous installment, we can fold over the natural numbers, bottoming out when we hit zero. We built our own Nat
data type, observing that it was equivalent in definition to Maybe
—recursion-schemes
just uses Maybe
for its Recursive
and Base
instance for Natural
.
type instance Base Natural = Maybe
instance Recursive Natural where
0 = Nothing
project = Just (n - 1) project n
Yet More Concision
As we’ve mentioned before, given a data type t
, the steps for constructing its Base
instance are straightforward: add a new type variable to the definition, and for each data constructor, create a new constructor with all recursive occurrences of t
replaced by the new type variable.
Thanks to the magic of Template Haskell, recursion-schemes
can generate this code for us:
import Data.Functor.Foldable.TH
data Expr
= Index Expr Expr
| Call Expr [Expr]
| Unary String Expr
| Binary Expr String Expr
| Paren Expr
| Literal Lit
deriving (Show, Eq)
'Expr makeBaseFunctor '
The makeBaseFunctor
call generates code equivalent to the following:
data ExprF a
= IndexF a a
| CallF a [a]
| UnaryF String a
| BinaryF a String a
| ParenF a
| LiteralF Lit
deriving (Show, Eq, Functor)
type instance Base Expr = ExprF
instance Recursive Expr where
Index a b) = IndexF a a
project (Call a b) = CallF a a
project (-- and so on and so forth
This is the result of applying the aforementioned algorithm to our Expr
type. To avoid name collisions, constructor names are suffixed with an ‘F’. (Infix operators are suffixed with a $
).
The inclusion of these Template Haskell splices means that you, the programmer, can pick up the recursion-schemes
library with a minimum of fuss and boilerplate. This is, in my experience writing Haskell in production, truly invaluable when dealing with nested data types: you can fold beautifully without setting up dozens of lines of boilerplate.
Reversing the Arrows, Again
In part two of this series, we generated an unfold by ‘reversing the arrows’ in cata
. As you might be able to infer from the title of this section, we can do the same for Recursive
, which yields us a Corecursive
typeclass for unfolds:
class Functor (Base t) => Corecursive t where
embed :: Base t t -> t
ana :: (a -> Base t a) -> a -> t
We’ve already reversed the arrows and generated ana
from cata
. The only thing left to do is reverse the arrows in project
, yielding embed
—rather than going from a t
to a Base
functor, as project
does, we go from a Base
functor to a t
.
As with cata
, ana
is defined in terms of fmap
and embed
:
class Functor (Base t) => Corecursive t where
embed :: Base t t -> t
ana :: (a -> Base t a) -- ^ a (Base t)-coalgebra
-> a -- ^ seed
-> t -- ^ resulting fixed point
= a where a = embed . fmap a . g ana g
Instances for embed
are similarly straightforward:
instance Corecursive [a] where
Cons x xs) = x:xs
embed (Nil = [] embed
In practice, you won’t need to write your own Corecursive
instances, as makeBaseFunctor
creates both Recursive
and Corecursive
instances.
One More Aside
Particularly alert readers will notice that the definition for cata
provided by Kmett in recursion-schemes
is slightly different from ours. Our definition used partial application of cata
in its definition—cata f
, being partially applied to f
, can be passed to fmap
:
cata :: (Functor f) => Algebra f a -> Term f -> a
= out >>> fmap (cata f) >>> f cata f
By contrast, Kmett’s cata
uses a where-clause to capture cata f
with a specific name, c
.
cata :: (Base t a -> a) -> t -> a
= c where c = f . fmap c . project cata f
Both formulations of cata
are defined point-free, but Kmett’s struck me as somewhat unusual— the name c
appears unnecessary, given that you can just pass cata f
to fmap
. It took several years before I inferred the reason behind this—GHC generates more efficient code if you avoid partial applications. Partially-applied functions must carry their arguments along with them, forcing their evaluation process to dredge up the applied arguments and call them when invoking the function. whereas bare functions are much simpler to invoke. (For more of the gory details, you can consult the GHC wiki page on the representation of heap objects).
Bearing this implementation detail in mind, this formulation of cata
is quite elegant: by naming cata f
, we can reference it not as a partially applied function, but as a regular function. Passing that function into fmap
generates more efficient code—usually this kind of microoptimization doesn’t matter much, but given that cata
is invoked on every iteration of a fold, the savings add up.
That’s All
I owe thanks to Edward Kmett, whose recursion-schemes
library is magnificent and inspiring, and to Austin Seipp, who checked my statements about GHC code generation for accuracy.
In part V, we discuss refolds—hylomorphisms and Elgot algebras.