Type inference for Haskell, part 5
This is part 5 of the series on type inference for Haskell.
Now it’s time to start implementing the logic for type inference. The first step
is to define the data structures that represent types. How do you store a type
like Int -> a -> [a] in a data structure? That’s what we’ll tackle in this
post.
There are a few different forms a type can come in. One is a concrete type name,
like Int or Maybe. Here’s how that will get represented:
type Id = String
data Tycon
= Tycon Id
deriving (Eq, Ord, Show)Another form of types is type variables. These are types like a. They have a
very similar structure:
data Tyvar
= Tyvar Id
deriving (Eq, Ord, Show)Those are the two base forms of types. Now we need a way to put types together
into bigger types. How do you make a type like Maybe Int? It seems to be made
out of two type constructors, Maybe and Int, but how do you put them
together? This is represented as “applying” one type to another (analogous to
how you apply a function to a value). The type Maybe is applied to the type
Int to produce the type Maybe Int.
You can recursively apply types to other applied types. For example, you could
(if you wanted) apply Maybe again to Maybe Int resulting in the type Maybe (Maybe Int). The left-hand side can also be a type application. For example,
Int can be applied to Either String to get (Either String) Int (more
commonly written as just Either String Int). Just like functions, types are
applied a single argument at a time.
To be able to describe type applications, we need some unified structure that can refer to itself (for type applications) and to the other two structures:
data Type
= TVar Tyvar
| TCon Tycon
| TAp Type Type
deriving (Eq, Ord, Show)Going back to the first example, Int -> a -> [a], it might seem like some
things are missing. How do you deal with function type arrows -> and list
types like [a]?
It turns out that [a] is just a special-case bit of syntax. All it really
means is the application of the list type to the type variable a. It just
means the list type (which is spelled []) applied to a. Though weird, this
is perfectly valid Haskell:
bar :: [] Int
bar = [123]It’s the same story for function types. The -> infix operator is just another
type that is applied to two arguments. You don’t actually have to use it as an
infix operator. Again, you can write this in a style that is closer to how the
types are actually represented:
foo :: ((->) a) a
foo x = xLet’s see what it looks like in practice to try to write a type as this data
structure. How is Int -> a -> [a] represented as data?
First, remember the grouping of function arrows: Int -> (a -> [a]). Next,
let’s convert those function arrows and list syntax to the prefix form:
((->) Int) (((->) a) ([] a))
That type is now expressed entirely in the three forms defined above: constructors, variables, and applications. This means that it can be written using that data structure:
TAp (TAp (TCon $ Tycon "->") (TCon $ Tycon "Int"))
(TAp (TAp (TCon $ Tycon "->") (TVar $ Tyvar "a"))
(TAp (TCon $ Tycon "[]") (TVar $ Tyvar "a")))
This structure is quite verbose, but it ends up being easy to work with in algorithms. Be glad that you don’t have to write types this way.
Here’s a handful of definitions of types and functions for building up types:
tUnit = TCon (Tycon "()")
tChar = TCon (Tycon "Char")
tInt = TCon (Tycon "Int")
tInteger = TCon (Tycon "Integer")
tFloat = TCon (Tycon "Float")
tDouble = TCon (Tycon "Double")
tList = TCon (Tycon "[]")
tArrow = TCon (Tycon "(->)")
tTuple2 = TCon (Tycon "(,)")
tString :: Type
tString = list tChar
infixr 4 `fn`
fn :: Type -> Type -> Type
a `fn` b = TAp (TAp tArrow a) b
list :: Type -> Type
list t = TAp tList t
pair :: Type -> Type -> Type
pair a b = TAp (TAp tTuple2 a) bNote that fn is infix just like the -> operator.
Narrowing types
Some of the previous posts have casually mentioned “narrowing” or “restricting” a type without actually defining what that means.
When you restrict a type, you make it refer to a subset of the possible values
it did before. Note that both sets might be infinite, but it is still possible
to say with certainty that one is a subset of the other. For example, the type
[Int] describes a subset (an infinite one, no less) of the possible values
that [a] describes. That’s because any value of type [Int] can be a [a]
and the values of type [Bool] could also be of type [a].
Can you change a type like Int to restrict it to a subset of its prior values?
No, because there is no type that describes just a subset of the values that
Int describes. What about a -> b? Yes, you can by replacing b with Int
getting a -> Int. Another way to restrict it is (c -> c) -> b. Yet another
is a -> (Bool -> d).
In general, you can restrict a type by replacing some of the type variables with other things. Those things might also include type variables.
What if you restrict a -> b by replacing b with c? Well, does a -> c
describe a subset of values that a -> b does? Yes, but it’s an improper subset
because they describe exactly the same set of values. You can always rename the
type variables to a unused name without changing the meaning of the type.
If you had replaced b with a getting a -> a, that would have restricted
the type. Types like Int -> String describe a subset of the values that a -> b does, but those values aren’t in the set described by a -> a (since a
can’t be both Int and String).
The next post will use this structure to implement the type inference rules.