Type inference for Haskell, part 6
This is part 6 of the series on type inference for Haskell.
This post builds on the type structure introduced in part 5 to build the core of the logic of type inference.
(THIH Section 11.3, Expressions)
The function that actually does the type inference is called
tiExpr. It takes in an expression and returns its type. It also needs a list of all the variables currently in scope and what their types are. That list is known as the “assumptions” - you “assume” that a variable is defined somewhere in the outer context when dealing with an expression.
tiExpr function returns what is known so far about the type of the expression. That type might get refined later as more is learned from other expressions. The information used to further refine the type is hidden in the monad (it’s basically the state monad). Calls to the inference function might optionally update this hidden information. I won’t cover how that information works until the next post, but know for now that it is called the “substitution” and it captures what is know so far about the solution to the constraints.
If there is a type error in the expression, this function will fail. It would also fail if an expression tried to use a variable that has not been defined yet.
The function recursively calls itself for the sub-expressions (assuming there are any) before figuring out the type of the current expression.
I should note that
tiExpr will depend on a few more helper functions. Some I’ll talk about in this post, but some will have to wait untill later.
type Infer e t = [Assump] -> e -> TI t tiExpr :: Infer Expr Type tiExpr as (Var i) = find i as tiExpr as (Lit l) = tiLit l tiExpr _ (Const (_ :>: t)) = return t tiExpr as (Ap e f) = do te <- tiExpr as e tf <- tiExpr as f t <- newTVar unify (tf `fn` t) te return t tiExpr as (Abs i e) = do t <- newTVar te <- tiExpr ((i :>: t) : as) e return $ t `fn` te tiExpr as (Let bind e) = do as' <- tiBinding as bind tiExpr (as' ++ as) e
If you expand the
Infer type alias in the function definition, you get
tiExpr :: [Assump] -> Expr -> TI Type
Let’s take a look at how the logic rules work for each sort of expression.
Variables involve the list of assumptions about what variables exist. The process is quite simple. It amounts to “look up the type for the variable and use that.”
ID part of
ID :>: Type is the name of the variable. This is kind of an odd data definition since it uses an infix constructor
:>:, but hey, that’s what THIH went with so that’s what I’ll use.
Note that the type it finds isn’t the “finished” type for this variable. In fact, what is looked up is only the first guess at what the type is (the assumptions aren’t updated with what is learned about a variable). Any information learned since the variable was introduced is stored in the monad. The next two posts will cover how that works.
Literals also involve just a call to a helper function. That helper function is about as direct as you could ask for:
The logic to handle constructors is also quite simple - just use the type stored in the definition. That’s because we are assuming that some preprocessing has already been done. When there is a constructor like
Just, we expect that some other code has already looked up the definition of that constructor and discovered that the type should be
a -> Maybe a. The structure used for storing the syntax of the constructor now includes an assumption containing that type.
The code for looking up the definition and converting it to a type isn’t included here, but it’s not hard to see how it would work. If the definition of the data was something like:
data MyType a b = SomeConstructor Int b (a, a)
The type for
SomeConstructor would be
Int -> b -> (a, a) -> MyType a b
In short, each field you need to supply turns into an argument in a function type, and the resulting type is the return type.
Applying a function is a couple of lines of code, but they aren’t that complicated to understand. The code first starts by finding the type of the two subexpressions.
e is the function and
f is the argument. Remember that these recursive calls can only give what is known so far about the type of those subexpressions.
The fact that the two subexpressions are used together in a function can increase what we know about both of their types. That’s where the
unify function comes in. I won’t cover how it works until a later post, but I will tell you what it does: it figures out how to change the two types it is given to make them become the same. In this case, we’re making the type of the function (
te) be the same as a function type that goes from the argument type (
tf) to some as-of-yet unknown return type (
t). This single request - asking that those two types be made the same - handles all three of the rules for the types of function applications.
Say the type of the function had been
a -> a, the type of the argument had been
Int, and the return type variable
unify function is what figures out that
a must be
Int, and thus
_t123 must also be
Int. Notice how
unify doesn’t return anything. That’s because all it does is update the information about the solution that is hidden in the monad.
t is created with
newTVar which simply generates a new type variable name that hasn’t been used yet.
n in the
TI monad is a counter that is incremented each time a new type variable name is generated (
s is that hidden state storing the solution).
The result of type inference is then that
t, which refers to the return type of the function call. This is just the type variable name that
newTVar returned - it doesn’t have what
unify learned applied to it. In general, we are trusting that the caller will apply that knowledge when it actually needs it. That saves re-applying the same thing over and over.
The THIH paper doesn’t directly include syntax for lambda expressions because you can emulate
(\x -> e) with
let f x = e in f. However, we don’t have function declaration syntax like
f x = e just yet, so we’ll add a case for dealing with function abstractions.
The logic is basically:
- Create a new scope by adding the argument variable name (
iin the code) to the current scope. This is done by prepending
(i :>: t)to the list of assumptions. The
tthere is just some new type variable, since we don’t yet know anything about the type of the argument.
- Find the type (
te) of the function body inside that new scope.
- Construct a function type out of the type of the argument and the type of the body.
Like with function applications, we might know more about the type than what we are returning. In particular, we likely know more about the type of the argument than just the type variable generated by
newTVar. That extra information is stored in the monad for now.
Much of the logic of a let binding is moved out to a separate function. Doing this will become more useful later when more features are added to the language since the
tiBinding function will become more complicated.
The logic is:
- Find the type of the first expression. That involves a recursive call to
tiExpr. Note that it is using an unmodified copy of the assumptions
ashere. That’s important because it means that the variable we are introducing isn’t in scope inside its definition, preventing (among other things) recursive function calls.
- Use that type as the type of the variable when adding it to the scope (which happens in the
as' ++ asexpression).
- Use that new scope to find the type of the second expression.
- That type is the type of the whole let expression.
Putting it together
The type that
tiExpr returns for an expression isn’t the final type. To get the final type, we have to apply that knowledge hidden in the monad. The
tiProgram function is a wrapper that takes care of that. It gets that knowledge out (
getSubst) after inferring the type of the expression and applies it to the type (
apply sub t) to find out what the actual resulting type is.
That piece of knowledge is called a substitution.
This code referenced a few functions (
apply) that have not been defined yet. We also haven’t talked about what a substitution is and how it works. The next two posts will finish up the algorithm by going over these.