Type inference for Haskell, part 6

Posted on January 6, 2019

This is part 6 of the series on type inference for Haskell.

This post builds on the type structure introduced in part 5 to build the core of the logic of type inference.

(THIH Section 11.3, Expressions)

The function that actually does the type inference is called tiExpr. It takes in an expression and returns its type. It also needs a list of all the variables currently in scope and what their types are. That list is known as the “assumptions” - you “assume” that a variable is defined somewhere in the outer context when dealing with an expression.

Our tiExpr function returns what is known so far about the type of the expression. That type might get refined later as more is learned from other expressions. The information used to further refine the type is hidden in the monad (it’s basically the state monad). Calls to the inference function might optionally update this hidden information. I won’t cover how that information works until the next post, but know for now that it is called the “substitution” and it captures what is know so far about the solution to the constraints.

If there is a type error in the expression, this function will fail. It would also fail if an expression tried to use a variable that has not been defined yet.

The function recursively calls itself for the sub-expressions (assuming there are any) before figuring out the type of the current expression.

I should note that tiExpr will depend on a few more helper functions. Some I’ll talk about in this post, but some will have to wait untill later.

If you expand the Infer type alias in the function definition, you get

tiExpr :: [Assump] -> Expr -> TI Type

Let’s take a look at how the logic rules work for each sort of expression.

Variables

Variables involve the list of assumptions about what variables exist. The process is quite simple. It amounts to “look up the type for the variable and use that.”

The ID part of ID :>: Type is the name of the variable. This is kind of an odd data definition since it uses an infix constructor :>:, but hey, that’s what THIH went with so that’s what I’ll use.

Note that the type it finds isn’t the “finished” type for this variable. In fact, what is looked up is only the first guess at what the type is (the assumptions aren’t updated with what is learned about a variable). Any information learned since the variable was introduced is stored in the monad. The next two posts will cover how that works.

Literals

Literals also involve just a call to a helper function. That helper function is about as direct as you could ask for:

Constructors

The logic to handle constructors is also quite simple - just use the type stored in the definition. That’s because we are assuming that some preprocessing has already been done. When there is a constructor like Just, we expect that some other code has already looked up the definition of that constructor and discovered that the type should be a -> Maybe a. The structure used for storing the syntax of the constructor now includes an assumption containing that type.

The code for looking up the definition and converting it to a type isn’t included here, but it’s not hard to see how it would work. If the definition of the data was something like:

data MyType a b = SomeConstructor Int b (a, a)

The type for SomeConstructor would be

Int -> b -> (a, a) -> MyType a b

In short, each field you need to supply turns into an argument in a function type, and the resulting type is the return type.

Function application

Applying a function is a couple of lines of code, but they aren’t that complicated to understand. The code first starts by finding the type of the two subexpressions. e is the function and f is the argument. Remember that these recursive calls can only give what is known so far about the type of those subexpressions.

The fact that the two subexpressions are used together in a function can increase what we know about both of their types. That’s where the unify function comes in. I won’t cover how it works until a later post, but I will tell you what it does: it figures out how to change the two types it is given to make them become the same. In this case, we’re making the type of the function (te) be the same as a function type that goes from the argument type (tf) to some as-of-yet unknown return type (t). This single request - asking that those two types be made the same - handles all three of the rules for the types of function applications.

Say the type of the function had been a -> a, the type of the argument had been Int, and the return type variable t was _t123. The unify function is what figures out that a must be Int, and thus _t123 must also be Int. Notice how unify doesn’t return anything. That’s because all it does is update the information about the solution that is hidden in the monad.

That t is created with newTVar which simply generates a new type variable name that hasn’t been used yet.

The n in the TI monad is a counter that is incremented each time a new type variable name is generated (s is that hidden state storing the solution).

The result of type inference is then that t, which refers to the return type of the function call. This is just the type variable name that newTVar returned - it doesn’t have what unify learned applied to it. In general, we are trusting that the caller will apply that knowledge when it actually needs it. That saves re-applying the same thing over and over.

Function abstraction

The THIH paper doesn’t directly include syntax for lambda expressions because you can emulate (\x -> e) with let f x = e in f. However, we don’t have function declaration syntax like f x = e just yet, so we’ll add a case for dealing with function abstractions.

The logic is basically:

  • Create a new scope by adding the argument variable name (i in the code) to the current scope. This is done by prepending (i :>: t) to the list of assumptions. The t there is just some new type variable, since we don’t yet know anything about the type of the argument.
  • Find the type (te) of the function body inside that new scope.
  • Construct a function type out of the type of the argument and the type of the body.

Like with function applications, we might know more about the type than what we are returning. In particular, we likely know more about the type of the argument than just the type variable generated by newTVar. That extra information is stored in the monad for now.

Let bindings

Much of the logic of a let binding is moved out to a separate function. Doing this will become more useful later when more features are added to the language since the tiBinding function will become more complicated.

The logic is:

  • Find the type of the first expression. That involves a recursive call to tiExpr. Note that it is using an unmodified copy of the assumptions as here. That’s important because it means that the variable we are introducing isn’t in scope inside its definition, preventing (among other things) recursive function calls.
  • Use that type as the type of the variable when adding it to the scope (which happens in the as' ++ as expression).
  • Use that new scope to find the type of the second expression.
  • That type is the type of the whole let expression.

Putting it together

The type that tiExpr returns for an expression isn’t the final type. To get the final type, we have to apply that knowledge hidden in the monad. The tiProgram function is a wrapper that takes care of that. It gets that knowledge out (getSubst) after inferring the type of the expression and applies it to the type (apply sub t) to find out what the actual resulting type is.

That piece of knowledge is called a substitution.


This code referenced a few functions (unify, getSubst, and apply) that have not been defined yet. We also haven’t talked about what a substitution is and how it works. The next two posts will finish up the algorithm by going over these.