Type inference for Haskell, part 1

Posted on January 1, 2019

This is the first post in a series explaining how type inference for Haskell works. There is a wonderful paper Typing Haskell in Haskell that provides a full description of Haskell’s type system and code for a type inference engine. That paper has a lot of information that is hard to find elsewhere, but it can be hard to follow as someone new to type inference.

The aim of this series is to accompany Typing Haskell in Haskell (henceforth abbreviated as THIH) and complement it with examples and more beginner-friendly explanations. Where possible, the examples will help demonstrate how you can see the impact of each algorithm in samples of Haskell code. This series also rearranges the material presented in THIH to build up to Haskell’s type system by adding features to a simpler type system.

This will not cover anything related to reasoning about the soundness of the type system or other formal properties. If you are interested in that side of things, a good starting point might be the Software Foundations books.

Audience

This series is aimed at someone who knows how to program in Haskell and is interested in learning more about how it works. If you are thinking about implementing your own language with type inference, I hope this series will provide a good starting point. Even if you are not interested in implementing a programming language, you’ll almost certainly learn something you didn’t know about Haskell while reading this. I sure did while doing the research for writing this.

Haskell is both the language for which types are inferred and the language the inference engine is written in. A fairly solid understanding of Haskell will be very helpful for reading this series. As a quick self-test, if you know what this type signature means, you are probably fine.

(Show a) => [a] -> String

If not, I suggest reading Learn You a Haskell for Great Good! before coming back to this. For later sections you’ll want to have an idea of how to use typeclasses, but you can understand the earlier sections without that.

Layout

Haskell’s type system is rich (read: complicated) and type inference for it is no simple feat. Tackling the whole thing at once would be quite daunting. The way to hike a mountain is one step at a time, and that’s the approach taken here. It’s much easier to understand how each feature works if they are dealt with individually, building atop the previous features.

This series of posts will start by explaining a small type inference system in the first part. That will infer types for a language that is a subset of Haskell. Once that is squared away, it becomes a foundation to build upon. The next part will then add a number of extensions to this small type inference system. These will build it up into quite a usable type system. The only thing that will be left to do is to add type classes. The third and final part covers how these work.

The first part will include the complete code for the small type inference system, and the next part will describe what code changes are necessary to add each new feature. The third part will reference the code from THIH. It won’t include every single line of code from the paper, but the stuff left out is just boilerplate. Any important code will be discussed. (And besides, the full code is available online.)

Some Background Knowledge

This post may be just an introduction, but that doesn’t mean you won’t learn anything from it! There are two details of Haskell that are important to know before moving on.

Single-argument functions

This is a bit of trivia about Haskell that even some experienced users may not necessarily be aware of. It turns out to be important both for how the type system works and in terms of what code Haskell allows you to write.

All functions in Haskell take a single argument.

“But,” you might say, “that cannot be right. I’ve declared functions like this:”

myFn x y z = ...

True, but remember that an expression like myFn 123 partially applies that function. (This is called currying.) The expression myFn 123 evaluates to another function which can later by applied to the rest of the arguments. Since it works to call myFn 123 with a single argument, it must be a single-argument function.

When you call it with three arguments like myFn 123 'x' False, this actually creates three function calls (the first two of which return another function):

((myFn 123) 'x') False

Look also at the type signature: myFn :: a -> b -> c -> d. The infix arrows group like this: myFn :: a -> (b -> (c -> d)). That means that when you apply the function to a value of type a, the result is another function of type b -> (c -> d).

Function bindings

A “binding” is a another way of saying a declaration. Binding is a commonly used term in other literature, so this series will use it, too.

You can think of the binding myFn x y z = ... as syntactic sugar for binding a lambda function to a variable:

myFn = (\x -> (\y -> (\z -> ...)))

(In Haskell syntax, \x -> y is a function with argument x that returns y.)

For now, the type inference algorithm will expect the function to be written in this form. This is really just to let the type inference algorithm use the same code for function bindings as it uses for lambda functions anywhere else in an expression (such as (\x -> bar) foo).

Near the end of this series, we’ll see a case¹ where Haskell actually treats the two definitions differently, but that is a language design decision. They could have been exactly equivalent if the designers of haskell had thought it was a good idea.

That concludes the first post in the series. There will be on the order of 20 posts of a similar size in this whole series. By the end, you’ll have a pretty good idea of how to actually implement a type inference engine for a language like Haskell.

The next post will start by looking at the syntactic forms for the smaller language (a subset of Haskell) that we are initially going to be doing type inference for.

If you want to look ahead, try comparing plus x y = x + y vs. plus = (\x y -> x + y). Google “monomorphism restriction” to see why.↩