Type inference for Haskell, part 15

Posted on January 15, 2019

This is part 15 of the series on type inference for Haskell.

Welcome to the third and final section of this series. The first section covered an algorithm for basic type inference. In the second section, that algorithm was extended with polymorphism, recursive bindings, explicitly typed bindings, kind checking, and pattern matching. This third section will cover the biggest extension, type classes.

From here on out, the code referenced will all come from the Typing Haskell in Haskell paper itself. The code in previous posts attempted to imitate the paper’s code as closely as possible, so many functions there should look familiar. The website for the paper does include a downloadable copy of the code, however it no longer works with modern Haskell compilers (it is nearly 20 years old, after all!). You can find an updated copy here.

In this post, I’ll start with a informally describing type classes, the rules placed on them, and how they behave.

One last note before getting started. There are many extensions to Haskell’s type system, and a lot of them have to do with type classes. Examples include multi-parameter type classes and functional dependencies. The discussion here does not cover these extensions. The language without any extensions offers more than enough complexiy on its own!

Classy code

Remember that polymorphic functions are those that can take on different types in each place they are used. For example:

head :: [a] -> a

is the type of a function that returns the first element of a list. This function is polymorphic because it can be applied to a list of any type. The a stands in for a as-of-yet unknown type.

What about functions that need to do some operation on the unknown type? For example, what about a function that finds the least element in a list?

minOfList [x]    = x
minOfList (x:xs) =
    let mxs = minOfList xs
    in if x < mxs then x else xms

Say the type was

minOfList :: [a] -> a

What would happen if it were given an list of things that cannot be compared with <? I suppose that a runtime error is the best you could do. This is why type classes are useful. They allow code to request that a type support some function without needing to specify what that type is.

The correct type for minOfList is:

minOfList :: (Ord a) => [a] -> a

The (Ord a) part specifies that whatever type gets used in the place of a must have an instance of the Ord class. (Ord provides comparison operations like <.) This part is called a class predicate, or just predicate for short. It’s a logical predicate in the sense that it holds true whenever a implements Ord.

Having (Ord a) => in the type almost looks like an extra argument. If you read about dictionary passing (which is outside of the scope of this series), you’ll see why this is no mistake.

The parts of the puzzle

Depending on how much you’ve used Haskell, there is a good chance you’ve created your own typeclass before. Still, you might find some details in this section you weren’t aware of.

Classes

In order to use a class, it first has to be declared. That declaration includes:

  • A unique class name.
  • A type variable.
  • One or more method type declarations.
  • Zero or more superclasses.

For example, the definition of Ord look roughly like:

(The real Ord definition has more methods, but this is enough for demonstration purposes.)

Let’s break down what is in this definition. This is declaring a class called Ord. It uses the type variable a to stand for some type implementing this class. The (Eq a) => part says that there is one superclass, Eq. This means that all types implementing Ord must also have an implementation of Eq and that the implementations of Ord may implicitly depend on Eq without declaring it. The superclass hierarchy must be acyclic.

After the where keyword, the types of the methods in the class are declared. Note how they involve the type variable a that was declared before. It’s required that the type of each method involve the type variable somewhere (that’s important for deciding which instance is the right one to use). Note that the type variable doesn’t necessarily need to appear in one of the argument positions. For example, this class is silly, but completely valid even though a only appears in the return type:

One detail on the allowed types for the methods: They are allowed to have class predicates on any type variable other than then one that is the instance of the class. For example, this is valid:

But this is not (at least, not without enabling non-standard language features):

It will fail to compile with Constraint 'Show a' in the type of 'someMethod' constrains only the class type variables.

Lastly, it’s OK for those predicates to contain cyclic references to the class. This will compile:

I’ll bet at least something here was a feature of typeclasses you didn’t know of before.

Instances

In order to use a class for anything, you probably want an instance of it. An instance contains three things:

  • Which class this is an instance of.
  • What type this instance is for.
  • Zero or more predicates on that type.
  • An implementation for each of the methods.

For example, one possible instance of the silly class above is:

This is saying that there is an instance of the class ListOf for the type Bool, and provides the logic for implementing the method. Note that you are actually not allowed to declare the type of the listOf function here. The combination of the type in the class declaration along with the type Bool is enough for the compiler to tell what type it is supposed to have.

You can optionally have some predicates on the type that the implementation is for. As an example, look at this class and instance:

This says that in order to use the PrettyPrint class on the Either type, the first type Either is parameterized with must implement Show and the second must implement PrettyPrint. Unlike the superclass hierarchy, this is allowed to contain cyclic references to classes.

Any given class can only be implemented for a given type once. For example, you can’t have two different instance ListOf Bool where declarations in the same module. In fact, you cannot even have the type “overlap” with the type for any other instance. Two types overlap if, even though they might not be identical, they share some types in common. For example, (a, Int) and (Bool, b) overlap because (Bool, Int) is an instance of both of them. This usually isn’t a problem unless you have the FlexibleInstances language feature enabled, because by default (a, Int) and (Bool, b) aren’t valid types for an instance declaration.

The reason those types aren’t valid is because the types are required to be a type constructor (so it can’t be just a type variable) instantiated with zero or more type variables. The problem with both (a, Int) and (Bool, b) is that the type constructor for a 2-tuple (which is spelled (,,) in Haskell) is being instantiated with another type constructor (Int in the first case) and not just type variables. As another example, [a] is a valid type for an instance but [String] is not.

For more details, see Class and instance declarations in the GHC users guide.

How type classes are used

When code uses a function from a typeclasses, the only thing that might change about the code is that its binding (the variable or function declaration) might get some predicates attached to its type.

As always, examples are best. Let’s start with an example that doesn’t use type classes in order to see how it changes when type classes are added. Imagine we had a concrete showInt function that converts integers to strings. I’ll write these functions along with the type that gets inferred for them.

Now let’s generalize that function to any showable input type with the help of the show function from the Show typeclass:

Remarkably little has changed. The [Int] part of the type was generalized to [a], and a predicate (Show a) was added. How did it know that was the right type? It knows that the argument is a list because of the pattern matching in the definition of the function. The type inside the list is left a generic a because nothing specifies the type any more than that. Even the call to show doesn’t force it to be a particular type. That is because show’s type is (Show a) => a -> String. However, the call to show does add a predicate, (Show a). In this case, that predicate just gets bubbled up to the function declaration.

It’s possible to use type classes without a predicate showing up in the function’s type. To see why, look at this really silly function:

This uses show, yet there is no predicate (Show something) in the function’s type. While calling show does still add that predicate, we soon learn that show is only being called on the concrete type Bool. That means the predicate now looks like (Show Bool). Since there is a Show instance for Bool, we know that predicate always holds true. So why bother writing it?

This example also used another type class, Eq by using the == function. That predicate, which became (Eq String), was removed in a similar way.

So, predicates can be kept and added to the function’s type, or they can be removed because they are obviously true. Can anything else happen? It turns out, yes! Predicates can be modified before being added to the function’s type.

Time to look at another silly function.

This uses the Eq type class because it uses the == function. The type of == is (Eq a) => a -> a -> Bool. Here, we’re applying it to lists. That means the type will instantiate to (Eq [t]) => [t] -> [t] -> Bool. Based on this, you might expect the type of areTheSame to be (Eq [t]) -> t -> t -> Bool. After all, there needs to be a way to compare lists of t with ==, not just individual elements of t, right? Why does it only require that elements of t be comparable with ==?

This is where predicates can be reduced using knowledge of what instances of a class exist. In this case, it takes advantage of the fact that there is an instance of Eq for lists, instance (Eq a) => Eq [a] where .... This says that there is an instance of Eq for a list type provided that there is also an instance of Eq for the type inside the list. When trying to reduce the (Eq [t]) predicate on the areTheSame function, it uses that instance to “trade in” (Eq [t]) for (Eq t).

This is a good thing, because Haskell requires that all class predicates reference a type variable, not a concrete type.

There is one more possibility: trying to use a function from a type class might result in an error. For example:

(Remember that in Haskell, list concatenation is done with ++.) The + function is from the Num class and has the type (Num a) => a -> a -> a. Thus, using the + function will add the predicate (Num [Bool]). Unlike Eq, there is no instance of Num for a list of anything. The step where it tries to simplify the predicate to something more like (Num t) fails as a result.

Now, at this point it won’t leave (Num [Bool]) as a predicate on your doesNotCompile function. It already figured out that there is no possible type that can be used in that function, so it’s better to give an error to tell the programmer that this function is broken rather than leave a function that can never be called.

To talk in more general terms about the behavior of type classes, there are a few things that might happen to the class constraints that get added when using a method from a typeclass.

  • The predicates be added to the type directly, like (Show a) on toStrings.
  • They might be reduced before being added to the type.
  • The predicates might even disappear completely.
  • They might result in a type error if it becomes apparent that we are trying to use a type as a class it clearly does not have an implementation of.

This actually isn’t the full set of what can happen. There are two edge cases that I’ll explain in individual posts, ambiguities and deferred predicates. For now, don’t worry about those.


To give a sneak peak of what the next posts cover, here’s one tidbit about how inference for this actually works: type inference for expressions just captures a list of all the predicates it sees and returns them. It doesn’t do any processing to them. It’s only when dealing with a binding that the inference starts checking that they make sense.