Type inference for Haskell, part 12

Posted on January 12, 2019

This is part 12 of the series on type inference for Haskell.

This post will take a look at how to deal with functions (or variables) where the user declared a type. Bindings with a type siguature are called “explicitly typed” bindings. As seen in the previous post, it is sometimes necessary to declare a type in order to get the type you want for a recursive function.

Declaring a type can also be used to restrict the type of a binding to be a less general type than what type inference would have picked. A programmer might find that ability useful for documenting the indent of a function or to help catch bugs. For example, type inference would have picked the type [a] -> [a] for this function:

dupString :: String -> String
dupString s = s ++ s

(THIH Section 11.6.1, Explicitly Typed Bindings)

There are a few issues to consider when dealing with explicitly typed bindings. For instance, what if a user declares the wrong type? If the given type is one that the binding cannot possibly have, that should result in a type error. This means that there needs to be some code to check if the type given by the user actually makes sense.

You also have to think about how explicitly typed bindings interact with implicitly typed bindings. Which order do you process the bindings in? Let’s look at an example with cyclic dependencies (a references b, which references c, which references a again).

a x = [b x]

b :: a -> a
b y = let foo = c 'c' in y

c z = "foo" ++ a z

If b wasn’t explicitly typed, these three definitions would constitute a single binding group. Due to the restriction on the types of things in binding groups, a would have the type Char -> [Char] and b would have the type Char -> Char. (c’s type is Char -> [Char] regardless of whether b is explicitly typed or not.)

Explicitly specifying the type for b breaks apart this binding group. Now the type of a is t -> [t] and the type of b is a -> a. Explicitly stating the type of b changed not only b’s type, but also the type of a. That’s because a is not longer in a binding group with c.

Before computing the binding groups and sorting them, explicitly typed bindings are first pulled out of the call graph¹. They are kept in a separate list. Then the remaining bindings (the implicitly typed ones) are examined to find the dependencies between them. All this is done with the goal of making the types for these bindings be as general as possible. Typing a as t -> [t] is preferrable to typing it as Char -> [Char].

A new list is added to the BindGroup type to hold the explicitly typed bindings:

type Impl = (Id, Expr)
type Expl = (Id, Scheme, Expr)
type BindGroup = ([Expl], [[Impl]])

An Expl holds an explicitly typed binding. It’s like an Impl with a type scheme added to it.

We have to be careful about the order we process these bindings in. As you saw in the example above, an implicitly typed binding might depend on an explicitly typed binding. That means that the types of the explicitly typed bindings have to be added to the assumptions when inferring the type of implicitly typed bindings.

Explicitly typed bindings can also reference implicitly typed ones. When checking that the type of the explicitly typed binding is valid, we have to know the type of the implicitly typed bindings in order to be able to tell if the type the user provided is correct or not. This means that checking the validity of explicitly typed bindings must happen after processing the implicitly typed ones.

When processing the implicitly typed bindings, we have to trust that the user-provided types are correct because we can’t actually check them until after processing the implicitly typed bindings. Here’s what that logic now looks like:

tiBindGroup :: Infer BindGroup [Assump]
tiBindGroup as (es,iss) = do
  let as' = [ v :>: sc | (v, sc, _) <- es ]
  as'' <- tiSeq tiImpls (as' ++ as) iss
  _ <- mapM (tiExpl (as'' ++ as' ++ as)) es
  return $ as'' ++ as'

In this code, es is the list of explicitly typed bindings. as' is the assumptions for the explicitly typed bindings (trusting that the provided type is correct). Then as'' is the list of assumptions that come from the implicitly typed bindings. Finally, tiExpl is applied to each of the explicitly typed bindings to check that their type is valid.

tiExpl :: Infer Expl ()
tiExpl as (i, sc, e) = do
  t <- freshInst sc
  tiBinding as e t

  s <- getSubst
  let t'  = apply s t
  let fs  = tv (apply s as)
  let gs  = tv t' \\ fs

  let sc' = quantify gs t'
  if sc /= sc' then
      fail "signature too general"
    else
      return ()

The logic in tiExpl starts off by using the same tiBinding function that tiImpls used. Remember that tiBinding unifies the type found for the expression e with the type t. If the declared type is invalid in some way, that unification will fail. For example, if the type contained Bool where the code needed an Int, that wouldn’t unify. That’s the first way this could fail.

If the explicit type is less general than the most general type that could have been inferred, unification will update the substitution to restrict the type down to what is explicitly stated.

The middle section of tiExpl finds the set of type variables gs to quantify over. This is the same process that happened in tiImpls.

The last section quantifies the type and checks that the quantified type matches the explicitly declared type. If it doesn’t, that means the type is too general. For example, if you wrote myFunction :: a -> a when it actually only works with strings, that type declaration would be too general. Types that are too specific or just plain wrong would have already been caught by unification. But if the declared type was too general, unification would have happily accepted it. That’s why this additional check is needed. This is the second way this function could fail.

That’s all we need to handle explicitly typed bindings. Though the logic is a bit subtle, the code doesn’t need to change that much.

The finished product is now in explicit.hs.

So far we’ve added three extensions to our type system: let polymorphism, recursive bindings, and now explicitly typed bindings. There are a few minor things left to add. Stay tuned for a discussion of kinds in the next post.

more work that we assume happens as a preprocessing stage↩