Type inference for Haskell, part 14

Posted on January 14, 2019

This is part 14 of the series on type inference for Haskell.

We’ve finally gotten to the last extension to the type system to add before getting to type classes! Trust me, I’m even more relieved than you are.

Right now our code cannot handle pattern matching syntax. This means it can’t yet deal with a case statement or with patterns in function arguments. The good news is that pattern matching is not particularly hard to add.

(THIH Section 11.2, Patterns)

There are a number of different sorts of patterns supported by Haskell.

You can match using a variable. That matches everything, and captures what it matches in that variable. Haskell requires that variable names only appear once in patterns and function definitions, but our code assumes this has already been checked.

There is also the wildcard which is spelled with an underscore. Like a variable this matches anything, but it doesn’t capture the result.

Haskell lets you capture the result of an inner pattern in a variable. For example, err@(Left _) captures the whole Left value in err.

You can also match literal values, like "foo" or 123.

There’s another way to match integers called an “n + k” pattern. This matches any positive integer that is greater than or equal to k. The variable n is then bound to the original value minus k. This feature isn’t that commonly used.

Lastly, you can match structures like lists, tuples, and data records. All of these are handled together. The various sorts of list patterns you can write are just syntactic sugar for plain-old data structure matching. The same goes for tuples.

These patterns can be captured with a simple data structure:

data Pat
  = PVar Id
  | PWildcard
  | PAs  Id Pat
  | PLit Literal
  | PNpk Id Integer
  | PCon Assump [Pat]

Patterns impact type inference in a few ways. First, they can restrict the type of the thing being matched. For example, this case statement is sufficient to know that the type of x is Either String Bool:

case x of
    Left "foo" -> False
    Left _     -> True
    Right b    -> b

Second, patterns can introduce variables. Type inference needs to figure out the type of those variables.

Third, all the right-hand sides of the pattern matching have to be the same type.

Here’s how that is implemented. The tiPat function takes in a pattern and returns a list of assumptions for any variables that were introduced, and the type of the thing being matched. Note that this function isn’t given the right-hand side of pattern matching. That’s handled later.

tiPat :: Pat -> TI ([Assump], Type)
tiPat (PVar i)               = do
  v <- newTVar Star
  return ([i :>: toScheme v], v)
tiPat PWildcard              = do
  v <- newTVar Star
  return ([], v)
tiPat (PAs i pat)            = do
  (as, t) <- tiPat pat
  return ((i :>: toScheme t):as, t)
tiPat (PLit l)               = do
  t <- tiLit l
  return ([], t)
tiPat (PNpk i k)             = do
  let t = tInteger
  return ([i :>: toScheme t], t)
tiPat (PCon (i :>: sc) pats) = do
  (as, ts) <- tiPats pats
  t'       <- newTVar Star
  t        <- freshInst sc
  unify t (foldr fn t' ts)
  return (as, t')

The variable and wildcard cases both use a new type variable as the type. That type variable will likely get replaced with a concrete type later. The only difference between the variable and the wildcard is whether an assumption is added.

Handling the i@pat first handles the inner pattern, then simply adds the variable name to the assumptions.

We already know the type of literals, so that type can be returned directly. It’s similar for n+k patterns, though that does also have to add an assumption.

Lastly, we come to handling structures. It processes all the inner patterns with a function we haven’t defined yet. It then creates a type variable for referencing the resulting type. It instantiates the scheme stored in the constructor’s assumption. Remember that for constructors like Just, this will result in a type like a -> Maybe a. It then sticks all the types from the inner pattern together into a function type with that new type variable as the return type. That’s so that it can then unify that function type with the type from the constructor. This can fail if some types in the pattern don’t work. Finally, it returns the assumptions and that type variable.

The function to handle a list of patterns just maps tiPat over the list of patterns, then collects the assumptions and types. If I were writing this function today, I would have used unzip to get the assumptions and types out of the list asts.

tiPats :: [Pat] -> TI ([Assump], [Type])
tiPats pats = do
  asts <- mapM tiPat pats
  let as = concat [ as' | (as', _) <- asts ]
  let ts = [ t | (_, t) <- asts ]
  return (as, ts)

(THIH Section 11.4, Alternatives)

We now have code to infer the type for patterns, but we still need a way to connect this up with the rest of the type inference algorithm. To do that, we’ll extend the definition of functions to be able to handle patterns as the arguments. We won’t actually introduce syntax for case statements, though you can emulate a case statement with a function binding in a let statement. For example, this:

case e of
    pat1 -> e1
    pat2 -> e2

can be transformed into

let f pat1 = e1
    f pat2 = e2
in f e

To handle a function like f pat1 pat2 pat3 = bar, we need a list of patterns and an expression. Those patterns might all be just variables (we use the same code regardless of whether the function definition has anything more interesting patterns). This is captured in a structure called an Alt (short for alternative):

type Alt = ([Pat], Expr)

Note that the list of patterns can be empty. This happens for declarations like x = e1.

The structure is called an alternative because a function can have multiple alternative ways of being evaluated. The two lines of the following function are both alternatives:

isJust (Just _) = True
isJust Nothing  = False

The implicitly-typed and explicitly-typed declarations now accept a list of alternatives (they used to just have an expression):

type Impl = (Id, [Alt])
type Expl = (Id, Scheme, [Alt])

Inference for an alternative is thankfully fairly straightforward. First, infer the types of the patterns. This will return zero or more assumptions and a list of types, one type per argument. It then finds the type of the expression (with those extra assumptions added). Finally, the resulting type is a function type where the argument types are the types from those patterns and the return type is the type of that expression.

tiAlt :: Infer Alt Type
tiAlt as (pats, e) = do
  (as', ts) <- tiPats pats
  t <- tiExpr (as' ++ as) e
  return $ foldr fn t ts

tiAlts :: [Assump] -> [Alt] -> Type -> TI ()
tiAlts as alts t = do
  ts <- mapM (tiAlt as) alts
  mapM_ (unify t) ts

The tiAlts function handles a list of alternatives and does one extra thing: on the last line it ensures that all the different function bodies have the same type.

The last change is that tiExpl and tiImpls now call tiAlts when they had been calling tiBinding. In fact, the tiBinding function now goes away.

And that’s it! It’s worth noting that even functions that just use variables for arguments still are typed using the pattern inference code.

You can find the finished code in patterns.hs.

This finishes up the middle third of the series on type inference for Haskell. Next up: Typeclasses.