Type inference for Haskell, part 14
This is part 14 of the series on type inference for Haskell.
We’ve finally gotten to the last extension to the type system to add before getting to type classes! Trust me, I’m even more relieved than you are.
Right now our code cannot handle pattern matching syntax. This means it can’t
yet deal with a case statement or with patterns in function arguments. The
good news is that pattern matching is not particularly hard to add.
(THIH Section 11.2, Patterns)
There are a number of different sorts of patterns supported by Haskell.
You can match using a variable. That matches everything, and captures what it matches in that variable. Haskell requires that variable names only appear once in patterns and function definitions, but our code assumes this has already been checked.
There is also the wildcard which is spelled with an underscore. Like a variable this matches anything, but it doesn’t capture the result.
Haskell lets you capture the result of an inner pattern in a variable. For
example, err@(Left _) captures the whole Left value in err.
You can also match literal values, like "foo" or 123.
There’s another way to match integers called an “n + k” pattern. This matches any positive integer that is greater than or equal to k. The variable n is then bound to the original value minus k. This feature isn’t that commonly used.
Lastly, you can match structures like lists, tuples, and data records. All of these are handled together. The various sorts of list patterns you can write are just syntactic sugar for plain-old data structure matching. The same goes for tuples.
These patterns can be captured with a simple data structure:
data Pat
= PVar Id
| PWildcard
| PAs Id Pat
| PLit Literal
| PNpk Id Integer
| PCon Assump [Pat]Patterns impact type inference in a few ways. First, they can restrict the type
of the thing being matched. For example, this case statement is sufficient to
know that the type of x is Either String Bool:
case x of
Left "foo" -> False
Left _ -> True
Right b -> bSecond, patterns can introduce variables. Type inference needs to figure out the type of those variables.
Third, all the right-hand sides of the pattern matching have to be the same type.
Here’s how that is implemented. The tiPat function takes in a pattern and
returns a list of assumptions for any variables that were introduced, and the
type of the thing being matched. Note that this function isn’t given the
right-hand side of pattern matching. That’s handled later.
tiPat :: Pat -> TI ([Assump], Type)
tiPat (PVar i) = do
v <- newTVar Star
return ([i :>: toScheme v], v)
tiPat PWildcard = do
v <- newTVar Star
return ([], v)
tiPat (PAs i pat) = do
(as, t) <- tiPat pat
return ((i :>: toScheme t):as, t)
tiPat (PLit l) = do
t <- tiLit l
return ([], t)
tiPat (PNpk i k) = do
let t = tInteger
return ([i :>: toScheme t], t)
tiPat (PCon (i :>: sc) pats) = do
(as, ts) <- tiPats pats
t' <- newTVar Star
t <- freshInst sc
unify t (foldr fn t' ts)
return (as, t')The variable and wildcard cases both use a new type variable as the type. That type variable will likely get replaced with a concrete type later. The only difference between the variable and the wildcard is whether an assumption is added.
Handling the i@pat first handles the inner pattern, then simply adds the
variable name to the assumptions.
We already know the type of literals, so that type can be returned directly. It’s similar for n+k patterns, though that does also have to add an assumption.
Lastly, we come to handling structures. It processes all the inner patterns with
a function we haven’t defined yet. It then creates a type variable for
referencing the resulting type. It instantiates the scheme stored in the
constructor’s assumption. Remember that for constructors like Just, this will
result in a type like a -> Maybe a. It then sticks all the types from the
inner pattern together into a function type with that new type variable as the
return type. That’s so that it can then unify that function type with the type
from the constructor. This can fail if some types in the pattern don’t
work. Finally, it returns the assumptions and that type variable.
The function to handle a list of patterns just maps tiPat over the list of
patterns, then collects the assumptions and types. If I were writing this
function today, I would have used unzip to get the assumptions and types out
of the list asts.
tiPats :: [Pat] -> TI ([Assump], [Type])
tiPats pats = do
asts <- mapM tiPat pats
let as = concat [ as' | (as', _) <- asts ]
let ts = [ t | (_, t) <- asts ]
return (as, ts)(THIH Section 11.4, Alternatives)
We now have code to infer the type for patterns, but we still need a way to
connect this up with the rest of the type inference algorithm. To do that, we’ll
extend the definition of functions to be able to handle patterns as the
arguments. We won’t actually introduce syntax for case statements, though you
can emulate a case statement with a function binding in a let statement. For
example, this:
case e of
pat1 -> e1
pat2 -> e2can be transformed into
let f pat1 = e1
f pat2 = e2
in f eTo handle a function like f pat1 pat2 pat3 = bar, we need a list of patterns
and an expression. Those patterns might all be just variables (we use the same
code regardless of whether the function definition has anything more interesting
patterns). This is captured in a structure called an Alt (short for
alternative):
type Alt = ([Pat], Expr)Note that the list of patterns can be empty. This happens for declarations like
x = e1.
The structure is called an alternative because a function can have multiple alternative ways of being evaluated. The two lines of the following function are both alternatives:
isJust (Just _) = True
isJust Nothing = FalseThe implicitly-typed and explicitly-typed declarations now accept a list of alternatives (they used to just have an expression):
type Impl = (Id, [Alt])
type Expl = (Id, Scheme, [Alt])Inference for an alternative is thankfully fairly straightforward. First, infer the types of the patterns. This will return zero or more assumptions and a list of types, one type per argument. It then finds the type of the expression (with those extra assumptions added). Finally, the resulting type is a function type where the argument types are the types from those patterns and the return type is the type of that expression.
tiAlt :: Infer Alt Type
tiAlt as (pats, e) = do
(as', ts) <- tiPats pats
t <- tiExpr (as' ++ as) e
return $ foldr fn t ts
tiAlts :: [Assump] -> [Alt] -> Type -> TI ()
tiAlts as alts t = do
ts <- mapM (tiAlt as) alts
mapM_ (unify t) tsThe tiAlts function handles a list of alternatives and does one extra thing:
on the last line it ensures that all the different function bodies have the same
type.
The last change is that tiExpl and tiImpls now call tiAlts when they had
been calling tiBinding. In fact, the tiBinding function now goes away.
And that’s it! It’s worth noting that even functions that just use variables for arguments still are typed using the pattern inference code.
You can find the finished code in patterns.hs.
This finishes up the middle third of the series on type inference for Haskell. Next up: Typeclasses.