Type inference for Haskell, part 23
This is part 23, the final part of the series on type inference for Haskell.
All the hard work for type classes is done. The data structures have been defined. We dealt with context reduction, which involved predicate entailment and converting predicates to head normal form. Most of the predicates needed to be retained on the current binding, but some needed to be deferred to an enclosing binding. Some predicates resulted in ambiguities (some of which could be resolved by defaulting). And finally, we added the monomorphism restriction.
Now we just need to attach this logic to the rest of the type inference system. You already saw two changes to existing functions in the previous post. The top level function tiProgram
was changed to apply a defaulting substitution, and tiImpls
gained logic to deal with the monomorphism restriction. There was another change to tiImpls
: It now returns a list of predicates in addition to the list of assumptions it was returning before. Where do those predicates come from?
Ultimately, they can come from one of four places.
First, a number literal now comes with a predicate attached. Where we previously had hard-coded a type of Integer
for a number, we can now return a generic type with the stipulation that the type be in some Num
class. The tiLit
function is now:
tiLit :: Literal -> TI ([Pred], Type)
tiLit (LitChar _) =
return ([], tChar)
tiLit (LitInt _) = do
v <- newTVar Star
return ([IsIn "Num" v], v)
tiLit (LitStr _) =
return ([], tString)
tiLit (LitRat _) = do
v <- newTVar Star
return ([IsIn "Fractional" v], v)
The LitRat
case handles literals like 2.78
.
Second, the N+K pattern also can add a predicate. This restricts the type a bit more than a literal number does (Integral
is a subclass of Num
).
If you look at the other cases in tiPat
(I won’t duplicate all of them here), you’ll see that most do one of two things. Either they return an empty list of predicates, or they return the list of predicates from processing an inner pattern. The constructor pattern is interesting because it needs to combine multiple sources of predicates:
tiPat (PCon (i:>:sc) pats) = do
(ps,as,ts) <- tiPats pats
t' <- newTVar Star
(qs :=> t) <- freshInst sc
unify t (foldr fn t' ts)
return (ps++qs, as, t')
Here, both ps
and qs
are lists of predicates. The list ps
is also the result of combining lists of predicates:
tiPats :: [Pat] -> TI ([Pred], [Assump], [Type])
tiPats pats = do
psasts <- mapM tiPat pats
let ps = concat [ ps' | (ps',_,_) <- psasts ]
as = concat [ as' | (_,as',_) <- psasts ]
ts = [ t | (_,_,t) <- psasts ]
return (ps, as, ts)
This is a general pattern you’ll see throughout the code. A small handful of things actually give predicates. The rest either give an empty list of predicates or return the predicates given by nested calls. Throughout all the type inference code, it just builds up a list of all the predicates it can find.
The third place you can get a predicate from is a variable. If a variable’s binding carries some predicates, they will be added to the list when that variable is used. This might happen if the variable is the name of a method in a type class. As an example, if you use the function show
, you would get the predicate (Show a)
.
Finally, the fourth source of predicates is constructor expressions. In practice, these will mostly have an empty list of predicates.
After that, everything else just collects and returns predicates. That holds true for the other cases in tiExpr
:
tiExpr ce as (Lit l) = do
(ps,t) <- tiLit l
return (ps, t)
tiExpr ce as (Ap e f) = do
(ps,te) <- tiExpr ce as e
(qs,tf) <- tiExpr ce as f
t <- newTVar Star
unify (tf `fn` t) te
return (ps ++ qs, t)
tiExpr ce as (Let bg e) = do
(ps, as') <- tiBindGroup ce as bg
(qs, t) <- tiExpr ce (as' ++ as) e
return (ps ++ qs, t)
You might notice that the type for tiExpr
is Infer Expr Type
, which makes use of the type alias Infer
. That is defined as follows:
The rule of gathering and returning predicates is also followed by tiAlt
and tiAlts
:
tiAlt :: Infer Alt Type
tiAlt ce as (pats, e) = do
(ps, as', ts) <- tiPats pats
(qs,t) <- tiExpr ce (as'++as) e
return (ps++qs, foldr fn t ts)
tiAlts :: ClassEnv -> [Assump] -> [Alt] -> Type -> TI [Pred]
tiAlts ce as alts t = do
psts <- mapM (tiAlt ce as) alts
mapM (unify t) (map snd psts)
return (concat (map fst psts))
The same is true of tiBindGroup
:
tiBindGroup :: Infer BindGroup [Assump]
tiBindGroup ce as (es,iss) = do
let as' = [ v:>:sc | (v,sc,alts) <- es ]
(ps, as'') <- tiSeq tiImpls ce (as'++as) iss
qss <- mapM (tiExpl ce (as''++as'++as)) es
return (ps++concat qss, as''++as')
And of tiSeq
:
tiSeq :: Infer bg [Assump] -> Infer [bg] [Assump]
tiSeq ti ce as [] = return ([],[])
tiSeq ti ce as (bs:bss) = do
(ps,as') <- ti ce as bs
(qs,as'') <- tiSeq ti ce (as'++as) bss
return (ps++qs, as''++as')
Let’s now look at where the rubber meets the road and see where these lists of predicates gathered up end up going to all that code we defined before. We need to look at both tiExpl
and tiImpls
.
Explicitly typed bindings are handled by tiExpl
. This means there was a type declared for them. That type is stored in the scheme sc
. Remember before how we needed to check that the declared type wasn’t too general for the binding? We need to do something similar with predicates. When the scheme sc
is instantiated, we get the predicates qs
. After type inference is done, we apply the current substitution s
to qs
to get qs'
. This makes sure all the type variables mentioned in qs
are up to date.
Back where the inference happened, you can see that tiAlts
gathers a list of predicates ps
. This is what we need to check works with qs'
. If the declared type for the binding didn’t mention a predicate that turns out to be necessary, that would be a problem. Haskell wouldn’t allow you to write this because the type you declare is missing a (Show a)
predicate:
So, we need to check that the predicates you supply cover everything that is needed. This is the second place the entail
function is called (the first was in simplifying the list of predicates). The predicates in ps
are filtered out if they are entailed by the declared predicates qs
. Any that remain are stored in ps'
.
It’s actually possible for it to be OK for there to be predicates left in ps'
. If all of the predicates that are left are deferred, then that means that there is no problem with the declared type. Some other binding will need to have those predicates. The call to split
determines whether the predicate is deferred or retained. If any retained predicates are left, that does indicate an error.
tiExpl :: ClassEnv -> [Assump] -> Expl -> TI [Pred]
tiExpl ce as (i, sc, alts) = do
(qs :=> t) <- freshInst sc
ps <- tiAlts ce as alts t
s <- getSubst
let qs' = apply s qs
t' = apply s t
fs = tv (apply s as)
gs = tv t' \\ fs
sc' = quantify gs (qs':=>t')
ps' = filter (not . entail ce qs') (apply s ps)
(ds,rs) <- split ce fs gs ps'
if sc /= sc' then
fail "signature too general"
else if not (null rs) then
fail "context too weak"
else
return ds
You have already seen the last few lines of tiImpls
in the previous post. Here is the function in its entirety:
tiImpls :: Infer [Impl] [Assump]
tiImpls ce as bs = do
ts <- mapM (\_ -> newTVar Star) bs
let is = map fst bs
scs = map toScheme ts
as' = zipWith (:>:) is scs ++ as
altss = map snd bs
pss <- sequence (zipWith (tiAlts ce as') altss ts)
s <- getSubst
let ps' = apply s (concat pss)
ts' = apply s ts
fs = tv (apply s as)
vss = map tv ts'
gs = foldr1 union vss \\ fs
(ds,rs) <- split ce fs (foldr1 intersect vss) ps'
if restricted bs then
let gs' = gs \\ tv rs
scs' = map (quantify gs' . ([]:=>)) ts'
in return (ds++rs, zipWith (:>:) is scs')
else
let scs' = map (quantify gs . (rs:=>)) ts'
in return (ds, zipWith (:>:) is scs')
It’s handling of predicates is actually simpler than tiExpl
because it doesn’t have to check if any set of predicates is correct. When the types are inferred for each implicitly typed binding, the result is a list of lists of predicates pss
. These are concatenated together before having the current substitution s
applied to them, resulting in ps'
.
Those predicates are then passed to split
to decide which are deferred (ds
) and which are retained (rs
). From there, the rest is all logic that was discussed in the previous post, but there’s one last thing I want to mention. Assuming the monomorphism restriction doesn’t apply to these bindings, the retained predicates rs
are added to the type schemes of all of the bindings in the group. That might make one of the types ambiguous if one of those retained predicates referred to a type variable not found in the type of one of those bindings. To ensure that doesn’t happen, the list of type variables passed to split
is the intersection (not the union) of the type variables (vss
) that appear in all the binding’s types (ts'
).
Fin
We’re finally done! I hope this has been useful to you.
This covered nearly all of the code found in the Typing Haskell in Haskell paper. If you haven’t already, I highly encourage you to read over both that paper and the included code. It has a number of details that I didn’t include here.
The code (and the THIH paper) can be found at https://web.cecs.pdx.edu/~mpj/thih/. However, this code is from 2000 and is thus slightly out of date. I’ve uploaded a modified version that compiles with GHC 8.0 here (alt link).