Type inference for Haskell, part 23

Posted on January 24, 2019

This is part 23, the final part of the series on type inference for Haskell.

All the hard work for type classes is done. The data structures have been defined. We dealt with context reduction, which involved predicate entailment and converting predicates to head normal form. Most of the predicates needed to be retained on the current binding, but some needed to be deferred to an enclosing binding. Some predicates resulted in ambiguities (some of which could be resolved by defaulting). And finally, we added the monomorphism restriction.

Now we just need to attach this logic to the rest of the type inference system. You already saw two changes to existing functions in the previous post. The top level function tiProgram was changed to apply a defaulting substitution, and tiImpls gained logic to deal with the monomorphism restriction. There was another change to tiImpls: It now returns a list of predicates in addition to the list of assumptions it was returning before. Where do those predicates come from?

Ultimately, they can come from one of four places.

First, a number literal now comes with a predicate attached. Where we previously had hard-coded a type of Integer for a number, we can now return a generic type with the stipulation that the type be in some Num class. The tiLit function is now:

tiLit :: Literal -> TI ([Pred], Type)
tiLit (LitChar _) =
  return ([], tChar)
tiLit (LitInt _)  = do
  v <- newTVar Star
  return ([IsIn "Num" v], v)
tiLit (LitStr _)  =
  return ([], tString)
tiLit (LitRat _)  = do
  v <- newTVar Star
  return ([IsIn "Fractional" v], v)

The LitRat case handles literals like 2.78.

Second, the N+K pattern also can add a predicate. This restricts the type a bit more than a literal number does (Integral is a subclass of Num).

tiPat (PNpk i k) = do
  t <- newTVar Star
  return ([IsIn "Integral" t], [i:>:toScheme t], t)

If you look at the other cases in tiPat (I won’t duplicate all of them here), you’ll see that most do one of two things. Either they return an empty list of predicates, or they return the list of predicates from processing an inner pattern. The constructor pattern is interesting because it needs to combine multiple sources of predicates:

tiPat (PCon (i:>:sc) pats) = do
    (ps,as,ts) <- tiPats pats
    t'         <- newTVar Star
    (qs :=> t) <- freshInst sc
    unify t (foldr fn t' ts)
    return (ps++qs, as, t')

Here, both ps and qs are lists of predicates. The list ps is also the result of combining lists of predicates:

tiPats :: [Pat] -> TI ([Pred], [Assump], [Type])
tiPats pats = do
    psasts <- mapM tiPat pats
    let ps = concat [ ps' | (ps',_,_) <- psasts ]
        as = concat [ as' | (_,as',_) <- psasts ]
        ts = [ t | (_,_,t) <- psasts ]
    return (ps, as, ts)

This is a general pattern you’ll see throughout the code. A small handful of things actually give predicates. The rest either give an empty list of predicates or return the predicates given by nested calls. Throughout all the type inference code, it just builds up a list of all the predicates it can find.

The third place you can get a predicate from is a variable. If a variable’s binding carries some predicates, they will be added to the list when that variable is used. This might happen if the variable is the name of a method in a type class. As an example, if you use the function show, you would get the predicate (Show a).

tiExpr ce as (Var i) = do
    sc         <- find i as
    (ps :=> t) <- freshInst sc
    return (ps, t)

Finally, the fourth source of predicates is constructor expressions. In practice, these will mostly have an empty list of predicates.

tiExpr ce as (Const (i:>:sc)) = do
    (ps :=> t) <- freshInst sc
    return (ps, t)

After that, everything else just collects and returns predicates. That holds true for the other cases in tiExpr:

tiExpr ce as (Lit l)    = do
  (ps,t) <- tiLit l
  return (ps, t)

tiExpr ce as (Ap e f)   = do
  (ps,te) <- tiExpr ce as e
  (qs,tf) <- tiExpr ce as f
  t       <- newTVar Star
  unify (tf `fn` t) te
  return (ps ++ qs, t)

tiExpr ce as (Let bg e) = do
  (ps, as') <- tiBindGroup ce as bg
  (qs, t)   <- tiExpr ce (as' ++ as) e
  return (ps ++ qs, t)

You might notice that the type for tiExpr is Infer Expr Type, which makes use of the type alias Infer. That is defined as follows:

type Infer e t = ClassEnv -> [Assump] -> e -> TI ([Pred], t)

The rule of gathering and returning predicates is also followed by tiAlt and tiAlts:

tiAlt :: Infer Alt Type
tiAlt ce as (pats, e) = do
    (ps, as', ts) <- tiPats pats
    (qs,t)  <- tiExpr ce (as'++as) e
    return (ps++qs, foldr fn t ts)

tiAlts :: ClassEnv -> [Assump] -> [Alt] -> Type -> TI [Pred]
tiAlts ce as alts t = do
    psts <- mapM (tiAlt ce as) alts
    mapM (unify t) (map snd psts)
    return (concat (map fst psts))

The same is true of tiBindGroup:

tiBindGroup :: Infer BindGroup [Assump]
tiBindGroup ce as (es,iss) = do
  let as' = [ v:>:sc | (v,sc,alts) <- es ]
  (ps, as'') <- tiSeq tiImpls ce (as'++as) iss
  qss        <- mapM (tiExpl ce (as''++as'++as)) es
  return (ps++concat qss, as''++as')

And of tiSeq:

tiSeq :: Infer bg [Assump] -> Infer [bg] [Assump]
tiSeq ti ce as []       = return ([],[])
tiSeq ti ce as (bs:bss) = do
    (ps,as')  <- ti ce as bs
    (qs,as'') <- tiSeq ti ce (as'++as) bss
    return (ps++qs, as''++as')

Let’s now look at where the rubber meets the road and see where these lists of predicates gathered up end up going to all that code we defined before. We need to look at both tiExpl and tiImpls.

Explicitly typed bindings are handled by tiExpl. This means there was a type declared for them. That type is stored in the scheme sc. Remember before how we needed to check that the declared type wasn’t too general for the binding? We need to do something similar with predicates. When the scheme sc is instantiated, we get the predicates qs. After type inference is done, we apply the current substitution s to qs to get qs'. This makes sure all the type variables mentioned in qs are up to date.

Back where the inference happened, you can see that tiAlts gathers a list of predicates ps. This is what we need to check works with qs'. If the declared type for the binding didn’t mention a predicate that turns out to be necessary, that would be a problem. Haskell wouldn’t allow you to write this because the type you declare is missing a (Show a) predicate:

foo :: a -> a
foo x = show x

So, we need to check that the predicates you supply cover everything that is needed. This is the second place the entail function is called (the first was in simplifying the list of predicates). The predicates in ps are filtered out if they are entailed by the declared predicates qs. Any that remain are stored in ps'.

It’s actually possible for it to be OK for there to be predicates left in ps'. If all of the predicates that are left are deferred, then that means that there is no problem with the declared type. Some other binding will need to have those predicates. The call to split determines whether the predicate is deferred or retained. If any retained predicates are left, that does indicate an error.

tiExpl :: ClassEnv -> [Assump] -> Expl -> TI [Pred]
tiExpl ce as (i, sc, alts) = do
  (qs :=> t) <- freshInst sc
  ps         <- tiAlts ce as alts t
  s          <- getSubst
  let qs' = apply s qs
      t'  = apply s t
      fs  = tv (apply s as)
      gs  = tv t' \\ fs
      sc' = quantify gs (qs':=>t')
      ps' = filter (not . entail ce qs') (apply s ps)
  (ds,rs)    <- split ce fs gs ps'
  if sc /= sc' then
      fail "signature too general"
    else if not (null rs) then
      fail "context too weak"
    else
      return ds

You have already seen the last few lines of tiImpls in the previous post. Here is the function in its entirety:

tiImpls :: Infer [Impl] [Assump]
tiImpls ce as bs = do
  ts <- mapM (\_ -> newTVar Star) bs
  let is    = map fst bs
      scs   = map toScheme ts
      as'   = zipWith (:>:) is scs ++ as
      altss = map snd bs
  pss <- sequence (zipWith (tiAlts ce as') altss ts)
  s   <- getSubst
  let ps' = apply s (concat pss)
      ts' = apply s ts
      fs  = tv (apply s as)
      vss = map tv ts'
      gs  = foldr1 union vss \\ fs
  (ds,rs) <- split ce fs (foldr1 intersect vss) ps'
  if restricted bs then
      let gs'  = gs \\ tv rs
          scs' = map (quantify gs' . ([]:=>)) ts'
      in return (ds++rs, zipWith (:>:) is scs')
    else
      let scs' = map (quantify gs . (rs:=>)) ts'
      in return (ds, zipWith (:>:) is scs')

It’s handling of predicates is actually simpler than tiExpl because it doesn’t have to check if any set of predicates is correct. When the types are inferred for each implicitly typed binding, the result is a list of lists of predicates pss. These are concatenated together before having the current substitution s applied to them, resulting in ps'.

Those predicates are then passed to split to decide which are deferred (ds) and which are retained (rs). From there, the rest is all logic that was discussed in the previous post, but there’s one last thing I want to mention. Assuming the monomorphism restriction doesn’t apply to these bindings, the retained predicates rs are added to the type schemes of all of the bindings in the group. That might make one of the types ambiguous if one of those retained predicates referred to a type variable not found in the type of one of those bindings. To ensure that doesn’t happen, the list of type variables passed to split is the intersection (not the union) of the type variables (vss) that appear in all the binding’s types (ts').

Fin

We’re finally done! I hope this has been useful to you.

This covered nearly all of the code found in the Typing Haskell in Haskell paper. If you haven’t already, I highly encourage you to read over both that paper and the included code. It has a number of details that I didn’t include here.

The code (and the THIH paper) can be found at https://web.cecs.pdx.edu/~mpj/thih/. However, this code is from 2000 and is thus slightly out of date. I’ve uploaded a modified version that compiles with GHC 8.0 here (alt link).