Type inference for Haskell, part 7
This is part 7 of the series on type inference for Haskell.
Remember how I said there would be some way of keeping track of the solution thus far? Substitutions are how the algorithm stores what it has learned so far.
Substitutions are mappings from type variable to types. A substitution mapping
the type variable a to the type Int is saying that you should replace the
type variable a with the type Int whenever you see a in a type. The type
on the right hand side could be (or contain) other type variables, so you might
have a substitution that replaces a with b, or one that replaces c with
Maybe d.
In this code, that’s stored as a list of tuples. It also could have been a
Data.Map, but later on it will end up being convenient to know the order in
which stuff was added to the substitution.
type Subst = [(Tyvar, Type)]There are also some utility functions for creating simple substitution values:
nullSubst :: Subst
nullSubst = []
(+->) :: Tyvar -> Type -> Subst
u +-> t = [(u, t)]A substitution from some type variable to a type basically says that we’ve
learned that the type variable has that type. If we learn that a type must be an
Int where we had previously been using the type variable a, that means we
should substitute a with Int. This also works for storing the fact that two
types are the same (even if we don’t know what either one is yet): we just
replace one type variable for the other. Now all types will just use the other
type variable. Using the same type variable in two places forces the type to be
the same in those two places.
How do you actually use a substitution once it is made? The process of applying
a substitution to a type recursively looks over the type. Where it finds any
type variables that the substitution replaces, it uses that replacement. It
leaves any other type variable alone. So, if you have a type Bool -> a -> b
and a substitution that replaces a with Int, the result of applying that
substitution to that type would be Bool -> Int -> b. Applying an empty
substitution always returns the type unchanged.
Here’s what the code for that looks like:
apply :: Subst -> Type -> Type
apply s (TVar u) =
case lookup u s of
Just t -> t
Nothing -> TVar u
apply s (TAp l r) = TAp (apply s l) (apply s r)
apply s t = tAs type inference looks at different expressions, it learns more about the
solution to the problem. That means that it needs to add information to the
substitution. At first glance, it looks like appending more (type variable,
type) pairs to the existing substitution would work for this. However, this
doesn’t work when there is a chain of more than two things that need to have the
same type. Say that we know that the type variable a is the same type as the
type variable b, and thus have a substitution that replaces a with b. I’ll
use python-like syntax to write this concisely: {a: b}. Now we learn that
b is the same type as c (giving {b: c}). Combining those by appending
the two gives {a: b, b: c}. This is incorrect. The three type variables
should all be replaced with the same thing, but here a and b are being
replaced with different things.
In order to make it work right, we need to update the existing substitution with
what we’ve learned before adding new stuff to it. That means applying the new
substitution to the right-hand sides of the old substitution. In the example
above, that means applying {b: c} to the b in {a: b}. That results in {a: c}. Now we can safely combine the two, giving {a: c, b: c}. Now all three
type variables get converted to the same thing like we wanted.
As a general note, when a type variable appears in one of the left-hand sides of a substitution, you don’t want it to appear in any the right-hand sides. That would mean you would have to apply the same substitution multiple times to get the full result. It’s even worse when a variable is replaced by something containing itself - you’d never finish applying that!
Here’s the code for composing two substitutions. s2 is the older of the two
substitutions here:
infixr 4 @@
(@@) :: Subst -> Subst -> Subst
s1 @@ s2 = [ (u, apply s1 t) | (u,t) <- s2 ] ++ s1By the way, when you compose two substitutions like this, it produces a new
substitution that, when applied to a type, is equivalent to first applying the
old substitution and then applying the new one. That is to say, apply sNew (apply sOld t) should do the same thing to t that apply (sNew @@ sOld) t
does.
If you are studying Algorithm W (perhaps by reading Algorithm W Step by Step), you’ll see that each step there returns a substitution. The code here is slightly different. We are keeping a single, global substitution and adding on to it instead of constantly creating new substitutions. The two algorithms are equivalent. I’ve seen this variant called “Algorithm J” in at least one place.
Speaking of which, there are two functions for working with that global substitution. The first gets the current state of the substitution out of the monad, and the second updates it by composing the current substitution with the new substitution.
getSubst :: TI Subst
getSubst = TI (\s n -> (s,n,s))
extSubst :: Subst -> TI ()
extSubst s' = TI (\s n -> (s'@@s, n, ()))One last point of interest about substitutions: once two types become the same due to an update to the substitution, nothing will happen to cause them to become different again.
The next post will cover the last pieces of logic needed for type
inference. That includes the unify function, which is where new information
gets added to the substitution (it will call extSubst).