# Type inference for Haskell, part 11

This is part 11 of the series on type inference for Haskell.

Right now the type system doesn’t allow the definition of a binding to refer to itself. This means that you cannot have recursive functions. That’s a big problem in a functional language.

To allow for recursive functions, the rule for let bindings needs to change slightly. Remember that the general form of a let binding is `let v1 = e1 in e2`

. Up till now, the rule has been that the variable *v1* is not in scope in the expression *e1*. To allow recursion, we simply change it so that the variable is in scope in *e1*. Note that this means that *e1* can no longer refer to a shadowed variable of the same name as *v1*.

That rule change means that when figuring out the type of the expression *e1*, we need the variable *v1* to be in scope already. The variable’s type is not known yet, but we can still create an assumption it. The type will just start off as a new type variable. Once inferring the type of *e1* is done, we’ll update the notion of what type *v1* ends up being before we start to look at *e2*.

This approach isn’t perfect. It requires that the function always make the recursive call with the same type at which it was called. Normally this isn’t a problem. For example, if you define your own map function:

and that function was called like `myMap show [1,2,3]`

, the type of `myMap`

in that call would be `(Int -> String) -> [Int] -> [String]`

. In the recursive call (the `myMap f xs`

), the type of `myMap`

is the same.

Where you run into a problem is when the recursive call has a different type. The code below uses a strange data structure definition to cause each recursive call to be at a different type. The fact that the next link is a `CrazyList [a]`

instead of a `CrazyList a`

means that each link’s data is nested one level deeper inside nested lists. This means that the recursive call (`crazyListSize rest`

) is a different type than the first call was.

```
data CrazyList a = End | List a (CrazyList [a])
-- crazyListSize :: CrazyList a -> Int
crazyListSize End = 0
crazyListSize (List _ rest) = 1 + crazyListSize rest
```

If you try to compile the code above in Haskell, it will fail. Type inference can’t handle that particular edge case. Fortunately, there is an easy work-around: you can make it work by explicitly declaring the type. The example will compile if you uncomment the type declaration above.

Dealing with explicitly typed bindings will be a problem for the next post.

There is actually still something missing. As defined right now, we cannot handle mutually recursive definitions. Fixing that requires that a single let expression be able to declare multiple variables. The definitions of the variables can then reference any of the introduced variables. We can now write expressions like:

```
let evens n = collatz (n `div` 2)
odds n = collatz (3 * n + 1)
collatz 1 = 1
collatz n = 1 + (if n `mod` 2 == 0 then evens n else odds n)
in collatz 13
```

Above we looked at a problem caused by the fact that the recursive call has to be the same type as the original call of that function. That problem gets even worse when mutual recursion comes into play: the types of *all* the calls of a function have to match the type of the original function call. This could accidentally result in types that are far more restrictive than you wanted. Look at this (admittedly silly) example:

That restriction causes the type of `identity`

to be `String -> String`

instead of `a -> a`

because it is used as `String -> String`

in the definition of `foo`

. This only happens because the `unused = foo "bar"`

bit causes the definitions to be mutually recursive.

To avoid getting overly restricted types for things we need to be careful to only treat bindings as mutually recursive when they actually are. That means that bindings need to be split up as much as possible while still keeping the mutually recursive bindings together. This requires a topological sort of the strongly connected components of the call graph of these variables. The strongly connected components are the groups of bindings that are mutually recursive. Topologically sorting them makes it so we don’t deal with a binding that references something we haven’t dealt with yet.

Our algorithm assumes this grouping and sorting has already done for us. The new structure of bindings will reflect this.

An `Impl`

is an implicitly typed (i.e. no type declaration) binding. The `Id`

is the name of the variable.

In `BindGroup`

, each item in the outer list is a mutually-recursive group (which might be just one binding if it isn’t involved in mutual recursion), and each item in the group is one binding.

The data structure for `Let`

now references the `BindGroup`

structure:

The inference logic references a function for finding the type of an entire group of bindings:

`tiBindGroup`

returns an assumption for each variable in `as'`

. It works by repeatedly running `tiImpls`

for each binding group. The `tiSeq`

function is a helper function for looping over each binding group, collecting the added assumptions along the way. `iss`

is the list of lists of implicitly typed bindings.

The code for `tiImpls`

looks more scary that it actually is. The first four lines create a new list of assumptions `as'`

that contain the original assumptions plus the temporary assumptions for the variables being introduced. `bs`

is the list of bindings in the group, `as`

is the original assumptions, `ts`

is a list of new type variables for the temporary assumptions, `scs`

is those type variables converted to schemes, and `is`

is the names of the variables introduced.

The middle portion gets the list of expressions the variables are bound to and (with the help of `tiBinding`

), finds the type for all of them. `sequence`

is a function provided by Haskell that runs a list of monadic expressions. Interestingly, we don’t use the result of that expression. The fact that `tiBinding`

unifies the result of each expression with the respective type variable from `ts`

means that our substitution gets updated with the real type.

That’s where the third part comes in. It gets that substitution and applies it to all the types in `ts`

to get the real types `ts'`

. It then figures out the type variables in the assumptions (after applying the substitution to them) and in the new bindings for the sake of finding the type variables to quantify over (`gs`

). That list is the set of type variables that appear in any of the types of the variables being introduced and not in any of the existing bindings. That list is then used to quantify each type in `ts'`

to get the type scheme for each variable introduced. The last line assembles the names and the schemes into assumptions.

```
tiImpls :: Infer [Impl] [Assump]
tiImpls as bs = do
ts <- mapM (\_ -> newTVar) bs
let scs = map toScheme ts
let is = map fst bs
let as' = (zipWith (:>:) is scs) ++ as
let exprs = map snd bs
_ <- sequence $ zipWith (tiBinding as') exprs ts
s <- getSubst
let ts' = apply s ts
let fs = tv (apply s as)
let vss = map tv ts'
let gs = foldr1 union vss \\ fs
let scs' = map (quantify gs) ts'
return $ zipWith (:>:) is scs'
tiBinding :: [Assump] -> Expr -> Type -> TI ()
tiBinding as e t = do
t' <- tiExpr as e
unify t t'
```

Here’s the implementation of `tiSeq`

. It runs the inference function `ti`

for each element in the list, adding the resulting assumptions to the set of assumptions before moving on to the next element in the list.

```
tiSeq :: Infer bg [Assump] -> Infer [bg] [Assump]
tiSeq _ _ [] =
return []
tiSeq ti as (bs:bss) = do
as' <- ti as bs
as'' <- tiSeq ti (as' ++ as) bss
return $ as'' ++ as'
```

It is now also convenient to make the top-level of type inference take a whole binding group instead of a single expression. That way, you can find out the types of a whole list of bindings.

```
type Program = [BindGroup]
tiProgram :: [Assump] -> Program -> [Assump]
tiProgram as bgs = runTI $ do
as' <- tiSeq tiBindGroup as bgs
s <- getSubst
return (apply s as')
```

And that’s it! Here’s the finished code.

Now that the type system has let polymorphism and recursive bindings, it is a fairly useful one. That said, it still has a long way to go until it matches Haskell’s. There are a few more small features left to add before moving on to the huge feature, type classes.