-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redo type checker using "propagate types inward" trick #99
Comments
To add source location to the My first attempt looks like this: data TermLoc = TermLoc {start :: Int, end :: Int}
deriving (Eq, Show, Data)
-- Add TermLoc to the relevant terms
data Term
= TVar TermLoc Var
| TDef TermLoc Var (Maybe Polytype) Term
| ... which could be parsed like so: locatedTerm :: (TermLoc -> a -> Term) -> Parser a -> Parser Term
locatedTerm mkTerm parser = do
start <- getOffset
a <- parser
end <- getOffset
pure $ mkTerm (TermLoc start' end') a so that we could add the location of |
Note that to add a type Type = Fix (Compose ((,) TermLoc) TypeF) with lots of utility functions and pattern synonyms to make it not painful to use. Being able to easily decorate every node like this is one of the benefits of using a two-level type. Off the top of my head though I'm not sure how this would interact with unification; we would need to tell the library to not worry about However, in any case I'm not sure adding Yes, if we do the "propagate types inward" trick, we could probably pick up a |
Thank you for all the details. I'd rather wait for a new type checker implementation that could use the failing term loc when throwing the error, than fiddling with the current library. You also mention another solution you'd like to try in #100 , and perhaps there is another technique we can use to manage the loc. For example, the Dhall.Core.Expr data type is using a |
As another example, the Grace.Syntax is using something like |
Yes, that's a nicer approach than manually adding a |
I spent a little more time thinking about this and think I understand why it would be an improvement. Suppose we have a definition like this:
We call But what if instead of having This doesn't always work, and there are still other problems (such as the fact that the place where it actually fails might not be the real error at all), but it's still better than what we have now. |
As a proof of concept, I started by modifying the
I put this on the It doesn't yet produce any better error messages, but I think this proves that we can improve things incrementally, without having to do an entire rewrite of the typechecker. To actually get better error messages, we have to try as hard as possible to catch errors that we can flag right at checking time, instead of putting them all off to the unifier. For example, in this case once we recursively check that |
Add record types to the language: record values are written like `[x = 3, y = "hi"]` and have types like `[x : int, y : text]`. Empty and singleton records are allowed. You can project a field out of a record using standard dot notation, like `r.x`. If things named e.g. `x` and `y` are in scope, you can also write e.g. `[x, y]` as a shorthand for `[x=x, y=y]`. Closes #1093 . #153 would make this even nicer to use. One reason this is significant is that record projection is our first language construct whose type cannot be inferred, because if we see something like `r.x` all we know about the type of `r` is that it is a record type with at least one field `x`, but we don't know how many other fields it might have. Without some complex stuff like row polymorphism we can't deal with that, so we just punt and throw an error saying that we can't infer the type of a projection. To make this usable we have to do a better job checking types, a la #99 . For example `def f : [x:int] -> int = \r. r.x end` would not have type checked before, since when checking the lambda we immediately switched into inference mode, and then encountered the record projection and threw up our hands. Now we work harder to push the given function type down into the lambda so that we are still in checking mode when we get to `r.x` which makes it work. But it is probably easy to write examples of other things where this doesn't work. Eventually we will want to fully implement #99 ; in the meantime one can always add a type annotation (#1164) on the record to get around this problem. Note, I was planning to add a `open e1 in e2` syntax, which would take a record expression `e1` and "open" it locally in `e2`, so all the fields would be in scope within `e2`. For example, if we had `r = [x = 3, y = 7]` then instead of writing `r.x + r.y` you could write `open r in x + y`. This would be especially useful for imports, as in `open import foo.sw in ...`. However, it turns out to be problematic: the only way to figure out the free variables in `open e1 in e2` is if you know the *type* of `e1`, so you know which names it binds in `e2`. (In all other cases, bound names can be determined statically from the *syntax*.) However, in our current codebase there is one place where we get the free variables of an untyped term: we decide at parse time whether definitions are recursive (and fill in a boolean to that effect) by checking whether the name of the thing being defined occurs free in its body. One idea might be to either fill in this boolean later, after typechecking, or simply compute it on the fly when it is needed; currently this is slightly problematic because we need the info about whether a definition is recursive when doing capability checking, which is currently independent of typechecking. I was also planning to add `export` keyword which creates a record with all names currently in scope --- this could be useful for creating modules. However, I realized that very often you don't really want *all* in-scope names, so it's not that useful to have `export`. Instead I added record punning so if you have several variables `x`, `y`, `z` in scope that you want to package into a record, you can just write `[x, y, z]` instead of `[x=x, y=y, z=z]`. Though it could still be rather annoying if you wanted to make a module with tons of useful functions and had to list them all in a record at the end... Originally I started adding records because I thought it would be a helpful way to organize modules and imports. However, that would require having records that contain fields with polymorphic types. I am not yet sure how that would play out. It would essentially allow encoding arbitrary higher-rank types, so it sounds kind of scary. In any case, I'm still glad I implemented records and I learned a lot, even if they can't be used for my original motivation. I can't think of a way to make a scenario that requires the use of records. Eventually once we have proper #94 we could make a scenario where you have to communicate with another robot and send it a value of some required type. That would be a cool way to test the use of other language features like lambdas, too.
The basic idea is that we try as hard as possible to stay in "checking mode", where we have an "expected type" from the context and "push it down" through a term. We also try hard to avoid generating fresh unification type variables unless absolutely necessary. That way, when we come to something that doesn't match the expected type, we can immediately flag a specific error about that part of the code, rather than later having a generic unification error. This is mostly just refactoring, but it already improves the type error messages somewhat: * Instead of saying something like "can't unify int and text", it might say something like "expected type int, but got text" with a reference to a specific source code location. * It's probably possible to still get it to sometimes fail with a "can't unify" type error, but with only this technique I don't think we can ever 100% get rid of that. * Type errors now report a specific line and column number, not just a line number like they used to. There is still a lot more I want to do to improve the type error messages but this is a start. Closes #99.
Peyton Jones et al, in section 5.4 of Practical type inference for arbitrary-rank types, describe an apparently well-known trick for improving type error messages: instead of an infer function, write a check function which takes an expected type; we can then recover
infer
by simply generating a fresh unification variable and passing that along as the "expected type".I hope this may lay the groundwork for better error messages when type inference fails.
The text was updated successfully, but these errors were encountered: