Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redo type checker using "propagate types inward" trick #99

Closed
byorgey opened this issue Sep 28, 2021 · 7 comments · Fixed by #1283
Closed

Redo type checker using "propagate types inward" trick #99

byorgey opened this issue Sep 28, 2021 · 7 comments · Fixed by #1283
Assignees
Labels
C-Moderate Effort Should take a moderate amount of time to address. L-Type inference The process of inferring the type of a Swarm expression. S-Nice to have The bug fix or feature would be nice but doesn't currently have much negative impact. Z-Refactoring This issue is about restructuring the code without changing the behaviour to improve code quality. Z-User Experience This issue seeks to make the game more enjoyable to play.

Comments

@byorgey
Copy link
Member

byorgey commented Sep 28, 2021

Peyton Jones et al, in section 5.4 of Practical type inference for arbitrary-rank types, describe an apparently well-known trick for improving type error messages: instead of an infer function, write a check function which takes an expected type; we can then recover infer by simply generating a fresh unification variable and passing that along as the "expected type".

I hope this may lay the groundwork for better error messages when type inference fails.

@byorgey byorgey added Z-Refactoring This issue is about restructuring the code without changing the behaviour to improve code quality. C-Moderate Effort Should take a moderate amount of time to address. S-Nice to have The bug fix or feature would be nice but doesn't currently have much negative impact. L-Type inference The process of inferring the type of a Swarm expression. labels Sep 28, 2021
@TristanCacqueray
Copy link
Collaborator

To add source location to the TypeErr message, I tried to thread a TermLoc data into the TypeF and ideed that seems rather tricky =)

My first attempt looks like this:

data TermLoc = TermLoc {start :: Int, end :: Int}
  deriving (Eq, Show, Data)

-- Add TermLoc to the relevant terms
data Term
  = TVar TermLoc Var
  | TDef TermLoc Var (Maybe Polytype) Term
  | ...

which could be parsed like so:

locatedTerm :: (TermLoc -> a -> Term) -> Parser a -> Parser Term
locatedTerm mkTerm parser = do
  start <- getOffset
  a <- parser
  end <- getOffset
  pure $ mkTerm (TermLoc start' end') a

so that we could add the location of UnboundVar and DefNotTopLevel type errors. This is a rather simple solution, could it also work for the "propagate types inward" trick?

@byorgey
Copy link
Member Author

byorgey commented Sep 29, 2021

To add source location to the TypeErr message, I tried to thread a TermLoc data into the TypeF and ideed that seems rather tricky =)

Note that to add a TermLoc to types, instead of adding it to TypeF directly, it could probably be done something like so:

type Type = Fix (Compose ((,) TermLoc) TypeF)

with lots of utility functions and pattern synonyms to make it not painful to use. Being able to easily decorate every node like this is one of the benefits of using a two-level type. Off the top of my head though I'm not sure how this would interact with unification; we would need to tell the library to not worry about TermLocs that don't match.

However, in any case I'm not sure adding TermLocs to types like this makes sense. Lots of Type values arise in ways that don't correspond to a particular place in the code.

Yes, if we do the "propagate types inward" trick, we could probably pick up a TermLoc from the term we are processing at the point where we get a unification failure. Pointing to the term where unification happened to fail is a notoriously bad way to identify the real location of the error, but it would be way better than nothing.

@TristanCacqueray
Copy link
Collaborator

Thank you for all the details. I'd rather wait for a new type checker implementation that could use the failing term loc when throwing the error, than fiddling with the current library.

You also mention another solution you'd like to try in #100 , and perhaps there is another technique we can use to manage the loc. For example, the Dhall.Core.Expr data type is using a Note Src Expr constructor to wrap terms when needed.

@TristanCacqueray
Copy link
Collaborator

TristanCacqueray commented Sep 29, 2021

As another example, the Grace.Syntax is using something like data Syntax = Syntax { loc : TermLoc, term : Term } where the recursive Term use that Syntax type to keep track of nested loc. Similarly the Grace.Type uses a similar technique.

@byorgey
Copy link
Member Author

byorgey commented Sep 29, 2021

As another example, the Grace.Syntax is using something like data Syntax = Syntax { loc : TermLoc, term : Term } where the recursive Term use that Syntax type to keep track of nested loc.

Yes, that's a nicer approach than manually adding a TermLoc to every constructor of Term. It's isomorphic to the Fix (Compose ((,) TermLoc) thing.

@byorgey
Copy link
Member Author

byorgey commented Feb 16, 2022

I spent a little more time thinking about this and think I understand why it would be an improvement. Suppose we have a definition like this:

def p : int * int = (3, true)

We call check on the body of the definition with the given type signature. But what happens now is that we then immediately switch to calling infer, which tries to infer some type for the body (in this case int * bool), and then generates a constraint that the given type int * int must be equal to the inferred type int * bool. This constraint is then given to the constraint solver, which eventually complains that it can't unify int and bool. But at that point we don't really know where the constraint came from.

But what if instead of having infer do most of the work, we have check do most of the work. Checking that a pair has a product type would turn into recursive checks that the left and right sides have the specified types. In this example we would reach a point where we are checking that true has type int, which would raise an error---but at this point we know exactly which part of the term we are checking, and we can indicate that in the error message.

This doesn't always work, and there are still other problems (such as the fact that the place where it actually fails might not be the real error at all), but it's still better than what we have now.

@byorgey byorgey added the Z-User Experience This issue seeks to make the game more enjoyable to play. label Apr 30, 2022
@byorgey byorgey self-assigned this Jun 20, 2022
@byorgey byorgey removed their assignment Aug 23, 2022
@byorgey
Copy link
Member Author

byorgey commented Dec 18, 2022

As a proof of concept, I started by modifying the check function to look like this:

-- | @check t ty@ checks that @t@ has type @ty@.
check :: Syntax -> UType -> Infer ()
check s@(Syntax _ t) ty = case t of
  SPair s1 s2 -> do
    (ty1, ty2) <- decomposeProdTy ty
    check s1 ty1
    check s2 ty2
  _ -> do
    ty' <- infer s
    _ <- ty =:= ty'
    return ()

decomposeProdTy :: UType -> Infer (UType, UType)
decomposeProdTy (UTyProd ty1 ty2) = return (ty1, ty2)
decomposeProdTy ty = do
  ty1 <- fresh
  ty2 <- fresh
  ty =:= UTyProd ty1 ty2
  return (ty1, ty2)

I put this on the feature/propagate-types branch.

It doesn't yet produce any better error messages, but I think this proves that we can improve things incrementally, without having to do an entire rewrite of the typechecker. To actually get better error messages, we have to try as hard as possible to catch errors that we can flag right at checking time, instead of putting them all off to the unifier. For example, in this case once we recursively check that 3 and true both have type int, when we get to the call to check true int, we need to notice that true must have type bool, and that we can immediately throw an error using the specific SrcLoc info we have, since we have a concrete type int that we can see is not equal to bool without having to call the unifier (as opposed to something like a unification variable). We'll probably want to make a variant of =:= which does some concrete checking first before generating a unification constraint only as a last resort.

byorgey added a commit that referenced this issue Dec 18, 2022
@byorgey byorgey mentioned this issue Mar 8, 2023
byorgey added a commit that referenced this issue Mar 15, 2023
mergify bot pushed a commit that referenced this issue Mar 25, 2023
Add record types to the language: record values are written like `[x = 3, y = "hi"]` and have types like `[x : int, y : text]`.  Empty and singleton records are allowed.  You can project a field out of a record using standard dot notation, like `r.x`.  If things named e.g. `x` and `y` are in scope, you can also write e.g. `[x, y]` as a shorthand for `[x=x, y=y]`.

Closes #1093 .

#153 would make this even nicer to use.

One reason this is significant is that record projection is our first language construct whose type cannot be inferred, because if we see something like `r.x` all we know about the type of `r` is that it is a record type with at least one field `x`, but we don't know how many other fields it might have.  Without some complex stuff like row polymorphism we can't deal with that, so we just punt and throw an error saying that we can't infer the type of a projection.  To make this usable we have to do a better job checking types, a la #99 . For example `def f : [x:int] -> int = \r. r.x end` would not have type checked before, since when checking the lambda we immediately switched into inference mode, and then encountered the record projection and threw up our hands.  Now we work harder to push the given function type down into the lambda so that we are still in checking mode when we get to `r.x` which makes it work.  But it is probably easy to write examples of other things where this doesn't work.  Eventually we will want to fully implement #99 ; in the meantime one can always add a type annotation (#1164) on the record to get around this problem.

Note, I was planning to add a `open e1 in e2` syntax, which would take a record expression `e1` and "open" it locally in `e2`, so all the fields would be in scope within `e2`.  For example, if we had  `r = [x = 3, y = 7]` then instead of writing `r.x + r.y` you could write `open r in x + y`.  This would be especially useful for imports, as in `open import foo.sw in ...`.  However, it turns out to be problematic: the only way to figure out the free variables in `open e1 in e2` is if you know the *type* of `e1`, so you know which names it binds in `e2`.  (In all other cases, bound names can be determined statically from the *syntax*.)  However, in our current codebase there is one place where we get the free variables of an untyped term: we decide at parse time whether definitions are recursive (and fill in a boolean to that effect) by checking whether the name of the thing being defined occurs free in its body.  One idea might be to either fill in this boolean later, after typechecking, or simply compute it on the fly when it is needed; currently this is slightly problematic because we need the info about whether a definition is recursive when doing capability checking, which is currently independent of typechecking.

I was also planning to add `export` keyword which creates a record with all names currently in scope --- this could be useful for creating modules.  However, I realized that very often you don't really want *all* in-scope names, so it's not that useful to have `export`.  Instead I added record punning so if you have several variables `x`, `y`, `z` in scope that you want to package into a record, you can just write `[x, y, z]` instead of `[x=x, y=y, z=z]`.  Though it could still be rather annoying if you wanted to make a module with tons of useful functions and had to list them all in a record at the end...

Originally I started adding records because I thought it would be a helpful way to organize modules and imports.  However, that would require having records that contain fields with polymorphic types.  I am not yet sure how that would play out.  It would essentially allow encoding arbitrary higher-rank types, so it sounds kind of scary.  In any case, I'm still glad I implemented records and I learned a lot, even if they can't be used for my original motivation.

I can't think of a way to make a scenario that requires the use of records.  Eventually once we have proper #94 we could make a scenario where you have to communicate with another robot and send it a value of some required type.  That would be a cool way to test the use of other language features like lambdas, too.
@byorgey byorgey self-assigned this May 24, 2023
@mergify mergify bot closed this as completed in #1283 May 28, 2023
mergify bot pushed a commit that referenced this issue May 28, 2023
The basic idea is that we try as hard as possible to stay in "checking mode", where we have an "expected type" from the context and "push it down" through a term.  We also try hard to avoid generating fresh unification type variables unless absolutely necessary.  That way, when we come to something that doesn't match the expected type, we can immediately flag a specific error about that part of the code, rather than later having a generic unification error.

This is mostly just refactoring, but it already improves the type error messages somewhat:

* Instead of saying something like "can't unify int and text", it might say something like "expected type int, but got text" with a reference to a specific source code location.
*  It's probably possible to still get it to sometimes fail with a "can't unify" type error, but with only this technique I don't think we can ever 100% get rid of that.
* Type errors now report a specific line and column number, not just a line number like they used to. 

There is still a lot more I want to do to improve the type error messages but this is a start.

Closes #99.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-Moderate Effort Should take a moderate amount of time to address. L-Type inference The process of inferring the type of a Swarm expression. S-Nice to have The bug fix or feature would be nice but doesn't currently have much negative impact. Z-Refactoring This issue is about restructuring the code without changing the behaviour to improve code quality. Z-User Experience This issue seeks to make the game more enjoyable to play.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants