-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Type Ascription Syntax to Resolve Issue #10 #1164
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I was just thinking about whether I should implement this myself since it might help with #1148 , so I'm really glad someone else did instead! 😄 This is looking very nice, there are just a few things I think need to be addressed before we can merge.
src/Swarm/Language/Syntax.hs
Outdated
@@ -1001,6 +1009,7 @@ freeVarsS f = go S.empty | |||
SDef r x xty s1 -> rewrap $ SDef r x xty <$> go (S.insert (lvVar x) bound) s1 | |||
SBind mx s1 s2 -> rewrap $ SBind mx <$> go bound s1 <*> go (maybe id (S.insert . lvVar) mx bound) s2 | |||
SDelay m s1 -> rewrap $ SDelay m <$> go bound s1 | |||
SAnnotate s1 _ -> go bound s1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be something more like rewrap $ SAnnotate <$> go bound s1 <*> pure ty
(untested). There's no reason that traversing over free variables should remove SAnnotate
nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I completely agree. Thank you for pointing out this mistake. I think that your proposed fix works and will change this line accordingly.
(valid "(3 : int) + 5") | ||
, testCase | ||
"invalid type ascription" | ||
(process "1 : text" "1: Can't unify text and int") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a few more tests? e.g. let's at least have a test where we annotate something with a polytype. Actually, trying it out just now, this doesn't seem to work:
((\x . x) : a -> a) 3
This ought to return 3
, but instead we get an error Can't unify s0 and int
. I think I see what the issue is, I will add a comment to the typechecker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added this test case, but I will add more test cases tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added a few additional test cases. Do you still see a gap in the test coverage?
src/Swarm/Language/Typecheck.hs
Outdated
@@ -452,6 +452,9 @@ infer s@(Syntax l t) = (`catchError` addLocToTypeErr s) $ case t of | |||
c2' <- maybe id ((`withBinding` Forall [] a) . lvVar) mx $ infer c2 | |||
_ <- decomposeCmdTy (c2' ^. sType) | |||
return $ Syntax' l (SBind mx c1' c2') (c2' ^. sType) | |||
SAnnotate c pty -> do | |||
uty <- skolemize $ toU pty | |||
check c uty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the right idea but I think it has a couple issues:
- The general idea is that the typechecker annotates the AST with types, but otherwise leaves the AST intact. I think even in the annotation case we should leave the annotation there instead of stripping it off (for example, because we might later want to pretty-print the term in an error message and make sure we preserve the way the user wrote it; or even more relevant, for Ability to save definitions typed at the REPL to a
.sw
file #7 we might want to later pretty-print the term to save it in a file, in which case it might not even typecheck any more if we got rid of the annotation). So instead of justcheck c uty
this should say something likec' <- check c uty; return $ Syntax' l (SAnnotate c' pty) ...
- The second issue is that after skolemizing the type in order to check it, we need to re-generalize so that we don't end up later trying to match skolem variables against other types when the annotated thing is used. In other words, the skolemized type should only be used internally to check that the annotated thing is valid with the given type annotation; it should not be used externally as the inferred type of the thing. Actually it may be as simple as using
toU pty
as the external type. So something likereturn $ Syntax' l (SAnnotate c' pty) (toU pty)
. I haven't tested this, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Again I immediately agree with your point 1 that it is not a good idea to erase type signatures at this point. I should have paid closer attention to the SLam
and SLet
cases to realize this.
Regarding your point 2 I had to think and experiment a bit to get to a workable solution. Your proposal doesn't quite work, since toU pty
is of type UPolytype
where UType
is needed. I think the actual problem is that I invoke skolemize
to replace universally quantified type variables with instances that cannot unify with anything other than themselves; surely we have a problem if they escape to the external world. In think we should rather instantiate
them with unification variables.
However, if I implement it this way, I get strange errors about a Skolem variable escaping its scope under certain circumstances. An example for this is the following expression:
((\x.x):(a->a)->(a->a))(\x:a->a.x)
I can evaluate it once and get the type (a->a)->a->a
, while a second evaluation of this expression raises the error Skolem variable a would escape its scope
. It seems to me that the reason for this problem lies not in the new type ascription but rather in how type annotations inside a lambda expression are handled. As the corresponding constructor in the AST is SLam LocVar (Maybe Type) (Syntax' ty)
and not something like SLam LocVar (Maybe Polytype) (Syntax' ty)
a type annotation in this context is not polymorphic and free type variables seem to be interpreted as a kind of type constant.
I would like to look into this problem a little longer to evaluate whether it makes sense to solve it in this merge request or enter a separate issue for it.
Fix typo. Co-authored-by: Brent Yorgey <byorgey@gmail.com>
I have noticed that type inference in the presence of type acriptions still doesn't quite work as expected after my last change. Now it is too permissive in that for example For the moment I convert this pull request to a draft, since in this state it clearly should not be merged. |
src/Swarm/Language/Typecheck.hs
Outdated
@@ -453,8 +453,10 @@ infer s@(Syntax l t) = (`catchError` addLocToTypeErr s) $ case t of | |||
_ <- decomposeCmdTy (c2' ^. sType) | |||
return $ Syntax' l (SBind mx c1' c2') (c2' ^. sType) | |||
SAnnotate c pty -> do | |||
uty <- skolemize $ toU pty | |||
check c uty | |||
uty <- instantiate $ toU pty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this isn't going to work because it allows things like 3 : a
, as you noted. If a
turns into a unification variable then it will be free to unify with int
. We really do need skolemize
here for checking the type of c
, so that we can check that it really is polymorphic as advertised. But then we need to throw away the type we got from checking c
and return a version which has been instantiate
d. So maybe something like this (untested):
sty <- skolemize $ toU pty
uty <- instantiate $ toU pty
c' <- check c sty
return $ Syntax' l (SAnnotate c' pty) uty
An alternative would be to just say that polytypes are not allowed in type annotations. That would make things a lot simpler. I don't have a good sense of how big/annoying of a restriction that would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this explanation. In fact I have just pushed a commit that is very similar to your proposed solution (which I have just seen afterwards). What really helped me understand what is going on here is that I read through your rather detailed and (at least in my opinion) well-written blog post on your Hendley-Milner implementation.
I have added a few more test cases to validate the correctness of my last change. So I am confident that it should work now. Essentially it only differs from your proposal by the presence of a check for escaping Skolem variables. Without it, \f. (f:forall a. a->a) 3
would typecheck, which it really shouldn't without higher ranked types. If I understand it correctly, this would be the same problem as in #211.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see another difference now which puzzles me a little bit. You only check c
against sty
and use the result in the resulting AST node. I did something similar in an intermediate version of my implementation, but wondered whether this is a good idea, since check
puts the unified type into the resulting AST node. So wouldn't return $ Syntax' l (SAnnotate c' pty) uty
lead to a possibly problematic missmatch between uty
and c' ^. sType
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, glad you found the blog post useful!
Checking for escaping skolems makes sense, and of course you're right about the potential problems with my version having a mismatch between uty
and c' ^. sType
. I think the way you have done it makes sense. In an abstract sense it bothers me somewhat to call check
twice on c
, once with a skolemized type and once with an instantiated type---but I don't actually know of a better way to do it!
additional test cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! There's just the one remaining issue of preserving the SAnnotate
constructor in erase
, but feel free to just fix that up and then merge.
src/Swarm/Language/Syntax.hs
Outdated
@@ -960,6 +967,7 @@ erase (SApp s1 s2) = TApp (eraseS s1) (eraseS s2) | |||
erase (SLet r x mty s1 s2) = TLet r (lvVar x) mty (eraseS s1) (eraseS s2) | |||
erase (SDef r x mty s) = TDef r (lvVar x) mty (eraseS s) | |||
erase (SBind mx s1 s2) = TBind (lvVar <$> mx) (eraseS s1) (eraseS s2) | |||
erase (SAnnotate s _) = eraseS s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I didn't notice this before, we should preserve the SAnnotate
constructor here too, for the same reasons as elsewhere. Erasing types just means erasing the inferred types the typechecker filled in, not erasing an explicit type annotation the user wrote.
src/Swarm/Language/Typecheck.hs
Outdated
@@ -453,8 +453,10 @@ infer s@(Syntax l t) = (`catchError` addLocToTypeErr s) $ case t of | |||
_ <- decomposeCmdTy (c2' ^. sType) | |||
return $ Syntax' l (SBind mx c1' c2') (c2' ^. sType) | |||
SAnnotate c pty -> do | |||
uty <- skolemize $ toU pty | |||
check c uty | |||
uty <- instantiate $ toU pty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, glad you found the blog post useful!
Checking for escaping skolems makes sense, and of course you're right about the potential problems with my version having a mismatch between uty
and c' ^. sType
. I think the way you have done it makes sense. In an abstract sense it bothers me somewhat to call check
twice on c
, once with a skolemized type and once with an instantiated type---but I don't actually know of a better way to do it!
Add record types to the language: record values are written like `[x = 3, y = "hi"]` and have types like `[x : int, y : text]`. Empty and singleton records are allowed. You can project a field out of a record using standard dot notation, like `r.x`. If things named e.g. `x` and `y` are in scope, you can also write e.g. `[x, y]` as a shorthand for `[x=x, y=y]`. Closes #1093 . #153 would make this even nicer to use. One reason this is significant is that record projection is our first language construct whose type cannot be inferred, because if we see something like `r.x` all we know about the type of `r` is that it is a record type with at least one field `x`, but we don't know how many other fields it might have. Without some complex stuff like row polymorphism we can't deal with that, so we just punt and throw an error saying that we can't infer the type of a projection. To make this usable we have to do a better job checking types, a la #99 . For example `def f : [x:int] -> int = \r. r.x end` would not have type checked before, since when checking the lambda we immediately switched into inference mode, and then encountered the record projection and threw up our hands. Now we work harder to push the given function type down into the lambda so that we are still in checking mode when we get to `r.x` which makes it work. But it is probably easy to write examples of other things where this doesn't work. Eventually we will want to fully implement #99 ; in the meantime one can always add a type annotation (#1164) on the record to get around this problem. Note, I was planning to add a `open e1 in e2` syntax, which would take a record expression `e1` and "open" it locally in `e2`, so all the fields would be in scope within `e2`. For example, if we had `r = [x = 3, y = 7]` then instead of writing `r.x + r.y` you could write `open r in x + y`. This would be especially useful for imports, as in `open import foo.sw in ...`. However, it turns out to be problematic: the only way to figure out the free variables in `open e1 in e2` is if you know the *type* of `e1`, so you know which names it binds in `e2`. (In all other cases, bound names can be determined statically from the *syntax*.) However, in our current codebase there is one place where we get the free variables of an untyped term: we decide at parse time whether definitions are recursive (and fill in a boolean to that effect) by checking whether the name of the thing being defined occurs free in its body. One idea might be to either fill in this boolean later, after typechecking, or simply compute it on the fly when it is needed; currently this is slightly problematic because we need the info about whether a definition is recursive when doing capability checking, which is currently independent of typechecking. I was also planning to add `export` keyword which creates a record with all names currently in scope --- this could be useful for creating modules. However, I realized that very often you don't really want *all* in-scope names, so it's not that useful to have `export`. Instead I added record punning so if you have several variables `x`, `y`, `z` in scope that you want to package into a record, you can just write `[x, y, z]` instead of `[x=x, y=y, z=z]`. Though it could still be rather annoying if you wanted to make a module with tons of useful functions and had to list them all in a record at the end... Originally I started adding records because I thought it would be a helpful way to organize modules and imports. However, that would require having records that contain fields with polymorphic types. I am not yet sure how that would play out. It would essentially allow encoding arbitrary higher-rank types, so it sounds kind of scary. In any case, I'm still glad I implemented records and I learned a lot, even if they can't be used for my original motivation. I can't think of a way to make a scenario that requires the use of records. Eventually once we have proper #94 we could make a scenario where you have to communicate with another robot and send it a value of some required type. That would be a cool way to test the use of other language features like lambdas, too.
This pull request implements type ascription syntax and should resolve Issue #10.