Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Type Ascription Syntax to Resolve Issue #10 #1164

Merged
merged 9 commits into from
Mar 19, 2023

Conversation

Alexander-Block
Copy link
Collaborator

This pull request implements type ascription syntax and should resolve Issue #10.

@Alexander-Block Alexander-Block requested a review from byorgey March 17, 2023 22:44
Copy link
Member

@byorgey byorgey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I was just thinking about whether I should implement this myself since it might help with #1148 , so I'm really glad someone else did instead! 😄 This is looking very nice, there are just a few things I think need to be addressed before we can merge.

src/Swarm/Language/LSP/Hover.hs Outdated Show resolved Hide resolved
@@ -1001,6 +1009,7 @@ freeVarsS f = go S.empty
SDef r x xty s1 -> rewrap $ SDef r x xty <$> go (S.insert (lvVar x) bound) s1
SBind mx s1 s2 -> rewrap $ SBind mx <$> go bound s1 <*> go (maybe id (S.insert . lvVar) mx bound) s2
SDelay m s1 -> rewrap $ SDelay m <$> go bound s1
SAnnotate s1 _ -> go bound s1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be something more like rewrap $ SAnnotate <$> go bound s1 <*> pure ty (untested). There's no reason that traversing over free variables should remove SAnnotate nodes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree. Thank you for pointing out this mistake. I think that your proposed fix works and will change this line accordingly.

(valid "(3 : int) + 5")
, testCase
"invalid type ascription"
(process "1 : text" "1: Can't unify text and int")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a few more tests? e.g. let's at least have a test where we annotate something with a polytype. Actually, trying it out just now, this doesn't seem to work:

((\x . x) : a -> a) 3

This ought to return 3, but instead we get an error Can't unify s0 and int. I think I see what the issue is, I will add a comment to the typechecker.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this test case, but I will add more test cases tomorrow.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a few additional test cases. Do you still see a gap in the test coverage?

@@ -452,6 +452,9 @@ infer s@(Syntax l t) = (`catchError` addLocToTypeErr s) $ case t of
c2' <- maybe id ((`withBinding` Forall [] a) . lvVar) mx $ infer c2
_ <- decomposeCmdTy (c2' ^. sType)
return $ Syntax' l (SBind mx c1' c2') (c2' ^. sType)
SAnnotate c pty -> do
uty <- skolemize $ toU pty
check c uty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the right idea but I think it has a couple issues:

  1. The general idea is that the typechecker annotates the AST with types, but otherwise leaves the AST intact. I think even in the annotation case we should leave the annotation there instead of stripping it off (for example, because we might later want to pretty-print the term in an error message and make sure we preserve the way the user wrote it; or even more relevant, for Ability to save definitions typed at the REPL to a .sw file #7 we might want to later pretty-print the term to save it in a file, in which case it might not even typecheck any more if we got rid of the annotation). So instead of just check c uty this should say something like c' <- check c uty; return $ Syntax' l (SAnnotate c' pty) ...
  2. The second issue is that after skolemizing the type in order to check it, we need to re-generalize so that we don't end up later trying to match skolem variables against other types when the annotated thing is used. In other words, the skolemized type should only be used internally to check that the annotated thing is valid with the given type annotation; it should not be used externally as the inferred type of the thing. Actually it may be as simple as using toU pty as the external type. So something like return $ Syntax' l (SAnnotate c' pty) (toU pty). I haven't tested this, however.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Again I immediately agree with your point 1 that it is not a good idea to erase type signatures at this point. I should have paid closer attention to the SLam and SLet cases to realize this.

Regarding your point 2 I had to think and experiment a bit to get to a workable solution. Your proposal doesn't quite work, since toU pty is of type UPolytype where UType is needed. I think the actual problem is that I invoke skolemize to replace universally quantified type variables with instances that cannot unify with anything other than themselves; surely we have a problem if they escape to the external world. In think we should rather instantiate them with unification variables.

However, if I implement it this way, I get strange errors about a Skolem variable escaping its scope under certain circumstances. An example for this is the following expression:

((\x.x):(a->a)->(a->a))(\x:a->a.x)

I can evaluate it once and get the type (a->a)->a->a, while a second evaluation of this expression raises the error Skolem variable a would escape its scope. It seems to me that the reason for this problem lies not in the new type ascription but rather in how type annotations inside a lambda expression are handled. As the corresponding constructor in the AST is SLam LocVar (Maybe Type) (Syntax' ty) and not something like SLam LocVar (Maybe Polytype) (Syntax' ty) a type annotation in this context is not polymorphic and free type variables seem to be interpreted as a kind of type constant.

I would like to look into this problem a little longer to evaluate whether it makes sense to solve it in this merge request or enter a separate issue for it.

Alexander-Block and others added 2 commits March 18, 2023 15:25
Fix typo.

Co-authored-by: Brent Yorgey <byorgey@gmail.com>
@Alexander-Block
Copy link
Collaborator Author

I have noticed that type inference in the presence of type acriptions still doesn't quite work as expected after my last change. Now it is too permissive in that for example 3:a typechecks. I think I should go through the type inference again step by step for some examples and compare them to the Hendley-Milner rules to figure out the correct solution at this point.

For the moment I convert this pull request to a draft, since in this state it clearly should not be merged.

@Alexander-Block Alexander-Block marked this pull request as draft March 19, 2023 10:47
@@ -453,8 +453,10 @@ infer s@(Syntax l t) = (`catchError` addLocToTypeErr s) $ case t of
_ <- decomposeCmdTy (c2' ^. sType)
return $ Syntax' l (SBind mx c1' c2') (c2' ^. sType)
SAnnotate c pty -> do
uty <- skolemize $ toU pty
check c uty
uty <- instantiate $ toU pty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, this isn't going to work because it allows things like 3 : a, as you noted. If a turns into a unification variable then it will be free to unify with int. We really do need skolemize here for checking the type of c, so that we can check that it really is polymorphic as advertised. But then we need to throw away the type we got from checking c and return a version which has been instantiated. So maybe something like this (untested):

sty <- skolemize $ toU pty
uty <- instantiate $ toU pty
c' <- check c sty
return $ Syntax' l (SAnnotate c' pty) uty

An alternative would be to just say that polytypes are not allowed in type annotations. That would make things a lot simpler. I don't have a good sense of how big/annoying of a restriction that would be.

Copy link
Collaborator Author

@Alexander-Block Alexander-Block Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this explanation. In fact I have just pushed a commit that is very similar to your proposed solution (which I have just seen afterwards). What really helped me understand what is going on here is that I read through your rather detailed and (at least in my opinion) well-written blog post on your Hendley-Milner implementation.

I have added a few more test cases to validate the correctness of my last change. So I am confident that it should work now. Essentially it only differs from your proposal by the presence of a check for escaping Skolem variables. Without it, \f. (f:forall a. a->a) 3 would typecheck, which it really shouldn't without higher ranked types. If I understand it correctly, this would be the same problem as in #211.

Copy link
Collaborator Author

@Alexander-Block Alexander-Block Mar 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see another difference now which puzzles me a little bit. You only check c against sty and use the result in the resulting AST node. I did something similar in an intermediate version of my implementation, but wondered whether this is a good idea, since check puts the unified type into the resulting AST node. So wouldn't return $ Syntax' l (SAnnotate c' pty) uty lead to a possibly problematic missmatch between uty and c' ^. sType ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, glad you found the blog post useful!

Checking for escaping skolems makes sense, and of course you're right about the potential problems with my version having a mismatch between uty and c' ^. sType. I think the way you have done it makes sense. In an abstract sense it bothers me somewhat to call check twice on c, once with a skolemized type and once with an instantiated type---but I don't actually know of a better way to do it!

@Alexander-Block Alexander-Block marked this pull request as ready for review March 19, 2023 20:04
Copy link
Member

@byorgey byorgey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! There's just the one remaining issue of preserving the SAnnotate constructor in erase, but feel free to just fix that up and then merge.

@@ -960,6 +967,7 @@ erase (SApp s1 s2) = TApp (eraseS s1) (eraseS s2)
erase (SLet r x mty s1 s2) = TLet r (lvVar x) mty (eraseS s1) (eraseS s2)
erase (SDef r x mty s) = TDef r (lvVar x) mty (eraseS s)
erase (SBind mx s1 s2) = TBind (lvVar <$> mx) (eraseS s1) (eraseS s2)
erase (SAnnotate s _) = eraseS s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't notice this before, we should preserve the SAnnotate constructor here too, for the same reasons as elsewhere. Erasing types just means erasing the inferred types the typechecker filled in, not erasing an explicit type annotation the user wrote.

@@ -453,8 +453,10 @@ infer s@(Syntax l t) = (`catchError` addLocToTypeErr s) $ case t of
_ <- decomposeCmdTy (c2' ^. sType)
return $ Syntax' l (SBind mx c1' c2') (c2' ^. sType)
SAnnotate c pty -> do
uty <- skolemize $ toU pty
check c uty
uty <- instantiate $ toU pty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, glad you found the blog post useful!

Checking for escaping skolems makes sense, and of course you're right about the potential problems with my version having a mismatch between uty and c' ^. sType. I think the way you have done it makes sense. In an abstract sense it bothers me somewhat to call check twice on c, once with a skolemized type and once with an instantiated type---but I don't actually know of a better way to do it!

@byorgey byorgey linked an issue Mar 19, 2023 that may be closed by this pull request
@Alexander-Block Alexander-Block added the merge me Trigger the merge process of the Pull request. label Mar 19, 2023
@mergify mergify bot merged commit 8d6a2b8 into main Mar 19, 2023
@mergify mergify bot deleted the create-type-ascription-syntax branch March 19, 2023 22:05
@byorgey byorgey mentioned this pull request Mar 21, 2023
mergify bot pushed a commit that referenced this pull request Mar 25, 2023
Add record types to the language: record values are written like `[x = 3, y = "hi"]` and have types like `[x : int, y : text]`.  Empty and singleton records are allowed.  You can project a field out of a record using standard dot notation, like `r.x`.  If things named e.g. `x` and `y` are in scope, you can also write e.g. `[x, y]` as a shorthand for `[x=x, y=y]`.

Closes #1093 .

#153 would make this even nicer to use.

One reason this is significant is that record projection is our first language construct whose type cannot be inferred, because if we see something like `r.x` all we know about the type of `r` is that it is a record type with at least one field `x`, but we don't know how many other fields it might have.  Without some complex stuff like row polymorphism we can't deal with that, so we just punt and throw an error saying that we can't infer the type of a projection.  To make this usable we have to do a better job checking types, a la #99 . For example `def f : [x:int] -> int = \r. r.x end` would not have type checked before, since when checking the lambda we immediately switched into inference mode, and then encountered the record projection and threw up our hands.  Now we work harder to push the given function type down into the lambda so that we are still in checking mode when we get to `r.x` which makes it work.  But it is probably easy to write examples of other things where this doesn't work.  Eventually we will want to fully implement #99 ; in the meantime one can always add a type annotation (#1164) on the record to get around this problem.

Note, I was planning to add a `open e1 in e2` syntax, which would take a record expression `e1` and "open" it locally in `e2`, so all the fields would be in scope within `e2`.  For example, if we had  `r = [x = 3, y = 7]` then instead of writing `r.x + r.y` you could write `open r in x + y`.  This would be especially useful for imports, as in `open import foo.sw in ...`.  However, it turns out to be problematic: the only way to figure out the free variables in `open e1 in e2` is if you know the *type* of `e1`, so you know which names it binds in `e2`.  (In all other cases, bound names can be determined statically from the *syntax*.)  However, in our current codebase there is one place where we get the free variables of an untyped term: we decide at parse time whether definitions are recursive (and fill in a boolean to that effect) by checking whether the name of the thing being defined occurs free in its body.  One idea might be to either fill in this boolean later, after typechecking, or simply compute it on the fly when it is needed; currently this is slightly problematic because we need the info about whether a definition is recursive when doing capability checking, which is currently independent of typechecking.

I was also planning to add `export` keyword which creates a record with all names currently in scope --- this could be useful for creating modules.  However, I realized that very often you don't really want *all* in-scope names, so it's not that useful to have `export`.  Instead I added record punning so if you have several variables `x`, `y`, `z` in scope that you want to package into a record, you can just write `[x, y, z]` instead of `[x=x, y=y, z=z]`.  Though it could still be rather annoying if you wanted to make a module with tons of useful functions and had to list them all in a record at the end...

Originally I started adding records because I thought it would be a helpful way to organize modules and imports.  However, that would require having records that contain fields with polymorphic types.  I am not yet sure how that would play out.  It would essentially allow encoding arbitrary higher-rank types, so it sounds kind of scary.  In any case, I'm still glad I implemented records and I learned a lot, even if they can't be used for my original motivation.

I can't think of a way to make a scenario that requires the use of records.  Eventually once we have proper #94 we could make a scenario where you have to communicate with another robot and send it a value of some required type.  That would be a cool way to test the use of other language features like lambdas, too.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge me Trigger the merge process of the Pull request.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add type ascription syntax to the language
2 participants