Add Type Ascription Syntax to Resolve Issue #10 #1164

Alexander-Block · 2023-03-17T22:19:13Z

This pull request implements type ascription syntax and should resolve Issue #10.

byorgey

Thanks! I was just thinking about whether I should implement this myself since it might help with #1148 , so I'm really glad someone else did instead! 😄 This is looking very nice, there are just a few things I think need to be addressed before we can merge.

src/Swarm/Language/LSP/Hover.hs

byorgey · 2023-03-18T13:37:38Z

src/Swarm/Language/Syntax.hs

@@ -1001,6 +1009,7 @@ freeVarsS f = go S.empty
    SDef r x xty s1 -> rewrap $ SDef r x xty <$> go (S.insert (lvVar x) bound) s1
    SBind mx s1 s2 -> rewrap $ SBind mx <$> go bound s1 <*> go (maybe id (S.insert . lvVar) mx bound) s2
    SDelay m s1 -> rewrap $ SDelay m <$> go bound s1
+    SAnnotate s1 _ -> go bound s1


This should be something more like rewrap $ SAnnotate <$> go bound s1 <*> pure ty (untested). There's no reason that traversing over free variables should remove SAnnotate nodes.

I completely agree. Thank you for pointing out this mistake. I think that your proposed fix works and will change this line accordingly.

byorgey · 2023-03-18T13:43:50Z

test/unit/TestLanguagePipeline.hs

+            (valid "(3 : int) + 5")
+        , testCase
+            "invalid type ascription"
+            (process "1 : text" "1: Can't unify text and int")


Can we add a few more tests? e.g. let's at least have a test where we annotate something with a polytype. Actually, trying it out just now, this doesn't seem to work:

((\x . x) : a -> a) 3

This ought to return 3, but instead we get an error Can't unify s0 and int. I think I see what the issue is, I will add a comment to the typechecker.

I have added this test case, but I will add more test cases tomorrow.

I have added a few additional test cases. Do you still see a gap in the test coverage?

byorgey · 2023-03-18T13:53:07Z

src/Swarm/Language/Typecheck.hs

@@ -452,6 +452,9 @@ infer s@(Syntax l t) = (`catchError` addLocToTypeErr s) $ case t of
    c2' <- maybe id ((`withBinding` Forall [] a) . lvVar) mx $ infer c2
    _ <- decomposeCmdTy (c2' ^. sType)
    return $ Syntax' l (SBind mx c1' c2') (c2' ^. sType)
+  SAnnotate c pty -> do
+    uty <- skolemize $ toU pty
+    check c uty


This is the right idea but I think it has a couple issues:

The general idea is that the typechecker annotates the AST with types, but otherwise leaves the AST intact. I think even in the annotation case we should leave the annotation there instead of stripping it off (for example, because we might later want to pretty-print the term in an error message and make sure we preserve the way the user wrote it; or even more relevant, for Ability to save definitions typed at the REPL to a .sw file #7 we might want to later pretty-print the term to save it in a file, in which case it might not even typecheck any more if we got rid of the annotation). So instead of just check c uty this should say something like c' <- check c uty; return $ Syntax' l (SAnnotate c' pty) ...

The second issue is that after skolemizing the type in order to check it, we need to re-generalize so that we don't end up later trying to match skolem variables against other types when the annotated thing is used. In other words, the skolemized type should only be used internally to check that the annotated thing is valid with the given type annotation; it should not be used externally as the inferred type of the thing. Actually it may be as simple as using toU pty as the external type. So something like return $ Syntax' l (SAnnotate c' pty) (toU pty). I haven't tested this, however.

Thank you! Again I immediately agree with your point 1 that it is not a good idea to erase type signatures at this point. I should have paid closer attention to the SLam and SLet cases to realize this.

Regarding your point 2 I had to think and experiment a bit to get to a workable solution. Your proposal doesn't quite work, since toU pty is of type UPolytype where UType is needed. I think the actual problem is that I invoke skolemize to replace universally quantified type variables with instances that cannot unify with anything other than themselves; surely we have a problem if they escape to the external world. In think we should rather instantiate them with unification variables.

However, if I implement it this way, I get strange errors about a Skolem variable escaping its scope under certain circumstances. An example for this is the following expression:

((\x.x):(a->a)->(a->a))(\x:a->a.x)

I can evaluate it once and get the type (a->a)->a->a, while a second evaluation of this expression raises the error Skolem variable a would escape its scope. It seems to me that the reason for this problem lies not in the new type ascription but rather in how type annotations inside a lambda expression are handled. As the corresponding constructor in the AST is SLam LocVar (Maybe Type) (Syntax' ty) and not something like SLam LocVar (Maybe Polytype) (Syntax' ty) a type annotation in this context is not polymorphic and free type variables seem to be interpreted as a kind of type constant.

I would like to look into this problem a little longer to evaluate whether it makes sense to solve it in this merge request or enter a separate issue for it.

Fix typo. Co-authored-by: Brent Yorgey <byorgey@gmail.com>

Alexander-Block · 2023-03-19T10:47:40Z

I have noticed that type inference in the presence of type acriptions still doesn't quite work as expected after my last change. Now it is too permissive in that for example 3:a typechecks. I think I should go through the type inference again step by step for some examples and compare them to the Hendley-Milner rules to figure out the correct solution at this point.

For the moment I convert this pull request to a draft, since in this state it clearly should not be merged.

byorgey · 2023-03-19T19:35:56Z

src/Swarm/Language/Typecheck.hs

@@ -453,8 +453,10 @@ infer s@(Syntax l t) = (`catchError` addLocToTypeErr s) $ case t of
    _ <- decomposeCmdTy (c2' ^. sType)
    return $ Syntax' l (SBind mx c1' c2') (c2' ^. sType)
  SAnnotate c pty -> do
-    uty <- skolemize $ toU pty
-    check c uty
+    uty <- instantiate $ toU pty


Right, this isn't going to work because it allows things like 3 : a, as you noted. If a turns into a unification variable then it will be free to unify with int. We really do need skolemize here for checking the type of c, so that we can check that it really is polymorphic as advertised. But then we need to throw away the type we got from checking c and return a version which has been instantiated. So maybe something like this (untested):

sty <- skolemize $ toU pty uty <- instantiate $ toU pty c' <- check c sty return $ Syntax' l (SAnnotate c' pty) uty

An alternative would be to just say that polytypes are not allowed in type annotations. That would make things a lot simpler. I don't have a good sense of how big/annoying of a restriction that would be.

Thank you for this explanation. In fact I have just pushed a commit that is very similar to your proposed solution (which I have just seen afterwards). What really helped me understand what is going on here is that I read through your rather detailed and (at least in my opinion) well-written blog post on your Hendley-Milner implementation.

I have added a few more test cases to validate the correctness of my last change. So I am confident that it should work now. Essentially it only differs from your proposal by the presence of a check for escaping Skolem variables. Without it, \f. (f:forall a. a->a) 3 would typecheck, which it really shouldn't without higher ranked types. If I understand it correctly, this would be the same problem as in #211.

I see another difference now which puzzles me a little bit. You only check c against sty and use the result in the resulting AST node. I did something similar in an intermediate version of my implementation, but wondered whether this is a good idea, since check puts the unified type into the resulting AST node. So wouldn't return $ Syntax' l (SAnnotate c' pty) uty lead to a possibly problematic missmatch between uty and c' ^. sType ?

Ah, glad you found the blog post useful!

Checking for escaping skolems makes sense, and of course you're right about the potential problems with my version having a mismatch between uty and c' ^. sType. I think the way you have done it makes sense. In an abstract sense it bothers me somewhat to call check twice on c, once with a skolemized type and once with an instantiated type---but I don't actually know of a better way to do it!

additional test cases

byorgey

Looks good! There's just the one remaining issue of preserving the SAnnotate constructor in erase, but feel free to just fix that up and then merge.

byorgey · 2023-03-19T21:13:04Z

src/Swarm/Language/Syntax.hs

@@ -960,6 +967,7 @@ erase (SApp s1 s2) = TApp (eraseS s1) (eraseS s2)
 erase (SLet r x mty s1 s2) = TLet r (lvVar x) mty (eraseS s1) (eraseS s2)
 erase (SDef r x mty s) = TDef r (lvVar x) mty (eraseS s)
 erase (SBind mx s1 s2) = TBind (lvVar <$> mx) (eraseS s1) (eraseS s2)
+erase (SAnnotate s _) = eraseS s


Oh, I didn't notice this before, we should preserve the SAnnotate constructor here too, for the same reasons as elsewhere. Erasing types just means erasing the inferred types the typechecker filled in, not erasing an explicit type annotation the user wrote.

byorgey · 2023-03-19T21:19:38Z

src/Swarm/Language/Typecheck.hs

@@ -453,8 +453,10 @@ infer s@(Syntax l t) = (`catchError` addLocToTypeErr s) $ case t of
    _ <- decomposeCmdTy (c2' ^. sType)
    return $ Syntax' l (SBind mx c1' c2') (c2' ^. sType)
  SAnnotate c pty -> do
-    uty <- skolemize $ toU pty
-    check c uty
+    uty <- instantiate $ toU pty


Ah, glad you found the blog post useful!

Checking for escaping skolems makes sense, and of course you're right about the potential problems with my version having a mismatch between uty and c' ^. sType. I think the way you have done it makes sense. In an abstract sense it bothers me somewhat to call check twice on c, once with a skolemized type and once with an instantiated type---but I don't actually know of a better way to do it!

Add record types to the language: record values are written like `[x = 3, y = "hi"]` and have types like `[x : int, y : text]`. Empty and singleton records are allowed. You can project a field out of a record using standard dot notation, like `r.x`. If things named e.g. `x` and `y` are in scope, you can also write e.g. `[x, y]` as a shorthand for `[x=x, y=y]`. Closes #1093 . #153 would make this even nicer to use. One reason this is significant is that record projection is our first language construct whose type cannot be inferred, because if we see something like `r.x` all we know about the type of `r` is that it is a record type with at least one field `x`, but we don't know how many other fields it might have. Without some complex stuff like row polymorphism we can't deal with that, so we just punt and throw an error saying that we can't infer the type of a projection. To make this usable we have to do a better job checking types, a la #99 . For example `def f : [x:int] -> int = \r. r.x end` would not have type checked before, since when checking the lambda we immediately switched into inference mode, and then encountered the record projection and threw up our hands. Now we work harder to push the given function type down into the lambda so that we are still in checking mode when we get to `r.x` which makes it work. But it is probably easy to write examples of other things where this doesn't work. Eventually we will want to fully implement #99 ; in the meantime one can always add a type annotation (#1164) on the record to get around this problem. Note, I was planning to add a `open e1 in e2` syntax, which would take a record expression `e1` and "open" it locally in `e2`, so all the fields would be in scope within `e2`. For example, if we had `r = [x = 3, y = 7]` then instead of writing `r.x + r.y` you could write `open r in x + y`. This would be especially useful for imports, as in `open import foo.sw in ...`. However, it turns out to be problematic: the only way to figure out the free variables in `open e1 in e2` is if you know the *type* of `e1`, so you know which names it binds in `e2`. (In all other cases, bound names can be determined statically from the *syntax*.) However, in our current codebase there is one place where we get the free variables of an untyped term: we decide at parse time whether definitions are recursive (and fill in a boolean to that effect) by checking whether the name of the thing being defined occurs free in its body. One idea might be to either fill in this boolean later, after typechecking, or simply compute it on the fly when it is needed; currently this is slightly problematic because we need the info about whether a definition is recursive when doing capability checking, which is currently independent of typechecking. I was also planning to add `export` keyword which creates a record with all names currently in scope --- this could be useful for creating modules. However, I realized that very often you don't really want *all* in-scope names, so it's not that useful to have `export`. Instead I added record punning so if you have several variables `x`, `y`, `z` in scope that you want to package into a record, you can just write `[x, y, z]` instead of `[x=x, y=y, z=z]`. Though it could still be rather annoying if you wanted to make a module with tons of useful functions and had to list them all in a record at the end... Originally I started adding records because I thought it would be a helpful way to organize modules and imports. However, that would require having records that contain fields with polymorphic types. I am not yet sure how that would play out. It would essentially allow encoding arbitrary higher-rank types, so it sounds kind of scary. In any case, I'm still glad I implemented records and I learned a lot, even if they can't be used for my original motivation. I can't think of a way to make a scenario that requires the use of records. Eventually once we have proper #94 we could make a scenario where you have to communicate with another robot and send it a value of some required type. That would be a cool way to test the use of other language features like lambdas, too.

Alexander-Block added 4 commits March 17, 2023 23:08

type annotation syntax & test

b54d79f

pretty printing and annotate term

6c0eeec

additional unit test

8fbb24f

LSP changes for type ascription

74b7aba

Alexander-Block requested a review from byorgey March 17, 2023 22:44

byorgey requested changes Mar 18, 2023

View reviewed changes

Alexander-Block and others added 2 commits March 18, 2023 15:25

Update src/Swarm/Language/LSP/Hover.hs

604fcf5

Fix typo. Co-authored-by: Brent Yorgey <byorgey@gmail.com>

Implement remarks from code review

cd35306

Alexander-Block marked this pull request as draft March 19, 2023 10:47

byorgey reviewed Mar 19, 2023

View reviewed changes

correcting type inference for type ascription,

8194d5a

additional test cases

Alexander-Block marked this pull request as ready for review March 19, 2023 20:04

byorgey approved these changes Mar 19, 2023

View reviewed changes

byorgey linked an issue Mar 19, 2023 that may be closed by this pull request

Add type ascription syntax to the language #10

Closed

erase does not remove type ascription anymore

93426bc

Alexander-Block added the merge me Trigger the merge process of the Pull request. label Mar 19, 2023

Merge branch 'main' into create-type-ascription-syntax

5ad125d

mergify bot merged commit 8d6a2b8 into main Mar 19, 2023

mergify bot deleted the create-type-ascription-syntax branch March 19, 2023 22:05

byorgey mentioned this pull request Mar 21, 2023

Records #1148

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Type Ascription Syntax to Resolve Issue #10 #1164

Add Type Ascription Syntax to Resolve Issue #10 #1164

Alexander-Block commented Mar 17, 2023

byorgey left a comment

byorgey Mar 18, 2023

Alexander-Block Mar 18, 2023

byorgey Mar 18, 2023

Alexander-Block Mar 18, 2023

Alexander-Block Mar 19, 2023

byorgey Mar 18, 2023

Alexander-Block Mar 18, 2023

Alexander-Block commented Mar 19, 2023

byorgey Mar 19, 2023

Alexander-Block Mar 19, 2023 •

edited

Loading

Alexander-Block Mar 19, 2023 •

edited

Loading

byorgey Mar 19, 2023

byorgey left a comment

byorgey Mar 19, 2023

byorgey Mar 19, 2023

Add Type Ascription Syntax to Resolve Issue #10 #1164

Add Type Ascription Syntax to Resolve Issue #10 #1164

Conversation

Alexander-Block commented Mar 17, 2023

byorgey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alexander-Block commented Mar 19, 2023

Choose a reason for hiding this comment

Alexander-Block Mar 19, 2023 • edited Loading

Choose a reason for hiding this comment

Alexander-Block Mar 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

byorgey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Alexander-Block Mar 19, 2023 •

edited

Loading

Alexander-Block Mar 19, 2023 •

edited

Loading