Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

go/types, types2: missing types for unions #50093

Closed
findleyr opened this issue Dec 10, 2021 · 27 comments
Closed

go/types, types2: missing types for unions #50093

findleyr opened this issue Dec 10, 2021 · 27 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Milestone

Comments

@findleyr
Copy link
Member

findleyr commented Dec 10, 2021

Pointed out in slack by @dominikh, we're not recording types for union (T1 | T2) or approximation (~T) elements in interfaces, or inlined in constraints. This is inconsistent and may be the first time we don't record a type for a binary or unary expression.

We should consider changing this behavior to record the relevant union type. It is not clear what to do in the case of A | B | C: this expression consists of two binary expressions, but we only build one union in the type checker.

@findleyr
Copy link
Member Author

CC @griesemer

@findleyr findleyr added okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 release-blocker labels Dec 10, 2021
@findleyr findleyr added this to the Go1.18 milestone Dec 10, 2021
@findleyr findleyr added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Dec 10, 2021
@griesemer griesemer self-assigned this Dec 11, 2021
@griesemer
Copy link
Contributor

I don't believe this is a release blocker. While the new notations look like expressions, the syntax itself doesn't use expression syntax but is specific to interface types. For instance, the a | b | c is more akin to a list than an expression (a | b) | c (never mind that we represent it as binary expressions in the AST - that's an implementation detail). Similarly, while ~a looks like a unary operator, the ~ is simply a marker. Note that ~(a | b) is not permitted. All the relevant information needed about the interface is still exposed, I believe.

In summary it's not clear to me that we need to do anything here but perhaps update the API documentation. (That is not to say that we don't need to do more; I am just not convinced yet.)

@dominikh Is there anything important missing in the information provided through the API but for what might look like an inconsistency?

@dominikh
Copy link
Member

dominikh commented Dec 11, 2021

Personally I'm less concerned with missing information, and more with a change in behavior. I have existing code that looks like TypeOf(binop).Underlying(), which is valid for all existing Go code. However, with type lists, that can now panic, because TypeOf can now return nil for some binary expressions whereas before it never could.

This isn't technically a backwards incompatible change; types.Info.Types and types.Info.TypeOf both document that nils are possible, but I reckon I'm not the only user of go/types that relied on the type always being non-nil for binary expressions.

That is, the problem is exactly the implementation detail that type lists are expressed as binary expressions in the AST. Walking the AST will encounter these expressions, and existing code, written before type parameters, will assume that they're ordinary expressions.

@griesemer
Copy link
Contributor

Understood but unions are not binary expressions - they just happened to be represented as binary expressions - and the only reason for that is to minimize the changes required to the AST. It would have been less work for the implementation (parser and type checker) if a union were a new AST node, but we wanted to avoid that. Again note that something like (a | b) | (c | d) is not permitted syntactically.

To be more concrete: The union int | string | bool is represented as (int | string) | bool. There is no type nor underlying type associated with this - this notation specifies a type set. Yes, we could (would) use a union for this, but that may break all kinds of other assumptions. We've tried to make sure that unions only appear as parts of interfaces and don't "escape" otherwise. If they were escaping, I suspect you'd have to deal with unions in many places where you currently don't need to know anything about them.

It seems to me that it's going to be less work to recognize that you're walking the internals of an interface and that binary expressions need to be handled differently. That is, rather than generalizing all code, it seems simpler to specialize this one case. This would be what you'd do if there was a new union node, after all.

@dominikh
Copy link
Member

It seems to me that it's going to be less work to recognize that you're walking the internals of an interface and that binary expressions need to be handled differently. That is, rather than generalizing all code, it seems simpler to specialize this one case. This would be what you'd do if there was a new union node, after all.

I disagree. I think that handling a new node type requires less work than adding special handling of an existing node type.

  • If unions were a new node type, then a lot of existing uses of go/ast would continue to work as-is. For example, most analyzes don't care about all kinds of nodes, but only a subset. An analysis that flags x + 0 only cares about binary expressions, for example.
  • Code that cares about all node types is usually set up to fail in an easily recognizable way when encountering a new node type, for example by using an exhaustive type switch (one that panics in its default branch). In contrast, code that doesn't expect BinaryExprs to represent unions will panic in unexpected ways.
  • Handling the context of a node is cumbersome in go/ast. You either have to maintain your own stack, or use the helpers in astutil. Existing code that never had to consider the context is doing neither, and is just looking at a node at a time using ast.Inspect. In contrast, handling a new node type is usually just another case in a type switch.
  • Unions can show up in at least two places, because syntactically the interface can be implicit. This makes it more intricate to handle unions. It may also require further changes in the future should unions be allowed in more places.
  • My guess is that a lot more code will want to look at actual binary expressions than at unions. And virtually all code looking at binary expressions will have to check the context.

It's quite possible that I'm overlooking all of the complications that a new node type would cause, but in the context of Staticcheck, I cannot think of any case where I would prefer the reuse of BinaryExpr over a new node type.

@griesemer
Copy link
Contributor

griesemer commented Dec 11, 2021

Ok, so this issue has morphed somewhat at this point.

Let's say we had such a union node, would you still need a type for it, and what should it be? Second, how do you envision the union node looks like?

It's rather late in the (release) game but I suspect a union type could simplify our code as well, so maybe it's worth trying that change and see how bad it's impact would be overall. I believe on the type checker side the impact would be small and it won't escape elsewhere. We could leave types2 alone to reduce risk on the compiler.

It seems worthwhile having all the data before making further conclusions. What do you think?

I'll do the tentative CL on the go/types and parser side to explore the impact.

@dominikh
Copy link
Member

Ok, so this issue has morphed somewhat at this point.

Right. The desire to have types for the BinaryExprs that represent unions was entirely to help existing code not break, by being handed some type. Having an actual union node type would simplify this and we wouldn't need types for the sub-unions in A | B | C | ....

Second, how do you envision the union node looks like?

Naively I would expect something like the following (omitting fields for position information)

type Union struct {
    Terms []Expr
}

var _ Expr = (*Approximation)(nil)

type Approximation struct {
  Type Expr
}

Let's say we had such a union node, would you still need a type for it, and what should it be?

Instinctively I feel like an ast.Union should map to a types.Union, but I don't know what an ast.Approximation would map to. The type of the types.Term for ~int seems to be int, so conceivably the same would make sense for the AST node.

In either case I think that these mappings should exist for completeness sake. Everything in the AST can be mapped to its respective type or object after type checking.

Consider func fn[_ int | string](), where the ast.Union is the only way to refer to the union, and there is no other way to go from ast.Union to types.Union than a direct mapping, since there is neither a named object nor an ast.InterfaceType.

And for something (contrived) like

type I interface {
    ~int | T1
    ~int | T2
}

the individual unions have types different from the overall type constraint of I.

Note that in all these cases, I'm arguing based on what I think makes sense. I haven't yet encountered actual needs for these types, but I've also only just begun preparing Staticcheck for type parameters and this was one of the first issues I ran into.

@findleyr
Copy link
Member Author

findleyr commented Dec 11, 2021

This is somewhat unfortunate. I agree with your arguments -- by analogy with StarExpr, it probably would have been more consistent to have a new node type for approximation and union elements. Your characterization is accurate: we were trying to avoid adding new nodes, and there was no need for new nodes for these type expressions...

It's worth experimenting with new nodes, though I suspect it is too late to make this change. Nit: the nodes should probably be ApproximationTypeTildeExpr and UnionType, and we'd need more position information.

Being pragmatic, I also think it's OK to stick with what we've got, though I don't see much downside to recording types for the new expressions. It would be fine to produce a union for the intermediate BinaryExpr nodes, and might even (mildly) simplify the code in the type checker. It would allocate more, but would not make a meaningful difference in performance. I don't think it's a problem that this will make types.Union show up in more places -- in fact it would allow more easiliy filtering out type exprs when looking at BinaryExpr and UnaryExpr nodes.

@griesemer
Copy link
Contributor

Here's a much simpler solution: we can just introduce a new token, say token.UNION to represent the | token in unions. token.TILDE is already only used for ~T expressions. This way, it's easy to tell these binary (or unary, for ~) expressions apart from others. The effect is the same as if there was a new ast.Union type - a client can easily tell what kind of expression there is.

If we have a top-level union literal in a type parameter list, such as [P int|string|bool] we record a type for (int|string) | bool which will be an implicit interface - same as we already do (I think). For any lower level binary expressions (here: (int|string)) we don't record a type since it doesn't make much sense (and only costs memory and time). It's easy for an inspector to recognize token.UNION and token.TILDE expressions and do the desired thing.

@findleyr
Copy link
Member Author

findleyr commented Dec 12, 2021

we can just introduce a new token, say token.UNION

I think if we're introducing a new token we may as well introduce a new node type. I also don't think we should overload tokens in this way.

If we have a top-level union literal in a type parameter list, such as [P int|string|bool] we record a type for (int|string) | bool which will be an implicit interface - same as we already do (I think)

It appears we don't already do that. Here's the playground link from @dominikh's report on slack, which I should have included:
https://gotipplay.golang.org/p/LBPF8ntViqI

we don't record a type since it doesn't make much sense (and only costs memory and time)

Assuming we'd share the underlying slice storage and all the terms, recording exprs for the intervening BinaryExprs would just be a slice header. Not that different from recording constant.Value for intermediate expressions, but also not particularly valuable.

I've thought about this more. I wish we'd more strongly considered one or both of TildeExpr and UnionType -- at the time, I think we had inaccurate assumptions about what mattered with respect to AST changes. However, I also feel that what we have now is OK, after fixing the bug of not recording constraint types.

@griesemer griesemer changed the title go/types, types2: consider recording types for unions go/types, types2: missing types for unions Dec 13, 2021
@griesemer
Copy link
Contributor

@findleyr and I have been discussing our options here, including potential API changes.

But given the late stage in the release process, we concluded that we simply cannot make any API changes at this point (and for that matter, the API changes that we have made were communicated and agreed upond in the August time frame).

This leaves the need to record any missing types for union expressions: Because union expressions of the form a | b | c | ... are considered a single "unit" rather than a sequence of binary expressions ((a | b) | c) ... we will record the top-level type only. We believe that AST inspection can make the necessary decisions with that information (such as deciding to not further inspect sub-nodes).

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/371514 mentions this issue: go/types: record (top-level) union types

@findleyr
Copy link
Member Author

I think there's also a principled case to be made for consistent parsing of expressions. A | B | C is currently a valid (value) expression, and it's arguably a good thing that we parse it the same independent of context.

@dominikh
Copy link
Member

I'm sorry, but I'm feeling rather dissatisfied with this solution. It ignores most of the points I made in #50093 (comment) in particular with regard to backwards compatibility for old tools. With the current fix, existing tools will still encounter BinaryExprs without types, and will still need to add special checks handling type parameters even when they don't have to care about them.

the API changes that we have made were communicated and agreed upond in the August time frame

That is, without a usable implementation until much later, which means the changes were difficult to evaluate. I feel like the tooling ecosystem wasn't given the opportunity to test these changes and provide their feedback. Few of us would've been able to update our tools while battling with implementation bugs in the type checker and compiler. Now seems to be like a good time to test things. Furthermore, looking over #47781, it doesn't look like much attention was paid to this particular issue in the first place, and the proposal saw very little discussion overall.

The way I see it, we're stuck with an unclean API because "it's too late to change the API", but by the time useful feedback could be provided, it was already too late.

@seebs
Copy link
Contributor

seebs commented Dec 14, 2021

This seems really pretty unfortunate. If they're not binary expressions, and can't be used in code that previously worked on all binary expressions, this appears to me to violate the compatibility promise, at least in spirit -- it makes code that doesn't otherwise need to know or care about generics start panicing doing something that was previously totally safe.

My very limited experiments with AST walking have usually not involved any kind of fancy secondary processing at a higher level to make sure that I'm blocking descent into trees that are labeled as binary expressions but actually secretly internally they are not binary expressions and their sub-expressions won't have types. I don't think they should have to.

Ideally, I think, if these aren't really binary expressions, they shouldn't look like them in the syntax tree. If they are binary expressions, they should have types exactly the way other binary expressions would.

This feels like one of those things where we're going to spend the next ten years explaining this problem over and over, but won't be able to fix it because it was in a release.

@mvdan
Copy link
Member

mvdan commented Dec 14, 2021

To add my 2c: I don't think any external tool making heavy use of go/ast and go/types has fully adapted to the changes for generics. The first one looks to be Staticcheck, and Dominik has already explained how the changes are leaning towards the "breaks my code in many places" end of the spectrum.

I think there might have been a slight miscommunication when we discussed the go/ast and go/types API proposals. At least speaking for myself, all I could say is "sounds good" - it was impossible to tell whether my tools would break or not until the APIs were actually fully implemented. So I would say we had a hopeful plan back in August, but we've only started testing it for real in the past few weeks.

And, as much as it is fairly late to re-evaluate, I also think we have a small time window to make it happen. Beta1 isn't out yet, and it is looking like 1.18 will be late by a full month if not longer. I'd very much rather see a couple more last-minute API changes for the sake of breaking fewer tools once beta1/rc1/1.18 come out. It seems to me like we wouldn't even be that late, given that the changes for go/types finished landing just a week ago.

@findleyr
Copy link
Member Author

I'm sorry, but I'm feeling rather dissatisfied with this solution.

No I'm sorry. We didn't mean for closing this to be dismissive of the underlying issue; rather, we thought that it was a clear bug that we're not recording a type for the top-level binary expr, and it was perhaps unfortunate but OK that we didn't choose to use a new node type for unions. (and for the record, I argued that it was OK, not @griesemer). Based on the response, let's reopen and reconsider. We certainly don't want to "spend the next ten years explaining this problem over and over", as @seebs put it, just because we didn't want to take a step back here. (though I'm not yet convinced that that prediction is accurate). If it is significantly better to change the API now, I agree we should do it, painful though it may be.

Let me also offer a blanket apology for the timing. We've been rather focused on updating go/types and x/tools, and I wasn't expecting there to be fundamental problems with the go/ast proposal, which at the time seemed uncontroversial, probably for the same reasons as @mvdan said "sounds good".

With that said, let's try to keep this issue about making the best decision right now, given everything we know, rather than figuring out what happened. I don't mean to deflect blame: once generics lands, I plan to write up a postmortem for myself, and I'd be happy to share it and accept input from others.

As I see it, we need to decide the following, very soon:

  • Does this need to block the beta?
  • Do we need to change the current representation of unions in the AST?

(I put those in perhaps the reverse order as one would expect, because the singular most pressing issue is whether to block the beta).

Once those have been decided, we need to address the following, pretty soon:

  • What is the ideal representation for unions in the AST?
  • What updates are needed for go/types, and other downstream tools.

In my opinion, this shouldn't block the beta. My rationale for this is that a decision here might break some tools, but that is less of a problem than delaying the critical exposure we get from cutting the beta.

As for whether a change is needed, I'm not yet sure. My argument against changing the current representation essentially comes down to the fact that we already parse the value expression A | B | C as a binary expr, and there may be unexpected consequences to having the same textual expression parse differently depending on context. What would parser.ParseExpr do? What consequences could this have on tools or editors in the future, particularly if we ever allow A | B outside of constraints?

How much would it mitigate the problem if we recorded types for every binary expr in A | B | C? We didn't do this mostly because types.Union memoizes its type set, so doing this is potentially non-linear in memory. We could actually move that memoization to a map during type checking, in which case recording a type for the A | B in A | B | C would not use significant memory. This would solve the problem of BinaryExprs without a type, but of course wouldn't change the fact that a BinaryExpr can now be a type expr. However, there is already precedent (with StarExpr and Ident) for expression types that may be a value or a type. Could it be the case that there will be an adjustment period for BinaryExpr and UnaryExpr, but once that period is over things will be fine?

@findleyr findleyr reopened this Dec 14, 2021
@findleyr
Copy link
Member Author

I am marking this as a beta blocker until we reach a decision that it needn't block the beta. As I said in #50093 (comment), I don't think it should block the beta, but given that the beta is about to be cut I think we need to grab the lock temporarily until others have an opportunity to respond.

CC @golang/release

@findleyr findleyr added release-blocker and removed okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 labels Dec 14, 2021
@rsc
Copy link
Contributor

rsc commented Dec 14, 2021

Let's not block the beta on this. It affects almost no users directly. Yes, it affects tool authors, but the whole point of the beta is to get the code into the hands of both users and tools authors, so that they can give us feedback. This probably isn't the only problem they will run into.

We do expect the release to be about a month late overall, and that is okay. This is quite possibly the most significant release since Go 1. It will take time to get right, and we will give it that time. It may be that a second beta will be needed, and that would be okay too. But the most important thing we can do right now is package up what we have - which is really very good!, just not perfect - and give it a wider audience.

@findleyr findleyr added the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 14, 2021
@dominikh
Copy link
Member

To be clear, I'm not at all concerned with the beta, only the final release, and I agree with Russ.

As far as changing the AST representation is concerned: I don't have a clear favourite. I'd be equally fine with introducing a new node type, not treating type lists as binary expressions, or properly treating them as binary expressions and recording type information for all the subexpressions. I think both approaches have about the same amount of backwards compatibility for existing tools, and both approaches allow easily differentiating between type lists and "real" binary expressions. It is only the current mixture of "we're representing it as a BinaryExpr, but not treating it as one" that's causing issues.

I don't think that it'd be a big problem for BinaryExpr referring to types. Most code should be able to handle that transparently.

@cherrymui cherrymui removed the okay-after-beta1 Used by release team to mark a release-blocker issue as okay to resolve either before or after beta1 label Dec 14, 2021
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/371756 mentions this issue: go/types: externalize union type sets

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/371757 mentions this issue: go/types: record types for union subexpressions

@findleyr
Copy link
Member Author

I'd be equally fine with introducing a new node type, not treating type lists as binary expressions, or properly treating them as binary expressions and recording type information for all the subexpressions.

Unless someone feels strongly that we should use a new node type, I think it makes more sense to just treat them as proper binary expressions. Same rationale as I mentioned above: parsing A | B | C differently based on context seems problematic, if not now then in a good percentage of potential futures.

I believe the two CLs above achieve the desired result, by recording types for all subexpressions of the union, including the UnaryExpr tilde-terms. For the latter, I record a Union type with a single term, which is consistent with what we would do for the ~A constraint expression.

I think we can merge these once they're reviewed, even if we're not done discussing here. They are mostly-safe changes that just add additional information to types.Info.

gopherbot pushed a commit that referenced this issue Dec 14, 2021
Move calculated type sets for unions into a map, rather than storing
them on the Union type.

Type sets for unions only matter during calculation of interface type
sets, and to a lesser extent inside of Identical. The latter should not
be encountered during type checking, as Identical uses the precomputed
interface type set when comparing interfaces, and unions do not arise
outside of interface types.

Removing the tset field from Union potentially frees up memory, and
eliminates a source of races via calls to NewUnion and Identical. It
also sets the stage for recording Unions for every subexpression of
union terms, which preserves an existing invariant that BinaryExprs and
UnaryExprs should have a recorded type.

Updates #50093

Change-Id: I5956fa59be6b0907c3a71faeba9fa5dd8aae0d65
Reviewed-on: https://go-review.googlesource.com/c/go/+/371756
Trust: Robert Findley <rfindley@google.com>
Run-TryBot: Robert Findley <rfindley@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
gopherbot pushed a commit that referenced this issue Dec 14, 2021
Prior to unions, unary and binary expressions always had a recorded
type. Preserve this by recording a type for all unary and binary
expressions encountered while parsing a union type.

Updates #50093

Change-Id: I5ba20f37854760596350d91ea325dc98e67e115a
Reviewed-on: https://go-review.googlesource.com/c/go/+/371757
Trust: Robert Findley <rfindley@google.com>
Run-TryBot: Robert Findley <rfindley@google.com>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/372475 mentions this issue: cmd/compile/internal/types2: record types for union subexpressions

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/372474 mentions this issue: cmd/compile/internal/types2: externalize union type sets

@griesemer
Copy link
Contributor

@dominikh This is now implemented in go/types (with pending analogue changes for the compiler, though that doesn't matter for this issue). Please close this issue if you're satisfied with the fix. Thanks.

gopherbot pushed a commit that referenced this issue Dec 15, 2021
This is a port of CL 371756 from go/types to types2 with
minor adjustments due to different error handling or AST.

Updates #50093

Change-Id: Iab6a4634f8fc917bf99df439d31098624085f52a
Reviewed-on: https://go-review.googlesource.com/c/go/+/372474
Trust: Robert Griesemer <gri@golang.org>
Reviewed-by: Robert Findley <rfindley@google.com>
gopherbot pushed a commit that referenced this issue Dec 15, 2021
This is a port of CL 371757 from go/types to types2, with
minor adjustments for different error handling and AST.

It also names the added API test cases more consistently.
The same renaming was applied to the respective go/types
file.

Updates #50093

Change-Id: Iaa132106a197a207f831525432e62e9d452b17c9
Reviewed-on: https://go-review.googlesource.com/c/go/+/372475
Trust: Robert Griesemer <gri@golang.org>
Reviewed-by: Robert Findley <rfindley@google.com>
@dominikh
Copy link
Member

LGTM. Thanks for having given this another look.

@golang golang locked and limited conversation to collaborators Jun 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. release-blocker
Projects
None yet
Development

No branches or pull requests

8 participants