Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding and removing extended modifiers #4136

Open
SirYwell opened this issue Aug 28, 2021 · 21 comments
Open

Adding and removing extended modifiers #4136

SirYwell opened this issue Aug 28, 2021 · 21 comments

Comments

@SirYwell
Copy link
Collaborator

SirYwell commented Aug 28, 2021

I was wondering why it isn't possible to directly add or remove extended modifiers. The only way is to copy the set, add your modifier and set it again. This is currently done in #4038 and #4134, and I'm currently trying to figure out some more issues with implicit modifiers where this functionality would make things a lot easier.

Is there any specific reason why such methods don't exist? If it just wasn't considered as useful before, I'd be happy to open a pull request for that.


Edit: Another useful API would be two static factory methods in CtExtendedModifier:

  • CtExtentedModifier.implicit(ModifierKind)
  • CtExtendedModifier.explicit(ModifierKind)

This would make it easier and more readable to work with extended modifiers.

@slarse
Copy link
Collaborator

slarse commented Aug 28, 2021

Is there any specific reason why such methods don't exist?

No, it just hasn't been done I suppose. See discussion in #4033.

Ping @raghav-deepsource, I think you were perhaps looking at this as well?

@raghav-deepsource
Copy link

Yup. Any thoughts on adding a way to pass a List of extended modifiers, instead of just a set? It would be useful for cases where the user wants to preserve the order of modifiers they assign to an element.

@slarse
Copy link
Collaborator

slarse commented Aug 28, 2021

Any thoughts on adding a way to pass a List of extended modifiers, instead of just a set?

I'm not sure that's necessary. There are ordered sets. Really, the use cases where the order of (semantically) unordered elements matter are very few. We want to keep the API as small as possible.

If there's a concrete use case that can't be served by the current API (plus the now planned add and remove methods for extended modifiers), then we can consider it. Otherwise, I'd personally say no.

@raghav-deepsource
Copy link

That makes sense.

@SirYwell
Copy link
Collaborator Author

SirYwell commented Aug 29, 2021

The problem with passing a List is, that it does not have an effect on the underlying implementation. Spoon only exposes a Set with no guarantees about order. One could stricten that requirement to return an ordered set but a method taking a List only makes sense if the underlying implementation can also return a List with the exact same order.

Also, the required order probably depends on the context - for the sniper printer the insertion order is more relevant, for another printer a strict order that does not depend on the insertion order is more relevant - I basically agree with what @I-Al-Istannen mentioned in #4033. (Though, lets not mix up the discussion about that in two issues now)


Another issue with extended modifiers is that, for example, records and enums are always static (the static modifier can be implicit or explicit if the record/enum is a nested class, but not if is a top level class) but the final modifier always needs to be implicit. I'm not sure in what extent this should be validated, but any validation there would be a lot easier with an add method too.

@raghav-deepsource
Copy link

raghav-deepsource commented Aug 29, 2021

disclaimer while all the stuff I have said below may apply to some extent, the argument is rendered moot since spoon generates an unordered set of modifiers during model building anyway. A better solution has been brought up in #4033 and I will create a new PR based on that idea.

Spoon only exposes a Set with no guarantees about order.

I've changed the implementation of CtModifierHandler from a HashSet to a LinkedHashSet in #4122 which could change this.

But users can't really reap the benefits of this change directly; there's no indication that the modifier set is ordered in getModifiers or getExtendedModifiers.

Also, the required order probably depends on the context -

Regarding sniper printing; I don't think it would be possible to guarantee consistency unless we can guarantee ordering from the time the model was constructed. And sniper printing is important enough to justify making it easier to accomplish within the model. Strict ordering of modifiers would require extra processing any way, regardless of how they looked at first, unless we go with a sorted set. Of course ,we lose out on easier sniper printing then.

From the user's perspective, there's no way to tell that you can set a particular order you want the modifiers to be in. One thing we do need to do is make it clear in method docs that we will store modifiers in terms of insertion order, provided we use only the add methods to add modifiers. With the set methods, ordering can only be guaranteed by passing in an ordered set ( note, I am talking from the perspective of the changes I have made in my pr).

I hope you can understand why I'm saying this here; this problem sort of encompasses a number of GitHub issues.

any validation there would be a lot easier with an add method too.

I have seen that while CtModifierHandler provides a number of methods, not all of them are exposed to the user when it is used in an element class. We could disallow setting modifiers all at once in the context of one of these classes should we wish to.

@SirYwell
Copy link
Collaborator Author

I was looking into this a little bit more and came across a few (probably rather minor) issues:

1) Set being used to store extended modifiers

Currently, CtModifierHandler keeps a Set of CtExtendedModifiers. This adds the need to deal with adding modifiers, as adding an explicit modifier of kind A should remove an implicit modifier of kind A (and visa-versa). It would probably make more sense to keep the data in a Map<ModifierKind, CtExtendedModifier> to keep this invariant without manual removal. WIth a custom map implementation, it would be possible to return a set directly and also the Set<ModifierKind> getModifiers() method could be simplified.

2) CtExtendedModifiers are modifiable

Especially, fields included in the hashCode can me modified - therefore it's easy to break data structures relying on the hashcode. The Set returned by CtModifierHandler itself is unmodifiable, but the elements in it can be modified.

3) Handling of exclusive modifiers

As only one of public, protected and private is allowed, there are special methods setVisibility() and getVisibility() to deal with that. But that does not prevent you from just adding two or more of them with the regular methods (This will also be an issue with sealed classes, as only one of final, sealed and non-sealed is allowed). This somewhat overlaps with 1) and the last paragraph of my previous comment I guess...

@monperrus
Copy link
Collaborator

Thanks for the analysis. We can fix those issues with one PR each! The second one is good but potentially breaking, but CtExtendedModifier are not used much, even less in a mutable setting, so I'd be for it too.

@slarse
Copy link
Collaborator

slarse commented Sep 23, 2021

Especially, fields included in the hashCode can me modified - therefore it's easy to break data structures relying on the hashcode. The Set returned by CtModifierHandler itself is unmodifiable, but the elements in it can be modified.

To me, it makes sense to be able to change the implicitness of a modifier. Changing its kind doesn't, however.

In terms of implementation convenience, I think it would make sense to:

  1. Disallow changing the kind of an extended modifier
    • IMO we should remove CtExtendedModifier.setKind(), it breaks a lot of stuff
    • We could also consider deprecating it and logging an aggressive warning about how much stuff it breaks, and how naughty it is to use it
  2. Consider equality between extended modifiers only based on the kind
    • This both allows changing implicitness without breaking the contract of a set, and solves the problem of being able to add duplicated implicit/explicit modifiers

@SirYwell
Copy link
Collaborator Author

To me, it makes sense to be able to change the implicitness of a modifier. Changing its kind doesn't, however.

That makes sense, yes.

We could also consider deprecating it and logging an aggressive warning about how much stuff it breaks, and how naughty it is to use it

I think I would prefer that over just removing it. Not sure how you deal with that in general, but it could be removed in a subsequent release.

This both allows changing implicitness without breaking the contract of a set, and solves the problem of being able to add duplicated implicit/explicit modifiers

I like that idea, however, it would mean that adding a public|implicit modifier to a set already containing a public|explicit modifier would not work with the Set#add method (it won't be added because it already "exists"). Maybe this can be avoided by reconsider the original idea of this issue:

What if the CtModifierHandler already contains all modifiers that are applicable to the CtElement (my ModifierTarget approach from #4177 could be used there) combined with a TriState IMPLICIT, EXPLICIT, ABSENT (or something similar). Then, instead of adding and removing modifiers, you'd change their state instead. On the other hand, that would not directly solve the issues with the exclusive modifiers (visiblity, and "finality" in java 17) but rather make it harder there.

Maybe you got a better idea how to deal with that, I'm not quite happy with mine here.

@I-Al-Istannen
Copy link
Collaborator

I'd like the modifiers to be immutable for #4176, as I otherwise need to clone the modifiers when cloning a CtElement. I will do that in the meantime, but not having to care about that would be nice. I am not sure how often you change the implicitness and doing so by overwriting the modifier manually sounds a bit nicer.

It breaks backwards compatibility more than allowing setImplicit so it might not be worthwhile, but it feels a bit more intuitive. The Modifiers are enums and immutable, the extended modifiers can suddenly change from under you.

@slarse
Copy link
Collaborator

slarse commented Sep 24, 2021

I'd like the modifiers to be immutable for #4176, as I otherwise need to clone the modifiers when cloning a CtElement

All CtElement need to be cloned when cloning. In a strict sense, there's no such thing as an immutable CtElement. For example, everything that derives from CtElement can have metadata added or removed at will, and that's not something we want to disable for the sake of immutability.

What we want to guarantee here is that CtExtendedModifier can be safely used in a Set, i.e. it should be made immutable from the perspective of the equals and hashCode methods.

@I-Al-Istannen
Copy link
Collaborator

I-Al-Istannen commented Sep 24, 2021

I was not talking about an immutable CtElement, but an immutable CtExtendedModifier. Currently I need to clone the modifiers on their own as changes in the original will affect the clone.

That was also just another thing that tripped me up dealing with them and not explicitly targeted at the solution in this PR. It just felt mostly relevant to the discussion at hand.

@slarse
Copy link
Collaborator

slarse commented Sep 24, 2021

Ohhh, wait, sorry I had it in the back of my mind that CtExtendedModifier was a CtElement. But you're right, in that sense it could potentially be immutable.

But, the solution to this problem would probably be to make CtExtendedModifier an actual CtElement. They should be fully part of the AST, but perhaps that's somehow breaking when considering the legacy use of enums. But we should look into that, as you say cloning becomes awkward otherwise.

@SirYwell
Copy link
Collaborator Author

But, the solution to this problem would probably be to make CtExtendedModifier an actual CtElement. They should be fully part of the AST, but perhaps that's somehow breaking when considering the legacy use of enums. But we should look into that, as you say cloning becomes awkward otherwise.

As I was thinking about ways to make sure that modifiers are only applied if it conforms with the JLS, this would basically make it impossible to cover all cases (e.g. for enums, an explicit final is never allowed). If CtExtendedModifiers were immutable, it would be possible to do the checks in the appropriate places (for example, when having a addExtendedModifier(...) method in CtEnum, you could validate the modifier there before passing it to the CtModifierHandler)

While this is something that was not validated at all before, I thought it would be worth to mention here.

@raghav-deepsource
Copy link

raghav-deepsource commented Apr 20, 2022

revisiting this now, I can draw a parallel between how spoon does this and other similar libraries such as JavaParser. IMO, the crux of this and many other issues is that spoon delegates many aspects of model generation to JDT, which does not provide all the information required to generate an AST that is true to the original source code.

This is just to express my feelings on this matter, but here are the main reasons for this and many other problems from my POV.

  1. things like modifiers aren't given source positions because they are treated as metadata in the JDT model.
  2. position information is sometimes confusing/missing because of idiosyncrasies in JDT's representation (which was what made me open better handling of invocation source positions #4171)
  3. Pretty printing is unreliable because spoon does not include certain vital context such as the positions of individual tokens and the order of modifiers, to name two things. This leads to either incorrect (For certain edge cases), correct but modified code (in cases where ideally no changes should have occurred) or crashes (like when attempting to print a modified local variable declaration in a declarator list).

To me, Spoon has been great for performing AST based analysis, but not so much for AST modification due to its behavior for edge cases.

Fixing these issues will likely be a breaking change but it would greatly increase spoon's usability.

The best way to do this in my opinion is to have separate tree building code that can produce a concrete syntax tree, and match elements from the CST with the data produced by JDT. Inspiration could be taken from how projects like intellij's PSI and also C#'s roslyn handle source representation, as an overlay of an AST on top of a CST. I've been exploring the use of tree sitter to do the cst generation step, though it's been slow going due to the scope of it. Introducing a native dependency like tree sitter may also not be the solution for everyone, but I think it could be rewarding if used correctly.

@monperrus
Copy link
Collaborator

Agree. Yet, the heavy dependency to JDT is both an advantage and a limitation. It does provide us with many useful pieces of information but we also inherit its problems.

Several times over the past years (decade?), we considered writing a new tree builder on top another library, starting with Javac (think a class JavacTreeBuilder in addition to JDTTreeBuilder) while remaining API compatible.

We never started this due to the amount of work that it represents.

@raghav-deepsource
Copy link

I am undertaking a rewrite of the position building code for my job and this work will be available publicly on my account, but I am not sure if it would be feasible to merge the changes I will make into spoon.

@slarse
Copy link
Collaborator

slarse commented Apr 20, 2022

@raghav-deepsource

IMO, the crux of this and many other issues is that spoon delegates many aspects of model generation to JDT, which does not provide all the information required to generate an AST that is true to the original source code.

I disagree with this statement. The AST is entirely true to the source code. The fact that it lacks detailed position information about certain decidedly non-AST elements (e.g. commas) and doesn't retain the order of unordered elements (e.g. modifiers) does not detract from this. The matter being discussed here is not about the AST not being true to the source code, it's about the fact that high fidelity (sniper) pretty printing requires information that is not the concern of an AST.

The primary limitation of a CST is that acting on it is a lot more arduous than working with an AST. I don't find it feasible to build Spoon upon a CST, it would be too hard to work with and too breaking.

If we present the current Spoon API to the user and maintain a separate CST in the background, Spoon itself has to map all operations on the AST to the CST. That's a massive performance and memory overhead, not to mention that making sure that every modifying operation on the AST also correctly updates the CST would be incredibly error prone. I've played around with the concept on multiple occasions and each time I've come out of it with the conclusion that it's simply infeasible given the resources that we have.

An idea that we have toyed with several times over the years is to augment AST nodes with CST information, and utilize this when pretty printing. For example, one could include leading and trailing formatting and delimiters in each node and puzzle that together when printing. This I found to be feasible but have never had the time to act on it.

If you come up with solutions to these very hard problems, we are all ears to your findings. I don't mean to discourage you from attempting to solve them, I just wanted to note that we've thought about this extensively and concluded that we don't have the resources to make it happen.

@raghav-deepsource
Copy link

I disagree with this statement. The AST is entirely true to the source code.

True, I misspoke. Spoon is, in the end, an AST library, not a CST one.

I don't mean to discourage you from attempting to solve them
No offense taken!

@slarse
Copy link
Collaborator

slarse commented Apr 22, 2022

I disagree with this statement. The AST is entirely true to the source code.

True, I misspoke. Spoon is, in the end, an AST library, not a CST one.

Honestly, reading my own response again, I was being a bit petty and taking your statement overly literally. It's quite clear what your intent was. I am sorry about that, I guess I was having a know-it-all day.

I don't mean to discourage you from attempting to solve them
No offense taken!

I want to emphasize here that just because we (and in large part I) didn't find a workable solution it doesn't mean there isn't one. We've had fairly limited resources to devote to this problem, you may very well find a breakthrough we did not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants