Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CtElement.getCompilationUnit method to the public API #3942

Open
algomaster99 opened this issue May 21, 2021 · 16 comments
Open

Add CtElement.getCompilationUnit method to the public API #3942

algomaster99 opened this issue May 21, 2021 · 16 comments

Comments

@algomaster99
Copy link
Contributor

Currently, we have to retrieve the CtCompilationUnit of an element, CtType in my particular case, using the factory methods defined.

element.getFactory().CompilationUnit().getOrCreate((CtType<?>) element);

I propose we add a public API to directly get the CtCompilationUnit. It can be a part of the CtElement as each element belongs to some compilation unit.

Reference: ASSERT-KTH/diffmin#65 (comment)

@andrewbwogi
Copy link
Contributor

andrewbwogi commented May 23, 2021

There have been several issues opened about CtCompilationUnit lately. It is quite hidden in the library for such a common entity, it would be good to make it more prominent. You can already do ctElement.getPosition().getCompilationUnit(), so directly exposing getCompilationUnit in CtElement seems harmless.

Note that in general, it is not the case that each CtElement belongs to some compilation unit since a CtElement can be created independently and detached from any parent type. CtElement.getCompilationUnit would then have to return NoSourcePosition.NullCompilationUnit or something similar.

@slarse
Copy link
Collaborator

slarse commented May 23, 2021

Note that in general, it is not the case that each CtElement belongs to some compilation unit since a CtElement can be created independently and detached from any parent type.

I'd say it's the opposite. In general, it's the case that each CtElement belongs to a compilation unit, but there are exceptions (such as the one you mentioned). Just like in general, each element has a parent, but there are exceptions.

CtElement.getCompilationUnit would then have to return NoSourcePosition.NullCompilationUnit or something similar.

I wouldn't involve SourcePosition into this, that just represents the original state of the element as it was during parsing. Elements without a position can belong to a compilation unit. For example, you may have created a new type and created a compilation unit for it (like the code example in OP shows). Or even worse, an element that was part of compilation unit A during parsing might have been moved to compilation unit B at runtime, and then the source position CU is simply wrong.

EDIT: See #3942 (comment) for why we can't avoid involving the source position with the current design.

As Spoon allows us to create and mutate compilation units, I think not accounting for that would make for some very unexpected situations.

What I think would be nice is a CtElement.getOrCreateCompilationUnit() that's essentially just a shortcut to the code snippet in OP. It should return a compilation unit where possible, and if it's not possible (say, it's called on an element that isn't rooted in a type, package or module), it should crash. It could be implemented something like so:

public CtCompilationUnit getOrCreateCompilationUnit() {
    if (this instanceof CtType) {
        element.getFactory().CompilationUnit().getOrCreate((CtType<?>) element);
    } else if (this instanceof ....) {
        // handle all of the special cases, e.g. `CtPackage`, `CtModule` etc)
    } else {
        CtType<?> enclosingType = getParent(CtType.class);
        if (enclosingType != null) {
            return enclosingType;
        } else {
            throw new SpoonException("does not belong to a compilation unit");
        }
    }
}

It's then the client's responsibility to check before-hand that the element in question actually belongs to a compilation unit, and in general, that can just be assumed unless they have created elements themselves. I dislike the idea of returning a null type, because if a compilation unit can't be found for an element then the method has been called on an unrooted element, and that seems like a programmatic error to me. In this case, even returning null would be preferable to a null object, as I can't imagine many cases where it makes sense to use a non-existing compilation unit, it would just hide problems.

@algomaster99
Copy link
Contributor Author

algomaster99 commented May 23, 2021

@andrewbwogi

ctElement.getPosition().getCompilationUnit()

Apart from Simon's explanation, I want to point out a minor detail. This returns the CompilationUnit which has been deprecated. @slarse why was this interface created in the first place when it is exactly same as CtCompilationUnit?

Actually, even the factory methods return an instance of CompilationUnit whose interface now has been annotated as deprecated. However, its implementation is not marked so. Is there any specific reason for deprecating the interface but not the class?

@slarse

Or even worse, an element that was part of compilation unit A during parsing might have been moved to compilation unit B at runtime, and then the source position CU is simply wrong.

getOrCreate(CtType) does exactly that. I think we would need to make it independent of the source position or modify the source position too.

throw new SpoonException("does not belong to a compilation unit");

This error method made me question the name of the method getOrCreateCompilationUnit. I thought that if the type, module, package, or a path does not exist, the method would still create a CompilationUnit. However, getOrCreate(CtType), getOrCreate(CtModule), and getOrCreate(CtPackage) can return null or throw an exception respectively.

  • I maybe wrong but I think if CtType passed is null, the method should still create a virtual file based on unnamed module and unnamed package. We can probably call the the type as UnnamedType and the compilation unit can contain something like this.
    class UnnamedType { }
  • CtModule and CtPackage throw IOException in some cases. But I cannot figure out how will this exception be raised. Which method in the try-catch block throws it? Is it possible to avoid it?

The above two points are just written to suggest a possibility and question why it is not possible that getOrCreate always returns a CompilationUnit for the above mentioned CtElements. I am not saying that we should always return the compilation for each CtElement as tackling elements like CtBinaryOperator, CtStatement, etc require the program to create a lot of parents before it can create the compilation unit. In that cases, maybe we can return null if those elements are not part of a CtType. But maybe we can try and return a compilation unit in case of CtModule, CtPackage, and CtModule if they are not part of one.

@slarse
Copy link
Collaborator

slarse commented May 24, 2021

@slarse why was this interface created in the first place when it is exactly same as CtCompilationUnit?

CompilationUnit is the original class, and it was not a CtElement. CtCompilationUnit, which is a CtElement, effectively replaced CompilationUnit a couple of years back, and the CompilationUnit interface was kept only for backwards compatibility. See #2702.

Or even worse, an element that was part of compilation unit A during parsing might have been moved to compilation unit B at runtime, and then the source position CU is simply wrong.

getOrCreate(CtType) does exactly that.

Yes, see my code snippet.

I think we would need to make it independent of the source position or modify the source position too.

The source position must be immutable. Otherwise it does not make sense. It's fine if the CU of the source position differs from the current CU of the element, as the source position just represents the initial state.

I maybe wrong but I think if CtType passed is null, the method should still create a virtual file based on unnamed module and unnamed package. We can probably call the the type as UnnamedType and the compilation unit can contain something like this.

The problem I have with this is that you're creating an unreasonable compilation unit. If you have say a standalone CtExpression (i.e. that's the root), it by definition does not belong to a compilation unit. It can't, because that wouldn't be valid Java. So returning a compilation unit for that element does not make sense, and we should fail-fast instead.

What is the point of getting a compilation unit that cannot exist in the Java language model?

etc require the program to create a lot of parents before it can create the compilation unit.

Do you mean that getOrCreateCompilationUnit should create parents for the element it's called on if it isn't rooted in a type, package or module? That, I would say, is a very unexpected side effect of calling the method.

I stand by my original suggestion that CtElement.getOrCreateCompilationUnit() should:

  1. Return the CU for this if it is a type, package or module.
  2. Return the CU for its closest parent that has a CU.
  3. Throw if the element does not belong to a CU, and there is no path to a parent that does.

@algomaster99
Copy link
Contributor Author

algomaster99 commented May 24, 2021

The source position must be immutable.

Is it because it tells where the CtElement originated from? And that should not change even if the compilation unit is changed.

Yes, see my code snippet.

Your code snippet internally uses the factory methods which use source position to get the compilation unit. Especially getOrCreate(CtType) does that. See here.

If you have say a standalone CtExpression (i.e. that's the root)

I am saying that getOrCreateCompilationUnit(CtType) should not return null if null is passed as I feel we can still create a compilation unit and just add an UnnamedType.

We can modify the source code here as below.

if (type == null) {
   // createType("UnnamedType");
}
// The line 137 will add the type to the compilation unit created.

Do you mean that getOrCreateCompilationUnit should create parents for the element it's called on if it isn't rooted in a type, package or module? That, I would say, is a very unexpected side effect of calling the method.

No, I am against that too. I am just proposing to make an exception for null CtType only.

  1. Return the CU for this if it is a type, package or module.
  2. Return the CU for its closest parent that has a CU.
  3. Throw if the element does not belong to a CU, and there is no path to a parent that does.

I agree with this approach too. But we should not use factory methods internally because they use source position to get the compilation unit.

@slarse
Copy link
Collaborator

slarse commented May 24, 2021

Your code snippet internally uses the factory methods which use source position to get the compilation unit. Especially getOrCreate(CtType) does that. See here.

Oh, woopsie. Actually, that might be considered a bug in getOrCreate.

Looking over the implementation of compilation units, it's pretty clear that the current design intends the compilation unit to be intrinsically linked to the source position. I don't agree with that design as we have methods to manipulate compilation units completely independently of source positions, but that problem is really orthogonal to the discussion of this convenience method.

So, for now, we'll leave it at the CU and source position being linked is, changing that is beyond the scope of this discussion.

If you have say a standalone CtExpression (i.e. that's the root)

I am saying that getOrCreateCompilationUnit(CtType) should not return null if null is passed as I feel we can still create a compilation unit and just add an UnnamedType.

I'm not sure what you mean by "if null is passed". The method I proposed doesn't take any arguments. It just retrieves the CU the receiver belongs to.

Even if it did take an argument, there is no "unnamed type" in Java's language model. Spoon is designed to reflect the semantics of Java, and an unnamed type simply does not make sense semantically. I'm guessing you're thinking that it would be they type equivalent of e.g. the unnamed module or unnamed package, but those two are things that actually exist in the language model, and so it isn't the same.

We can modify the source code here as below.

if (type == null) {
   // createType("UnnamedType");
}
// The line 137 will add the type to the compilation unit created.

We can, but why? What is the purpose of this? What would you use an unnamed type for? Also, what if there's already a type called UnnamedType?

@algomaster99
Copy link
Contributor Author

algomaster99 commented May 24, 2021

Oh, woopsie. Actually, that might be considered a bug in getOrCreate.

Oh, okay. I will open a ticket for this. But don't propose a solution yet. I am assuming it is not urgent so it can wait. I will submit a PR for that soon. 😅

I'm not sure what you mean by "if null is passed".

I meant this snippet only.

if (type == null) {
   // createType("UnnamedType");
}
// The line 137 will add the type to the compilation unit created.

I will explain its purpose under the next blockquote. It is not very significant thought.

We can, but why? What is the purpose of this? What would you use an unnamed type for?

I was just trying to make getOrCreate in the CompilationUnitFactory adhere more to its name and just force it to return a compilation unit in as many cases as possible.

EDIT:

So, for now, we'll leave it at the CU and source position being linked is, changing that is beyond the scope of this discussion.

Okay, we can open an issue for this and maybe fix it later.

@slarse
Copy link
Collaborator

slarse commented May 24, 2021

Oh, woopsie. Actually, that might be considered a bug in getOrCreate.

Oh, okay. I will open a ticket for this. But don't propose a solution yet. I am assuming it is not urgent so it can wait. I will submit a PR for that soon. sweat_smile

See my edit to that comment.

We can, but why? What is the purpose of this? What would you use an unnamed type for?

I was just trying to make getOrCreate in the CompilationUnitFactory adhere more to its name and just force it to return a compilation unit in as many cases as possible.

I disagree with that design philosophy. If input is unreasonable for the requested action, it's better to let the caller know that by throwing (or as is often the case returning null, although I dislike that in general) than to do something that has no purpose or makes no sense.

@algomaster99
Copy link
Contributor Author

See my edit to that comment.

Yes, I saw. I agree with it. But should we open an issue so that we may come back later?

I disagree with that design philosophy. If input is unreasonable for the requested action, it's better to let the caller know that by throwing (or as is often the case returning null, although I dislike that in general) than to do something that has no purpose or makes no sense.

Sounds good. My philosophy came from how Django implements get_or_create. But of course, there are cases when it throws an exception too.

@monperrus
Copy link
Collaborator

CtElement.getCompilationUnit()

This can be see as a derived property, a helper API method, so it's a good idea. The key design is what to return when there no CU: return Null, a NoCompilationUnit fake object or throw an exception

@slarse
Copy link
Collaborator

slarse commented Jun 11, 2021

CtElement.getCompilationUnit()

This can be see as a derived property, a helper API method, so it's a good idea. The key design is what to return when there no CU: return Null, a NoCompilationUnit fake object or throw an exception

@monperrus There is another problem as well. The only reasonable return value for CtElement.getCompilationUnit() is the CtCompilationUnit that the element currently belongs to. This may or may not be the compilation unit from which it was parsed.

However, all current ways of getting the compilation unit of an element return the compilation unit from which the element was originally parsed. I think it's fine if when the CU has changed, then element.getCompilationUnit() != element.getPosition().getCompilationUnit(), as the latter obviously is accessing the original compilation unit. However, it would be rather confusing if element.getCompilationUnit() != element.getFactory().CompilationUnit().getOrCreate(element) simply because the compilation unit has changed.

Note that CompilationUnitFactory.getOrCreate(CtType) returns the source position's compilation unit, if available.

As it's entirely possible to move elements across compilation units at runtime, I think it's important that this is reflected in a CtElement.getCompilationUnit() method, just like e.g. CtElement.getParent() returns the current parent, and not the original one.

@monperrus
Copy link
Collaborator

The only reasonable return value for CtElement.getCompilationUnit() is the CtCompilationUnit that the element currently belongs to

Fully agree.

The question I'm asking is about the behavior when no ancestor has a compilation unit.

@slarse
Copy link
Collaborator

slarse commented Jun 11, 2021

The question I'm asking is about the behavior when no ancestor has a compilation unit.

Yes, I got that, and that is indeed a key question. I would say either throw an exception, or the null object. But not null itself :). For clients, I think throwing an exception is the best solution, because an element not in any way being associated with a compilation unit only happens if you have created an element at runtime and not associated it with some preexisting element. It's an exceptional situation that clearly points to a programmatic error somewhere.

I could be wrong, but I feel like if you're in a situation where you want to get the compilation unit of an element, but there's a chance that the element doesn't have one, then you should have try/catch error handling.

For the other key problem that I highlighted, I think that the best solution may be to introduce CtElement.getCompilationUnit() at the next major revision, and then have CompilationUnitFactory.getOrCreate() defer to that instead of to Position.getCompilationUnit(). As the semantics of CtElement.getCompilationUnit() and Position.getCompilationUnit() will necessarily differ, this is a breaking change.

@algomaster99
Copy link
Contributor Author

@slarse

that would lead to massive information duplication, making it incredibly difficult to ensure that the model is consistent.

That is a good point. All will have to update the compilation unit in each child of the node. That could be really cumbersome.

Are there any more cases apart from replace and insert where the compilation unit of the element may change? If not, modification of CtCompilationUnit can be a part of these APIs. This would also avoid operating up the tree.

@monperrus

This abstraction is meant to help the user to do transformations. If the user has to take care of this by herself, this is a failure in having a friendly and simple API that abstracts away details.

Sound good. Let me just rephrase this to confirm if I understood what you meant. Spoon is concerned with code transformations only and the client should also be concerned with that only. An option for modifying the compilation unit (CtElement.setCompilationUnit) is not required as such details are not directly required for transformation so must be abstracted.

@slarse
Copy link
Collaborator

slarse commented Jun 14, 2021

@slarse

that would lead to massive information duplication, making it incredibly difficult to ensure that the model is consistent.

That is a good point. All will have to update the compilation unit in each child of the node. That could be really cumbersome.

Are there any more cases apart from replace and insert where the compilation unit of the element may change? If not, modification of CtCompilationUnit can be a part of these APIs. This would also avoid operating up the tree.

Not quite sure what you mean with replace and insert, but I don't think there's any need to extend or modify the API for mutating compilation units. We have mutators inside of CtCompilationUnit, which is sufficient. Changing the compilation unit of a type is such a rare use case that I don't think it's warranted to add additional helpers for this.

Let's focus this issue on CtElement.getCompilationUnit(), which we all agree is something that we need.

@algomaster99
Copy link
Contributor Author

Not quite sure what you mean with replace and insert

We would need to change the compilation unit of a specific element only when that element is inserted or put as replacement in a different model, right? This is why I thought we would need to take care of updating compilation units in replace and insert APIs.

Let's focus this issue on CtElement.getCompilationUnit(), which we all agree is something that we need.

Prior to that, we need to figure out a way to update the compilation units too. Otherwise, the current solution will compel us to be dependent on SourcePosition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants