Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create infer_object #2167

Closed
wants to merge 1 commit into from
Closed

Conversation

DanielNoord
Copy link
Collaborator

Type of Changes

Type
βœ“ πŸ› Bug fix
βœ“ ✨ New feature
βœ“ πŸ”¨ Refactoring

Description

Perhaps I should have created an issue for this, but:

This is my proposal to remove the import hell from astroid and the method assignment we do for the inference.py file.
Instead of calling node.infer() we would now just call infer_object(node).
This decouples the nodes and their meta-information from the actual astroid magic that needs and uses that information. Next steps would be to change calls in the astroid code base and remove _infer everywhere.
The migration in pylint seems rather straightforward as there are not that many calls to infer() actually.

I haven't made a news fragment as I'd like to do this after we have finalised the internal transition.

Let me know what you think guys!

@DanielNoord DanielNoord added the Enhancement ✨ Improvement to a component label May 6, 2023
@DanielNoord DanielNoord added this to the 3.0.0a3 milestone May 6, 2023
@codecov
Copy link

codecov bot commented May 6, 2023

Codecov Report

Merging #2167 (75f0b08) into main (0740a0d) will decrease coverage by 0.02%.
The diff coverage is 98.92%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2167      +/-   ##
==========================================
- Coverage   92.53%   92.52%   -0.02%     
==========================================
  Files          94       94              
  Lines       10804    10864      +60     
==========================================
+ Hits         9998    10052      +54     
- Misses        806      812       +6     
Flag Coverage Ξ”
linux 92.28% <98.92%> (-0.02%) ⬇️
pypy 87.50% <98.92%> (+0.01%) ⬆️
windows 92.12% <98.92%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Ξ”
astroid/inference.py 95.17% <98.90%> (+0.59%) ⬆️
astroid/nodes/node_ng.py 91.58% <100.00%> (-1.11%) ⬇️

... and 2 files with indirect coverage changes

Copy link
Member

@Pierre-Sassoulas Pierre-Sassoulas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what the right approach is, but shouldn't we remove more code in node_classe and scope_nodes if we refactor the way it's done right now ?

node: nodes.NodeNG, context: InferenceContext | None = None, **kwargs: Any
) -> Generator[InferenceResult, None, None]:
"""Find the infer method for the given node and call it."""
if isinstance(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the number of time this is going to be called it might be a good idea to do stats on the most common one so it's tested first if we go with this approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we might want to refactor this anyway before releasing the final version.

For now I'd say, let's use this as a start and see how we can optimise. I think there are a lot of other small optimisations we could do here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with a statistical approach, I'm nervous about this approach, it feels like a code smell that we should be using polymorphism, which is more like the existing way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also use a dictionary of nodes types and associated methods, which makes it (almost) O(1). However, I think there might be good reason to eventually refactor this inference function to be part of an instantiated AstroidManager object instead of having it rely on one global AstroidManager class (as this makes parallelisation much harder).
Thus, too many optimisations to this don't make sense as the moment I think. It's more about the intent of decoupling nodes from the interaction with nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @jacobtylerwalls about the code smell. Couldn't we put this in the nodes instead of outside the node ? Sometime the deferred creation of the infer function wasn't required.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just look at the imports that are needed in this file. If we want to put this in astroid.nodes we would have to do all of those within the methods. See also my other comment in the main thread.
This design really boils down whether the nodes should know how to infer themselves. I don't think they do as it heavily complicates the structure of the code and so far we haven't found a good way to combine it in one file.

@DanielNoord
Copy link
Collaborator Author

No, this works already because we don't remove the old way of doing inference with this PR.
Only after we remove _infer we need to do some more refactoring.

@DanielNoord DanielNoord mentioned this pull request May 7, 2023
2 tasks
@jacobtylerwalls
Copy link
Member

Before offering a view on whether this is the right direction, I'd like to understand a bit more of the benefits.

This decouples the nodes and their meta-information from the actual astroid magic that needs and uses that information.

Can you elaborate on this? I'm not sure I see the immediate upside.

@DanielNoord
Copy link
Collaborator Author

Can you elaborate on this? I'm not sure I see the immediate upside.

One of the issue (imo) with the current code structure is that the node objects serve two purposes: they 1) encode information about the node, its position, children, parent, etc. and 2) have a lot of methods that interact with them. However, this second category doesn't actually need to be "on" the nodes and gives a lot of issues such as this 1000 line file of method assignments.
I think it should be possible to create a full astroid.nodes module without any imports from other files: they are essentially just drop in replacements of stdlib ast nodes. The second part of the astroid library would then be the inference logic that can interact with the nodes package and can tell you more about the nodes based on inference. By decoupling the storage of "nodes information" and the "interaction with nodes" I think we would get a much cleaner code base without the import and method assignment mess that we currently deal with.

@Pierre-Sassoulas
Copy link
Member

However, this second category doesn't actually need to be "on" the nodes and gives a lot of issues such as this 1000 line file of method assignments.
I think it should be possible to create a full astroid.nodes module without any imports from other files: they are essentially just drop in replacements of stdlib ast nodes.

And ast nodes are very well typed so inheriting from ast nodes would be nice.

We could use composition inside NodeNG for inference instead of using an external function that check the type of its argument hough.

@jacobtylerwalls
Copy link
Member

I think we would get a much cleaner code base without the import and method assignment mess that we currently deal with.

The method assignments are definitely lame, but we can solve that in two ways: either your proposal, or just overriding _infer() everywhere. I think I'd like to see the second approach before deciding, and I'm volunteering to put up a PR to compare!

Also, I don't think we should merge either approach to main just yet--I think we should still review and merge incremental PRs, but just merge them to a dedicated branch. With something this disruptive it would be nice to wait to merge to main until the docs are done and we're sure the performance has improved. (In other words, I'd like the chance to have a final go/no-go decision.)

@DanielNoord
Copy link
Collaborator Author

We could use composition inside NodeNG for inference instead of using an external function that check the type of its argument hough.

This doesn't solve the issue that many inference functions use parts of the astroid.nodes package themselves. For example, some nodes are inferred based on the type of their parent. This is an inherent cyclic import in the inference code paths which will always exist as long as nodes need to know how to infer themselves.

@DanielNoord
Copy link
Collaborator Author

I think we would get a much cleaner code base without the import and method assignment mess that we currently deal with.

The method assignments are definitely lame, but we can solve that in two ways: either your proposal, or just overriding _infer() everywhere. I think I'd like to see the second approach before deciding, and I'm volunteering to put up a PR to compare!

I could help review such as PR. I have tried previously to fix this, but the import dependency were just too sever to define them cleanly on the nodes themselves.

Also, I don't think we should merge either approach to main just yet--I think we should still review and merge incremental PRs, but just merge them to a dedicated branch. With something this disruptive it would be nice to wait to merge to main until the docs are done and we're sure the performance has improved. (In other words, I'd like the chance to have a final go/no-go decision.)

I can understand that, but do think this creates a very high risk of becoming another 2.0 branch. The change itself is very disruptive, but is managable for client libraries (as can be seen from searching for .infer( in pylint). There is a balance to strike here.

@jacobtylerwalls
Copy link
Member

I could help review such as PR.

Thank you! Taking a break, but I made some good progress earlier today on this.

@jacobtylerwalls
Copy link
Member

Superseded by #2171

@jacobtylerwalls jacobtylerwalls removed their request for review May 13, 2023 13:39
@jacobtylerwalls
Copy link
Member

Sounds like I prematurely closed this @DanielNoord, sorry πŸ˜…

We could also use a dictionary of nodes types and associated methods, which makes it (almost) O(1)

For me, this was one of the main flaws I felt with this approach. If we reintroduce a dynamic mapping of nodes to inference functions, we've reintroduced the original indirection we were trying to improve on. If it's not dynamic, then it's coded with a match/case or a storm of isinstance checks.

@DanielNoord
Copy link
Collaborator Author

For me, this was one of the main flaws I felt with this approach. If we reintroduce a dynamic mapping of nodes to inference functions, we've reintroduced the original indirection we were trying to improve on. If it's not dynamic, then it's coded with a match/case or a storm of isinstance checks.

I think if we hardcode the mapping there is much less indirection. For me the main issue was that the node objects are defined across multiple files and you can't see what methods they have at runtime.
My splitting off the inference from the nodes you can still reasonably understand the different code paths by looking at the code.

@jacobtylerwalls
Copy link
Member

if it's not dynamic, then it's coded with a match/case

And then in this case, we lose the static analysis/type checking benefits of the polymorphism approach. See this disable on my branch necessitated from better inference.

@jacobtylerwalls
Copy link
Member

I'm starting to wonder if we're solving two different problems or if we're just coming from two different ideas of what makes better Python design. Or both πŸ˜… !

For me the main issue was that the node objects are defined across multiple files and you can't see what methods they have at runtime.
My splitting off the inference from the nodes you can still reasonably understand the different code paths by looking at the code.

Would you be able to flesh this out a little more? I'm not sure what you're referring to when you say you can't see what methods they have at runtime.

@DanielNoord
Copy link
Collaborator Author

DanielNoord commented May 14, 2023

I'm starting to wonder if we're solving two different problems or if we're just coming from two different ideas of what makes better Python design. Or both πŸ˜… !

Well, I'm in general not a big fan of composition when it involves setting instance attributes. The current base nodes "work" because they "upgrade" (or build upon) the node that they inherit from them and give some more functionality.
I think the pattern where we have to annotate an attribute at the class level but can't enforce it being set shows that composition might not be ideal in that situation.

Would you be able to flesh this out a little more? I'm not sure what you're referring to when you say you can't see what methods they have at runtime.

Well, by looking at the definition of a node in the current code you can't see they will have a _infer method at runtime. That's (obviously) a flaw. Your proposed fix fixes that issue and is therefore already a clear improvement.

However, I still wonder whether it makes sense to have that much code in the nodes modules that depends on other modules or on itself. If I were able to redesign astroid I would probably split the inference engine from the nodes module. The inference just has too much state that is now put into the nodes. For example, NodeNG.infer still depends on global state of the AstroidManager. I think it would be much better to be able to instantiate a manager and ask it to infer a node for you that you pass to it. That makes it much easier to have one entry point for astroid which I think makes it easier to parallelise or daemon-ize some of the parts of pylint and astroid.

So, I'm not necessarily opposed to your PR as it is definitely an improvement. However, on a higher level I think we should move in another direction and we could start doing so with this issue/fix.

@DanielNoord
Copy link
Collaborator Author

@jacobtylerwalls gentle ping on this issue as I don't want to lose all the work you put into the other PR (and this discussion).

@jacobtylerwalls
Copy link
Member

Sounds like we need a decision. Should we get more views from others?

@DanielNoord
Copy link
Collaborator Author

@Pierre-Sassoulas @mbyrnepr2 @cdce8p Do you have an opinion on the matter?

@Pierre-Sassoulas
Copy link
Member

I think #2171 is the best approach because the implementation would be in the node class where I'd personally expect it to be.

@jacobtylerwalls
Copy link
Member

Nick weighed in in favor of #2171 at pylint-dev/pylint#8735 (comment).

@DanielNoord
Copy link
Collaborator Author

Nick weighed in in favor of #2171 at pylint-dev/pylint#8735 (comment).

I mean, Nick would probably also be in favour of an approach where we split inference from the nodes as long as the module isn't so "gnarly".

I must confess it's a bit frustrating that I feel that some of my points (e.g., the use of global state and its impact on future multiprocessing improvements, composition with incompatible __init__ and __postinit signatures, etc.) haven't been discussed. I still feel strongly about the fact that we should keep the nodes simple like they are in ast and do everything else we need in a separate module. That said, it feels like I'm fighting an uphill battle here so I'll give in. I probably won't be contributing much to the implementation but I'd be happy to give input when asked for it!

@DanielNoord DanielNoord deleted the split-infer branch June 6, 2023 18:40
@jacobtylerwalls
Copy link
Member

One problem is that we have splintered conversation across a couple of PRs. DaniΓ«l, I'd be glad to open an issue and summarize the discussion we've had so far. I'd like to make sure that all points get a hearing before we take a decision. Will that work well for you?

@jacobtylerwalls
Copy link
Member

(It's possible that for the issues you feel strongly about, I just failed to see clearly their connection with the changesets we're discussing--but that could become clearer for me once we have a side-by-side pros-n-cons table. I'll tag you to collaborate on it when it's ready!)

@Pierre-Sassoulas
Copy link
Member

I still feel strongly about the fact that we should keep the nodes simple like they are in ast and do everything else we need in a separate module.

I view astroid as an overlay over the ast module with a lot of embedded utilities to make your life easier (the main one being inference). If we want those utility to be discoverable when using astroid without reading the doc we need them to be on the node object. I'm not super knowledgeable about astroid but it feel like this is one positive aspect of astroid's design that you're able to simply explore the node with a small reproducer file with the node you want to target and a debugger without having to read tons of docs (docs that are lacking, and that we should start to produce if we choose to do external utility functions imo).

Regarding the limitation of global states and multiprocessing, I'm not sure I followed well enough because I don't see how the proposed design would affect it ? Could you summarize the main argument in another issue like Jacob suggested ? If there's a limitation with the approach from #2171 we need to take it into account before investing in a refactor, for sure.

@DanielNoord
Copy link
Collaborator Author

but that could become clearer for me once we have a side-by-side pros-n-cons table. I'll tag you to collaborate on it when it's ready!)

I'm assuming Jacob will make an issue, or am I misunderstanding you here?

@jacobtylerwalls
Copy link
Member

I will when I have time in the next few days!

@jacobtylerwalls
Copy link
Member

Turns out we had an existing issue! See #679 (comment).

@DanielNoord I would be very grateful if you would read and respond when you have time, including fleshing out the parts where I didn't want to speak for you or didn't fully understand your argument. (I would like to be able to πŸ˜„ !)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants