Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove monkey-patching of methods onto classes #679

Closed
degustaf opened this issue Jun 19, 2019 · 8 comments · Fixed by #2171 or #2303
Closed

Remove monkey-patching of methods onto classes #679

degustaf opened this issue Jun 19, 2019 · 8 comments · Fixed by #2171 or #2303
Labels
Enhancement ✨ Improvement to a component
Milestone

Comments

@degustaf
Copy link
Contributor

In this comment, @PCManticore stated that there was a plan to remove monkey-patching of methods onto classes. inference.py and protocols.py seem to suggest that this hasn't happened.

Is this still a design goal?

It looks to me like this would be a fairly mechanical change. I am assuming that since it hasn't happened yet, I am missing something. Is it simply a "haven't had time" issue?

@PCManticore
Copy link
Contributor

Thanks for creating the issue @degustaf

Yes, this is still a design goal that we'd want to achieve with astroid. There are two reasons for why this didn't happen yet, the first one being the one that you identified about not having enough time to do it. The second one is that it's a rather tricky issue, since there are various cyclic dependencies between inference.py, protocols.py and the rest of astroid files. We attempted to do this in the so called "2.0" branch of astroid (https://github.com/PyCQA/astroid/tree/2.0) which was an effort to reengineer most of astroid's architecture. If you look in that branch, you can see that we managed to reduce that monkeypatching quite a bit, if not all, but at this point, that branch is quite behind everything else and has a lot more changes that can't come easily in master without additional work.

If you're interested in helping, cherry-picking the refactoring for dropping the monkeypatch would be definitely amazing.

@PCManticore PCManticore added Good first issue Friendly and approachable by new contributors Enhancement ✨ Improvement to a component labels Jun 20, 2019
@gyermolenko
Copy link
Contributor

gyermolenko commented Aug 4, 2019

Can someone provide some details on this please?
astroid 2.0 branch is the default for a long time now, but Manticore's comment is quite recent.
Are there any examples before-after?

But overall I just saw "contributor friendly" tag and wanted to see if I could be of any help.

@PCManticore
Copy link
Contributor

Hey @gyermolenko ! This might not be contributor friendly after all, as it requires going through https://github.com/PyCQA/astroid/tree/2.0 and bringing back into master the removal of monkeypatching methods. The gist of the solution is the use of single dispatch to register inference functions: https://github.com/PyCQA/astroid/blob/2.0/astroid/inference.py#L37

@PCManticore PCManticore removed the Good first issue Friendly and approachable by new contributors label Aug 6, 2019
@jacobtylerwalls jacobtylerwalls added this to the 3.0.0a5 milestone Jun 7, 2023
@jacobtylerwalls
Copy link
Member

jacobtylerwalls commented Jun 7, 2023

@DanielNoord made a start toward this by suggesting a new design where nodes don't need to know how to infer themselves in #2167 (with a new infer_object() API).

Then I suggested an alternative in #2171 where we keep the existing API and just define the methods on the classes where we expect them, and solve cyclic imports along the way.

We have consensus that we'd like to fix this in an alpha of astroid 3.0, but we don't have consensus yet on which direction to take. I'd like to flow out all the concerns folks raised to make sure everything is sounded out before choosing a direction.


Concerns with the infer_object()/AstroidManager approach:

  • Requires "looking up" the inference method through a series of isinstance() checks or a mapping from node types to inference methods
    • JW: a mapping from node types to inference methods won't support subclassing (a la pylint's unidiomatic-type-check msg)
    • JW: a series of isinstance() checks could commit us to a performance degradation
    • DN: This could probably be improved upon with some better code-paths and more sharing of code-paths between node types, but it is indeed a cost.
  • The nodes no longer know how to infer themselves
    • JW: from the perspective of making astroid easier to work with, the indirection created here is, for me, slightly better than the status quo, because you no longer have to search through various long files to find the assignment of a method, but it still doesn't seem like a Pythonic design: the node inference methods are node-specific (rely on attributes of those specific nodes). It strikes me as unwanted indirection to have to use a map from a node to node-specific method instead of having that node-specific method defined on the node.
    • DN: Is knowing how to infer themselves really the responsibility of the nodes? Personally I believe that having the nodes act as an (almost) drop-in replacement of ast and letting another part of the code handle the inferences of said nodes makes more sense. It means that the nodes module would just become a data layer over ast while all of the core logic can be moved into a separate module. That should reduce/remove a lot of the interdependencies between the data/model layer and the layer that actually uses it.
  • API-breakage
    • JW: I'm not sure what the final intended pattern is, e.g. if we're going to leave "infer()" as a shim forever--and accept a cyclic import of inference.infer_object--or if we're going to remove it. We have permission to remove interfaces in a major version, but the cost to users should be weighed in the total analysis as a minor factor.
    • DN: I don't think in recent months/years we have seen many contributions from other users than pylint. In fact, the only person that was actively contributing for a longer period of time moved away because astroid is so incompatible. See Add an alias ImportFrom.module and fix other inconsistencies with the ast module #1338 and other issues opened by them at that time. We don't seem to gain much from keeping API stability.

DN: I'd also like to reiterate that I do think that infer_object is less than optimal and think that we can improve upon it considerably. However, I wanted to do this in smaller steps, see my subsequent PR as well. My main intent is to have on AstroidManager class (or something similar) that is the entry point for inference and doesn't have any global state. I believe this is the only way to make pylint at least somewhat useful in a multiprocessing environment.


Concerns with keeping the existing API and refactoring to avoid cyclic imports

DN: What I don't really like about keeping the existing API is that I don't see a clear path towards some of the existing issues we have. Although you showed that it does allow us to fix some of the cyclic imports etc. for me it just doesn't click how making one class/type responsible for everything improves the state of the codebase. I think separation of concerns would be better and makes it easier to tackle individual problems. Perhaps I'm too influenced by my day to day work but I tend to see astroid as having a data and an application layer (with pylint being the frontend if you want to complete this comparison). Separation between the data modelling layer and the layer that interacts with it feels more natural to me, but that might just be my own preference.

@DanielNoord
Copy link
Collaborator

I hope to find time tonight to get back to this! My comment in #2210 is also somewhat related to my general ideas about astroid startup.

@DanielNoord
Copy link
Collaborator

I have added my comments to the list. Let's continue the discussion in comments from now on and not keep editing the original post by Jacob.

@jacobtylerwalls
Copy link
Member

Thanks, I really appreciate your taking the time to add notes! I'm hoping the silly bullet-point debate-flow-outline might help others follow the discussions we had across several PRs. 😄

I'll let this be my last post on the subject for a while, because if I drone on it could discourage other people from speaking up. 📣


I'm hearing the arguments for the infer_object() approach[*] as mostly design-philosophical and priced with a "we think this will make future fixes easier" discount. That's not enough for a merge for me. I don't think making an inference a pure function instead of an instance method makes astroid either easier or harder to contribute to, or multiprocessing bugs either easier or harder to solve. Pure functions are beautiful! I spent a lot of the last year in React, which loves them. But even React has pragmatic escape-hatches when you need global state or side effects, and when we address the multiprocessing optimizations, we will ideally have less global state than now, but likely will still need a little global state, and then we'll need to put it somewhere, and then the pure infer_object() won't be so pure anymore. And then I'll be back where I started, wondering why we merged it (and potentially with new things to optimize, like the isisnstance checks).

Or not! But it's too early to know. Is this PR supposed to be the first PR in a series of PRs to fix multiprocessing? It seems to me like the wrong place to start. Or, in other words, it seems like only one person has a mental roadmap of how infer_object() would pay off for multiprocessing. Pierre indicated a similar hesitation. (I did look at the second PR, which I took to be #2168, but it just removes the monkey-patching, which we've already agreed both PRs do just fine.)

[*] just noticed that @DanielNoord edited the post to clarify that it's the "infer_object()/AstroidManager" approach, but this is the crux of the whole thing: I don't see the two as connected. I'll gladly work on PRs on the AstroidManager after merging my alternative to just get rid of the monkey patching. I feel like the in-person analogue of where we're at right now is that we have two contributors looking at a whiteboard and making opposite educated guesses; in that scenario, I think it's better to merge the consensual parts: just removing the monkey-patching.


My main intent is to have on AstroidManager class (or something similar) that is the entry point for inference and doesn't have any global state. I believe this is the only way to make pylint at least somewhat useful in a multiprocessing environment.

Same as above, I'm not convinced this would be the only way to do that? I'd rather see at least some proof-of-concept of some element of removing global state and how infer_object() is necessary to make it work.

What I don't really like about keeping the existing API is that I don't see a clear path towards some of the existing issues we have. Although you showed that it does allow us to fix some of the cyclic imports etc. for me it just doesn't click how making one class/type responsible for everything improves the state of the codebase. I think separation of concerns would be better and makes it easier to tackle individual problems.

As said above, I don't see infer_object as the final version of the inference system but I do see it as a clearer step towards having an inference system that can be instantiated on demand and without any shared state.

To avoid repeating myself :D...
The multiprocessing bugs: just a cyclic-import behavior bug and then optimizing to make sure we actually are parallelizing the right things, right? Performance optimizations can be very surprising; it's better to optimize from reasoning about the existing system and profiling it, not from heuristics and design philosophy. This is why I'm so hesitant.

I think making astroid be more of a drop-in could (finally) increase the number of contributions.

I confess to not even being sure how this would work. A few lines of pseudocode would help, if you're up for it. What do people want with the astroid library if they don't want inference? I know there was a comment along these lines from PCManticore in #169 (comment), though.

I think the most impactful intervention for new contributors would be continuing to use the good-first-issues label and ensuring we don't ask for big refactors along the way. (The str/repr PR in #2198 was a good experience, I felt!)

@DanielNoord
Copy link
Collaborator

I'm hearing the arguments for the infer_object() approach[*] as mostly design-philosophical and priced with a "we think this will make future fixes easier" discount. That's not enough for a merge for me.

This is fair. I think the counter-argument is why I am so hesitant for the other approach: I don't see the end-goal and only a fix for one particular issue. We have so many difficulties with getting astroid fully typed an into a more modern codebase that support parallelisation that I think we would benefit from a more clear migration path (similar to the famous V2 branch). Since the end-goal I envisioned is 180 degrees different to what you propose in that PR I find it hard to see how we go to a truly better design from there.

Is this PR supposed to be the first PR in a series of PRs to fix multiprocessing? It seems to me like the wrong place to start.

This is basically my argument above reworded in my opinion 😄 What is the end objective of that PR? And if there isn't one: shouldn't there be one?

Or, in other words, it seems like only one person has a mental roadmap of how infer_object() would pay off for multiprocessing.

I agree, I should have discussed this more clearly.

I feel like the in-person analogue of where we're at right now is that we have two contributors looking at a whiteboard and making opposite educated guesses; in that scenario, I think it's better to merge the consensual parts: just removing the monkey-patching.

👍

Same as above, I'm not convinced this would be the only way to do that? I'd rather see at least some proof-of-concept of some element of removing global state and how infer_object() is necessary to make it work.

I think this is more of a philosophical/design decision. Like I said, to me it feels like we should disconnect the data and the data-handling as they have different responsibilities. infer_object is a move towards such a distinction, that's why it is connected in my mind.

To avoid repeating myself :D... The multiprocessing bugs: just a cyclic-import behavior bug and then optimizing to make sure we actually are parallelizing the right things, right?

See #2048. In the current setup I don't think we can parallelise what we should. That's an issue 😄

I confess to not even being sure how this would work. A few lines of pseudocode would help, if you're up for it. What do people want with the astroid library if they don't want inference? I know there was a comment along these lines from PCManticore in #169 (comment), though.

Well, in #1338 I know the author was trying to replace ast with astroid in a tool they wrote but weren't able to due to the API incompatibilities. I think they will likely want to have inference but if they could just do AstroidManager().infer_this_node_for_me(node) that would probably also be fine for them. By keeping the API of the nodes simple, minimal and mostly in line with ast they would have been able to use astroid as a replacement.

I think the most impactful intervention for new contributors would be continuing to use the good-first-issues label and ensuring we don't ask for big refactors along the way. (The str/repr PR in #2198 was a good experience, I felt!)

Agreed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement ✨ Improvement to a component
Projects
None yet
6 participants