Remove monkey-patching of methods onto classes #679

degustaf · 2019-06-19T19:33:32Z

In this comment, @PCManticore stated that there was a plan to remove monkey-patching of methods onto classes. inference.py and protocols.py seem to suggest that this hasn't happened.

Is this still a design goal?

It looks to me like this would be a fairly mechanical change. I am assuming that since it hasn't happened yet, I am missing something. Is it simply a "haven't had time" issue?

PCManticore · 2019-06-20T07:31:25Z

Thanks for creating the issue @degustaf

Yes, this is still a design goal that we'd want to achieve with astroid. There are two reasons for why this didn't happen yet, the first one being the one that you identified about not having enough time to do it. The second one is that it's a rather tricky issue, since there are various cyclic dependencies between inference.py, protocols.py and the rest of astroid files. We attempted to do this in the so called "2.0" branch of astroid (https://github.com/PyCQA/astroid/tree/2.0) which was an effort to reengineer most of astroid's architecture. If you look in that branch, you can see that we managed to reduce that monkeypatching quite a bit, if not all, but at this point, that branch is quite behind everything else and has a lot more changes that can't come easily in master without additional work.

If you're interested in helping, cherry-picking the refactoring for dropping the monkeypatch would be definitely amazing.

gyermolenko · 2019-08-04T17:11:05Z

Can someone provide some details on this please?
astroid 2.0 branch is the default for a long time now, but Manticore's comment is quite recent.
Are there any examples before-after?

But overall I just saw "contributor friendly" tag and wanted to see if I could be of any help.

PCManticore · 2019-08-06T13:38:03Z

Hey @gyermolenko ! This might not be contributor friendly after all, as it requires going through https://github.com/PyCQA/astroid/tree/2.0 and bringing back into master the removal of monkeypatching methods. The gist of the solution is the use of single dispatch to register inference functions: https://github.com/PyCQA/astroid/blob/2.0/astroid/inference.py#L37

jacobtylerwalls · 2023-06-07T09:15:21Z

@DanielNoord made a start toward this by suggesting a new design where nodes don't need to know how to infer themselves in #2167 (with a new infer_object() API).

Then I suggested an alternative in #2171 where we keep the existing API and just define the methods on the classes where we expect them, and solve cyclic imports along the way.

We have consensus that we'd like to fix this in an alpha of astroid 3.0, but we don't have consensus yet on which direction to take. I'd like to flow out all the concerns folks raised to make sure everything is sounded out before choosing a direction.

Concerns with the infer_object()/`AstroidManager` approach:

Requires "looking up" the inference method through a series of isinstance() checks or a mapping from node types to inference methods
- JW: a mapping from node types to inference methods won't support subclassing (a la pylint's unidiomatic-type-check msg)
- JW: a series of isinstance() checks could commit us to a performance degradation
- DN: This could probably be improved upon with some better code-paths and more sharing of code-paths between node types, but it is indeed a cost.
The nodes no longer know how to infer themselves
- JW: from the perspective of making astroid easier to work with, the indirection created here is, for me, slightly better than the status quo, because you no longer have to search through various long files to find the assignment of a method, but it still doesn't seem like a Pythonic design: the node inference methods are node-specific (rely on attributes of those specific nodes). It strikes me as unwanted indirection to have to use a map from a node to node-specific method instead of having that node-specific method defined on the node.
- DN: Is knowing how to infer themselves really the responsibility of the nodes? Personally I believe that having the nodes act as an (almost) drop-in replacement of ast and letting another part of the code handle the inferences of said nodes makes more sense. It means that the nodes module would just become a data layer over ast while all of the core logic can be moved into a separate module. That should reduce/remove a lot of the interdependencies between the data/model layer and the layer that actually uses it.
API-breakage
- JW: I'm not sure what the final intended pattern is, e.g. if we're going to leave "infer()" as a shim forever--and accept a cyclic import of inference.infer_object--or if we're going to remove it. We have permission to remove interfaces in a major version, but the cost to users should be weighed in the total analysis as a minor factor.
- DN: I don't think in recent months/years we have seen many contributions from other users than pylint. In fact, the only person that was actively contributing for a longer period of time moved away because astroid is so incompatible. See Add an alias ImportFrom.module and fix other inconsistencies with the ast module #1338 and other issues opened by them at that time. We don't seem to gain much from keeping API stability.

DN: I'd also like to reiterate that I do think that infer_object is less than optimal and think that we can improve upon it considerably. However, I wanted to do this in smaller steps, see my subsequent PR as well. My main intent is to have on AstroidManager class (or something similar) that is the entry point for inference and doesn't have any global state. I believe this is the only way to make pylint at least somewhat useful in a multiprocessing environment.

Concerns with keeping the existing API and refactoring to avoid cyclic imports

Introduces some mixin classes
- JW: Personally, I rarely guard mixins against unintended instantiation. If there is concern about unintended instantiation, I'll just get rid of the mixins and add a helper function.
- DN: I really dislikes Mixins as the Python tooling such as mypy and pylint doesn't handle them very well.
Prevents decoupling nodes from inference in order to have a drop-in replacement for ast nodes
- DN: See above as well. I think making astroid be more of a drop-in could (finally) increase the number of contributions.
  - JW: Marc indicated here his inclination against viewing astroid as a drop-in replacement for ast nodes
NodeNG.infer() depends on global AstroidManager state
- DN:
  
  astroid/astroid/nodes/node_ng.py
  
  Line 178 in ee12160
  
  limit = AstroidManager.max_inferable_values
  
  . Although we could probably work around this there are multiple places in our code where we just do AstroidManager() and then use the global state. As said above, I don't see infer_object as the final version of the inference system but I do see it as a clearer step towards having an inference system that can be instantiated on demand and without any shared state.
  - JW: I didn't follow this one, hoping to grok it after DN's updates. I understand that global state jeopardizes parallelism, but that seems like an orthogonal problem to me. Do you have an example of how Create infer_object #2167 would help in this regard?
Composition when involving setting instance attributes
- JW: Do you have an example of how Create infer_object #2167 would help in this regard? I didn't grok this.
- DN: See above. I don't like how we use postinit and have to see instance attributes as class-scope typed attributes to make things work for mypy. Mixins make this even worse. To me those types seem like an anti-pattern that we should try and avoid. Create infer_object #2167 doesn't fix this, but also doesn't make the problem worse.
The nodes modules will become longer

DN: What I don't really like about keeping the existing API is that I don't see a clear path towards some of the existing issues we have. Although you showed that it does allow us to fix some of the cyclic imports etc. for me it just doesn't click how making one class/type responsible for everything improves the state of the codebase. I think separation of concerns would be better and makes it easier to tackle individual problems. Perhaps I'm too influenced by my day to day work but I tend to see astroid as having a data and an application layer (with pylint being the frontend if you want to complete this comparison). Separation between the data modelling layer and the layer that interacts with it feels more natural to me, but that might just be my own preference.

DanielNoord · 2023-06-12T06:56:07Z

I hope to find time tonight to get back to this! My comment in #2210 is also somewhat related to my general ideas about astroid startup.

DanielNoord · 2023-06-12T18:45:20Z

I have added my comments to the list. Let's continue the discussion in comments from now on and not keep editing the original post by Jacob.

jacobtylerwalls · 2023-06-12T22:33:16Z

Thanks, I really appreciate your taking the time to add notes! I'm hoping the silly bullet-point debate-flow-outline might help others follow the discussions we had across several PRs. 😄

I'll let this be my last post on the subject for a while, because if I drone on it could discourage other people from speaking up. 📣

I'm hearing the arguments for the infer_object() approach[*] as mostly design-philosophical and priced with a "we think this will make future fixes easier" discount. That's not enough for a merge for me. I don't think making an inference a pure function instead of an instance method makes astroid either easier or harder to contribute to, or multiprocessing bugs either easier or harder to solve. Pure functions are beautiful! I spent a lot of the last year in React, which loves them. But even React has pragmatic escape-hatches when you need global state or side effects, and when we address the multiprocessing optimizations, we will ideally have less global state than now, but likely will still need a little global state, and then we'll need to put it somewhere, and then the pure infer_object() won't be so pure anymore. And then I'll be back where I started, wondering why we merged it (and potentially with new things to optimize, like the isisnstance checks).

Or not! But it's too early to know. Is this PR supposed to be the first PR in a series of PRs to fix multiprocessing? It seems to me like the wrong place to start. Or, in other words, it seems like only one person has a mental roadmap of how infer_object() would pay off for multiprocessing. Pierre indicated a similar hesitation. (I did look at the second PR, which I took to be #2168, but it just removes the monkey-patching, which we've already agreed both PRs do just fine.)

[*] just noticed that @DanielNoord edited the post to clarify that it's the "infer_object()/AstroidManager" approach, but this is the crux of the whole thing: I don't see the two as connected. I'll gladly work on PRs on the AstroidManager after merging my alternative to just get rid of the monkey patching. I feel like the in-person analogue of where we're at right now is that we have two contributors looking at a whiteboard and making opposite educated guesses; in that scenario, I think it's better to merge the consensual parts: just removing the monkey-patching.

My main intent is to have on AstroidManager class (or something similar) that is the entry point for inference and doesn't have any global state. I believe this is the only way to make pylint at least somewhat useful in a multiprocessing environment.

Same as above, I'm not convinced this would be the only way to do that? I'd rather see at least some proof-of-concept of some element of removing global state and how infer_object() is necessary to make it work.

What I don't really like about keeping the existing API is that I don't see a clear path towards some of the existing issues we have. Although you showed that it does allow us to fix some of the cyclic imports etc. for me it just doesn't click how making one class/type responsible for everything improves the state of the codebase. I think separation of concerns would be better and makes it easier to tackle individual problems.

As said above, I don't see infer_object as the final version of the inference system but I do see it as a clearer step towards having an inference system that can be instantiated on demand and without any shared state.

To avoid repeating myself :D...
The multiprocessing bugs: just a cyclic-import behavior bug and then optimizing to make sure we actually are parallelizing the right things, right? Performance optimizations can be very surprising; it's better to optimize from reasoning about the existing system and profiling it, not from heuristics and design philosophy. This is why I'm so hesitant.

I think making astroid be more of a drop-in could (finally) increase the number of contributions.

I confess to not even being sure how this would work. A few lines of pseudocode would help, if you're up for it. What do people want with the astroid library if they don't want inference? I know there was a comment along these lines from PCManticore in #169 (comment), though.

I think the most impactful intervention for new contributors would be continuing to use the good-first-issues label and ensuring we don't ask for big refactors along the way. (The str/repr PR in #2198 was a good experience, I felt!)

DanielNoord · 2023-06-13T06:10:31Z

I'm hearing the arguments for the infer_object() approach[*] as mostly design-philosophical and priced with a "we think this will make future fixes easier" discount. That's not enough for a merge for me.

This is fair. I think the counter-argument is why I am so hesitant for the other approach: I don't see the end-goal and only a fix for one particular issue. We have so many difficulties with getting astroid fully typed an into a more modern codebase that support parallelisation that I think we would benefit from a more clear migration path (similar to the famous V2 branch). Since the end-goal I envisioned is 180 degrees different to what you propose in that PR I find it hard to see how we go to a truly better design from there.

Is this PR supposed to be the first PR in a series of PRs to fix multiprocessing? It seems to me like the wrong place to start.

This is basically my argument above reworded in my opinion 😄 What is the end objective of that PR? And if there isn't one: shouldn't there be one?

Or, in other words, it seems like only one person has a mental roadmap of how infer_object() would pay off for multiprocessing.

I agree, I should have discussed this more clearly.

I feel like the in-person analogue of where we're at right now is that we have two contributors looking at a whiteboard and making opposite educated guesses; in that scenario, I think it's better to merge the consensual parts: just removing the monkey-patching.

👍

Same as above, I'm not convinced this would be the only way to do that? I'd rather see at least some proof-of-concept of some element of removing global state and how infer_object() is necessary to make it work.

I think this is more of a philosophical/design decision. Like I said, to me it feels like we should disconnect the data and the data-handling as they have different responsibilities. infer_object is a move towards such a distinction, that's why it is connected in my mind.

To avoid repeating myself :D... The multiprocessing bugs: just a cyclic-import behavior bug and then optimizing to make sure we actually are parallelizing the right things, right?

See #2048. In the current setup I don't think we can parallelise what we should. That's an issue 😄

I confess to not even being sure how this would work. A few lines of pseudocode would help, if you're up for it. What do people want with the astroid library if they don't want inference? I know there was a comment along these lines from PCManticore in #169 (comment), though.

Well, in #1338 I know the author was trying to replace ast with astroid in a tool they wrote but weren't able to due to the API incompatibilities. I think they will likely want to have inference but if they could just do AstroidManager().infer_this_node_for_me(node) that would probably also be fine for them. By keeping the API of the nodes simple, minimal and mostly in line with ast they would have been able to use astroid as a replacement.

I think the most impactful intervention for new contributors would be continuing to use the good-first-issues label and ensuring we don't ask for big refactors along the way. (The str/repr PR in #2198 was a good experience, I felt!)

Agreed!

PCManticore added Good first issue Friendly and approachable by new contributors Enhancement ✨ Improvement to a component labels Jun 20, 2019

PCManticore removed the Good first issue Friendly and approachable by new contributors label Aug 6, 2019

DanielNoord mentioned this issue Mar 17, 2022

Rename of branches is incorrect #1479

Closed

jacobtylerwalls added this to the 3.0.0a5 milestone Jun 7, 2023

This was referenced Jun 7, 2023

Create infer_object #2167

Closed

Move inference methods to nodes module(s) #2171

Merged

Pierre-Sassoulas modified the milestones: 3.0.0a5, 3.0.0a6 Jun 13, 2023

jacobtylerwalls closed this as completed in #2171 Jul 2, 2023

Pierre-Sassoulas mentioned this issue Sep 25, 2023

Bump astroid to 3.0.0, update changelog #2303

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove monkey-patching of methods onto classes #679

Remove monkey-patching of methods onto classes #679

degustaf commented Jun 19, 2019

PCManticore commented Jun 20, 2019

gyermolenko commented Aug 4, 2019 •

edited

Loading

PCManticore commented Aug 6, 2019

jacobtylerwalls commented Jun 7, 2023 •

edited by DanielNoord

Loading

DanielNoord commented Jun 12, 2023

DanielNoord commented Jun 12, 2023

jacobtylerwalls commented Jun 12, 2023

DanielNoord commented Jun 13, 2023

Remove monkey-patching of methods onto classes #679

Remove monkey-patching of methods onto classes #679

Comments

degustaf commented Jun 19, 2019

PCManticore commented Jun 20, 2019

gyermolenko commented Aug 4, 2019 • edited Loading

PCManticore commented Aug 6, 2019

jacobtylerwalls commented Jun 7, 2023 • edited by DanielNoord Loading

Concerns with the infer_object()/AstroidManager approach:

Concerns with keeping the existing API and refactoring to avoid cyclic imports

DanielNoord commented Jun 12, 2023

DanielNoord commented Jun 12, 2023

jacobtylerwalls commented Jun 12, 2023

DanielNoord commented Jun 13, 2023

gyermolenko commented Aug 4, 2019 •

edited

Loading

jacobtylerwalls commented Jun 7, 2023 •

edited by DanielNoord

Loading

Concerns with the infer_object()/`AstroidManager` approach: