Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify on non-Module roots #2536

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

temyurchenko
Copy link
Contributor

The PR fixes a missing parent on the "main" node and weakens the typing of "NodeNG.root" to better reflect the reality

I don't know what entry to fill in in the ChangeLog.

Type of Changes

Type
🐛 Bug fix
📜 Docs

Description

Refs #120, partially.

@temyurchenko
Copy link
Contributor Author

temyurchenko commented Sep 4, 2024

The reason the CI is failing is that at the class construction stage classes have "Unknown" for a parent. On the previous revision, the result was just that all builtin (or not just?) classes have "Unknown" as their locals['__module__'].
I'm thinking of a better way to approach it in this version.

@DanielNoord
Copy link
Collaborator

I think this will cause a lot of type errors in pylint as we assume root() always returns... Can we get away by asserting?

@temyurchenko
Copy link
Contributor Author

I think this will cause a lot of type errors in pylint as we assume root() always returns... Can we get away by asserting?

I'm currently in the process of eliminating non-module roots. Without that, the assert is going to fail a loot of times.

@temyurchenko
Copy link
Contributor Author

temyurchenko commented Sep 5, 2024

I think this will cause a lot of type errors in pylint as we assume root() always returns... Can we get away by asserting?

What is the difference between Uninferable and InferenceError? Which one should be used and in what cases? Is there some documentation on it? Or a specific person to ask?

@DanielNoord
Copy link
Collaborator

I think this will cause a lot of type errors in pylint as we assume root() always returns... Can we get away by asserting?

I'm currently in the process of eliminating non-module roots. Without that, the assert is going to fail a loot of times.

I don't understand what you mean with this?

I think this will cause a lot of type errors in pylint as we assume root() always returns... Can we get away by asserting?

What is the difference between Uninferable and InferenceError? Which one should be used and in what cases? Is there some documentation on it? Or a specific person to ask?

It depends on the situation. Ideally InferenceError isn't exposed to the users of astroid such as pylint so that they only need to handle Uninferable.

@temyurchenko
Copy link
Contributor Author

I don't understand what you mean with this?

The root() currently returns non-Module nodes, often. So, if we just assert that the return is a Module, that assert is going to fail often.

It depends on the situation. Ideally InferenceError isn't exposed to the users of astroid such as pylint so that they only need to handle Uninferable.

What is the logic for preferring one or the other internally? I often see lines like:

            if not name.startswith("__") and self.has_dynamic_getattr(context):
                # class handle some dynamic attributes, return a Uninferable object
                yield util.Uninferable
            else:
                raise InferenceError(
                    str(error), target=self, attribute=name, context=context
                ) from error

(this one is from scoped_nodes.ClassDef.igetattr)

@temyurchenko
Copy link
Contributor Author

I don't understand what you mean with this?

The root() currently returns non-Module nodes, often. So, if we just assert that the return is a Module, that assert is going to fail often.

It depends on the situation. Ideally InferenceError isn't exposed to the users of astroid such as pylint so that they only need to handle Uninferable.

What is the logic for preferring one or the other internally? I often see lines like:

            if not name.startswith("__") and self.has_dynamic_getattr(context):
                # class handle some dynamic attributes, return a Uninferable object
                yield util.Uninferable
            else:
                raise InferenceError(
                    str(error), target=self, attribute=name, context=context
                ) from error

(this one is from scoped_nodes.ClassDef.igetattr)

What is the logic for preferring one or the other internally? I often see lines like:

I see, raising the InferenceError doesn't let adding new inference results, while just yielding uninferrable does allow it.

@temyurchenko temyurchenko force-pushed the clarify-non-module-root branch 3 times, most recently from b693781 to c2c38cd Compare September 6, 2024 00:19
Copy link

codecov bot commented Sep 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.09%. Comparing base (a99967e) to head (00a70e9).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #2536      +/-   ##
==========================================
+ Coverage   93.05%   93.09%   +0.04%     
==========================================
  Files          93       93              
  Lines       11050    11042       -8     
==========================================
- Hits        10283    10280       -3     
+ Misses        767      762       -5     
Flag Coverage Δ
linux 92.94% <100.00%> (+<0.01%) ⬆️
pypy 93.09% <100.00%> (+0.04%) ⬆️
windows 93.08% <100.00%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
astroid/bases.py 88.32% <100.00%> (+0.03%) ⬆️
astroid/brain/brain_namedtuple_enum.py 93.50% <100.00%> (+0.61%) ⬆️
astroid/nodes/node_ng.py 93.05% <100.00%> (+0.04%) ⬆️
astroid/nodes/scoped_nodes/scoped_nodes.py 93.21% <100.00%> (+0.07%) ⬆️
astroid/raw_building.py 94.80% <100.00%> (-0.12%) ⬇️

... and 5 files with indirect coverage changes

@temyurchenko temyurchenko force-pushed the clarify-non-module-root branch 5 times, most recently from 703f23a to ebae45f Compare September 6, 2024 22:12
@temyurchenko
Copy link
Contributor Author

Now, this should also close #1490.

I tried my best to give complete descriptions to the commits, because they are lengthy. I also changed the annotation of root() back to just Module (with an added assert).

I'm happy to elaborate more on changes, since there are quite a few of them.

As a side note, I still feel that it's not a very honest type signature, given how many nodes we are creating out-of-tree and thus without a parent (see const_factory). Furthermore, there is no enforcement of the parent is not None invariant anywhere in the codebase, so the accidents are bound to creep in.

@temyurchenko
Copy link
Contributor Author

Also fixes #2517

Copy link
Collaborator

@DanielNoord DanielNoord left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome changes! But also pretty hard to review.

Maintainer time on astroid is sparse and nobody has actively worked on its internals for about a year.

I'd be open to spending some time of the next days/weeks to get this merged but I an only approve smaller bite sized PRs. If you're op for that let's go down that route!

For example, the first commit could be a separate PR to declutter this PR.
The second commit does a lot of things at once (as indicated by the commit message). I think a lot of changes in there could be separated into separate PRs, also to show their effects on the tests more clearly. That helps with reviewing them.

astroid/bases.py Outdated
@@ -5,6 +5,8 @@
"""This module contains base classes and functions for the nodes and some
inference utils.
"""
# isort: off
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this line

yield from _infer_stmts(
self._wrap_attr(get_attr, context), context, frame=self
)
attrs = self.getattr(name, context, lookupclass=False)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you split out these changes into a separate commit/PR? I'd like to see the effect of these changes on the tests by itself.

@@ -20,13 +20,10 @@ def infer_namespace(node, context: InferenceContext | None = None):
"Namespace",
lineno=node.lineno,
col_offset=node.col_offset,
parent=nodes.Unknown(),
parent=AstroidManager().adhoc_module, # this class is not real
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the name of adhoc. But I also don't have better suggestions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can come up with other options. Please, tell what you are unsure about it, to guide the thought process for new names.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Pierre-Sassoulas Any suggestions here?

@@ -4,6 +4,8 @@

"""Astroid hooks for various builtins."""

# isort: off
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove this line

@@ -641,12 +646,13 @@ def infer_property(

prop_func = objects.Property(
function=inferred,
name=inferred.name,
name="<property>",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change? Doesn't that lose value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I talk about it in the commit message:

  • fixing construction of in-place properties (bar = property(getter)). They just create a nameless object, not the one
    with the name of the getter. Thus, the name was changed to
    "". Furthermore, the definition of that property is not
    attached to any scope, as it's again nameless.

This change is for "inplace", nameless, properties, not the ones created with a decorator.

What does it mean for it to lose the value? It's still accessible through an Assign field, for example.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, makes sense. I understand the name change a little better now (although I would probably do f"<property> {inferred.name}" or something, so we don't lose that context.

I don't really understand the parent changes though.

class A:
    temperature = property(get_temperature, set_temperature)

For temperature I would still expect A to be the parent no? Not the adhoc module

Copy link
Contributor Author

@temyurchenko temyurchenko Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For temperature it is. However, temperature is just a name in the AssignName. It is assigned a call to the constructor of actual property. The actual property class is defined somewhere in the builtins module. However, since we are treating property specially, we are creating our own definition of it, separate for each use. Kind of like an Instance. And that special definition isn't defined anywhere in the actual source. We've just made it up on the spot, ad-hoc.

Treating it as if it was defined in A is furthermore incorrect, since it would cause it to be in As body, or searchable from A.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand now, thanks!

lineno=node.lineno,
col_offset=node.col_offset,
# ↓ semantically, the definition of this property isn't within
# node.frame (or anywhere else, really)
parent=AstroidManager().adhoc_module,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this change? Why doesn't the property have the parent of the function being decorated as its parent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above, this is about «inplace» properties, not the «decorator» properties.

# A typical ClassDef automatically adds its name to the parent scope,
# but doing so causes problems, so defer setting parent until after init
# see: https://github.com/pylint-dev/pylint/issues/5982
class_node.parent = node.parent
class_node.postinit(
# set base class=tuple
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think L156 can be remove?

@@ -21,6 +21,8 @@
mechanism.
"""

# isort: off
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to comment on all, but please remove these lines

@temyurchenko
Copy link
Contributor Author

I'd be open to spending some time of the next days/weeks to get this merged but I an only approve smaller bite sized PRs. If you're op for that let's go down that route!

For sure.

For example, the first commit could be a separate PR to declutter this PR.

Will do.

The second commit does a lot of things at once (as indicated by the commit message). I think a lot of changes in there could be separated into separate PRs, also to show their effects on the tests more clearly. That helps with reviewing them.

Will do.

Regarding isort. I've grepped the project and found a few similar isort comment. Based on that I concluded that the project uses isort to sort imports. However, when I ran isort on it, it resulted in quite a few changes, thus the "isort: off" comments. Could you please tell me if I should use isort? If so, is there a special config for the project? I wasn't able to find one.

@DanielNoord
Copy link
Collaborator

"I", # isort

We use ruff to do our import sorting now. Any comments you found are probably outdated (and could be removed in a separate PR)

@temyurchenko
Copy link
Contributor Author

temyurchenko commented Sep 9, 2024

I think a lot of changes in there could be separated into separate PRs, also to show their effects on the tests more clearly.

Is it possible to have something like an merge train on github? That is, for one PR to depend on some previous PR? It's possible to emulate with merging into branches, but then you'd have to do reviews on my local fork, since I can't create branches in this repository.

If not, the simplest way would probably be do just breakup commits within this PR. Would you be open to that?

@DanielNoord
Copy link
Collaborator

DanielNoord commented Sep 9, 2024

I think a lot of changes in there could be separated into separate PRs, also to show their effects on the tests more clearly.

Is it possible to have something like an merge train on github? That is, for one PR to depend on some previous PR? It's possible to emulate with merging into branches, but then you'd have to do reviews on my local fork, since I can't create branches in this repository.

If not, the simplest way would probably be do just breakup commits within this PR. Would you be open to that?

Sadly it doesn't.

I'm okay with splitting up in this PR but I think that increases the risk of me (and other maintainers) finding it hard to approve of the full PR and merge it.

Separate PRs make it much easier to spend 15-20 minutes during lunch to review and merge the smaller commits. From experience I know although it involves more rebase work in the end it significantly increases the chances of astroid PRs landing on main.

Edit: I wanted to add that I'm mainly saying this because I really think the changes in this PR are super valuable and should be merged. I'd hate for this to die down due to maintainer fatigue and want to do anything to prevent that from happening :)

@temyurchenko
Copy link
Contributor Author

I'm okay with splitting up in this PR but I think that increases the risk of me (and other maintainers) finding it hard to approve of the full PR and merge it.

Sure, I'll do my best to achieve that.

@temyurchenko
Copy link
Contributor Author

I've made the first batch of PRs that cover a big part of this one. One thing is, #2557 is better reviewed the last.

@Pierre-Sassoulas
Copy link
Member

The secret to merge PR with volunteer maintainers has been told. I have multiple 5mn free windows on mobile during the days. I'll hit that merge button when my rust code is compiling, when the pasta's water is heating, when I'm waiting for the train, or when a butterfly distracted me and I checked the notification on my phone. I have some 10mn/20mn windows during the week. But one hour of uninterrupted concentration time on PC ? I have to plan ahead for that, it's not going to happens (not in a reasonable time at least)

@temyurchenko
Copy link
Contributor Author

The secret to merge PR with volunteer maintainers has been told. I have multiple 5mn free windows on mobile during the days. I'll hit that merge button when my rust code is compiling, when the pasta's water is heating, when I'm waiting for the train, or when a butterfly distracted me and I checked the notification on my phone. I have some 10mn/20mn windows during the week. But one hour of uninterrupted concentration time on PC ? I have to plan ahead for that, it's not going to happens (not in a reasonable time at least)

I did my best to split off the PR into smaller ones, as you can see in the comment above. They currently don't cover the whole of this PR, but I will finish the job some time later.

@DanielNoord
Copy link
Collaborator

I think I went through all PRs and merged the first one that should solve most rebase issues. Let me know when you rebased the others or need me to do another review!

@temyurchenko
Copy link
Contributor Author

I think I went through all PRs and merged the first one that should solve most rebase issues. Let me know when you rebased the others or need me to do another review!

Thank you for the reviews!

@DanielNoord
Copy link
Collaborator

To give an update: changes coming a long nicely. @temyurchenko has been very helpful in rebasing and responding to comments.

#2563 is on my radar, but after a long week at work I don't really have the capacity to have a good look at it. I hope to do so this weekend or on Monday.

- using "tuple" ClassDef for a base of 'namedtuple' instead of a
   Name. We're already doing it for "enum"s, and I don't know how to
   ensure that the Name actually refers to the actual tuple, and not
   something shadowing it in the scope. Removed the test that asserted
   that the base is a Name and not a ClassDef. If it is actually
   useful, it should be checked and reworked comprehensively across
   all nodes (cf. Enum).

it's a part of the campaign to get rid of non-module roots
We also remove `add_local_node` to avoid redundancy. Instead we do the
   attachment to the parent scope in the constructor of `FunctionDef`.

We append a node to the body of the frame when it is also the
   parent. If it's not a parent, then the node should belong to the
   "body" of the parent if it existed. An example is a definition
   within an "if", where the parent is the If node, but the frame is
   the whole module.

it's a part of the campaign to get rid of non-module roots
Instances often have «special_attributes». ClassDef does a lot of
   transformations of attibutes during inference, so let's reuse them
   in Instance.

We are fixing inference of the "special" attributes of
   Instances. Before these changes, the FunctionDef attributes
   wouldn't get wrapped into a BoundMethod. This was facilitated by
   extracting the logic of inferring attributes into
   'FunctionDef._infer_attrs' and reusing it in BaseInstance.

  This issue isn't visible right now, because the special attributes
   are simply not found due not being attached to the parent (the
   instance). Which in turn is caused by not providing the actual
   parent in the constructor. Since we are on a quest to always
  provide a parent, this change is necessary.
that involved several changes

- creating an "adhoc" module, specifically for nodes that are not
   based in the real syntactic tree, but created "ad-hoc". See
   "__astroid_adhoc" and 'AstroidManager().adhoc_module'.

- eliminating all "Unknown" nodes as parents. They are not
   'Module's. Most of the time, it is sufficient to either specify the
   actual parent in the constructor or the adhoc module from above.

- fixing construction of in-place properties (`bar =
   property(getter)`). They just create a nameless object, not the one
   with the name of the getter. Thus, the name was changed to
   "<property>". Furthermore, the definition of that property is not
   attached to any scope, as it's again nameless.

- using "tuple" ClassDef for a base of 'namedtuple' instead of a
   Name. We're already doing it for "enum"s, and I don't know how to
   ensure that the Name actually refers to the actual tuple, and not
   something shadowing it in the scope. Removed the test that asserted
   that the base is a Name and not a ClassDef. If it is actually
   useful, it should be checked and reworked comprehensively across
   all nodes (cf. Enum).

- appending a node to the body of the frame when it is also the
   parent. If it's not a parent, then the node should belong to the
   "body" of the parent if it existed. An example is a definition
   within an "if", where the parent is the If node, but the frame is
   the whole module. See FunctionDef.__init__.

- fixing inference of the "special" attributes of Instances. Before
   these changes, the FunctionDef attributes wouldn't get wrapped into
   a BoundMethod. This was facilitated by extracting the logic of
   inferring attributes into 'FunctionDef._infer_attrs' and reusing it
   in BaseInstance.

  This issue wasn't visible before, because the special attributes
   were simply not found due not being attached to the parent (the
   instance). Which in turn was caused by not providing the actual
   parent in the constructor.

- enforcing a non-None parent in various builders in raw_building.py

- fixing tests to accomodate for changes

attach classes to their parent; fix bugs uncovered by this

1. some '__doc__' fields of standard library
   symbols (e.g. WrapperDescriptorType.__doc__) don't return a string,
   they return a 'getset_descriptor'. Thus, an attempt to print "as
   string" fails. The solution is to check that __doc__ is an instance
   of str.

2. A "temporary_class" was attached to a function node, when it
   shouldn't have been. The solution is to reattach it to the adhoc
   module.
Sometimes a class accesses a member by a different name than
   "__name__" of that member: in pypy3 `list.__mul__.__name__ ==
   "__rmul__"`.

As a result, in the example above we weren't able to find
   "list.__mul__", because it was recorded only as "list.__rmul__".
The nodes are often created in an ad-hoc way, and their parent is not
   always set. We can't control for that invariant fully in the
   constructor, since the parent is sometimes set outside of the
   constructor. The previous commits did their best to clean up such
   situations, but let's add an assert just in case.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants