Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python 3.12 - ValueError: generator already executing #9138

Open
sivel opened this issue Oct 10, 2023 · 23 comments · Fixed by #9454
Open

Python 3.12 - ValueError: generator already executing #9138

sivel opened this issue Oct 10, 2023 · 23 comments · Fixed by #9454
Labels
Astroid Related to astroid Crash 💥 A bug that makes pylint crash Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning python 3.12 Upstream Bug 🪲 Bug in a dependency of pylint that is not astroid
Milestone

Comments

@sivel
Copy link

sivel commented Oct 10, 2023

Bug description

It seems that with a combination of a commit to astroid, and a change in Python 3.12, I am getting a ValueError: generator already executing.

Unfortunately, despite about 12 hours of trying, I don't have a simple reproducer, and this will involve running pylint against https://github.com/ansible/ansible

I've logged here instead of directly against astroid, as I was not able to find an easy way to cause the error with astroid alone.

Configuration

N/A

Command used

git clone git@github.com:ansible/ansible.git
cd ansible
python3.12 -m pylint lib/ansible/playbook/base.py > /dev/null

Pylint output

Traceback (most recent call last):
  File ".../astroid/astroid/decorators.py", line 104, in inner
    yield from generator
ValueError: generator already executing

If I insert additional debugging into astroid, to use traceback.print_stack() when that ValueError happens, I see something like:

Traceback...

  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File ".../lib/python3.12/site-packages/pylint/__main__.py", line 10, in <module>
    pylint.run_pylint()
  File ".../lib/python3.12/site-packages/pylint/__init__.py", line 34, in run_pylint
    PylintRun(argv or sys.argv[1:])
  File ".../lib/python3.12/site-packages/pylint/lint/run.py", line 211, in __init__
    linter.check(args)
  File ".../lib/python3.12/site-packages/pylint/lint/pylinter.py", line 704, in check
    self._lint_files(ast_per_fileitem, check_astroid_module)
  File ".../lib/python3.12/site-packages/pylint/lint/pylinter.py", line 752, in _lint_files
    self._lint_file(fileitem, module, check_astroid_module)
  File ".../lib/python3.12/site-packages/pylint/lint/pylinter.py", line 788, in _lint_file
    check_astroid_module(module)
  File ".../lib/python3.12/site-packages/pylint/lint/pylinter.py", line 1017, in check_astroid_module
    retval = self._check_astroid_module(
  File ".../lib/python3.12/site-packages/pylint/lint/pylinter.py", line 1069, in _check_astroid_module
    walker.walk(node)
  File ".../lib/python3.12/site-packages/pylint/utils/ast_walker.py", line 94, in walk
    self.walk(child)
  File ".../lib/python3.12/site-packages/pylint/utils/ast_walker.py", line 94, in walk
    self.walk(child)
  File ".../lib/python3.12/site-packages/pylint/utils/ast_walker.py", line 94, in walk
    self.walk(child)
  File ".../lib/python3.12/site-packages/pylint/utils/ast_walker.py", line 91, in walk
    callback(astroid)
  File ".../lib/python3.12/site-packages/pylint/checkers/typecheck.py", line 1440, in visit_call
    called, implicit_args, callable_name = _determine_callable(called)
  File ".../lib/python3.12/site-packages/pylint/checkers/typecheck.py", line 636, in _determine_callable
    new = callable_obj.local_attr("__new__")[-1]
  File ".../astroid/astroid/nodes/scoped_nodes/scoped_nodes.py", line 2327, in local_attr
    class_node = next(self.local_attr_ancestors(name, context), None)
  File ".../astroid/astroid/nodes/scoped_nodes/scoped_nodes.py", line 2279, in local_attr_ancestors
    ancestors: Iterable[ClassDef] = self.mro(context)[1:]
  File ".../astroid/astroid/nodes/scoped_nodes/scoped_nodes.py", line 2924, in mro
    return self._compute_mro(context=context)
  File ".../astroid/astroid/nodes/scoped_nodes/scoped_nodes.py", line 2893, in _compute_mro
    inferred_bases = list(self._inferred_bases(context=context))
  File ".../astroid/astroid/nodes/scoped_nodes/scoped_nodes.py", line 2873, in _inferred_bases
    baseobj = next(
  File ".../astroid/astroid/decorators.py", line 56, in wrapped
    yield res
  File ".../astroid/astroid/bases.py", line 181, in _infer_stmts
    yield inf
  File ".../astroid/astroid/nodes/node_ng.py", line 175, in infer
    yield result
  File ".../astroid/astroid/decorators.py", line 56, in wrapped
    yield res
  File ".../astroid/astroid/bases.py", line 181, in _infer_stmts
    yield inf
  File ".../astroid/astroid/nodes/node_ng.py", line 175, in infer
    yield result
  File ".../astroid/astroid/decorators.py", line 56, in wrapped
    yield res
  File ".../astroid/astroid/bases.py", line 181, in _infer_stmts
    yield inf
  File ".../astroid/astroid/nodes/node_ng.py", line 175, in infer
    yield result
  File ".../astroid/astroid/decorators.py", line 56, in wrapped
    yield res
  File ".../astroid/astroid/nodes/node_ng.py", line 175, in infer
    yield result
  File ".../astroid/astroid/decorators.py", line 107, in inner
    traceback.print_stack(file=sys.stderr)

Expected behavior

No traceback

Pylint version

astroid 3.0.0
pylint 3.0.1
Python 3.12.0

OS / Environment

N/A

Additional dependencies

N/A
@sivel sivel added the Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling label Oct 10, 2023
@sivel
Copy link
Author

sivel commented Oct 10, 2023

Based on my very non-expert research, it seems that due to using cls.is_subclass_of('enum.Enum') has caused .ancestors() to be called and .infer(), and that is somehow interfering with the ability to compute the mro in _compute_mro which also calls .infer(). As to exactly how, I'm at a little bit of a loss.

As to how exactly the cpython commit that I bisected earlier plays into this, I have no clue.

@jacobtylerwalls
Copy link
Member

Thanks for the report. I've seen this from time to time. Do you have any sense of the severity of this issue?

@sivel
Copy link
Author

sivel commented Oct 11, 2023

This is preventing us from upgrading to pylint 3, as well as switching our default python version to py3.12. This not only affects the CI for ansible/ansible, but content authors using our ansible-test tooling.

It will also cause issues with the ability for Fedora (or other distros switching to python3.12 as their default) to package ansible, due to Fedora 39 needing to include pylint 3 as a result of their inclusion of Python 3.12. As such, ansible-test installed from their packages will be partially unusable.

@Pierre-Sassoulas Pierre-Sassoulas added High priority Issue with more than 10 reactions python 3.12 Crash 💥 A bug that makes pylint crash and removed Needs triage 📥 Just created, needs acknowledgment, triage, and proper labelling labels Oct 11, 2023
@jacobtylerwalls
Copy link
Member

Thanks for the extra details. I was wondering if the error was thrown only in object finalization after pylint's main program finishes. Obviously we need to clean it up either way, but this sounds more serious than the error I'd seen from time to time before.

@sivel
Copy link
Author

sivel commented Oct 11, 2023

The exception output is interspersed in the normal output from pylint. So it doesn't seem necessarily to happen at the very end. I'm unsure exactly the internal timing of things, it could be in object finalization, but it doesn't seem to be after the main program finishes.

[SNIP]
lib/ansible/playbook/base.py:435:12: W0201: Attribute '_ds' defined outside __init__ (attribute-defined-outside-init)
lib/ansible/playbook/base.py:716:0: C0115: Missing class docstring (missing-class-docstring)
Traceback (most recent call last):
  File "/Users/sivel/projects/ansibledev/astroid/astroid/decorators.py", line 104, in inner
    yield from generator
ValueError: generator already executing
lib/ansible/playbook/base.py:739:58: E1101: Module 'ansible.constants' has no 'ANY_ERRORS_FATAL' member (no-member)
lib/ansible/playbook/base.py:741:48: E1101: Module 'ansible.constants' has no 'TASK_TIMEOUT' member (no-member)
[SNIP]

@nitzmahone
Copy link

nitzmahone commented Oct 11, 2023

I mean the "easy" solution is just to override the default sys.unraisablehook behavior (since that's where it's coming from), but I've been trying to figure out why some of the decorated generators appear to be getting abandoned in the first place, and what's different in Python 3.12 that's causing them to move when they get finalized- I have a couple of ideas, but the whole thing is a bit of a Heisenbug, since the same repro doesn't manifest under a debugger, and most of the instrumentation code I'm able to add to astroid also prevents the repro from working.

Python 3.12 appears to be much more aggressive about finalizing the (apparently) abandoned wrapped generators- I was able to proxy the ones that are involved in our repro to see when they're finalized (sadly weakref also prevents our repro from working), and running that output side-by-side with 3.11, it's clear that 3.12 finalizes them much earlier than 3.11 does. I've found dozens of ways to "fix" the problem (ie "make our repro stop working"), but without understanding the root cause, I'm not at all convinced we're not just moving the problem around, and that the root cause is really another subtle bug unique to Python 3.12.

I was wondering if the error was thrown only in object finalization after pylint's main program finishes.

They aren't- it's happening inline, and much earlier with 3.12 than 3.11. I've also tried having the proxied generators capture the callstack on init, so I can examine the ones that appear to be leaked later to figure out who created them- python -X dev is helpful in deferring their finalization to exit, but unfortunately the added memory overhead also "fixes" our reliable repro. The abandoned ones do still appear though, they just don't blow up in finalization (which adds to my suspicion that this is really a subtle Python 3.12 regression of some kind, so I don't want to just propose a simple fix that masks the real problem).

@nitzmahone
Copy link

(we're also going to try a throwaway plugin that disables the exception printing in sys.unraisablehook, just to unblock our CI from using pylint 3.0.1 under 3.12, but I still think there's something wiggling under this rock that needs killing 😉 )

@nickdrozd
Copy link
Contributor

I don't know if it's of any use, but here's a whittled-down version of that base.py file that reproduces the error:

from ansible.plugins.loader import action_loader

class FieldAttributeBase:
    def _load_module_defaults(self):
        resolved_action = self._resolve_action()
        if resolved_action:
            validated_defaults_dict[resolved_action] = defaults

        if True:
            resolved_action = self._resolve_action()
            if resolved_action:
                validated_defaults_dict[resolved_action] = defaults

    def _resolve_action(self):
        prefer = action_loader.find_plugin_with_context()

        if prefer.resolved:
            return prefer.resolved_fqcn

@nitzmahone
Copy link

nitzmahone commented Oct 11, 2023

Yeah, at least improves the signal/noise ratio while beating on it- thanks!

@nickdrozd
Copy link
Contributor

Here is an even smaller reproducer. It doesn't import anything else from the library, but it still has to be linted from inside the library (replacing base.py) for the ValueError to surface.

I think an inference around importlib is ultimately the source of the problem.

# pylint: disable = line-too-long, missing-function-docstring, missing-module-docstring, missing-class-docstring, too-few-public-methods

# pylint: disable = no-member, using-constant-test, protected-access, attribute-defined-outside-init, undefined-variable

from importlib import import_module


class PluginLoadContext:
    @property
    def resolved_fqcn(self):
        if True:
            return None

        return self._resolved_fqcn


class PluginLoader:
    def find_plugin_with_context(self):
        _ = import_module('asdf').__file__

        if self._load_module_source():
            self._module_cache[None] = None

        self._load_config_defs()

        self._display_plugin_load()

        if self._plugin_instance_cache:
            self._plugin_instance_cache[None] = self._plugin_instance_cache[None]

        self._load_config_defs()

        if self.undefined:
            __import__()

        PluginLoadContext()._resolved_fqcn = undefined

        resolved_action = self._resolve_action()

        if resolved_action:
            {}[resolved_action] = None

        return PluginLoadContext()

    def _resolve_action(self):
        prefer = PluginLoader().find_plugin_with_context()

        if prefer.resolved:
            return prefer.resolved_fqcn

        return None

After all the disables, Pylint gives this 10/10, even with the ValueError.

@nitzmahone
Copy link

nitzmahone commented Oct 12, 2023

We've updated our CI to use a (hopefully temporary) plugin that installs a custom sys.unraisablehook on 3.12+ to mask this specific issue (while delegating anything else to the previous unraisablehook).

After spending the majority of the past several days digging pretty deeply into this (including hacking a bunch of instrumentation into CPython's genobject.c), I still think it's ultimately a 3.12 regression around generator finalization/close when a yield from is involved. Even with the minimized repro that @nickdrozd posted yesterday (thanks again!), it still requires evaluation under the context of the Ansible codebase to manifest the issue, so the resultant wad of subgenerators is quite deep and difficult to navigate (the "needle in a stack of needles" problem, since there are so many of them involved in the run, most of which behave normally and are very short-lived). There's also an awful lot of caching involved in astroid- disabling nearly any of it "fixes" the repro, so it might also be possible that a generator instance is sneaking into one of those caches, or somehow being aliased in a way that causes the problem, but it seems like that would also manifest under < 3.12.

Once I hacked in enough instrumentation to be able to roughly correlate the C-side generator wakeups to their associated Python frames, it appears that the issue is around a re-entrant C-side wakeup that's kicked off when a particular abandoned Attribute._infer generator (decorated with path_wrapper and raise_if_nothing_inferred) gets finalized. With this repro, the raise_if_nothing_inferred layer sits at its second-stage yield from when finalized, which has special handling in gen_close- it directly marks that generator as executing, then calls the gen_close_iter helper to close down the subgenerator. That kicks off a very deep chain of closes and finalizations, one of which ultimately tries to do a re-entrant gen_close on the same generator instance from the top of the chain. Since the finalization/close frame eval stuff is deeply recursive, the generator already executing ValueError is indeed correct. The thing I've been struggling to figure out is if this is a bug in:

  • the GC/finalizer - any change that even slightly extends that generator's life "fixes" the problem
  • Python's generator close/wind-down code - the possibility of a re-entrant close doesn't appear to have been accounted for in the multiple places that mark the generator FRAME_EXECUTING
  • new 3.12 generator optimizations (eg FOR_ITER_GEN; possibly a subtle refcount issue that's making the problematic generator eligible for finalization before its enclosing generator layers are)
  • astroid - with the many layers of caching and decorator magic happening, it's not out of the realm of possibility that a generator reference is being shared in a way that causes finalization issues that were only exposed by behavior changes in 3.12

Anyway, I need to get back to other work, but I might come back to it at some point- I just can't squash the feeling that there's a serious issue lurking here...

@jacobtylerwalls
Copy link
Member

Thanks for the deep dive.

There's also an awful lot of caching involved in astroid- disabling nearly any of it "fixes" the repro

I'm willing to try this, even if there's a python 3.12 issue that could be reported upstream. Specifically I think we can revert pylint-dev/astroid@0d4f73d. I'll see what the pylint primer tool says about that.

@jacobtylerwalls jacobtylerwalls self-assigned this Oct 14, 2023
@jacobtylerwalls jacobtylerwalls removed their assignment Oct 15, 2023
@jacobtylerwalls jacobtylerwalls added Astroid Related to astroid Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning and removed High priority Issue with more than 10 reactions labels Dec 26, 2023
@matejsp
Copy link

matejsp commented Jan 4, 2024

We also got hit by this issue. Really annoying. We introduced another plugin for fixing this issue based on:
https://github.com/ansible/ansible/pull/81953/files#diff-6f76ad123895780c325b531bbaf0f55837bd3c6f10838b9c164ca67247179982

Is there any chance to somehow fix this? Maybe even hacking with the solution provided by Ansible. Pylint is just unstable and useless out of the box for us with latest python.

@jacobtylerwalls jacobtylerwalls added the Upstream Bug 🪲 Bug in a dependency of pylint that is not astroid label Jan 4, 2024
@cdce8p
Copy link
Member

cdce8p commented Feb 20, 2024

Spend some time bisecting cpython today to figure out when the error first appeared. This is what I've got so far

I haven't had time yet to look at these PRs in more detail. Wanted to post this here in case someone might be interested.
For testing I used the example from @nickdrozd above: #9138 (comment) and modified the current astroid master slightly. In rebuilder.py, I added PY312_PLUS = False after the imports.

This was the full error between a2 and b1

Exception on node <Call l.21 at 0xffffadfd9910> in file '/usr/src/test.py'
Traceback (most recent call last):
  File "/usr/astroid/astroid/decorators.py", line 90, in inner
    yield next(generator)
          ^^^^^^^^^^^^^^^
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/pylint/utils/ast_walker.py", line 91, in walk
    callback(astroid)
  File "/usr/src/pylint/checkers/base/basic_checker.py", line 708, in visit_call
    if utils.is_terminating_func(node):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/pylint/checkers/utils.py", line 2165, in is_terminating_func
    return True
           ^^^^
  File "/usr/astroid/astroid/nodes/node_ng.py", line 170, in infer
    for i, result in enumerate(self._infer(context=context, **kwargs)):
  File "/usr/astroid/astroid/decorators.py", line 95, in inner
    raise InferenceError(
astroid.exceptions.InferenceError: StopIteration raised without any error information.

@cdce8p
Copy link
Member

cdce8p commented Feb 21, 2024

Just for fun I checked the 3.13 alpha releases today. They are fine!
The issue seems to be fixed after this PR was merged. It's already included in 3.13.0a1.

That all leads me to believe that this might actually be a cpython issue in 3.12, not something else. It would just be awesome to have a self contained reproducer for it that doesn't depend on pylint / astroid. This would be really helpful when opening an issue.

Maybe it does make sense to add the pylint plugin @matejsp suggested to overwrite sys.unraisablehook.

@nitzmahone
Copy link

Thanks for bisecting the CPython builds to track down possible sources- I hadn't had time to try that... Just at a glance, I'd be a little surprised if the dict materialization change actually fixed whatever the problem was, since it doesn't appear to be related to the generator bits at all. It's more likely that pylint/astroid are positively affected by that 3.13 change in other ways, so the timing/circumstances improve just enough that the known reproducers go away (just as they do when other changes around caching or whatever are made). I also wish there was a way to come up with a simpler standalone repro, but I've thus far failed to induce the problem outside of pylint/astroid on Ansible's codebase. 😞

@cdce8p
Copy link
Member

cdce8p commented Feb 22, 2024

Just at a glance, I'd be a little surprised if the dict materialization change actually fixed whatever the problem was, since it doesn't appear to be related to the generator bits at all. It's more likely that pylint/astroid are positively affected by that 3.13 change in other ways, so the timing/circumstances improve just enough that the known reproducers go away (just as they do when other changes around caching or whatever are made).

Seems you're right. I just tested pylint with 3.13 alpha version on a larger code base, Home Assistant. Just without any other dependencies as some aren't compatible yet.

On 3.12.2 I would see anywhere from 1 to 3 (with dependencies) ValueErrors. Those actually increase to 3 (without) for 3.13.0a1. The materialization change I mentioned above is included there, so it doesn't do much overall.

For 3.13.0a2 till 3.13.0a4 I don't see any errors though. Of course it would be great if it would be possible to test with all dependencies to see if they are gone for good, but we'll have to wait a few months for that.

@nitzmahone I would be interested if you can reproduce that for Ansible. If that's the case there might still be a commit in between a1 and a2 that does fix it. It would then just be a question to bisect it again.

@cdce8p
Copy link
Member

cdce8p commented Feb 22, 2024

Did some more bisecting today. At least for Home Assistant the change that does appear to fix all ValueErrors seems to be

Also tested cherry-picking it onto 3.12 which did also work.

@iritkatriel
Copy link

Did some more bisecting today. At least for Home Assistant the change that does appear to fix all ValueErrors seems to be

Also tested cherry-picking it onto 3.12 which did also work.

I've now backported it to 3.12.

@cdce8p
Copy link
Member

cdce8p commented Feb 22, 2024

Thanks a lot @iritkatriel! I've ran some more tests and the issue seems to be truly gone now.

@nitzmahone
Copy link

@nitzmahone I would be interested if you can reproduce that for Ansible.

On the plus side, no issues running on 3.13.0a4 with Ansible's codebase (and our sys.unraiseablehook masker disabled), but owing to the underlying issue's fragile Heisenbug nature, I'm also no longer seeing the issue under 3.12.1, either... 😐

Cautiously optimistic that the CPython upstream fix takes care of this one everywhere- once 3.12.3+ is widely available, we'll kill off our masking plugin and hope that's the end of it.

Thanks everyone!

@cdce8p
Copy link
Member

cdce8p commented Feb 22, 2024

Thanks @nitzmahone 👍🏻

I went ahead and opened #9454 to include the sys.unraisablehook overwrite in the pylint code base. It's scoped just for the affected Python versions.

@jacobtylerwalls
Copy link
Member

jacobtylerwalls commented Aug 17, 2024

This is an issue as of Python 3.12.5 again after python/cpython#120467. See logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Astroid Related to astroid Crash 💥 A bug that makes pylint crash Needs investigation 🔬 A bug or crash where it's not immediately obvious what is happenning python 3.12 Upstream Bug 🪲 Bug in a dependency of pylint that is not astroid
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants