Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression of 3.13.1 with iterator creation being duplicated #127682

Open
kayhayen opened this issue Dec 6, 2024 · 7 comments
Open

Regression of 3.13.1 with iterator creation being duplicated #127682

kayhayen opened this issue Dec 6, 2024 · 7 comments
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error

Comments

@kayhayen
Copy link

kayhayen commented Dec 6, 2024

Bug report

Bug description:

For my Python compiler Nuitka, I use CPython as the oracle of what the correct behaviour is. I am running some tests that I used to clarify the behavior from decades ago, in this case I wanted to know when exactly the iterator creation is used. I striped this test a bunch, so the regression is still visible. I first noticed the issue on GitHub Actions, where 3.13.0 got replaced with 3.13.1 for Windows and Linux, but it applies to all OSes. See below for a diff, that the same iterator is created multiple times.

""" Generator expression tests

"""

from __future__ import print_function

import inspect

print("Generator expression that demonstrates the timing:")


def iteratorCreationTiming():
    def getIterable(x):
        print("Getting iterable", x)
        return Iterable(x)

    class Iterable:
        def __init__(self, x):
            self.x = x  # pylint: disable=invalid-name
            self.values = list(range(x))
            self.count = 0

        def __iter__(self):
            print("Giving iterator now", self.x)

            return self

        def __next__(self):
            print("Next of", self.x, "is", self.count)

            if len(self.values) > self.count:
                self.count += 1

                return self.values[self.count - 1]
            else:
                print("Raising StopIteration for", self.x)

                raise StopIteration

        # Python2/3 compatibility.
        next = __next__

        def __del__(self):
            print("Deleting", self.x)

    gen = ((y, z) for y in getIterable(3) for z in getIterable(2))

    print("next value is", next(gen))
    res = tuple(gen)
    print("remaining generator is", res)

    try:
        next(gen)
    except StopIteration:
        print("Usage past end gave StopIteration exception as expected.")

        try:
            print("Generator state then is", inspect.getgeneratorstate(gen))
        except AttributeError:
            pass

        print("Its frame is now", gen.gi_frame)

    print("Early aborting generator:")

    gen2 = ((y, z) for y in getIterable(3) for z in getIterable(2))
    del gen2

iteratorCreationTiming()

The unified diff between 3.13.0 output (and basically all Python versions before) and 3.13.1 output.


--- out-3.13.0.txt      2024-12-06 12:37:19.447115100 +0100
+++ out-3.13.1.txt      2024-12-06 12:37:23.452239500 +0100
@@ -1,9 +1,11 @@
 Generator expression that demonstrates the timing:
 Getting iterable 3
 Giving iterator now 3
+Giving iterator now 3
 Next of 3 is 0
 Getting iterable 2
 Giving iterator now 2
+Giving iterator now 2
 Next of 2 is 0
 next value is (0, 0)
 Next of 2 is 1
@@ -13,6 +15,7 @@
 Next of 3 is 1
 Getting iterable 2
 Giving iterator now 2
+Giving iterator now 2
 Next of 2 is 0
 Next of 2 is 1
 Next of 2 is 2
@@ -21,6 +24,7 @@
 Next of 3 is 2
 Getting iterable 2
 Giving iterator now 2
+Giving iterator now 2
 Next of 2 is 0

The duplicated prints out the iterator creation are new. This is not optimal and new. I don't know if the iterator being through a slot cause cause this or what it is. I checked if generator.c changed but I think it didn't at all.

My self compiled Python 3.13.1 for Linux and the official Windows download agree in behaviour.

CPython versions tested on:

3.13

Operating systems tested on:

Linux, Windows

@kayhayen kayhayen added the type-bug An unexpected behavior, bug, or error label Dec 6, 2024
@brianschubert
Copy link
Contributor

Bisected to bcc7227, backport of #125178

cc @efimov-mikhail @JelleZijlstra @markshannon

@JelleZijlstra
Copy link
Member

This is an expected result from the change @brianschubert linked, which fixes a crash. Our belief was that __iter__ is supposed to return self on iterators, so it's safe to call it multiple times.

@kayhayen
Copy link
Author

kayhayen commented Dec 6, 2024

It does return self though in my concrete code here. My goal was to see what generator expressions do when to be 100% compatible.

        def __iter__(self):
            print("Giving iterator now", self.x)

            return self

But nobody ever promised to not have a side effect in there, and calling it twice is strange, which one of the two gets used for what there? Right now I cannot distinguish the two.

As for real-life code relevance, I am more than willing to admit this is garbage code. I just did a bunch of prints in iterator functions to see when they get called, and now they're getting called more often. Would it have to be a performance regression to make two calls?

@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Dec 6, 2024
@efimov-mikhail
Copy link
Contributor

efimov-mikhail commented Dec 6, 2024

Actually, there is a PR for main branch related to this issue:
#126408
It it will be accepted, then multiple calls of __iter__ be removed.

But on 3.13 current behavior is not a bug.

@markshannon
Copy link
Member

markshannon commented Dec 6, 2024

The problem is specific to generator expressions which have acted in a subtly different way from generators that (almost) no one noticed.

For a generator, any iteration occurs once the generator is executed. For generator expressions creating an iterator from the iterable happened when the generator expression is created. However there was no check that the iteration variable held an iterator when the generator expression executed, which could lead to a crash in exceptional circumstances.

This can be seen by disassembling this function:

def f(seq):
    return (x for x in seq)

Up to 3.13

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
       L2:     FOR_ITER                 6 (to L3)       # This is unsafe if .0 contains a non-iterator
               ...

Now:

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

What I would like:

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

I think we should make generator expressions behave exactly like generators. It seems surprising that they would not.

@hroncok
Copy link
Contributor

hroncok commented Dec 6, 2024

FWIW we have discovered a segfault in libdnf because of this. The problem pre-existed in libdnf but was never triggered because nobody ever did iter(iter(...)) there.

https://bugzilla.redhat.com/2330562 rpm-software-management/libdnf#1682

@hroncok
Copy link
Contributor

hroncok commented Dec 11, 2024

And another one: https://bugzilla.redhat.com/2331665 rpm-software-management/libcomps#116

I realize both of those use cases did it wrong, but this Python's new behavior tends to uncover problems that would otherwise never bite anybody.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

7 participants