Regression of 3.13.1 with iterator creation being duplicated #127682

kayhayen · 2024-12-06T11:45:20Z

Bug report

Bug description:

For my Python compiler Nuitka, I use CPython as the oracle of what the correct behaviour is. I am running some tests that I used to clarify the behavior from decades ago, in this case I wanted to know when exactly the iterator creation is used. I striped this test a bunch, so the regression is still visible. I first noticed the issue on GitHub Actions, where 3.13.0 got replaced with 3.13.1 for Windows and Linux, but it applies to all OSes. See below for a diff, that the same iterator is created multiple times.

""" Generator expression tests

"""

from __future__ import print_function

import inspect

print("Generator expression that demonstrates the timing:")


def iteratorCreationTiming():
    def getIterable(x):
        print("Getting iterable", x)
        return Iterable(x)

    class Iterable:
        def __init__(self, x):
            self.x = x  # pylint: disable=invalid-name
            self.values = list(range(x))
            self.count = 0

        def __iter__(self):
            print("Giving iterator now", self.x)

            return self

        def __next__(self):
            print("Next of", self.x, "is", self.count)

            if len(self.values) > self.count:
                self.count += 1

                return self.values[self.count - 1]
            else:
                print("Raising StopIteration for", self.x)

                raise StopIteration

        # Python2/3 compatibility.
        next = __next__

        def __del__(self):
            print("Deleting", self.x)

    gen = ((y, z) for y in getIterable(3) for z in getIterable(2))

    print("next value is", next(gen))
    res = tuple(gen)
    print("remaining generator is", res)

    try:
        next(gen)
    except StopIteration:
        print("Usage past end gave StopIteration exception as expected.")

        try:
            print("Generator state then is", inspect.getgeneratorstate(gen))
        except AttributeError:
            pass

        print("Its frame is now", gen.gi_frame)

    print("Early aborting generator:")

    gen2 = ((y, z) for y in getIterable(3) for z in getIterable(2))
    del gen2

iteratorCreationTiming()

The unified diff between 3.13.0 output (and basically all Python versions before) and 3.13.1 output.


--- out-3.13.0.txt      2024-12-06 12:37:19.447115100 +0100
+++ out-3.13.1.txt      2024-12-06 12:37:23.452239500 +0100
@@ -1,9 +1,11 @@
 Generator expression that demonstrates the timing:
 Getting iterable 3
 Giving iterator now 3
+Giving iterator now 3
 Next of 3 is 0
 Getting iterable 2
 Giving iterator now 2
+Giving iterator now 2
 Next of 2 is 0
 next value is (0, 0)
 Next of 2 is 1
@@ -13,6 +15,7 @@
 Next of 3 is 1
 Getting iterable 2
 Giving iterator now 2
+Giving iterator now 2
 Next of 2 is 0
 Next of 2 is 1
 Next of 2 is 2
@@ -21,6 +24,7 @@
 Next of 3 is 2
 Getting iterable 2
 Giving iterator now 2
+Giving iterator now 2
 Next of 2 is 0

The duplicated prints out the iterator creation are new. This is not optimal and new. I don't know if the iterator being through a slot cause cause this or what it is. I checked if generator.c changed but I think it didn't at all.

My self compiled Python 3.13.1 for Linux and the official Windows download agree in behaviour.

CPython versions tested on:

3.13

Operating systems tested on:

Linux, Windows

The text was updated successfully, but these errors were encountered:

brianschubert · 2024-12-06T14:30:10Z

Bisected to bcc7227, backport of #125178

cc @efimov-mikhail @JelleZijlstra @markshannon

JelleZijlstra · 2024-12-06T14:45:13Z

This is an expected result from the change @brianschubert linked, which fixes a crash. Our belief was that __iter__ is supposed to return self on iterators, so it's safe to call it multiple times.

kayhayen · 2024-12-06T15:19:53Z

It does return self though in my concrete code here. My goal was to see what generator expressions do when to be 100% compatible.

        def __iter__(self):
            print("Giving iterator now", self.x)

            return self

But nobody ever promised to not have a side effect in there, and calling it twice is strange, which one of the two gets used for what there? Right now I cannot distinguish the two.

As for real-life code relevance, I am more than willing to admit this is garbage code. I just did a bunch of prints in iterator functions to see when they get called, and now they're getting called more often. Would it have to be a performance regression to make two calls?

efimov-mikhail · 2024-12-06T15:50:13Z

Actually, there is a PR for main branch related to this issue:
#126408
It it will be accepted, then multiple calls of __iter__ be removed.

But on 3.13 current behavior is not a bug.

markshannon · 2024-12-06T16:20:01Z

The problem is specific to generator expressions which have acted in a subtly different way from generators that (almost) no one noticed.

For a generator, any iteration occurs once the generator is executed. For generator expressions creating an iterator from the iterable happened when the generator expression is created. However there was no check that the iteration variable held an iterator when the generator expression executed, which could lead to a crash in exceptional circumstances.

This can be seen by disassembling this function:

def f(seq):
    return (x for x in seq)

Up to 3.13

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
       L2:     FOR_ITER                 6 (to L3)       # This is unsafe if .0 contains a non-iterator
               ...

Now:

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              GET_ITER
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

What I would like:

  2           LOAD_CONST               0 (<code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>)
              MAKE_FUNCTION
              LOAD_FAST                0 (seq)
              CALL                     0
              RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f1ad8c25be0, file "<python-input-9>", line 2>:
   2           RETURN_GENERATOR
               POP_TOP
       L1:     RESUME                   0
               LOAD_FAST                0 (.0)
               GET_ITER
       L2:     FOR_ITER                 6 (to L3)
               ...

I think we should make generator expressions behave exactly like generators. It seems surprising that they would not.

hroncok · 2024-12-06T18:51:46Z

FWIW we have discovered a segfault in libdnf because of this. The problem pre-existed in libdnf but was never triggered because nobody ever did iter(iter(...)) there.

https://bugzilla.redhat.com/2330562 rpm-software-management/libdnf#1682

hroncok · 2024-12-11T13:06:46Z

And another one: https://bugzilla.redhat.com/2331665 rpm-software-management/libcomps#116

I realize both of those use cases did it wrong, but this Python's new behavior tends to uncover problems that would otherwise never bite anybody.

kayhayen added the type-bug An unexpected behavior, bug, or error label Dec 6, 2024

picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Dec 6, 2024

nrnhines mentioned this issue Dec 7, 2024

Python 3.13.1 broke [s for s in sl] where sl is a SectionList. neuronsimulator/nrn#3276

Merged

hroncok mentioned this issue Dec 9, 2024

Segfault in Python when an object returns this and multiple Python objects wrap one C++ object swig/swig#3086

Open

sklam mentioned this issue Dec 13, 2024

Py3.13.1 changed bytecode for generator causing list-comp tests to fail numba/numba#9846

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression of 3.13.1 with iterator creation being duplicated #127682

Regression of 3.13.1 with iterator creation being duplicated #127682

kayhayen commented Dec 6, 2024 •

edited by github-actions bot

Loading

brianschubert commented Dec 6, 2024

JelleZijlstra commented Dec 6, 2024

kayhayen commented Dec 6, 2024

efimov-mikhail commented Dec 6, 2024 •

edited

Loading

markshannon commented Dec 6, 2024 •

edited

Loading

hroncok commented Dec 6, 2024

hroncok commented Dec 11, 2024 •

edited

Loading

Regression of 3.13.1 with iterator creation being duplicated #127682

Regression of 3.13.1 with iterator creation being duplicated #127682

Comments

kayhayen commented Dec 6, 2024 • edited by github-actions bot Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

brianschubert commented Dec 6, 2024

JelleZijlstra commented Dec 6, 2024

kayhayen commented Dec 6, 2024

efimov-mikhail commented Dec 6, 2024 • edited Loading

markshannon commented Dec 6, 2024 • edited Loading

hroncok commented Dec 6, 2024

hroncok commented Dec 11, 2024 • edited Loading

kayhayen commented Dec 6, 2024 •

edited by github-actions bot

Loading

efimov-mikhail commented Dec 6, 2024 •

edited

Loading

markshannon commented Dec 6, 2024 •

edited

Loading

hroncok commented Dec 11, 2024 •

edited

Loading