Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generator finalization is slower in 3.11 vs 3.10 #100762

Open
Onlooker2186 opened this issue Jan 5, 2023 · 30 comments
Open

Generator finalization is slower in 3.11 vs 3.10 #100762

Onlooker2186 opened this issue Jan 5, 2023 · 30 comments
Assignees
Labels
3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage

Comments

@Onlooker2186
Copy link

Onlooker2186 commented Jan 5, 2023

I found that the Python 3.11.1 implementation of all() is 30% slower compared to Python 3.10.9.

any() also seems to be around 4% slower on my device

Environment

  • CPython versions tested on: Python 3.10.9 and Python 3.11.1
  • Operating system and architecture: Windows 10 22H2 (Build 19045) 64 bit

Code to test all():

import timeit
from functools import partial
import platform


def find_by_keys(
    keys: list[str],
    table: list[dict[str, str | int]],
    match_data: dict[str, str | int],
) -> dict[str, str | int] | None:
    for item in table:
        if all(item[k] == match_data[k] for k in keys):
            return item
    return None


def main():
    keys: list[str] = ["id", "key_1", "key_2"]
    table: list[dict[str, str | int]] = [
        {
            "id": i,
            "key_1": "val_1",
            "key_2": "val_2",
        }
        for i in range(1, 5001)
    ]
    match_data: dict[str, str | int] = {
        "id": 3000,
        "key_1": "val_1",
        "key_2": "val_2",
    }

    timeit_output = timeit.repeat(
        partial(find_by_keys, keys, table, match_data), repeat=10000, number=1
    )

    average_time = sum(timeit_output) / len(timeit_output)
    best_time = min(timeit_output)
    tps_output = 1 / average_time

    print(f"Python version = {platform.python_version()}")
    print(f"Average time = {average_time}")
    print(f"Best time = {best_time}")
    print(f"Average transactions per second = {tps_output}")


if __name__ == "__main__":
    main()

Results for all()

Console output using Python 3.10.9:

Python version = 3.10.9
Average time = 0.0008256170657486655
Best time = 0.0007106999401003122
Average transactions per second = 1211.2152733824669

Console output using Python 3.11.1:

Python version = 3.11.1
Average time = 0.0011819988898839802
Best time = 0.001033599954098463
Average transactions per second = 846.0244832363215

Code to test any():

import timeit
from functools import partial
import platform


def find_by_keys(
    keys: list[str],
    table: list[dict[str, str | int]],
    match_data: dict[str, str | int],
) -> dict[str, str | int] | None:
    for item in table:
        if any(item[k] == match_data[k] for k in keys):
            return item
    return None


def main():
    keys: list[str] = ["id", "key_1", "key_2"]
    table: list[dict[str, str | int]] = [
        {
            "id": i,
            "key_1": "foo_1",
            "key_2": "foo_2",
        }
        for i in range(1, 5001)
    ]
    match_data: dict[str, str | int] = {
        "id": 3000,
        "key_1": "val_1",
        "key_2": "val_2",
    }

    timeit_output = timeit.repeat(
        partial(find_by_keys, keys, table, match_data), repeat=10000, number=1
    )

    average_time = sum(timeit_output) / len(timeit_output)
    best_time = min(timeit_output)
    tps_output = 1 / average_time

    print(f"Python version = {platform.python_version()}")
    print(f"Average time = {average_time}")
    print(f"Best time = {best_time}")
    print(f"Average transactions per second = {tps_output}")


if __name__ == "__main__":
    main()

Results for any()

Console output using Python 3.10.9:

Python version = 3.10.9
Average time = 0.0009300541202770546
Best time = 0.00090230000205338
Average transactions per second = 1075.206246817238

Console output using Python 3.11.1:

Python version = 3.11.1
Average time = 0.0009645268900785595
Best time = 0.000930099980905652
Average transactions per second = 1036.7777303943815

Linked PRs

@AlexWaygood AlexWaygood added the performance Performance or resource usage label Jan 5, 2023
@eendebakpt
Copy link
Contributor

eendebakpt commented Jan 5, 2023

I can confirm this issue (comparing 3.10.8 to 3.12.0a3+). A shorter minimal example is:

import timeit
import platform

setup="""
def go( ) :
    item=[1,2,3]
    for ii in range(3000):
        all(item[k] == 2 for k in range(3))
""" 

cmd='go()'
timeit_output = timeit.repeat(
   cmd, setup=setup , repeat=500, number=10,
)

average_time = sum(timeit_output) / len(timeit_output)
best_time = min(timeit_output)
tps_output = 1 / average_time

print(f"Python version = {platform.python_version()}")
print(f"Average time = {average_time}")
print(f"Best time = {best_time}")
print(f"Average transactions per second = {tps_output}")

The expression inside the all(...) evaluates to a generator. Replacing this with all( [item[k] == 2 for k in range(3)] ) also shows a performance regression, but it is much smaller.
Some more tests show:

        all(k for k in item) # has no regression
        all(k == 2 for k in item) # has a regression

With git bisect I tried to identify the commit where the regression is introduced, but I could not pinpoint it (there seem to be multiple regressions, and not all commits build on my system)

@pochmann
Copy link
Contributor

pochmann commented Jan 7, 2023

What makes you blame all() instead of the generator and all it involves?

Do you see a difference with all((False,))? That's an iterable likewise starting with False that eliminates almost everything else.

@eendebakpt What times did you get?

@eendebakpt
Copy link
Contributor

@pochmann I executed the the following code:

import pyperf
runner = pyperf.Runner()

for stmt in ['g=(k == 2 for k in item)','g=all(k for k in item)', 'g=all(k == 2 for k in item)' ]:
             time = runner.timeit(name=stmt, stmt=stmt, setup="item=[1,2,3]")

Results 3.10.9:

.....................
g=(k == 2 for k in item): Mean +- std dev: 321 ns +- 10 ns
.....................
g=(k == 2 for k in item); h=list(g): Mean +- std dev: 504 ns +- 7 ns
.....................
g=all(k for k in item): Mean +- std dev: 388 ns +- 9 ns
.....................
all((False,)): Mean +- std dev: 74.1 ns +- 2.3 ns
.....................
g=all(k == 2 for k in item): Mean +- std dev: 434 ns +- 8 ns


Results 3.12.0a3+:

.....................
g=(k == 2 for k in item): Mean +- std dev: 351 ns +- 20 ns
.....................
g=(k == 2 for k in item); h=list(g): Mean +- std dev: 442 ns +- 5 ns
.....................
g=all(k for k in item): Mean +- std dev: 325 ns +- 3 ns
.....................
all((False,)): Mean +- std dev: 45.5 ns +- 1.5 ns
.....................
g=all(k == 2 for k in item): Mean +- std dev: 567 ns +- 7 ns

I am not sure what to conclude from the numbers, but the last statement tested g=all(k == 2 for k in item) is the regression reported in this issue.

@thatbirdguythatuknownot
Copy link
Contributor

There doesn't seem to be any difference in the implementation code of all() and any() between 3.10 and main. Maybe the generators are the issue?

@mdboom
Copy link
Contributor

mdboom commented Jan 9, 2023

Yes, indeed signs point to generators (or maybe generator expressions specifically) more than all or any.

@mdboom
Copy link
Contributor

mdboom commented Jan 9, 2023

EDIT: I wasn't measuring what I thought I was here. See below for corrected results.

I tried to see if the new overhead is in the setup/starting of the generator or the actual iteration (is the overhead constant per generator used, or scales with the number of items iterated over) and it seems it's maybe both...?

I modified this script so that the number of keys in the dictionary is configurable (using only 3 here seems like a hard case if the regression is in generator creation), and then ran it for powers of 2 from 2 to 2048. It definitely gets better as the number of keys goes up, but it never gets on par with 3.10.

image

Not sure what any of this means, but maybe some other faster CPython folks may have insight.

Code example with a variable number of keys ``` import timeit from functools import partial import sys

def find_by_keys(
keys: list[str],
table: list[dict[str, str | int]],
match_data: dict[str, str | int],
) -> dict[str, str | int] | None:

for item in table:
    if all(item[k] == match_data[k] for k in keys):
        return item
return None

def main():
nkeys = int(sys.argv[-1])
keys: list[str] = ["id"] + [f"key_{i}" for i in range(nkeys)]
table: list[dict[str, str | int]] = []
for i in range(1, 5001):
d = {
f"key_{j}": f"val_{j}" for j in range(nkeys)
}
d["id"] = i
table.append(d)
match_data = table[2999].copy()

timeit_output = timeit.repeat(
    partial(find_by_keys, keys, table, match_data), repeat=10000, number=1
)

average_time = sum(timeit_output) / len(timeit_output)
print(average_time)

if name == "main":
main()

</details>

@mdboom
Copy link
Contributor

mdboom commented Jan 9, 2023

Some additional information -- here's the differences in the bytecode.

3.10 bytecode
Disassembly of <code object <genexpr> at 0x7f019adad630, file "/home/mdboom/Work/builds/tmp-generator-regression/test.py", line 13>:
              0 GEN_START                0

 13           2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                11 (to 28)

 14           6 STORE_FAST               1 (k)
              8 LOAD_DEREF               0 (item)
             10 LOAD_FAST                1 (k)
             12 BINARY_SUBSCR
             14 LOAD_DEREF               1 (match_data)
             16 LOAD_FAST                1 (k)
             18 BINARY_SUBSCR
             20 COMPARE_OP               2 (==)

 13          22 YIELD_VALUE
             24 POP_TOP
             26 JUMP_ABSOLUTE            2 (to 4)
        >>   28 LOAD_CONST               0 (None)
             30 RETURN_VALUE
3.11 bytecode
Disassembly of <code object <genexpr> at 0x7fc1302f9530, file "/home/mdboom/Work/builds/tmp-generator-regression/test.py", line 19>:
              0 COPY_FREE_VARS           2

 19           2 RETURN_GENERATOR
              4 POP_TOP
              6 RESUME_QUICK             0
              8 LOAD_FAST                0 (.0)
             10 FOR_ITER                22 (to 56)
             12 STORE_FAST               1 (k)
             14 LOAD_DEREF               2 (item)
             16 LOAD_FAST                1 (k)
             18 BINARY_SUBSCR_DICT
             28 LOAD_DEREF               3 (match_data)
             30 LOAD_FAST                1 (k)
             32 BINARY_SUBSCR_DICT
             42 COMPARE_OP               2 (==)
             48 YIELD_VALUE
             50 RESUME_QUICK             1
             52 POP_TOP
             54 JUMP_BACKWARD_QUICK     23 (to 10)
        >>   56 LOAD_CONST               0 (None)
             58 RETURN_VALUE

@brandtbucher
Copy link
Member

I think @mdboom's example is still dominated by generator creation, since the first 2999 loops will only advance the generator once, and the 3000th will advance it n times, then return.

Perhaps put "id" last in the list of keys, so that all will walk the full list of keys every time?

@mdboom
Copy link
Contributor

mdboom commented Jan 9, 2023

Thanks, @brandtbucher. Indeed I was not measuring what I thought I was measuring. By putting id at the end and getting more iterations, I get a graph that makes a lot more sense.

As the number of iterations in the generator goes up, it trends toward "at par" between 3.10 and 3.11. So I think it's fair to say that whatever regression exists is due to generator startup, not the time spent iterating over it.

image

From the bytecode, it definitely is doing "more work" to start the generator, but I don't know if I could say what room there is for improvement.

Code to test with variable number of keys
import timeit
from functools import partial
import sys


def find_by_keys(
    keys: list[str],
    table: list[dict[str, str | int]],
    match_data: dict[str, str | int],
) -> dict[str, str | int] | None:

    for item in table:
        if all(item[k] == match_data[k] for k in keys):
            break
    return None


def main():
    nkeys = int(sys.argv[-1])
    keys: list[str] = [f"key_{i}" for i in range(nkeys)] + ["id"]
    table: list[dict[str, str | int]] = []
    for i in range(1, 5001):
        d = {
            f"key_{j}": f"val_{j}" for j in range(nkeys)
        }
        d["id"] = str(i)
        table.append(d)
    match_data = table[2999].copy()

    timeit_output = timeit.repeat(
        partial(find_by_keys, keys, table, match_data), repeat=10000, number=1
    )

    average_time = sum(timeit_output) / len(timeit_output)
    print(average_time)


if __name__ == "__main__":
    main()

@brandtbucher brandtbucher changed the title Performance of all() is 30% slower on Python 3.11.1 compared to Python 3.10.9, (and to a lesser extent, any()) Generator creation is slower in 3.11 vs 3.10 Jan 9, 2023
@brandtbucher brandtbucher added 3.11 only security fixes 3.12 bugs and security fixes labels Jan 9, 2023
@brandtbucher
Copy link
Member

brandtbucher commented Jan 11, 2023

This was bugging me, so I dug into it a bit more by running some nano-benchmarks to measure the performance of different parts of the generator life-cycle. Here are the times to...

Create a new generator:

3.9:  1.57 us +- 0.15 us
3.10: 1.46 us +- 0.15 us
3.11:  357 ns +-   14 ns
3.12:  368 ns +-   18 ns

Yield from a generator:

3.9:  65.3 ns +- 4.1 ns
3.10: 59.0 ns +- 3.4 ns
3.11: 36.2 ns +- 5.5 ns
3.12: 32.7 ns +- 7.0 ns

Return from a generator:

3.9:  79.0 ns +- 8.8 ns
3.10: 61.8 ns +- 4.4 ns
3.11: 35.3 ns +- 5.0 ns
3.12: 33.7 ns +- 5.6 ns

Destroy a completed generator:

3.9:  45.8 ns +- 6.8 ns
3.10: 39.8 ns +- 6.2 ns
3.11: 34.1 ns +- 4.8 ns
3.12: 34.8 ns +- 4.5 ns

Destroy a suspended generator:

3.9:  182 ns +-  7 ns
3.10: 149 ns +- 10 ns
3.11: 181 ns +- 11 ns
3.12: 264 ns +- 11 ns

Code here:

import pyperf

LOOPS = 1 << 18

def g():
    yield

def bench_gen_create(loops: int) -> float:
    it = range(loops)
    start = pyperf.perf_counter()
    gens = [g() for _ in it]
    return pyperf.perf_counter() - start

def bench_gen_yield(loops: int) -> float:
    it = range(loops)
    gens = [g() for _ in it]
    start = pyperf.perf_counter()
    for gen in gens:
        for _ in gen:
            break
    return pyperf.perf_counter() - start

def bench_gen_return(loops: int) -> float:
    it = range(loops)
    gens = [g() for _ in it]
    for gen in gens:
        for _ in gen:
            break
    start = pyperf.perf_counter()
    for gen in gens:
        for _ in gen:
            break
    return pyperf.perf_counter() - start

def bench_gen_destroy_completed(loops: int) -> float:
    it = range(loops)
    gens = [g() for _ in it]
    for gen in gens:
        for _ in gen:
            break
    for gen in gens:
        for _ in gen:
            break
    start = pyperf.perf_counter()
    del gens
    return pyperf.perf_counter() - start

def bench_gen_destroy_suspended(loops: int) -> float:
    it = range(loops)
    gens = [g() for _ in it]
    for gen in gens:
        for _ in gen:
            break
    start = pyperf.perf_counter()
    del gens
    return pyperf.perf_counter() - start

if __name__ == "__main__":
    runner = pyperf.Runner(loops=LOOPS)
    runner.bench_time_func("gen_create", bench_gen_create)
    runner.bench_time_func("gen_yield", bench_gen_yield)
    runner.bench_time_func("gen_return", bench_gen_return)
    runner.bench_time_func("gen_destroy_completed", bench_gen_destroy_completed)
    runner.bench_time_func("gen_destroy_suspended", bench_gen_destroy_suspended)

The conclusion seems pretty clear: everything about generators has gotten significantly faster, except for finalizing suspended generators (or, in other words, gen.close()). This seems pretty consistent with the results in earlier comments.

While it's certainly a relief that generators in general haven't gotten slower, I still think the finalization issue deserves a closer look (especially since the situation seems much worse in 3.12).

Off the top of my head, I would guess it's due to our changes to exception handling (and/or frames) - the cost of raising exceptions has gotten higher in recent versions. Generators finalize themselves by throwing GeneratorExit into their suspended frame, which could be why we're seeing this slowdown.

@brandtbucher brandtbucher changed the title Generator creation is slower in 3.11 vs 3.10 Generator finalization is slower in 3.11 vs 3.10 Jan 11, 2023
@markshannon
Copy link
Member

@brandtbucher Thanks for the analysis.
However, it can't be finalization that's the problem. You can't finalize an object more than once, and the combined creation and finalization time is less in 3.11 than for 3.10

Times in µs Creation Finalization Total
3.10 1450 150 1600
3.11 360 180 540

So there must something else going on here.

@markshannon
Copy link
Member

In the above example, the code is creating generator expressions, not plain generators.
This adds additional overhead of creating a function, only to create the generator.
Creating and deallocating a function hasn't changed much since 3.10, but we should probably check that as well.

@brandtbucher
Copy link
Member

So there must something else going on here.

Looking back on the "create" benchmark, I think that the results are heavily skewed by the time taken to build a list of hundreds of thousands of new generators (benchmarking the creation, but not destruction, of generators is actually pretty tricky), as well as the time taken to look up the global name g.

Here are some new numbers that avoid these issues:

Create and destroy:

3.9:  168 ns +- 25 ns
3.10: 165 ns +- 25 ns
3.11: 217 ns +- 20 ns
3.12: 181 ns +- 25 ns

Create and destroy (expression):

3.9:  245 ns +- 29 ns
3.10: 257 ns +- 44 ns
3.11: 323 ns +- 26 ns
3.12: 299 ns +- 50 ns

Create, exhaust, and destroy:

3.9:  143 ns +- 17 ns
3.10: 140 ns +- 22 ns
3.11: 131 ns +- 33 ns
3.12: 119 ns +- 32 ns

Create, exhaust, and destroy (expression):

3.9:  188 ns +- 42 ns
3.10: 192 ns +- 44 ns
3.11: 209 ns +- 48 ns
3.12: 189 ns +- 36 ns

Code:

def g():
    """Equivalient to: (_ for _ in ())"""
    for _ in (): yield _

def bench_gen_create_destroy(loops: int) -> float:
    it = range(loops)
    g_fast = g
    start = pyperf.perf_counter()
    for _ in it:
        g_fast()
    return pyperf.perf_counter() - start

def bench_gen_expr_create_destroy(loops: int) -> float:
    it = range(loops)
    start = pyperf.perf_counter()
    for _ in it:
        (_ for _ in ())
    return pyperf.perf_counter() - start

def bench_gen_create_exhaust_destroy(loops: int) -> float:
    it = range(loops)
    g_fast = g
    start = pyperf.perf_counter()
    for _ in it:
        for _ in g_fast():
            pass
    return pyperf.perf_counter() - start

def bench_gen_expr_create_exhaust_destroy(loops: int) -> float:
    it = range(loops)
    start = pyperf.perf_counter()
    for _ in it:
        for _ in (_ for _ in ()):
            pass
    return pyperf.perf_counter() - start

You can clearly see that the time taken to create and destroy an unexhausted generator got much worse in 3.11, in a way that isn't reflected in the time taken to create, exhaust, and destroy the same generator. Also, the results are consistent across both "normal" generators and generator expressions... so I think that generator expressions themselves are a red herring here, and the issue is indeed finalization.

(The 3.12 improvements are likely irrelevant, and due to our specialization of the benchmarks' for loops over range iterators.)

@eendebakpt
Copy link
Contributor

Profiling with valgrind shows that the difference between (_ for _ in () ) and

for _ in (_ for _ in () ):
	pass

is in gen_close. A screenshot of the relevant part for the (_ for _ in () ) case:

image

If we somehow avoid this expensive part for the common case that would be great. While analyzing the code I noticed that in the check

    if (PyErr_ExceptionMatches(PyExc_StopIteration)
        || PyErr_ExceptionMatches(PyExc_GeneratorExit)) {
	...

at the end of gen_close we almost never have PyErr_ExceptionMatches(PyExc_StopIteration)==1 (in the entire python testing suite this happens only a few times (in test_generators and test_asyncgen)). Changing the order gives a small performance improvement.

@Onlooker2186
Copy link
Author

Onlooker2186 commented Jan 18, 2023

If this issue does get resolved via some patches to the main branch, is there any chance they can be applied to the publicly released Python 3.11 branch as well? Hoping to not wait another 12 months min to upgrade from Python 3.10

@gvanrossum
Copy link
Member

If this issue does get resolved via some patches to the main branch, is there any chance they can be applied to the publicly released Python 3.11 branch as well? Hoping to not wait another 12 months min to upgrade from Python 3.10

There's definitely a chance. It just depends on how complicated the fix is. We don't have a solution yet, so I can't comment further.

@brandtbucher
Copy link
Member

The proposed fix in #101013 changes the bytecode, but only because it's necessary to handle other changes we made to how generators work in 3.12.

The basic idea (don't call close() if there aren't any active exception table entries) can probably be backported to 3.11 with minimal fuss.

@brandtbucher
Copy link
Member

...that is, if we consider this a bad enough performance regression to call it a "bug".

I'm on the fence, myself, since this is just a consequence of all exceptions becoming more expensive to raise in Python code, not anything specific to generators. But, it may impact async codebases in a real way... I've not done enough async programming to know how common it is for apps to abandon "unfinished" coroutines, for instance.

@gvanrossum
Copy link
Member

gvanrossum commented Jan 19, 2023

Abandoning an unfinished coroutine is likely a bug -- most coroutines are wrapped in tasks and we usually complain about abandoned tasks, so users are used to seeing abandonment as an anti-pattern.

Throwing exceptions into coroutines must definitely work (e.g. cancellation) but I don't think it's a major speed concern.

I'm curious if the OP (@Onlooker2186) has found this while debugging a real perf regression (in production code)? Esp. since the scope of the regression is much narrower than the original comment implied -- it's not that all() as a whole is 30% slower, it's slower in the specific case where it returns False after a small number of iterations. (Or for any(), returning True.)

@Onlooker2186
Copy link
Author

Onlooker2186 commented Jan 19, 2023

@gvanrossum Yeah it was production code for a websocket client implementation to keep track of a large number of order books. That code in the opening post is used to identify key value pairs to after an "update" or "delete" event received from the websocket server, and is the main cause of regression in total performance for the overall implementation.

@gvanrossum
Copy link
Member

Got it. That must have been frustrating.

If you want a backport could you help ?

@Onlooker2186
Copy link
Author

Got it. That must have been frustrating.

If you want a backport could you help ?

For a backport, I probably wont be of much use since I don't know any C and if building Python etc assumes that knowledge. Purely just a user of Python so far :)

markshannon added a commit that referenced this issue Jan 24, 2023
…y. (GH-101013)

* Store exception stack depth in YIELD_VALUE's oparg and use it avoid expensive gen.throw() in gen.close() where possible.
@markshannon
Copy link
Member

I believe this is fixed.
@Onlooker2186 can you confirm?

@Onlooker2186
Copy link
Author

Onlooker2186 commented May 8, 2023

I believe this is fixed. @Onlooker2186 can you confirm?

Thanks for the update @markshannon . Should I be using 3.11.3 to test this?

Testing the all() function - the performance regression seems to still be there on 3.11.3. I've also put the timings for 3.12.0a7 below.

Python version = 3.10.9
Average time = 0.0008305954898474738
Best time = 0.0007323999889194965
Average transactions per second = 1203.9554900347878
Python version = 3.11.3
Average time = 0.0011983903799438849
Best time = 0.0010432000271975994
Average transactions per second = 834.4526264027799
Python version = 3.12.0a7
Average time = 0.0012928771821549162
Best time = 0.0011587999761104584
Average transactions per second = 773.4686742117607

@brandtbucher
Copy link
Member

brandtbucher commented May 10, 2023

This isn't fixed in 3.11... the PR (#101316) is failing for some reason.

It should be fixed in 3.12, though, since that's where the change landed.

@brandtbucher
Copy link
Member

(I think it landed in a5, so the comparison to test would be between 3.12.0a4 and 3.12.0a5.)

But I can see from your comment that things appear to have gotten worse in 3.12. Are these all release builds with the same build options (PGO, etc.)?

@brandtbucher
Copy link
Member

I can confirm that there still appears to be a regression:

Python version = 3.10.8+
Average time = 0.0010208233151759486
Best time = 0.0009102780022658408
Average transactions per second = 979.6014502545335

Python version = 3.11.1+
Average time = 0.0013533004376047758
Best time = 0.0012006000033579767
Average transactions per second = 738.934217571017

Python version = 3.12.0a7+
Average time = 0.0015747777904500254
Best time = 0.0013656059745699167
Average transactions per second = 635.0102256104522

@markshannon
Copy link
Member

What are those times for?

@brandtbucher
Copy link
Member

I just used the script from the very first comment on the issue.

@markshannon
Copy link
Member

There are a few things going on here.

First of all, the workaround is to inline the generator code.
This is much faster on all versions:

def find_by_keys2(
    keys: list[str],
    table: list[dict[str, str | int]],
    match_data: dict[str, str | int],
) -> dict[str, str | int] | None:
    for item in table:
        for k in keys:
            if item[k] != match_data[k]:
                break
        else:
            return item
    return None

The overhead of destroying the generator is much reduced in 3.12, but the cost of switching between C and Python code is increased in 3.11 compared to 3.10, and seems to be even worse in 3.12.

This will be much improved in future, as we optimize larger regions and convert all, any, etc. into bytecode.

iritkatriel added a commit to iritkatriel/cpython that referenced this issue Oct 19, 2023
@iritkatriel iritkatriel added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Nov 27, 2023
aisk pushed a commit to aisk/cpython that referenced this issue Feb 11, 2024
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Feb 22, 2024
(cherry picked from commit 0db2517)

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
iritkatriel added a commit that referenced this issue Feb 22, 2024
gh-100762: Fix optimization in gen_close  (GH-111069)
(cherry picked from commit 0db2517)

Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
iritkatriel added a commit to iritkatriel/cpython that referenced this issue Jun 13, 2024
Glyphack pushed a commit to Glyphack/cpython that referenced this issue Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

10 participants