Handling "tuple/any/all comprehensions" in the compiler. #545

markshannon · 2023-01-23T15:10:18Z

Consider the following function containing a "tuple comprehension":

def f():
     return tuple(expr for var in seq)

Which produces the following code:

>>> dis.dis(f)
  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (NULL + tuple)
             14 LOAD_CONST               1 (<code object <genexpr> at 0x7f0074ed0d50, file "<stdin>", line 2>)
             16 MAKE_FUNCTION            0
             18 LOAD_GLOBAL              2 (seq)
             30 GET_ITER
             32 CALL                     0
             42 CALL                     1
             52 POP_TOP
             54 LOAD_CONST               0 (None)
             56 RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f0074ed0d50, file "<stdin>", line 2>:
  2           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0
              6 LOAD_FAST                0 (.0)
        >>    8 FOR_ITER                11 (to 34)
             12 STORE_FAST               1 (var)
             14 LOAD_GLOBAL              0 (expr)
             26 YIELD_VALUE              2
             28 RESUME                   1
             30 POP_TOP
             32 JUMP_BACKWARD           13 (to 8)
        >>   34 END_FOR
             36 LOAD_CONST               0 (None)
             38 RETURN_VALUE
        >>   40 CALL_INTRINSIC_1         3
             42 RERAISE                  1
ExceptionTable:
  4 to 38 -> 40 [0] lasti

What we would like is:

>>> dis.dis(f)
  RESUME                   0
  BUILD_LIST               0
  LOAD_GLOBAL              2 (seq)
  FOR_ITER
  STORE_FAST               1 (var)
  LOAD_GLOBAL              0 (expr)
  LIST_APPEND              2
  JUMP_BACKWARD  
  END_FOR
  LIST_TO_TUPLE
  RETURN_VALUE

For list comprehensions, we can use escape analysis and inlining to remove the overhead.
@carljm is working on doing just that.

However, the above "tuple comprehension" is resistant to escape analysis as tuple could hypothetically be anything.
We could use something like the approach described in @vstinner's FAT Python and convert:

def f():
     return tuple(expr for var in seq)

into:

def f():
    $tmp = tuple
    if $tmp is __the_builtin_tuple_type:
         return LIST_TO_TUPLE([expr for var in seq])
     return tuple(expr for var in seq)

where LIST_TO_TUPLE is a VM instruction, not a call.

With escape analysis and inlining, we should get the ideal code, with the small prefix:

    LOAD_GLOBAL 'tuple'
    LOAD_CONST tuple
    IS_OP
    POP_JUMP_IF_FALSE

We would expect the tier 2 optimizer to eliminate the prefix.

The above optimization also applies to any and all as well as tuple.
The potential for performance improvements is even greater in those cases as they tend to not exhaust the generator.

The text was updated successfully, but these errors were encountered:

itamaro · 2023-05-29T23:18:48Z

Is this done now, with the pep-709 implementation as it landed?

JelleZijlstra · 2023-05-29T23:25:24Z

No, PEP 709 only handles list/set/dict comprehensions, not generator expressions.

iritkatriel · 2023-08-02T23:09:34Z

Unless we can implement LIST_TO_TUPLE in a way that doesn't double the necessary space, I think memory might be an issue for large outputs.

markshannon · 2023-08-03T08:17:58Z

PySequence_Tuple also needs to perform a number of copies. It is mainly a matter of luck whether an additional allocation is needed at the end.

The main difference is likely to be the number of resizes building the list.
PySequence_Tuple doubles, but PyList_Append() only scales by 9/8.

markshannon · 2023-08-03T08:21:09Z

1/n allocations and copying one pointer is going to be much cheaper than a round trip between PySequence_Tuple, the tower of gen_iternext, gen_send, ... and the interpreter.

markshannon · 2023-08-03T08:30:01Z

With a growth factor of 9/8 we need 6 reallocations per doubling, so that's ~10 additional copies per element for a large tuple.
I suspect that will still be faster than the round-trip through PySequence_Tuple, but we should measure it.

We can look to change the growth factor for lists in another issue.

markshannon mentioned this issue Jan 25, 2023

Inline dict/list/set comprehensions in the compiler for better performance python/cpython#97933

Closed

iritkatriel self-assigned this Aug 2, 2023

markshannon mentioned this issue Aug 30, 2023

Insert inline calls to "surrogate" functions during tier 2 region formation #619

Open

iritkatriel removed their assignment Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling "tuple/any/all comprehensions" in the compiler. #545

Handling "tuple/any/all comprehensions" in the compiler. #545

markshannon commented Jan 23, 2023 •

edited

Loading

itamaro commented May 29, 2023

JelleZijlstra commented May 29, 2023

iritkatriel commented Aug 2, 2023

markshannon commented Aug 3, 2023

markshannon commented Aug 3, 2023

markshannon commented Aug 3, 2023

Handling "tuple/any/all comprehensions" in the compiler. #545

Handling "tuple/any/all comprehensions" in the compiler. #545

Comments

markshannon commented Jan 23, 2023 • edited Loading

itamaro commented May 29, 2023

JelleZijlstra commented May 29, 2023

iritkatriel commented Aug 2, 2023

markshannon commented Aug 3, 2023

markshannon commented Aug 3, 2023

markshannon commented Aug 3, 2023

markshannon commented Jan 23, 2023 •

edited

Loading