Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling "tuple/any/all comprehensions" in the compiler. #545

Open
markshannon opened this issue Jan 23, 2023 · 6 comments
Open

Handling "tuple/any/all comprehensions" in the compiler. #545

markshannon opened this issue Jan 23, 2023 · 6 comments

Comments

@markshannon
Copy link
Member

markshannon commented Jan 23, 2023

Consider the following function containing a "tuple comprehension":

def f():
     return tuple(expr for var in seq)

Which produces the following code:

>>> dis.dis(f)
  1           0 RESUME                   0

  2           2 LOAD_GLOBAL              1 (NULL + tuple)
             14 LOAD_CONST               1 (<code object <genexpr> at 0x7f0074ed0d50, file "<stdin>", line 2>)
             16 MAKE_FUNCTION            0
             18 LOAD_GLOBAL              2 (seq)
             30 GET_ITER
             32 CALL                     0
             42 CALL                     1
             52 POP_TOP
             54 LOAD_CONST               0 (None)
             56 RETURN_VALUE

Disassembly of <code object <genexpr> at 0x7f0074ed0d50, file "<stdin>", line 2>:
  2           0 RETURN_GENERATOR
              2 POP_TOP
              4 RESUME                   0
              6 LOAD_FAST                0 (.0)
        >>    8 FOR_ITER                11 (to 34)
             12 STORE_FAST               1 (var)
             14 LOAD_GLOBAL              0 (expr)
             26 YIELD_VALUE              2
             28 RESUME                   1
             30 POP_TOP
             32 JUMP_BACKWARD           13 (to 8)
        >>   34 END_FOR
             36 LOAD_CONST               0 (None)
             38 RETURN_VALUE
        >>   40 CALL_INTRINSIC_1         3
             42 RERAISE                  1
ExceptionTable:
  4 to 38 -> 40 [0] lasti

What we would like is:

>>> dis.dis(f)
  RESUME                   0
  BUILD_LIST               0
  LOAD_GLOBAL              2 (seq)
  FOR_ITER
  STORE_FAST               1 (var)
  LOAD_GLOBAL              0 (expr)
  LIST_APPEND              2
  JUMP_BACKWARD  
  END_FOR
  LIST_TO_TUPLE
  RETURN_VALUE

For list comprehensions, we can use escape analysis and inlining to remove the overhead.
@carljm is working on doing just that.

However, the above "tuple comprehension" is resistant to escape analysis as tuple could hypothetically be anything.
We could use something like the approach described in @vstinner's FAT Python and convert:

def f():
     return tuple(expr for var in seq)

into:

def f():
    $tmp = tuple
    if $tmp is __the_builtin_tuple_type:
         return LIST_TO_TUPLE([expr for var in seq])
     return tuple(expr for var in seq)

where LIST_TO_TUPLE is a VM instruction, not a call.

With escape analysis and inlining, we should get the ideal code, with the small prefix:

    LOAD_GLOBAL 'tuple'
    LOAD_CONST tuple
    IS_OP
    POP_JUMP_IF_FALSE

We would expect the tier 2 optimizer to eliminate the prefix.

The above optimization also applies to any and all as well as tuple.
The potential for performance improvements is even greater in those cases as they tend to not exhaust the generator.

@itamaro
Copy link

itamaro commented May 29, 2023

Is this done now, with the pep-709 implementation as it landed?

@JelleZijlstra
Copy link
Contributor

No, PEP 709 only handles list/set/dict comprehensions, not generator expressions.

@iritkatriel iritkatriel self-assigned this Aug 2, 2023
@iritkatriel
Copy link
Collaborator

Unless we can implement LIST_TO_TUPLE in a way that doesn't double the necessary space, I think memory might be an issue for large outputs.

@markshannon
Copy link
Member Author

PySequence_Tuple also needs to perform a number of copies. It is mainly a matter of luck whether an additional allocation is needed at the end.

The main difference is likely to be the number of resizes building the list.
PySequence_Tuple doubles, but PyList_Append() only scales by 9/8.

@markshannon
Copy link
Member Author

1/n allocations and copying one pointer is going to be much cheaper than a round trip between PySequence_Tuple, the tower of gen_iternext, gen_send, ... and the interpreter.

@markshannon
Copy link
Member Author

With a growth factor of 9/8 we need 6 reallocations per doubling, so that's ~10 additional copies per element for a large tuple.
I suspect that will still be faster than the round-trip through PySequence_Tuple, but we should measure it.

We can look to change the growth factor for lists in another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants