Skip to content

Better uop coverage in the JIT optimizer #131798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
brandtbucher opened this issue Mar 27, 2025 · 16 comments
Open

Better uop coverage in the JIT optimizer #131798

brandtbucher opened this issue Mar 27, 2025 · 16 comments
Assignees
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT type-feature A feature request or enhancement

Comments

@brandtbucher
Copy link
Member

brandtbucher commented Mar 27, 2025

Out of 263 total uops, 155 of these are ignored by the tier two optimizer. These represent over half of all uops by dynamic execution count.

This issue will serve as a checklist for auditing these missing uops, and adding them where they make sense. At first glance, there's quite a bit of potential here... especially around ability to narrow known output types (like _CONTAINS_OP_SET), and the ability to narrow and remove guards on input types (like _BINARY_OP_SUBSCR_LIST_INT). As I'm going through, I'll cross out anything that doesn't seem like it makes sense to add.

First, here are the 53 missing uops that each represent at least 0.1% of all uops executed:

  • _SET_IP (12.1%)
  • _CHECK_VALIDITY (10.1%)
  • _CHECK_VALIDITY_AND_SET_IP (6.5%)
  • _CHECK_PERIODIC (3.1%)
  • _MAKE_WARM (2.8%)
  • _START_EXECUTOR (1.7%)
  • _GUARD_NOS_INT (1.5%)
  • _BINARY_OP_SUBSCR_LIST_INT (1.0%)
  • _CHECK_FUNCTION (1.0%)
  • _CHECK_MANAGED_OBJECT_HAS_VALUES (0.7%)
  • _ITER_CHECK_LIST (0.7%)
  • _CONTAINS_OP_SET (0.6%)
  • _FOR_ITER_TIER_TWO (0.6%)
  • _GUARD_NOT_EXHAUSTED_LIST (0.6%)
  • _ITER_NEXT_LIST_TIER_TWO (0.6%)
  • _SAVE_RETURN_OFFSET (0.6%)
  • _CALL_LEN (0.5%)
  • _CALL_LIST_APPEND (0.5%)
  • _POP_TOP (0.5%)
  • _RESUME_CHECK (0.5%)
  • _BINARY_OP_SUBSCR_STR_INT (0.4%)
  • _GUARD_DORV_VALUES_INST_ATTR_FROM_DICT (0.4%)
  • _GUARD_KEYS_VERSION (0.4%)
  • _BINARY_OP_SUBSCR_DICT (0.3%)
  • _CALL_BUILTIN_FAST (0.3%)
  • _CHECK_STACK_SPACE_OPERAND (0.3%)
  • _GET_ITER (0.3%)
  • _STORE_SUBSCR (0.3%)
  • _GUARD_NOT_EXHAUSTED_RANGE (0.2%)
  • _BINARY_SLICE (0.2%)
  • _BUILD_LIST (0.2%)
  • _CALL_BUILTIN_O (0.2%)
  • _CALL_NON_PY_GENERAL (0.2%)
  • _CHECK_IS_NOT_PY_CALLABLE (0.2%)
  • _GUARD_NOS_FLOAT (0.2%)
  • _ITER_CHECK_RANGE (0.2%)
  • _ITER_CHECK_TUPLE (0.2%)
  • _LOAD_DEREF (0.2%)
  • _STORE_SUBSCR_LIST_INT (0.2%)
  • _BINARY_OP_EXTEND (0.1%)
  • _CALL_ISINSTANCE (0.1%)
  • _CALL_METHOD_DESCRIPTOR_FAST (0.1%)
  • _CALL_METHOD_DESCRIPTOR_FAST_WITH_KEYWORDS (0.1%)
  • _CALL_METHOD_DESCRIPTOR_NOARGS (0.1%)
  • _CALL_TYPE_1 (0.1%)
  • _CHECK_ATTR_CLASS (0.1%)
  • _CONTAINS_OP_DICT (0.1%)
  • _GUARD_BINARY_OP_EXTEND (0.1%)
  • _GUARD_NOT_EXHAUSTED_TUPLE (0.1%)
  • _ITER_NEXT_TUPLE (0.1%)
  • _LIST_APPEND (0.1%)
  • _STORE_ATTR_SLOT (0.1%)
  • _STORE_SUBSCR_DICT (0.1%)

And here are the 102 missing uops that are less than 0.1%. These are less important, but still may net us some wins on individual benchmarks:

  • _BINARY_OP_SUBSCR_CHECK_FUNC
  • _BINARY_OP_SUBSCR_TUPLE_INT
  • _BUILD_MAP
  • _BUILD_SET
  • _BUILD_SLICE
  • _BUILD_STRING
  • _CALL_BUILTIN_CLASS
  • _CALL_BUILTIN_FAST_WITH_KEYWORDS
  • _CALL_INTRINSIC_1
  • _CALL_INTRINSIC_2
  • _CALL_KW_NON_PY
  • _CALL_METHOD_DESCRIPTOR_O
  • _CALL_STR_1
  • _CALL_TUPLE_1
  • _CHECK_ATTR_METHOD_LAZY_DICT
  • _CHECK_EG_MATCH
  • _CHECK_EXC_MATCH
  • _CHECK_FUNCTION_VERSION_INLINE
  • _CHECK_FUNCTION_VERSION_KW
  • _CHECK_IS_NOT_PY_CALLABLE_KW
  • _CHECK_METHOD_VERSION
  • _CHECK_METHOD_VERSION_KW
  • _CHECK_PERIODIC_IF_NOT_YIELD_FROM
  • _CONVERT_VALUE
  • _COPY_FREE_VARS
  • _DELETE_ATTR
  • _DELETE_DEREF
  • _DELETE_FAST
  • _DELETE_GLOBAL
  • _DELETE_NAME
  • _DELETE_SUBSCR
  • _DEOPT
  • _DICT_MERGE
  • _DICT_UPDATE
  • _END_FOR
  • _END_SEND
  • _ERROR_POP_N
  • _EXIT_INIT_CHECK
  • _EXPAND_METHOD
  • _EXPAND_METHOD_KW
  • _FATAL_ERROR
  • _FORMAT_SIMPLE
  • _FORMAT_WITH_SPEC
  • _GET_AITER
  • _GET_ANEXT
  • _GET_AWAITABLE
  • _GET_LEN
  • _GET_YIELD_FROM_ITER
  • _GUARD_DORV_NO_DICT
  • _GUARD_GLOBALS_VERSION
  • _GUARD_TOS_FLOAT
  • _GUARD_TOS_INT
  • _GUARD_TYPE_VERSION_AND_LOCK
  • _IMPORT_FROM
  • _IMPORT_NAME
  • _IS_NONE
  • _LIST_EXTEND
  • _LOAD_ATTR_NONDESCRIPTOR_NO_DICT
  • _LOAD_ATTR_NONDESCRIPTOR_WITH_VALUES
  • _LOAD_BUILD_CLASS
  • _LOAD_COMMON_CONSTANT
  • _LOAD_FAST_LOAD_FAST
  • _LOAD_FROM_DICT_OR_DEREF
  • _LOAD_GLOBAL
  • _LOAD_GLOBAL_BUILTINS
  • _LOAD_GLOBAL_MODULE
  • _LOAD_LOCALS
  • _LOAD_NAME
  • _LOAD_SUPER_ATTR_ATTR
  • _LOAD_SUPER_ATTR_METHOD
  • _MAKE_CALLARGS_A_TUPLE
  • _MAKE_CELL
  • _MAKE_FUNCTION
  • _MAP_ADD
  • _MATCH_CLASS
  • _MATCH_KEYS
  • _MATCH_MAPPING
  • _MATCH_SEQUENCE
  • _MAYBE_EXPAND_METHOD_KW
  • _NOP
  • _POP_EXCEPT
  • _POP_TWO_LOAD_CONST_INLINE_BORROW
  • _PUSH_EXC_INFO
  • _PUSH_NULL_CONDITIONAL
  • _SETUP_ANNOTATIONS
  • _SET_ADD
  • _SET_FUNCTION_ATTRIBUTE
  • _SET_UPDATE
  • _STORE_ATTR
  • _STORE_ATTR_INSTANCE_VALUE
  • _STORE_ATTR_WITH_HINT
  • _STORE_DEREF
  • _STORE_FAST_LOAD_FAST
  • _STORE_FAST_STORE_FAST
  • _STORE_GLOBAL
  • _STORE_NAME
  • _STORE_SLICE
  • _TIER2_RESUME_CHECK
  • _UNARY_INVERT
  • _UNARY_NEGATIVE
  • _UNPACK_SEQUENCE_LIST
  • _WITH_EXCEPT_START

Linked PRs

@brandtbucher
Copy link
Member Author

brandtbucher commented Apr 1, 2025

@diegorusso is going to add _CALL_LEN.

@brandtbucher
Copy link
Member Author

brandtbucher commented Apr 3, 2025

@fluhus is going to add _BINARY_SLICE.

@brandtbucher
Copy link
Member Author

@Klaus117 is going to improve _TO_BOOL_INT.

@brandtbucher
Copy link
Member Author

@Zheaoli is going to add _CONTAINS_OP_DICT.

Zheaoli added a commit to Zheaoli/cpython that referenced this issue Apr 8, 2025
…CONTAINS_OP_DICT

Signed-off-by: Manjusaka <me@manjusaka.me>
Zheaoli added a commit to Zheaoli/cpython that referenced this issue Apr 8, 2025
…CONTAINS_OP_DICT

Signed-off-by: Manjusaka <me@manjusaka.me>
brandtbucher pushed a commit that referenced this issue Apr 8, 2025
@Zheaoli
Copy link
Contributor

Zheaoli commented Apr 8, 2025

I think I can work on _BINARY_OP_SUBSCR_LIST_INT and _BINARY_OP_SUBSCR_DICT

@brandtbucher
Copy link
Member Author

I think I can work on _BINARY_OP_SUBSCR_LIST_INT and _BINARY_OP_SUBSCR_DICT

Sorry, I already have a branch to do the guards for these (and a couple others) that I was going to up in a minute! I'll tag you for review though.

@brandtbucher
Copy link
Member Author

@tomasr8 is going to add _CALL_STR_1, _CALL_TUPLE_1, and _CALL_TYPE_1.

@Zheaoli, want to take _BUILD_LIST, _BUILD_MAP, _BUILD_SET, _BUILD_SLICE, and _BUILD_STRING? For all but _BUILD_SLICE and _BUILD_STRING, I think we can only set the type of the output (sym_new_type). For _BUILD_SLICE and _BUILD_STRING, we may be able to have a constant output if the items are constant (sym_is_const/sym_get_const/sym_new_const). Maybe one PR for the first three, and separate PRs for _BUILD_SLICE and _BUILD_STRING?

@Zheaoli
Copy link
Contributor

Zheaoli commented Apr 10, 2025

want to take _BUILD_LIST, _BUILD_MAP, _BUILD_SET, _BUILD_SLICE, and _BUILD_STRING? For all but _BUILD_SLICE and _BUILD_STRING, I think we can only set the type of the output (sym_new_type). For _BUILD_SLICE and _BUILD_STRING, we may be able to have a constant output if the items are constant

100% want to(lol

I'm on the way

@Funnyyanne
Copy link

I want to join , I am the starter of python, I've only ever written test scripts in python at work, and have seen its charms. I should start reading from that part of my knowledge, if I may.

@splasky
Copy link

splasky commented Apr 10, 2025

How can I join this?

@runzh-crypto
Copy link

Not sure I can do it well, but I would like to try my best :)

@brandtbucher
Copy link
Member Author

Sorry, I think at this point we already have enough people working on this issue (it's also a pretty tough one if you've never contributed to CPython before). Thanks for the interest in helping out, though!

@Zheaoli
Copy link
Contributor

Zheaoli commented Apr 11, 2025

@Zheaoli, want to take _BUILD_LIST, _BUILD_MAP, _BUILD_SET, _BUILD_SLICE, and _BUILD_STRING? For all but _BUILD_SLICE and _BUILD_STRING, I think we can only set the type of the output (sym_new_type). For _BUILD_SLICE and _BUILD_STRING, we may be able to have a constant output if the items are constant (sym_is_const/sym_get_const/sym_new_const). Maybe one PR for the first three, and separate PRs for _BUILD_SLICE and _BUILD_STRING?

Sorry for delay in days, I might have some issues here. I have noticed that we have a similar uop code before, https://github.com/python/cpython/blob/main/Python/optimizer_bytecodes.c#L918-L920. It's seems create in https://github.com/python/cpython/pull/128940/files

I'm not sure we need to create sym_new_list like sym_new_tuple or we just need sym_new_type(ctx, &PyList_Type);

@brandtbucher
Copy link
Member Author

brandtbucher commented Apr 11, 2025

I'm not sure we need to create sym_new_list like sym_new_tuple or we just need sym_new_type(ctx, &PyList_Type);

For these, we just want to set the type. The symbolic tuple representation is a special case, since tuples are immutable. Tracking the values in a list is much harder, since the list can change in many different ways. For example:

t = (a, b)
l = [a, b]
foo(t, l)
x = t[0]  # The JIT can prove that x is a.
y = l[0]  # l may have been mutated by foo, so we can't prove anything.

The symbolic tuple representation allows us to track symbolic values inside of a tuple, even if they aren't constant. For instance, if we later guard that x is an int, then the JIT has also proven that a is an int as well (and vice-versa!). We may choose to (very carefully!) track the contents of mutable containers or object attributes in the future, but in general this is much harder to do correctly, and pays off in fewer cases.

@Zheaoli
Copy link
Contributor

Zheaoli commented Apr 11, 2025

For these, we just want to set the type. The symbolic tuple representation is a special case, since tuples are immutable. Tracking the values in a list is much harder, since the list can change in many different ways. For example:

I see, for now, we just need to replace sym_new_not_null to sym_new_type, it can get some performance improvement(maybe)

----Following discussion is just personal thought---

The symbolic tuple representation allows us to track symbolic values inside of a tuple, even if they aren't constant. For instance, if we later guard that x is an int, then the JIT has also proven that a is an int as well (and vice-versa!). We may choose to (very carefully!) track the contents of mutable containers or object attributes in the future, but in general this is much harder to do correctly, and pays off in fewer cases.

I got your point. So, Mark setup the sym_new_tuple here because it's immutable. In other words, the data built from _BUILD_TUPLE is predictable. So we can setup extra optimization path for tuple. But for mutable type like set, map, list , the optimization path is dangerous and not useful

I'm not sure my thought is right, correct me if I'm wrong plz.

@brandtbucher
Copy link
Member Author

Yep, your understanding is correct!

Zheaoli added a commit to Zheaoli/cpython that referenced this issue Apr 12, 2025
… _BUILD_LIST, _BUILD_SET, _BUILD_MAP

Signed-off-by: Manjusaka <me@manjusaka.me>
Zheaoli added a commit to Zheaoli/cpython that referenced this issue Apr 12, 2025
… _BUILD_LIST, _BUILD_SET, _BUILD_MAP

Signed-off-by: Manjusaka <me@manjusaka.me>
Zheaoli added a commit to Zheaoli/cpython that referenced this issue Apr 15, 2025
Signed-off-by: Manjusaka <me@manjusaka.me>
Zheaoli added a commit to Zheaoli/cpython that referenced this issue Apr 15, 2025
Signed-off-by: Manjusaka <me@manjusaka.me>
Zheaoli added a commit to Zheaoli/cpython that referenced this issue Apr 15, 2025
Signed-off-by: Manjusaka <me@manjusaka.me>
Zheaoli added a commit to Zheaoli/cpython that referenced this issue Apr 15, 2025
Signed-off-by: Manjusaka <me@manjusaka.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage topic-JIT type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

6 participants