-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
make_simplified_union: add caching and reduce allocations #12659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
make_simplified_union is used in a lot of places and therefore accounts for a significant share to typechecking time. Based on sample metrics gathered from a large real-world codebase we can see that: 1. the majority of inputs are already as simple as they're going to get, which means we can avoid allocation extra lists and return the input unchanged 2. most of the cost of `make_simplified_union` comes from `is_proper_subtype` 3. `is_proper_subtype` has some caching going on under the hood but it only applies to `Instance`, and cache hit rate is low in this particular case because, as per 1) above, items are in fact rarely subtypes of each other To address 1, refactor `make_simplified_union` with an optimistic fast path that avoid unnecessary allocations. To address 2 & 3, introduce a cache to record the result of union simplification. These changes are observed to yield significant improvements in a real-world codebase: a roughly 10-20% overall speedup, with make_simplified_union/is_proper_subtype no longer showing up as hotspots in the py-spy profile. For python#12526
a64490e
to
e5a41c6
Compare
return all_items | ||
|
||
|
||
_simplified_union_cache: List[Dict[Tuple[ProperType, ...], ProperType]] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this live in TypeState soit can be reset along with other caches?
Diff from mypy_primer, showing the effect of this PR on open source code: urllib3 (https://github.com/urllib3/urllib3)
+ src/urllib3/poolmanager.py:474: error: Unused "type: ignore" comment
|
Thanks for the PR! Caching make_simplified_union results gives a very nice performance improvement indeed. I looked into this in some detail recently, and there are few unfortunate things that probably block this until they are addressed:
The above limitations may also affect the subtype caching we already do, but it will be more benign, since we only cache a |
That does seem rather important to fix. Do you already have a list of affected classes?
I was very curious about that and I looked at the code. There is only a tiny fraction of callers the actually pass line/column to
It's not clear to me that these locations actually need the line/column to be added to the output of I think it might be worth getting rid of the line/column parameters to
Hmm, that's interesting, I would have assumed that those values were constants, or a worst deterministically derived from the arguments of each type's |
make_simplified_union is used in a lot of places and therefore
accounts for a significant share to typechecking time. Based
on sample metrics gathered from a large real-world codebase
we can see that:
going to get, which means we can avoid allocation extra
lists and return the input unchanged
make_simplified_union
comes fromis_proper_subtype
is_proper_subtype
has some caching going on under the hoodbut it only applies to
Instance
, and cache hit rate is lowin this particular case because, as per 1) above, items are
in fact rarely subtypes of each other
To address 1, refactor
make_simplified_union
with an optimisticfast path that avoid unnecessary allocations.
To address 2 & 3, introduce a cache to record the result of union
simplification.
These changes are observed to yield significant improvements in
a real-world codebase: a roughly 10-20% overall speedup, with
make_simplified_union/is_proper_subtype no longer showing up as
hotspots in the py-spy profile.
For #12526