Skip to content

make_simplified_union: add caching and reduce allocations #12659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

huguesb
Copy link
Contributor

@huguesb huguesb commented Apr 23, 2022

make_simplified_union is used in a lot of places and therefore
accounts for a significant share to typechecking time. Based
on sample metrics gathered from a large real-world codebase
we can see that:

  1. the majority of inputs are already as simple as they're
    going to get, which means we can avoid allocation extra
    lists and return the input unchanged
  2. most of the cost of make_simplified_union comes from
    is_proper_subtype
  3. is_proper_subtype has some caching going on under the hood
    but it only applies to Instance, and cache hit rate is low
    in this particular case because, as per 1) above, items are
    in fact rarely subtypes of each other

To address 1, refactor make_simplified_union with an optimistic
fast path that avoid unnecessary allocations.

To address 2 & 3, introduce a cache to record the result of union
simplification.

These changes are observed to yield significant improvements in
a real-world codebase: a roughly 10-20% overall speedup, with
make_simplified_union/is_proper_subtype no longer showing up as
hotspots in the py-spy profile.

For #12526

@github-actions

This comment has been minimized.

make_simplified_union is used in a lot of places and therefore
accounts for a significant share to typechecking time. Based
on sample metrics gathered from a large real-world codebase
we can see that:
 1. the majority of inputs are already as simple as they're
    going to get, which means we can avoid allocation extra
    lists and return the input unchanged
 2. most of the cost of `make_simplified_union` comes from
    `is_proper_subtype`
 3. `is_proper_subtype` has some caching going on under the hood
    but it only applies to `Instance`, and cache hit rate is low
    in this particular case because, as per 1) above, items are
    in fact rarely subtypes of each other

To address 1, refactor `make_simplified_union` with an optimistic
fast path that avoid unnecessary allocations.

To address 2 & 3, introduce a cache to record the result of union
simplification.

These changes are observed to yield significant improvements in
a real-world codebase: a roughly 10-20% overall speedup, with
make_simplified_union/is_proper_subtype no longer showing up as
hotspots in the py-spy profile.

For python#12526
@huguesb huguesb force-pushed the pr-simplify-union branch from a64490e to e5a41c6 Compare April 23, 2022 06:31
return all_items


_simplified_union_cache: List[Dict[Tuple[ProperType, ...], ProperType]] = [
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this live in TypeState soit can be reset along with other caches?

@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

urllib3 (https://github.com/urllib3/urllib3)
+ src/urllib3/poolmanager.py:474: error: Unused "type: ignore" comment

@JukkaL
Copy link
Collaborator

JukkaL commented Apr 25, 2022

Thanks for the PR! Caching make_simplified_union results gives a very nice performance improvement indeed.

I looked into this in some detail recently, and there are few unfortunate things that probably block this until they are addressed:

  1. Some Type subclasses seem to have incorrect/missing/incomplete __eq__ and/or __hash__ methods. This should be pretty easy to fix and are just bugs. It would be good to also have some unit tests for these.
  2. The line and column attributes of Type are not using in type equality/hashing, but they might plausibly affect type checking results in edge cases (e.g. locations of errors).
  3. The can_be_true and can_be_false attributes are ignored in equality/hashing, but they might plausibly affect type checking.

The above limitations may also affect the subtype caching we already do, but it will be more benign, since we only cache a bool result. In contrast, here we cache the result type object, so deviations can have unpredictable results. For example, the cache may give us a type with some arbitrary line number, or a type with some arbitrary can_be_true value. I think that we need to address the above issues first before merging this. I'll add more detail in #12526. I don't think that will be very difficult to fix the issues, but I might not be trivial either.

@huguesb
Copy link
Contributor Author

huguesb commented Apr 30, 2022

1. Some `Type` subclasses seem to have incorrect/missing/incomplete `__eq__` and/or `__hash__` methods. This should be pretty easy to fix and are just bugs. It would be good to also have some unit tests for these.

That does seem rather important to fix. Do you already have a list of affected classes?

2. The `line` and `column` attributes of `Type` are not using in type equality/hashing, but they might plausibly affect type checking results in edge cases (e.g. locations of errors).

I was very curious about that and I looked at the code. There is only a tiny fraction of callers the actually pass line/column to make_simplified_union. Specifically:

  • check_overload_call in checkexpr.py
  • ExpandTypeVisitor.visit_union_type
  • false_only, true_only and true_or_false in typeops.py

It's not clear to me that these locations actually need the line/column to be added to the output of make_simplified_union.
As an experiment, I updated all call sites to remove these parameters and the tests still passed. I went ahead and made a PR to trigger a full CI run and mypy_primer came out clean: #12698

I think it might be worth getting rid of the line/column parameters to make_simplified_union entirely...

3. The `can_be_true` and `can_be_false` attributes are ignored in equality/hashing, but they might plausibly affect type checking.

Hmm, that's interesting, I would have assumed that those values were constants, or a worst deterministically derived from the arguments of each type's __init__ method. This too might take some refactoring...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants