-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Generalize subtype caches #5474
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalize subtype caches #5474
Conversation
Currently, the typestate module keeps track of two separate subtype caches: one for regular subtypes, and another for proper subtypes. This pattern unfortunately does not generalize well: for example, in the upcoming overloads PR, it ends up being useful to perform subtype checks *with promotions disabled*. The problem is that while performing a subtype check with promotions disabled, we cannot use or store information to the existing is_subtype without introducing subtle bugs and inconsistencies: if is_subtype previously decided that int is a subtype of float due to promotions, it wouldn't make sense to return that cached result with promotions disabled. One potential solution is to just add two more caches to TypeState. However, this solution does not generalize well, especially if we were to add additional variations of subtype checks in the future. Instead, this pull request modifies typestate so it can keep track of an arbitrary number of subtype caches, each of which is characterized by a 'SubtypeKind'. The 'SubtypeKind' is an int, which we obtain by taking the hash of a tuple containing a unique string per each subtype method, along with whatever additional parameters are active during that subtype check. So now, instead of hard-coding calls to 'record_subtype_cache_entry' or 'record_proper_subtype_cache_entry' and friends, we can just call 'record_subtype_cache_entry' and pass in the appropriate SubtypeKind to store the type info in the correct cache. Since manually juggling parameters and SubtypeKinds is a bit of a pain, this pull request also refactors the two subtype visitors so they call a private helper method that takes care of propagating the parameters instead of directly calling the `is_subtype` or `is_proper_subtype` top-level methods. NOTE: Currently, the mypy codebase does not attempt to perform subtype checks with promotions disabled. That means that this PR should not change the actual behavior of mypy, which is why there are no new tests. (My subsequent PRs will perform subtype checks w/ promotions disabled.)
This pull request adds more robust support for detecting partially overlapping types. Specifically, it detects overlaps with... 1. TypedDicts 2. Tuples 3. Unions 4. TypeVars 5. Generic types containing variations of the above. It also swaps out the code for detecting overlaps with operators and removes some associated (and now unused) code. This pull request builds on top of python#5474 and python#5475 -- once those two PRs are merged, I'll rebase this diff if necessary. This pull request also supercedes python#5475 -- that PR contains basically the same code as these three PRs, just smushed together.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beautiful! I just have three comments. Also did you double-check this works with internal codebases?
mypy/subtypes.py
Outdated
return True | ||
# NOTE: left.type.mro may be None in quick mode if there | ||
# was an error somewhere. | ||
if left.type.mro is not None: | ||
if not self.ignore_promotions and left.type.mro is not None: | ||
for base in left.type.mro: | ||
# TODO: Also pass recursively ignore_declared_variance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this TODO can be removed since _is_subtype
does this automatically.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
mypy/subtypes.py
Outdated
type_parameter_checker, | ||
ignore_pos_arg_names, | ||
ignore_declared_variance, | ||
ignore_promotions)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the hash
call really needed here? As I understand it will be used as a key (tuples can also be keys) so the hash will be called under the hood anyway. Or is it a performance trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was an attempt at optimization, but I forgot to test it before submitting this PR. After testing, it ended up making little-to-no difference, so I ended up just removing the hash.
mypy/subtypes.py
Outdated
elif isinstance(right, Instance): | ||
return is_subtype(left.fallback, right, | ||
ignore_pos_arg_names=self.ignore_pos_arg_names) | ||
return is_subtype(left.fallback, right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using is_subtype
instead of self._is_subtype
? If this is some special case, then it deserves a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah whoops -- that was an oversight. Fixed.
I've tried running #5476 against our internal codebases, but not this one specifically. I'll start the run + post an update once it finishes. |
@ilevkivskyi -- I've confirmed this PR does not introduce any new regressions to our codebases. (I ran the tests before fixing the merge conflict, but I'm pretty sure that change won't affect anything.) |
This pull request adds more robust support for detecting partially overlapping types. Specifically, it detects overlaps with... 1. TypedDicts 2. Tuples 3. Unions 4. TypeVars 5. Generic types containing variations of the above. It also swaps out the code for detecting overlaps with operators and removes some associated (and now unused) code. This pull request builds on top of python#5474 and python#5475 -- once those two PRs are merged, I'll rebase this diff if necessary. This pull request also supercedes python#5475 -- that PR contains basically the same code as these three PRs, just smushed together.
This pull request adds more robust support for detecting partially overlapping types. Specifically, it detects overlaps with... 1. TypedDicts 2. Tuples 3. Unions 4. TypeVars 5. Generic types containing variations of the above. It also swaps out the code for detecting overlaps with operators and removes some associated (and now unused) code. This pull request builds on top of python#5474 and python#5475 -- once those two PRs are merged, I'll rebase this diff if necessary. This pull request also supercedes python#5475 -- that PR contains basically the same code as these three PRs, just smushed together.
This pull request adds more robust support for detecting partially overlapping types. Specifically, it detects overlaps with... 1. TypedDicts 2. Tuples 3. Unions 4. TypeVars 5. Generic types containing variations of the above. This new overlapping type check is used for detecting unsafe overlaps with overloads and operator methods as well as for performing reachability/unreachability analysis in `isinstance` and `if x is None` checks and the like. This PR also removes some (now unused) code that used to be used for detecting overlaps with operators. This pull request builds on top of #5474 and #5475 and supersedes #5475.
Fixes #5514 The root cause is that #5474 added `type_parameter_checker` (a function) as one of the keys of subtype kinds. This function is a nested one, so in fact a _new_ function was created on every call to `is_subtype_ignoring_tvars()` causing many thousands parasitic keys created in subtype caches. In addition there are two smaller things listed below. The performance win is around 5% This PR: * Moves the mentioned function to global scope, this causes only 6 subtype kinds present during self-check. * Prevents similar regressions in future by switching only boolean flags as subtype kind keys. * Removes two one liners that stay in a very hot code path (cache lookup is hit 140K times during self-check). * Avoids creation of empty sets on cache look-up, create them only when writing to cache.
Fixes #5514 The root cause is that #5474 added `type_parameter_checker` (a function) as one of the keys of subtype kinds. This function is a nested one, so in fact a _new_ function was created on every call to `is_subtype_ignoring_tvars()` causing many thousands parasitic keys created in subtype caches. In addition there are two smaller things listed below. The performance win is around 5% This PR: * Moves the mentioned function to global scope, this causes only 6 subtype kinds present during self-check. * Prevents similar regressions in future by switching only boolean flags as subtype kind keys. * Removes two one liners that stay in a very hot code path (cache lookup is hit 140K times during self-check). * Avoids creation of empty sets on cache look-up, create them only when writing to cache.
Currently, the typestate module keeps track of two separate subtype caches: one for regular subtypes, and another for proper subtypes.
This pattern unfortunately does not generalize well: for example, in the upcoming overloads PR, it ends up being useful to perform subtype checks with promotions disabled.
The problem is that while performing a subtype check with promotions disabled, we cannot use or store information to the existing
is_subtype
function without introducing subtle bugs and inconsistencies: ifis_subtype
previously decided that int is a subtype of float due to promotions, it wouldn't make sense to return that cached result with promotions disabled.One potential solution is to just add two more caches to TypeState. However, this solution does not generalize well, especially if we were to add additional variations of subtype checks in the future.
Instead, this pull request modifies TypeState so it can keep track of an arbitrary number of subtype caches, each of which is characterized by a 'SubtypeKind'.
The 'SubtypeKind' is an int, which we obtain by taking the hash of a tuple containing a unique string per each subtype method, along with whatever additional parameters are active during that subtype check.
So now, instead of hard-coding calls to
record_subtype_cache_entry
orrecord_proper_subtype_cache_entry
and friends, we can just callrecord_subtype_cache_entry
and pass in the appropriate SubtypeKind to store the type info in the correct cache.Since manually juggling parameters and SubtypeKinds is a bit of a pain, this pull request also refactors the two subtype visitors so they call a private helper method that takes care of propagating the parameters instead of directly calling the
is_subtype
oris_proper_subtype
top-level methods.NOTE: Currently, the mypy codebase does not attempt to perform subtype checks with promotions disabled. That means that this PR should not change the actual behavior of mypy, which is why there are no new tests.
(My subsequent PRs will perform subtype checks w/ promotions disabled.)