Generalize subtype caches #5474

Michael0x2a · 2018-08-14T05:40:59Z

Currently, the typestate module keeps track of two separate subtype caches: one for regular subtypes, and another for proper subtypes.

This pattern unfortunately does not generalize well: for example, in the upcoming overloads PR, it ends up being useful to perform subtype checks with promotions disabled.

The problem is that while performing a subtype check with promotions disabled, we cannot use or store information to the existing is_subtype function without introducing subtle bugs and inconsistencies: if is_subtype previously decided that int is a subtype of float due to promotions, it wouldn't make sense to return that cached result with promotions disabled.

One potential solution is to just add two more caches to TypeState. However, this solution does not generalize well, especially if we were to add additional variations of subtype checks in the future.

Instead, this pull request modifies TypeState so it can keep track of an arbitrary number of subtype caches, each of which is characterized by a 'SubtypeKind'.

The 'SubtypeKind' is an int, which we obtain by taking the hash of a tuple containing a unique string per each subtype method, along with whatever additional parameters are active during that subtype check.

So now, instead of hard-coding calls to record_subtype_cache_entry or record_proper_subtype_cache_entry and friends, we can just call record_subtype_cache_entry and pass in the appropriate SubtypeKind to store the type info in the correct cache.

Since manually juggling parameters and SubtypeKinds is a bit of a pain, this pull request also refactors the two subtype visitors so they call a private helper method that takes care of propagating the parameters instead of directly calling the is_subtype or is_proper_subtype top-level methods.

NOTE: Currently, the mypy codebase does not attempt to perform subtype checks with promotions disabled. That means that this PR should not change the actual behavior of mypy, which is why there are no new tests.

(My subsequent PRs will perform subtype checks w/ promotions disabled.)

Currently, the typestate module keeps track of two separate subtype caches: one for regular subtypes, and another for proper subtypes. This pattern unfortunately does not generalize well: for example, in the upcoming overloads PR, it ends up being useful to perform subtype checks *with promotions disabled*. The problem is that while performing a subtype check with promotions disabled, we cannot use or store information to the existing is_subtype without introducing subtle bugs and inconsistencies: if is_subtype previously decided that int is a subtype of float due to promotions, it wouldn't make sense to return that cached result with promotions disabled. One potential solution is to just add two more caches to TypeState. However, this solution does not generalize well, especially if we were to add additional variations of subtype checks in the future. Instead, this pull request modifies typestate so it can keep track of an arbitrary number of subtype caches, each of which is characterized by a 'SubtypeKind'. The 'SubtypeKind' is an int, which we obtain by taking the hash of a tuple containing a unique string per each subtype method, along with whatever additional parameters are active during that subtype check. So now, instead of hard-coding calls to 'record_subtype_cache_entry' or 'record_proper_subtype_cache_entry' and friends, we can just call 'record_subtype_cache_entry' and pass in the appropriate SubtypeKind to store the type info in the correct cache. Since manually juggling parameters and SubtypeKinds is a bit of a pain, this pull request also refactors the two subtype visitors so they call a private helper method that takes care of propagating the parameters instead of directly calling the `is_subtype` or `is_proper_subtype` top-level methods. NOTE: Currently, the mypy codebase does not attempt to perform subtype checks with promotions disabled. That means that this PR should not change the actual behavior of mypy, which is why there are no new tests. (My subsequent PRs will perform subtype checks w/ promotions disabled.)

This pull request adds more robust support for detecting partially overlapping types. Specifically, it detects overlaps with... 1. TypedDicts 2. Tuples 3. Unions 4. TypeVars 5. Generic types containing variations of the above. It also swaps out the code for detecting overlaps with operators and removes some associated (and now unused) code. This pull request builds on top of python#5474 and python#5475 -- once those two PRs are merged, I'll rebase this diff if necessary. This pull request also supercedes python#5475 -- that PR contains basically the same code as these three PRs, just smushed together.

ilevkivskyi

This is beautiful! I just have three comments. Also did you double-check this works with internal codebases?

ilevkivskyi · 2018-08-14T15:16:46Z

mypy/subtypes.py

                return True
            # NOTE: left.type.mro may be None in quick mode if there
            # was an error somewhere.
-            if left.type.mro is not None:
+            if not self.ignore_promotions and left.type.mro is not None:
                for base in left.type.mro:
                    # TODO: Also pass recursively ignore_declared_variance


I think this TODO can be removed since _is_subtype does this automatically.

ilevkivskyi · 2018-08-14T15:22:31Z

mypy/subtypes.py

+                     type_parameter_checker,
+                     ignore_pos_arg_names,
+                     ignore_declared_variance,
+                     ignore_promotions))


Is the hash call really needed here? As I understand it will be used as a key (tuples can also be keys) so the hash will be called under the hood anyway. Or is it a performance trick?

It was an attempt at optimization, but I forgot to test it before submitting this PR. After testing, it ended up making little-to-no difference, so I ended up just removing the hash.

ilevkivskyi · 2018-08-14T15:26:24Z

mypy/subtypes.py

        elif isinstance(right, Instance):
-            return is_subtype(left.fallback, right,
-                              ignore_pos_arg_names=self.ignore_pos_arg_names)
+            return is_subtype(left.fallback, right)


Why are you using is_subtype instead of self._is_subtype? If this is some special case, then it deserves a comment.

Ah whoops -- that was an oversight. Fixed.

Michael0x2a · 2018-08-14T19:10:12Z

I've tried running #5476 against our internal codebases, but not this one specifically. I'll start the run + post an update once it finishes.

Michael0x2a · 2018-08-14T20:32:57Z

@ilevkivskyi -- I've confirmed this PR does not introduce any new regressions to our codebases. (I ran the tests before fixing the merge conflict, but I'm pretty sure that change won't affect anything.)

This pull request adds more robust support for detecting partially overlapping types. Specifically, it detects overlaps with... 1. TypedDicts 2. Tuples 3. Unions 4. TypeVars 5. Generic types containing variations of the above. It also swaps out the code for detecting overlaps with operators and removes some associated (and now unused) code. This pull request builds on top of python#5474 and python#5475 -- once those two PRs are merged, I'll rebase this diff if necessary. This pull request also supercedes python#5475 -- that PR contains basically the same code as these three PRs, just smushed together.

This pull request adds more robust support for detecting partially overlapping types. Specifically, it detects overlaps with... 1. TypedDicts 2. Tuples 3. Unions 4. TypeVars 5. Generic types containing variations of the above. This new overlapping type check is used for detecting unsafe overlaps with overloads and operator methods as well as for performing reachability/unreachability analysis in `isinstance` and `if x is None` checks and the like. This PR also removes some (now unused) code that used to be used for detecting overlaps with operators. This pull request builds on top of #5474 and #5475 and supersedes #5475.

Fixes #5514 The root cause is that #5474 added `type_parameter_checker` (a function) as one of the keys of subtype kinds. This function is a nested one, so in fact a _new_ function was created on every call to `is_subtype_ignoring_tvars()` causing many thousands parasitic keys created in subtype caches. In addition there are two smaller things listed below. The performance win is around 5% This PR: * Moves the mentioned function to global scope, this causes only 6 subtype kinds present during self-check. * Prevents similar regressions in future by switching only boolean flags as subtype kind keys. * Removes two one liners that stay in a very hot code path (cache lookup is hit 140K times during self-check). * Avoids creation of empty sets on cache look-up, create them only when writing to cache.

Michael0x2a requested a review from ilevkivskyi August 14, 2018 05:40

Michael0x2a mentioned this pull request Aug 14, 2018

Add partial overload checks #5476

Merged

ilevkivskyi approved these changes Aug 14, 2018

View reviewed changes

Respond to code review

753b5a0

Merge branch 'master' into generalize-subtype-cache

a30c76d

ilevkivskyi merged commit a2ffde9 into python:master Aug 14, 2018

msullivan mentioned this pull request Aug 20, 2018

Perf regression caused by #5474 #5514

Closed

Michael0x2a deleted the generalize-subtype-cache branch August 27, 2018 05:46

ilevkivskyi mentioned this pull request Sep 11, 2018

Fix small regression in subtype cache performance #5604

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize subtype caches #5474

Generalize subtype caches #5474

Michael0x2a commented Aug 14, 2018

ilevkivskyi left a comment

ilevkivskyi Aug 14, 2018

Michael0x2a Aug 14, 2018

ilevkivskyi Aug 14, 2018

Michael0x2a Aug 14, 2018

ilevkivskyi Aug 14, 2018

Michael0x2a Aug 14, 2018

Michael0x2a commented Aug 14, 2018

Michael0x2a commented Aug 14, 2018

Generalize subtype caches #5474

Generalize subtype caches #5474

Conversation

Michael0x2a commented Aug 14, 2018

ilevkivskyi left a comment

Choose a reason for hiding this comment

ilevkivskyi Aug 14, 2018

Choose a reason for hiding this comment

Michael0x2a Aug 14, 2018

Choose a reason for hiding this comment

ilevkivskyi Aug 14, 2018

Choose a reason for hiding this comment

Michael0x2a Aug 14, 2018

Choose a reason for hiding this comment

ilevkivskyi Aug 14, 2018

Choose a reason for hiding this comment

Michael0x2a Aug 14, 2018

Choose a reason for hiding this comment

Michael0x2a commented Aug 14, 2018

Michael0x2a commented Aug 14, 2018