[WIP] Fix partially overlapping types in Python 2 builtins #2311

Michael0x2a · 2018-07-06T07:51:51Z

This commit fixes the overloads for the rsplit and split methods in the Python 2 builtins so that they are no longer unsafely overlapping.

This PR is a WIP and should not yet be merged.

This commit fixes the overloads for the `rsplit` and `split` methods in the Python 2 builtins so that they are no longer unsafely overlapping. This PR is a WIP and should not yet be merged.

ilevkivskyi · 2018-07-06T22:23:44Z

stdlib/2/__builtin__.pyi

    @overload
-    def split(self, sep: unicode, maxsplit: int = ...) -> List[unicode]: ...
+    def split(self, sep: AnyStr, maxsplit: int = ...) -> List[AnyStr]: ...


The necessity of this change indicates that maybe we should re-think how we detect unsafe partial overlaps. Imagine this code that type checks cleanly:

N = TypeVar('N', int, float) # to not worry about Python 2/3 str/bytes class Bad: x: List[int] = [] def boom(self) -> None: self.x[0] >> 1 def wrap(self, x: N) -> List[N]: if isinstance(x, int): return self.x return [1] surprise: float = 0 # we aim at the overlap b = Bad() b.wrap(surprise).append(0.1) b.boom() # TypeError: unsupported operand type(s) for >>: 'float' and 'int'

I think this is a bug in type checking the body of wrap (this is because of promotions so maybe it is even intentional). But the real problem is that for overloads the body is not really checked. So the above unsafe overlap is present in both old and new signatures. Moreover, I don't think there is any other kinds of unsagety in either signature (the problem is invariance of lists after all).

The problem with this, is that mypy essentially forces a user to use one unsafe signature instead of another equally unsafe signature. I could imagine some people may be not happy about this. I am not sure what to do here, one possible solution is to assume that all containers are covariant in the return types when checking for unsafe overloads, but this looks too ad-hoc. Another option is to only report the unsafe partial overlap of args only when the returns are totally disjoint.

Both options are not perfect, but I think this may be the case where practicality beats purity (since this is an actual strlib signature). What do you think?

Yeah, TBH this commit was just a quick hack so I could try using testpr -- I'm also dissatisfied with this change.

I agree that we need to make some sort of compromise here, but I also don't really know what exactly that compromise might look like.

Another kludge we could try is perhaps just disabling promotions altogether when checking for overlap in overloads -- I believe that's what the old implementation actually does. This would help smooth over the common cases, but wouldn't really help solve the bigger issue though.

The main downside is that actually implementing this is going to be somewhat painful: we'd need to propagate down whether we care about promotions down through is_subtype, modify the subtype cache to record this third type of subtyping info alongside the existing caches for is_subtype and is_subtype_proper...

@ilevkivskyi -- yeah, now that I've had a chance to play around with this, I think the root issue ultimately has to do with promotions -- the TypeVar and overload thing is sort of a red herring?

Specifically, the hole in your code sample emerges specifically because:

We allow ints to act as a subtype of float when we define the surprise variable (because it's more convenient that way).

We do not consider ints to be an instance of float in the isinstance check (because that matches the actual runtime behavior of Python).

This inconsistency is what ultimately causes mypy to not detect the bug above.

In contrast, consider the following version of your code which removes promotion weirdness altogether:

class FakeFloat: pass class FakeInt(FakeFloat): def int_only(self) -> str: return "foo" N = TypeVar('N', FakeFloat, FakeInt) class Bad: x: List[FakeInt] = [] def boom(self) -> None: self.x[0].int_only() def wrap(self, x: N) -> List[N]: if isinstance(x, FakeInt): return self.x return [FakeFloat()] surprise: FakeFloat = FakeInt() # we aim at the overlap b = Bad() b.wrap(surprise).append(FakeFloat()) b.boom()

Mypy (both on master and my PR) will report an Incompatible return value type (got "List[FakeInt]", expected "List[FakeFloat]") for the return self.x line. (Admittedly, it's not the greatest error message in the world, but it's better then nothing...)

And if you tweak the code to try and break it again, mypy seems to continue to behave correctly in all cases.

Basically, I think mypy does typecheck code that uses TypeVars more precisely then overload implementations -- it's just that isinstance checks and promotions interact weirdly.

So taking all this into account, I think there are three options:

We eliminate promotions altogether and type split using either overloads or TypeVars.

This would remove the inconsistency (and also help us simplify mypy's codebase). The main downside, of course, is that this would break literally everybody's code/would cause mypy to no longer be consistent with PEP 484.

We modify isinstance so it stops ignoring promotions and continue to type split using TypeVars.

This is basically what my PR currently does -- the isinstance checks call my new overlap check, which just leaves promotions enabled (mostly because it was easier to implement that way).

The main pro is that this would eliminate the inconsistency up above; the main downside is that this would cause mypy to no longer handle isinstance the same way as the Python runtime. This would also probably break some code, but probably not nearly as much as option 1.

We modify the overlap checks so they ignore promotions when checking overloads, isinstance checks, and similar things. We also type split using overloads.

The rationale here is that overloads are usually paired with isinstance checks, and so it makes sense to for them to handle promotions in the same way. This would be the least disruptive change -- the main downside is that the hole you identified in your example would still exist.

Option 3 seems like the most pragmatic choice -- I'm thinking we go for that?

I am still a bit overloaded with other things, so can't give a detailed answer now, but I think you are right, option 3 is probably the best one. I don't care much about the unsafety, promotions are clearly unsafe and it is kind of a deliberate choice.

Michael0x2a · 2018-08-14T07:03:17Z

Closing -- I was able to figure out how to refine overloads + operators in a way that (apparently) requires no typeshed changes.

See python/mypy#5476 for additional context.

[WIP] Fix partially overlapping types in Python 2 builtins

2bb8e6b

This commit fixes the overloads for the `rsplit` and `split` methods in the Python 2 builtins so that they are no longer unsafely overlapping. This PR is a WIP and should not yet be merged.

ilevkivskyi reviewed Jul 6, 2018

View reviewed changes

Michael0x2a closed this Aug 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Fix partially overlapping types in Python 2 builtins #2311

[WIP] Fix partially overlapping types in Python 2 builtins #2311

Michael0x2a commented Jul 6, 2018

ilevkivskyi Jul 6, 2018

Michael0x2a Jul 6, 2018

Michael0x2a Jul 19, 2018

ilevkivskyi Jul 19, 2018

Michael0x2a commented Aug 14, 2018 •

edited

Loading

[WIP] Fix partially overlapping types in Python 2 builtins #2311

[WIP] Fix partially overlapping types in Python 2 builtins #2311

Conversation

Michael0x2a commented Jul 6, 2018

ilevkivskyi Jul 6, 2018

Choose a reason for hiding this comment

Michael0x2a Jul 6, 2018

Choose a reason for hiding this comment

Michael0x2a Jul 19, 2018

Choose a reason for hiding this comment

ilevkivskyi Jul 19, 2018

Choose a reason for hiding this comment

Michael0x2a commented Aug 14, 2018 • edited Loading

Michael0x2a commented Aug 14, 2018 •

edited

Loading