Skip to content

[WIP] Fix partially overlapping types in Python 2 builtins #2311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions stdlib/2/__builtin__.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -351,17 +351,17 @@ class str(basestring, Sequence[str]):
@overload
def rpartition(self, sep: unicode) -> Tuple[unicode, unicode, unicode]: ...
@overload
def rsplit(self, sep: Optional[str] = ..., maxsplit: int = ...) -> List[str]: ...
def rsplit(self, sep: None = ..., maxsplit: int = ...) -> List[str]: ...
@overload
def rsplit(self, sep: unicode, maxsplit: int = ...) -> List[unicode]: ...
def rsplit(self, sep: AnyStr, maxsplit: int = ...) -> List[AnyStr]: ...
@overload
def rstrip(self, chars: str = ...) -> str: ...
@overload
def rstrip(self, chars: unicode) -> unicode: ...
@overload
def split(self, sep: Optional[str] = ..., maxsplit: int = ...) -> List[str]: ...
def split(self, sep: None = ..., maxsplit: int = ...) -> List[str]: ...
@overload
def split(self, sep: unicode, maxsplit: int = ...) -> List[unicode]: ...
def split(self, sep: AnyStr, maxsplit: int = ...) -> List[AnyStr]: ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The necessity of this change indicates that maybe we should re-think how we detect unsafe partial overlaps. Imagine this code that type checks cleanly:

N = TypeVar('N', int, float)   # to not worry about Python 2/3 str/bytes

class Bad:
    x: List[int] = []

    def boom(self) -> None:
        self.x[0] >> 1

    def wrap(self, x: N) -> List[N]:
        if isinstance(x, int):
            return self.x
        return [1]

surprise: float = 0  # we aim at the overlap
b = Bad()
b.wrap(surprise).append(0.1)
b.boom()  # TypeError: unsupported operand type(s) for >>: 'float' and 'int'

I think this is a bug in type checking the body of wrap (this is because of promotions so maybe it is even intentional). But the real problem is that for overloads the body is not really checked. So the above unsafe overlap is present in both old and new signatures. Moreover, I don't think there is any other kinds of unsagety in either signature (the problem is invariance of lists after all).

The problem with this, is that mypy essentially forces a user to use one unsafe signature instead of another equally unsafe signature. I could imagine some people may be not happy about this. I am not sure what to do here, one possible solution is to assume that all containers are covariant in the return types when checking for unsafe overloads, but this looks too ad-hoc. Another option is to only report the unsafe partial overlap of args only when the returns are totally disjoint.

Both options are not perfect, but I think this may be the case where practicality beats purity (since this is an actual strlib signature). What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, TBH this commit was just a quick hack so I could try using testpr -- I'm also dissatisfied with this change.

I agree that we need to make some sort of compromise here, but I also don't really know what exactly that compromise might look like.

Another kludge we could try is perhaps just disabling promotions altogether when checking for overlap in overloads -- I believe that's what the old implementation actually does. This would help smooth over the common cases, but wouldn't really help solve the bigger issue though.

The main downside is that actually implementing this is going to be somewhat painful: we'd need to propagate down whether we care about promotions down through is_subtype, modify the subtype cache to record this third type of subtyping info alongside the existing caches for is_subtype and is_subtype_proper...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilevkivskyi -- yeah, now that I've had a chance to play around with this, I think the root issue ultimately has to do with promotions -- the TypeVar and overload thing is sort of a red herring?

Specifically, the hole in your code sample emerges specifically because:

  1. We allow ints to act as a subtype of float when we define the surprise variable (because it's more convenient that way).
  2. We do not consider ints to be an instance of float in the isinstance check (because that matches the actual runtime behavior of Python).

This inconsistency is what ultimately causes mypy to not detect the bug above.

In contrast, consider the following version of your code which removes promotion weirdness altogether:

class FakeFloat: pass
class FakeInt(FakeFloat):
    def int_only(self) -> str: return "foo"

N = TypeVar('N', FakeFloat, FakeInt)

class Bad:
    x: List[FakeInt] = []

    def boom(self) -> None:
        self.x[0].int_only()

    def wrap(self, x: N) -> List[N]:
        if isinstance(x, FakeInt):
            return self.x
        return [FakeFloat()]

surprise: FakeFloat = FakeInt()  # we aim at the overlap
b = Bad()
b.wrap(surprise).append(FakeFloat())
b.boom() 

Mypy (both on master and my PR) will report an Incompatible return value type (got "List[FakeInt]", expected "List[FakeFloat]") for the return self.x line. (Admittedly, it's not the greatest error message in the world, but it's better then nothing...)

And if you tweak the code to try and break it again, mypy seems to continue to behave correctly in all cases.

Basically, I think mypy does typecheck code that uses TypeVars more precisely then overload implementations -- it's just that isinstance checks and promotions interact weirdly.

So taking all this into account, I think there are three options:

  1. We eliminate promotions altogether and type split using either overloads or TypeVars.

    This would remove the inconsistency (and also help us simplify mypy's codebase). The main downside, of course, is that this would break literally everybody's code/would cause mypy to no longer be consistent with PEP 484.

  2. We modify isinstance so it stops ignoring promotions and continue to type split using TypeVars.

    This is basically what my PR currently does -- the isinstance checks call my new overlap check, which just leaves promotions enabled (mostly because it was easier to implement that way).

    The main pro is that this would eliminate the inconsistency up above; the main downside is that this would cause mypy to no longer handle isinstance the same way as the Python runtime. This would also probably break some code, but probably not nearly as much as option 1.

  3. We modify the overlap checks so they ignore promotions when checking overloads, isinstance checks, and similar things. We also type split using overloads.

    The rationale here is that overloads are usually paired with isinstance checks, and so it makes sense to for them to handle promotions in the same way. This would be the least disruptive change -- the main downside is that the hole you identified in your example would still exist.

Option 3 seems like the most pragmatic choice -- I'm thinking we go for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still a bit overloaded with other things, so can't give a detailed answer now, but I think you are right, option 3 is probably the best one. I don't care much about the unsafety, promotions are clearly unsafe and it is kind of a deliberate choice.

def splitlines(self, keepends: bool = ...) -> List[str]: ...
def startswith(self, prefix: Union[unicode, Tuple[unicode, ...]]) -> bool: ...
@overload
Expand Down
8 changes: 4 additions & 4 deletions stdlib/2/builtins.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -351,17 +351,17 @@ class str(basestring, Sequence[str]):
@overload
def rpartition(self, sep: unicode) -> Tuple[unicode, unicode, unicode]: ...
@overload
def rsplit(self, sep: Optional[str] = ..., maxsplit: int = ...) -> List[str]: ...
def rsplit(self, sep: None = ..., maxsplit: int = ...) -> List[str]: ...
@overload
def rsplit(self, sep: unicode, maxsplit: int = ...) -> List[unicode]: ...
def rsplit(self, sep: AnyStr, maxsplit: int = ...) -> List[AnyStr]: ...
@overload
def rstrip(self, chars: str = ...) -> str: ...
@overload
def rstrip(self, chars: unicode) -> unicode: ...
@overload
def split(self, sep: Optional[str] = ..., maxsplit: int = ...) -> List[str]: ...
def split(self, sep: None = ..., maxsplit: int = ...) -> List[str]: ...
@overload
def split(self, sep: unicode, maxsplit: int = ...) -> List[unicode]: ...
def split(self, sep: AnyStr, maxsplit: int = ...) -> List[AnyStr]: ...
def splitlines(self, keepends: bool = ...) -> List[str]: ...
def startswith(self, prefix: Union[unicode, Tuple[unicode, ...]]) -> bool: ...
@overload
Expand Down