Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Re-work overload overlap logic (#17392)
Fixes #5510 OK, so I noticed during last couple years, that every other time I change something about type variables, a few unsafe overload overlap errors either appears or disappears. At some point I almost stopped looking at them. The problem is that unsafe overload overlap detection for generic callables is currently ad-hoc. However, as I started working on it, I discovered a bunch of foundational problems (and few smaller issues), so I decided to re-work the unsafe overload overlap detection. Here is a detailed summary: * Currently return type compatibility is decided using regular subtype check. Although it is technically correct, in most cases there is nothing wrong if first overload returns `list[Subtype]` and second returns `list[Supertype]`. All the unsafe overload story is about runtime values, not static types, so we should use `is_subset()` instead of `is_subtype()`, which is IIUC easy to implement: we simply need to consider all invariant types covariant. * Current implementation only checks for overlap between parameters, i.e. it checks if there are some calls that are valid for both overloads. But we also need to check that those common calls will not be always caught by the first overload. I assume it was not checked because, naively, we already check elsewhere that first overload doesn't completely shadow the second one. But this is not the same: first overload may be not more general overall, but when narrowed to common calls, it may be more general. Example of such false-positive (this is an oversimplified version of what is often used in situations with many optional positional arguments): ```python @overload def foo(x: object) -> object: ... @overload def foo(x: int = ...) -> int: ... ``` * Currently overlap for generic callables is decided using some weird two-way unification procedure, where we actually keep going on (with non-unified variables, and/or `<never>`) if the right to left unification fails. TBH I never understood this. What we need is to find some set of type variable values that makes two overloads unsafely overlapping. Constraint inference may be used as a (good) source of such guesses, but is not decisive in any way. So instead I simply try all combinations of upper bounds and values. The main benefit of such approach is that it is guaranteed false-positive free. If such algorithm finds an overlap it is definitely an overlap. There are however false negatives, but we can incrementally tighten them in the future. * I am making `Any` overlap nothing when considering overloads. Currently it overlaps everything (i.e. it is not different from `object`), but this violates the rule that replacing a precise type with `Any` should not generate an error. IOW I essentially treat `Any` as "too dynamic or not imported". * I extend `None` special-casing to be more uniform. Now essentially it only overlaps with explicitly optional types. This is important for descriptor-like signatures. * Finally, I did a cleanup in `is_overlapping_types()`, most notably flags were not passed down to various (recursive) helpers, and `ParamSpec`/`Parameters` were treated a bit arbitrary. Pros/cons of the outcome: * Pro: simple (even if not 100% accurate) mental model * Pro: all major classes of false positives eliminated * Pro: couple minor false negatives fixed * Con: two new false negatives added, more details below So here a two new false negatives and motivation on why I think they are OK. First example is ```python T = TypeVar("T") @overload def foo(x: str) -> int: ... @overload def foo(x: T) -> T: ... def foo(x): if isinstance(x, str): return 0 return x ``` This is obviously unsafe (consider `T = float`), but not flagged after this PR. I think this is ~fine for two reasons: * There is no good alternative for a user, the error is not very actionable. Using types like `(str | T) -> int | T` is a bad idea because unions with type variables are not only imprecise, but also highly problematic for inference. * The false negative is mostly affecting unbounded type variables, if a "suspicious" bound is used (like `bound=float` in this example), the error will be still reported. Second example is signatures like ```python @overload def foo(x: str, y: str) -> str: ... @overload def foo(*args: str) -> int: ... @overload def bar(*, x: str, y: str) -> str: ... @overload def bar(**kwds: str) -> int: ... ``` These are also unsafe because one can fool mypy with `x: tuple[str, ...] = ("x", "y"); foo(*x)` and `x: dict[str, str] = {"x": "x", "y": "y"}; bar(**x)`. I think this is OK because while such unsafe calls are quite rare, this kind of catch-all fallback as last overload is relatively common. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Shantanu <12621235+hauntsaninja@users.noreply.github.com>
- Loading branch information