Generic inference regression from #5699 #5738

msullivan · 2018-10-04T18:52:33Z

Since #5699, mypy now rejects the following program (minimized from a few failures at dropbox):

from typing import TypeVar, Callable, Tuple

T = TypeVar('T')

def f(x: Callable[..., T]) -> T:
    return x()

x: Tuple[str, ...] = f(tuple)

with the error message
Incompatible types in assignment (expression has type "Tuple[_T_co, ...]", variable has type "Tuple[str, ...]")

@ilevkivskyi thoughts?

To be honest I'm not totally sure how we used to get this right, since my understanding was that we don't do the sort of instantiation of all type variables and unification that I would expect to be necessary?

The text was updated successfully, but these errors were encountered:

ilevkivskyi · 2018-10-04T20:18:09Z

OK, so here is the diagnosis: I remember that I checked that our repos are clean with that PR, but later @JukkaL proposed to only use the outer context invariant instances, which I didn't check. The point is that tuple is covariant (and Tuple[str, ...] translates to an instance type), so we previously (successfully) used it for inference, but not anymore.

It looks however, there is another bug this uncovered, because currently it should have inferred <nothing>, or failed with a reasonable error, like "Cannot infer type variable", instead of leaking type variable.

I believe this may be a manifestation of #1317 which is another high priority issue I always wanted to fix.

ilevkivskyi · 2018-10-04T20:28:21Z

Yes, I am pretty sure this is a duplicate of #1317, here is a similar repro:

[case testCallableArgInference]
from typing import TypeVar, Callable

T = TypeVar('T')

def f(x: Callable[..., T]) -> T:
    return x()

def tpl(x: T) -> T:  # Really any generic thing (class or just function)
    return x

reveal_type(f(tpl))  # Revealed type is 'T`-1' (???)
[out]

I can try looking into this more today and tomorrow, but I can't guarantee it is easy to fix.

ilevkivskyi · 2018-10-05T11:02:33Z

Continuing thinking about this and #1317 it seems there are actually two aspects here:

Fix leaking type variables (which I find quite annoying)
Support inferring generic types (which is more relevant for original use case in Generic decorators don't work with generic functions #1317), because currently inferred types can't have free type variables.

Fixing the first one could be easy, while the second point may be a bit tricky. Currently mypy gets almost there:

from typing import TypeVar, Callable, List

T = TypeVar('T')
S = TypeVar('S')
U = TypeVar('U')

def dec(f: Callable[[S], T]) -> Callable[[S], List[T]]: ...

def f(x: U) -> U: ...

g = dec(f)

reveal_type(g)  # Revealed type is 'def (U`-1) -> builtins.list[U`-1]'

we just need to make it def [U] ....

@JukkaL what do you think?

JukkaL · 2018-10-05T13:09:58Z

I agree that leaking type variables probably isn't hard to fix, once we figure out what the correct behavior would be.

I can't say how hard it would be to infer generic types. This would need some research.

gvanrossum · 2018-10-05T17:23:27Z

I'm not sure this needs to be ready for the release even if it technically is a regression -- IIUC the code that used to pass was questionable at best, right?

ilevkivskyi · 2018-10-05T17:30:44Z

IIUC the code that used to pass was questionable at best, right?

It is not incorrect although. I will try a bit more to see if there is a simple fix, I will have nothing by the end of the day, I would propose to just not worry about this for now.

euresti · 2018-10-17T17:41:23Z

This also affects a common usage of attrs:

import attr
from typing import Sequence

x: Sequence[int] = attr.Factory(list)  # Incompatible types in assignment (expression has type "List[_T]", variable has type "Sequence[int]")

Unsuprisingly, attr.Factory is defined as follows:

def Factory(factory: Callable[[], _T]) -> _T: ...

However

x: List[int] = attr.Factory(list)

is ok.

Mitigates #5738 by reverting heuristics for type variable return vs instance type outer context changed in #5699. Both versions are kind of unprincipled, but the old one _seems_ to work better in real codebases. This doesn't touch the core fixes in #5699

ilevkivskyi · 2019-07-14T23:06:10Z

I spent couple days thinking and investigating this and related issues (in total I think we have around couple dozen issues that are duplicates or related to either this or #1317). Here are results of the analysis.

Preliminary note on terminology. Just to avoid confusion about what is meant by "generic type" I want to recall that there are two different things called this way: type constructors and polymorphic types. It is better to illustrate this on an example of callback protocols:

T = TypeVar('T')
α = TypeVar('α')

class Contructor(Protocol[T]):
    def __call__(self, x: T) -> T: ...

class Poly(Protocol):
    def __call__(self, x: α) -> α: ...

Type constructors expect type arguments while polymorphic types don't (I am not sure if these are the "official" terms but let's use them). The type parameters of the latter are expected to be inferred if it appears as a callee, for the former they are just Any if missing. Note that not only callable types can be polymorphic, there are also existential types like registry: Dict[Type[α], List[α]] but mypy/PEP 484 don't support them. (I am going to use Greek letters whenever I want to emphasize that this type variable should be inferred.) As a side note Poly <: Constructor[t] and mypy understand this well.

Whole expression type inference. Here I want to briefly describe how an "ideal" (see below for why it isn't actually ideal) type inference scheme/algorithm should look like. Imagine we have a nested function call f(g(...), h(...), ...) where functions may be polymorphic. Collecting all type constraints by matching each formal parameter type against corresponding actual arguments in the whole call "tree" leads to type constraints of three forms:

α_i <: t_j or t_i <: α_j (upper and lower bounds)
α_i <: β_j ("linear" constraints)
α_i <: List[β_j] ("non-linear" constraints)

I tried to search the literature about whether there are any "standard" algorithms for solving these, but didn't find anything satisfying, so here is how I see this.

First we notice that the linear constraints plus lower and upper bound constraints form a directed graph where type variables α_i are internal nodes and t_i are leaves. So we can "propagate" lower bounds and "back-propagate" upper bounds to all reachable inner nodes.

Then we meet all upper bounds, and join all lower bounds for each node. If there is both lower bound and upper bound for a variable we choose the lower bound (if lower is not a subtype of upper it is an error), otherwise the upper bound, otherwise <nothing>. (FWIW I think I have a proof that if there are only linear constraints this will produce a valid set of solutions, hint is two type variables are connected they either both have a lower bound, or both don't.)

Next, we substitute the inferred values into non-linear constraints, so that some of them can become linear, solve the new linear subset by the constraint propagation and so on until we see no progress. For example if we start from α <: β; β <: int; α <: str; γ <: List[β], then we find that α = object; β = int; γ = List[int].

The main downside of this inference scheme (and btw other schemes I have found in literature) is that any type error made by a user will result in a cryptic message Cannot infer type variable for function "g" (which is especially annoying because we don't support explicit type application for functions).

Current state of type inference in mypy. Here is a brief summary of the current logic of the "Hanoi Tower" type inference in mypy:

Erase meta-variables (type variables of the caller that are currently being inferred) from the type context.
Use this erased type context against the return type to infer some type variables of the function.
Apply those type variable solutions inferred above that don't violate upper bound or values.
Infer types of arguments in this partial context (basically just accept them).
Infer now the rest of type variables using the "inner" context -- argument types.
Steps 4-5 are repeated twice: first for a subset of arguments, then for the rest, so that we infer types of lambda expressions using a richer context.
Infer types of arguments in the full context and type check them.

Note that this scheme is more limited that full expression inference mostly because of the type erasure that happens in step 1 for an argument that is a function call itself, while performing the step 4 of the caller. Basically we "cut" all linear constraints and are only left with the upper- and lower-bound kind of constraint (first in the list above).

Issues in current logic and implementation. Unfortunately both inference scheme itself and its implementation has certain issues. There are four big ones:

Staring the inference from the outer context results in too many false positives. Imagine even a basic situation like this:
```
T = TypeVar('T')

def f(xs: List[T]) -> T:
    ...
xs: List[int]
x: float = f(xs)
```
Following the general scheme would infer T = float and cause Invalid type of argument, expected "List[float]" got "List[int]". We avoid this by having a bunch of ad-hoc hacks (or heuristics) to skip the step 2 (and consequently 3) in situations like this.
The logic for sorting arguments between first and second pass of steps 4-5 looks broken. At least what I would expect it to do, what it is documented doing, and what it actually does all pairwise disagree.
This scheme can cause "leaking" of type variables if an argument to match against is a polymorphic callable (see e.g. the original example in this issue, there is also a bunch of duplicates).
Generic decorators don't work on generic functions, see for example Generic decorators don't work with generic functions #1317 (one of the oldest high priority issues) and Generic inference regression from #5699 #5738 (comment) just above.

Possible solution plan I think fixing these issues can be mostly done independently. However, since they are all related it is better to do this in an organized manner. Here are some ideas that I would like to propose.

Issue 1: This would be solved by the whole expression inference scheme, but as I mentioned, the latter results in bad error messages, and IMO this is a very important argument against it. But I think with just couple improvements we can make the existing inference logic almost as powerful as whole expression inference without deteriorating error messages (and maybe even make them better).

First, we need to collect and solve constraints from outer context and from the first pass arguments together (steps 2 and 5). Second, don't apply them (i.e. skip step 3 altogether), instead record the gathered information about bounds in the erased types when erasing meta-variables (so that for a nested function call we can use inferred constraints and the recorded ones together). Infer the types of second pass arguments using this augmented erased context. And finally infer the type argument using both external and full internal context.

Importantly, if there is a type inference error in this last step, then we fall back to solution without extra constraints from erased parts of outer context, and then without the argument types. So that we will get Invalid argument type error whenever possible instead of Can't infer type.

This is best illustrated on an example with function f defined above. In a call x: float = f(['a', 'b', 'c']) there is no first pass arguments (list display translates to a generic constructor call), so we only infer one constraint T <: float, now we don't apply it, but supply an erased type context List[Erased(lower=None, upper=float)] to the list call. List call infers for its type argument T' :> str from first pass arguments (and there is no second one). Trying to solve the full set T' <: float; T' :> float; T' <: str it fails, so falls back to T' = str. This returns List[str] to the initial f call, where it fails again, so we now infer T = float for f and get Invalid argument type "List[str]", expected "List[float]" which I think is a nice error for this call.

Issue 2: In some sense this should be "just fixed", i.e. we should clean up the implementation and docs here. Importantly, if we go with the above solution, I think we should bias the selection to only put rock solid arguments in the first pass, i.e. those that work well in empty context, like RefExprs, calls to non-generic functions, and similar things. Everything else should go in the second pass.

Issue 3: The fix for this issue is straightforward -- type variables of polymorphic callables should participate in the inference as intermediate nodes for linear constraint propagation. (Caveat: we should be careful to not confuse type variables bound by an enclosed function with polymorphic variables, they currently can have the same name and id.) This is better illustrated on an example:

X = TypeVar('X')
T = TypeVar('T')

def foo(x: Callable[[int], X]) -> List[X]:
    ...
def id(x: T) -> T:
    ...
y = foo(id)

Here inferred type of y should be List[int] (not List[T] as currently). This happens as following, there is no external context, but there is a first pass argument so we just infer constraints for Callable[[T], T] <: Callable[[int], X], this yields T <: X; int <: T. Propagating the constraint through T we get int <: X and infer X = int.

Issue 4: This issue can be fixed by allowing "instantiation" with less variables instead of inferring <nothing> for "unbounded" type variables with linear constraints. Let's use the example from the comment above:

T = TypeVar('T')
S = TypeVar('S')
U = TypeVar('U')

def dec(f: Callable[[S], T]) -> Callable[[S], List[T]]:
    ...
def f(x: U) -> U:
    ...
g = dec(f)

Here the inferred type of g should be Callable[[α], List[α]]. This happens in a manner similar to last example, we infer constraints U <: T; U :> S and propagating through U we get S <: T. Now is the interesting step: instead of inferring S = T = <nothing> (because there are no bounds), we introduce a new variable S = T = α and get the desired result. There is one important caveat, this procedure can easily generate existential types, for example if dec would have a signature like:

def dec(f: Callable[[S], T]) -> Dict[S, T]:
    ...

we would end up inferring Dict[α, α]. So we need a function apply_polymorphic_if_possible() that will check if application will not generate existential types in return, and fall back to applying <nothing> (and thus likely Need type annotation for variable) otherwise.

I am not sure what is the best order to fix these, 1 and 2 are better fixed together and in total can take a week to fix, while 3 and 4 are smaller. Any thoughts or comments?

cc @Michael0x2a

Michael0x2a · 2019-07-15T03:20:54Z

Hmm, I wonder if it might be possible to improve the error handling of the "whole expression inference" algorithm.

Basically, it seems to me that there are only four ways the whole expression inference algorithm can fail to produce a valid mapping of nodes to types:

We end up inferring that some node α has inconsistent bounds. That is, the upper and lower bounds U and L do not satisfy L <: U.
After propagating inferred types, we end up with an inconsistent arrow in the graph. For example, suppose we have as input Set[α] <: List[β] and infer α = int and β = str, causing us to end up with the bad relationship Set[int] <: List[str]
We have some nodes that do not have a concrete lower bound/have a lower bound of <nothing>. For example, we have as input α <: β but no way of determining what α or β are supposed to be.
The input graph contains cycles/is not a DAG.

I think we might be able to handle cases 1 through 3 by making the following changes:

For case 1, if some node α has inconsistent upper and lower bounds, assign it some sort of InconsistentBounds(upper=U, lower=L) object instead of giving up and returning an error. Then, when we try applying the mapping to deduce the argument/return types, we check for the presence of these special objects and construct an appropriate error message.

So for example, if we have as input int <: α, α <: str, α <: β, we'd return the mapping α = InconsistentBounds(lower=int, upper=str), β = str.

(We could have also assigned this InconsistentBounds object to β during propagation, but in order to prevent these objects from "spreading" and causing lots of error messages, we could always make sure to use either the upper or lower bounds when propagating.)
For case 2, label each subtyping relationship we pass in and make sure to preserve the labels whenever the algorithm internally transforms the graph. This means when we hit some inconsistency, we can use the labels to figure out exactly where that inconsistency is happening/give us the context we need to make a better error message.

For example, suppose we have the following:
```
T = TypeVar('T')
S = TypeVar('S')

def f(xs: List[T]) -> T: ...
def g(y: S) -> Set[S]: ...

a: str = f(g(3))
```
Then we'd pass in these labeled subtyping relationships as input:
```
T      <:-- return type of f -- str
Set[S] <:----- arg 1 of f ----- List[T]
int    <:----- arg 1 of g ----- S
```
After one propagation pass, we end up with the mapping T = str, S = int.

Then when we try applying this mapping to the graph, we realize the second relationship results in an inconsistency: Set[str] <: List[int] is not true and so return it alongside of our mapping.

And since the returned relationship is labeled, the caller can use it to reconstruct a more meaningful error message.
For case 3, we use the same fixes you proposed when discussing "Issue 4" (including adding a final check to replace singleton nodes/existential types with <nothing>).

I think the only case where it's difficult to construct a good error message for is case 4 (the graph is not a DAG). But tbh I think such cases are pretty rare, maybe even impossible, so just failing with a bland "Could not infer type" error might be good enough in this case.

Caveats: I haven't really thought about this problem too deeply, so there could be some flaws with the idea that I'm overlooking. And even if this modified algorithm does work, I think it could still be reasonable to go with your proposal -- iterative changes to a known algorithm are generally less risker compared to a complete rewrite, etc etc...

ilevkivskyi · 2019-07-15T08:44:13Z

@Michael0x2a Thanks for ideas! Here are few comments:

After propagating inferred types, we end up with an inconsistent arrow in the graph. For example, suppose we have as input Set[α] <: List[β]

Such constraint can't get into the list initially, I didn't clarify this, sorry. I think we should "dig through" the type components until we get a bare variable on either side of constraint. IOW I don't have problems with how constraints are found currently (well except for inferring constraints against unions, but this is a totally different story).

The input graph contains cycles/is not a DAG.

This is not actually an error. It is totally fine to have cycles i.e. have a DG that is not a DAG. Moreover, this is a quite typical situation for invariant containers to produce tiny loops like α <: β; β <: α. We still can (back-)propagate constraints through such graph (just should be careful and use an appropriate graph traversal).

To summarize I think the problems/situations 2-4 in your list are non-issues, and problem 1 is really the root of all evil. I am however not sure I understand how you propose to use your solution to give the better error message. Could you please explain it on example with x: float = f(['a', 'b', 'c'])? Also another clarification here, the whole expression inference happens for each top-level expression once (with errors disabled), then we apply all inferred arguments, then perform type checking.

Michael0x2a · 2019-07-15T15:21:31Z

Such constraint can't get into the list initially, I didn't clarify this, sorry. I think we should "dig through" the type components until we get a bare variable on either side of constraint. IOW I don't have problems with how constraints are found currently (well except for inferring constraints against unions, but this is a totally different story).

Oh, so given the constraint Set[α] <: List[β], we're actually ok with digging down and passing in α <: β instead?

What happens if we have a constraint that looks like Foo[α, β] <: Bar[γ]?

Could you please explain it on example with x: float = f(['a', 'b', 'c'])?

Sure! Assuming that the type signature of f is f(x: List[T]) -> T, we initially pass in the constraintsList[str] <: List[T] and T <: float to the constraint-solving algorithm.

At some point, these constraints get refined down to str <: T and T <: float. I initially assumed this would happen inside of the algorithm, but based on your comments, it seems it's actually the caller's responsibility to simplify this down? But it's not too important; either way is fine.

After performing propagation, we end up with T having inconsistent lower and upper bounds of str and float. So, our algorithm returns the mapping {T: InconsistentBounds(lower=str, upper=float)}.

Once the caller receives this mapping, it realizes there must be a type error of some kind since the mapping contains an InconsistentBounds. So to produce nice error messages, it deconstructs this mapping into {T: str} and {T: float} (one mapping uses the lower bound, the other the upper) and produces two types as we deduce the arg types and return type. If after the application the two types we produce are different, we report an error for that particular arg or return.

In this case, when we try applying these two mappings we get List[str] and List[float] for the argument. These types are different, so we produce an error like Argument 1 to "f" has incompatible type "List[str]"; expected "List[float]".

This error is actually different from the error mypy produces today, which is Incompatible types in assignment (expression has type "str", variable has type "float").

I think both errors are fine tbh, but I don't think it'll be too hard to find a way to continue biasing towards "pushing up" errors to the return type. For example, we could maybe try running this "solve constraints then apply" step twice -- first without including the return type context as a constraint, then if that doesn't return any errors trying again with it.

(Or alternatively, we could apply these two mappings without doing any error handling to produce the callee and caller signatures f(List[str]) -> str and f(List[float]) -> float respectively? Then pass these two signatures to the type checker and let it do its usual type-checking.)

ilevkivskyi · 2019-07-15T22:54:45Z

Oh, so given the constraint Set[α] <: List[β], we're actually ok with digging down and passing in α <: β instead?

No, this will infer no constraints because this can never be satisfied. But for the cases like List[α] <: Iterable[β] we do infer α <: β using map_instance_to_supertype() (or a similar thing for protocols). You can read about this in constraints.py, as I said there is nothing there I have problems with except inference against unions.

Sure! [...]

Thanks for explanation. But it looks like this will not work well. I should have been more explicit here again. The point is that a list literal is internally translated to a generic function call (see my explanation for how errors would generate in the compromise inference scheme). So that ['a', 'b', 'c'] is roughly:

def _list(*args: T) -> List[T]: ...
_list('a', 'b', 'c')

This is actually why x: List[float] = [1, 2, 3] works in mypy.

Taking this into account and renaming type variable of _list to T' to avoid confusion (and as I did in the long comment) full expression inference will get us the following set of constraints: T <: float; T :> T'; T <: T'; T' :> str. Following your idea this should generate InconsistentBounds(lower=str, upper=float) for both T and T'. So now when you will try to apply all four options this will generate two errors that confusingly contradict each other.

JukkaL · 2019-07-16T13:05:24Z

@ilevkivskyi I like your incremental approach of improving type inference. The general scheme looks good, though I have a few questions. It would be good to have a few more worked examples, such as some already supported tricky cases (such as cases involving lambda and/or overloads), and open bugs that this would fix.

Beyond the difficulty of generating good error messages, the "whole expression inference" approach also seems much more work -- especially more work needed for any incremental improvement, as it seems basically all-or-nothing to me.

Second, don't apply them (i.e. skip step 3 altogether), instead record the gathered information about bounds in the erased types when erasing meta-variables (so that for a nested function call we can use inferred constraints and the recorded ones together).

Does this mean using the new Erased(lower, upper) types? How would these types work -- would we need to support all type operations with them? An early prototype that eventually evolved into mypy had a concept of "range type" during type inference, and I hit various difficulties with them and eventually abandoned them. I can't recall the specifics, though -- your idea may be sufficiently different that it doesn't have the same issues.

I think we should bias the selection to only put rock solid arguments in the first pass, i.e. those that work well in empty context, like RefExprs, calls to non-generic functions, and similar things. Everything else should go in the second pass.

How would this deal with cases where there is both a generic function call (say, [1, 2, 3]) and a lambda as an argument, and the type of lambda depends on the return value of the generic function call? The map() builtin is a good example where these happen with some frequency.

ilevkivskyi · 2019-07-17T09:20:49Z

We briefly discussed Jukka's questions privately, here is a summary.

Does this mean using the new Erased(lower, upper) types?

We can re-use the existing ErasedType for this and add two extra attributes to it. All the existing interactions of this type will hold, we would only update its behaviour for constraint inference.

How would this deal with cases where there is both a generic function call (say, [1, 2, 3]) and a lambda as an argument

It looks like we might need to introduce priorities that will determine order of processing arguments to make this kind of things robust.

See python/mypy#5738 and python-attrs/attrs#519

ilevkivskyi · 2019-08-12T14:46:44Z

An example of questionable behaviour that stems from current logic of splitting arguments in two passes (appeared in a discussion on typeshed):

def g(x: T, y: Callable[[], T]) -> None: ...
def not_a_lambda() -> str: ...

g(42, not_a_lambda)  # this currently fails, but should infer T = object

We sometimes get this from mypy: Traceback (most recent call last): [...] File "mypy/main.py", line 89, in main File "mypy/build.py", line 166, in build File "mypy/build.py", line 235, in _build File "mypy/build.py", line 2611, in dispatch File "mypy/build.py", line 2911, in process_graph File "mypy/build.py", line 2989, in process_fresh_modules File "mypy/build.py", line 1919, in fix_cross_refs File "mypy/fixup.py", line 25, in fixup_module File "mypy/fixup.py", line 89, in visit_symbol_table File "mypy/fixup.py", line 45, in visit_type_info File "mypy/fixup.py", line 91, in visit_symbol_table File "mypy/nodes.py", line 874, in accept File "mypy/fixup.py", line 136, in visit_var File "mypy/types.py", line 234, in accept cr_running = gi_running File "mypy/fixup.py", line 169, in visit_type_alias_type File "mypy/fixup.py", line 276, in lookup_qualified_alias File "mypy/fixup.py", line 290, in lookup_qualified File "mypy/fixup.py", line 299, in lookup_qualified_stnode File "mypy/lookup.py", line 47, in lookup_fully_qualified AssertionError: Cannot find component 'pending_download_type' for 'qutebrowser.browser.webkit.mhtml._Downloader.pending_download_type' This seems to help... See: python/mypy#5738 python/mypy#7281

bwo · 2020-05-18T19:50:18Z

I'm not sure but I suspect this is another instance of this issue, which just bit me today:

from typing import Optional, TypeVar, Callable, Tuple

A = TypeVar('A')
B = TypeVar('B')
L = TypeVar('L')
R = TypeVar('R')
Ret = TypeVar('Ret')


def decorate(f):
    # type: (Callable[[L, R], Ret]) -> Callable[[L, R], Ret]
    ''' in real life, the type of decorator is a little more involved, so I can't just have it be `(T) -> T` '''

    assert False


def function(l, foo):
    # type: (Optional[A], Optional[B]) -> Optional[Tuple[A, B]]
    assert False


x: Optional[Tuple[str, int]] = function('s', 1)
y: Optional[Tuple[str, int]] = decorate(function)('s', 1)
reveal_type(function)
reveal_type(decorate)
reveal_type(decorate(function))
reveal_type(x)
reveal_type(y)

typechecking this file gives:

$ mypy foo.py
foo.py:21: error: Incompatible types in assignment (expression has type "Optional[Tuple[A, B]]", variable has type "Optional[Tuple[str, int]]")
foo.py:21: error: Argument 1 has incompatible type "str"; expected "Optional[A]"
foo.py:21: error: Argument 2 has incompatible type "int"; expected "Optional[B]"
foo.py:22: note: Revealed type is 'def [A, B] (l: Union[A`-1, None], foo: Union[B`-2, None]) -> Union[Tuple[A`-1, B`-2], None]'
foo.py:23: note: Revealed type is 'def [L, R, Ret] (f: def (L`-1, R`-2) -> Ret`-3) -> def (L`-1, R`-2) -> Ret`-3'
foo.py:24: note: Revealed type is 'def (Union[A`-1, None], Union[B`-2, None]) -> Union[Tuple[A`-1, B`-2], None]'
foo.py:25: note: Revealed type is 'Union[Tuple[builtins.str, builtins.int], None]'
foo.py:26: note: Revealed type is 'Union[Tuple[builtins.str, builtins.int], None]'
Found 3 errors in 1 file (checked 1 source file)

From reveal_type(decorate(function)) it seems that the genericity has been wiped out. But A and B aren't concrete types. I've been able to type decorators that don't change the function signature in the past by giving them the type (T) -> T (though I haven't tried to decorate themselves-generic functions with such a decorator), but for my actual decorator that won't work.

RunOrVeith · 2020-08-14T11:57:58Z

I assume this is also caused by this issue:

from typing import List, Optional
from dataclasses import field, dataclass

@dataclass
class Foo:
    x: Optional[List[str]] = field(default_factory=list)

which gives Incompatible types in assignment (expression has type "List[_T]", variable has type "Optional[List[str]]").
Mypy thus fails to match _T to str. It is able to do so if x where not Optional.

phinate · 2021-01-26T23:55:42Z

I assume this is also caused by this issue:
from typing import List, Optional
from dataclasses import field, dataclass

@dataclass
class Foo:
    x: Optional[List[str]] = field(default_factory=list)
which gives Incompatible types in assignment (expression has type "List[_T]", variable has type "Optional[List[str]]").
Mypy thus fails to match _T to str. It is able to do so if x where not Optional.

for reference, I'm encountering this exact scenario too with Tuple

cebtenzzre · 2022-08-07T22:11:31Z

The way mypy does type inference also breaks calling a generic function on the right side of set.__and__, because it takes an argument of type AbstractSet[object].

from typing import TypeVar

T = TypeVar('T')

def identity(x: set[T]) -> set[T]:
    return x

x = {0, 1}

# error: Argument 1 to "identity" has incompatible type "Set[int]"; expected "Set[object]"  [arg-type]
print({0} & identity(x))

Fixes #1317 Fixes #5738 Fixes #12919 (also fixes a `FIX` comment that is more than 10 years old according to git blame) Note: although this PR fixes most typical use-cases for type inference against generic functions, it is intentionally incomplete, and it is made in a way to limit implications to small scope. This PR has essentially three components (better infer, better solve, better apply - all three are needed for this MVP to work): * A "tiny" change to `constraints.py`: if the actual function is generic, we unify it with template before inferring constraints. This prevents leaking generic type variables of actual in the solutions (which makes no sense), but also introduces new kind of constraints `T <: F[S]`, where type variables we solve for appear in target type. These are much harder to solve, but also it is a great opportunity to play with them to prepare for single bin inference (if we will switch to it in some form later). Note unifying is not the best solution, but a good first approximation (see below on what is the best solution). * New more sophisticated constraint solver in `solve.py`. The full algorithm is outlined in the docstring for `solve_non_linear()`. It looks like it should be able to solve arbitrary constraints that don't (indirectly) contain "F-bounded" things like `T <: list[T]`. Very short the idea is to compute transitive closure, then organize constraints by topologically sorted SCCs. * Polymorphic type argument application in `checkexpr.py`. In cases where solver identifies there are free variables (e.g. we have just one constraint `S <: list[T]`, so `T` is free, and solution for `S` is `list[T]`) it will apply the solutions while creating new generic functions. For example, if we have a function `def [S, T] (fn: Callable[[S], T]) -> Callable[[S], T]` applied to a function `def [U] (x: U) -> U`, this will result in `def [T] (T) -> T` as the return. I want to put here some thoughts on the last ingredient, since it may be mysterious, but now it seems to me it is actually a very well defined procedure. The key point here is thinking about generic functions as about infinite intersections or infinite overloads. Now reducing these infinite overloads/intersections to finite ones it is easy to understand what is actually going on. For example, imagine we live in a world with just two types `int` and `str`. Now we have two functions: ```python T = TypeVar("T") S = TypeVar("S") U = TypeVar("U") def dec(fn: Callable[[T], S]) -> Callable[[T], S]: ... def id(x: U) -> U: ... ``` the first one can be seen as overload over ``` ((int) -> int) -> ((int) -> int) # 1 ((int) -> str) -> ((int) -> str) # 2 ((str) -> int) -> ((str) -> int) # 3 ((str) -> str) -> ((str) -> str) # 4 ``` and second as an overload over ``` (int) -> int (str) -> str ``` Now what happens when I apply `dec(id)`? We need to choose an overload that matches the argument (this is what we call type inference), but here is a trick, in this case two overloads of `dec` match the argument type. So (and btw I think we are missing this for real overloads) we construct a new overload that returns intersection of matching overloads `# 1` and `# 4`. So if we generalize this intuition to the general case, the inference is selection of an (infinite) parametrized subset among the bigger parameterized set of intersecting types. The only question is whether resulting infinite intersection is representable in our type system. For example `forall T. dict[T, T]` can make sense but is not representable, while `forall T. (T) -> T` is a well defined type. And finally, there is a very easy way to find whether a type is representable or not, we are already doing this during semantic analyzis. I use the same logic (that I used to view as ad-hoc because of lack of good syntax for callables) to bind type variables in the inferred type. OK, so here is the list of missing features, and some comments on them: 1. Instead of unifying the actual with template we should include actual's variables in variable set we solve for, as explained in #5738 (comment). Note however, this will work only together with the next item 2. We need to (iteratively) infer secondary constraints after linear propagation, e.g. `Sequence[T] <: S <: Sequence[U] => T <: U` 3. Support `ParamSpec` (and probably `TypeVarTuple`). Current support for applying callables with `ParamSpec` to generics is hacky, and kind of dead-end. Although `(Callable[P, T]) -> Callable[P, List[T]]` works when applied to `id`, even a slight variation like `(Callable[P, List[T]]) -> Callable[P, T]` fails. I think it needs to be re-worked in the framework I propose (the tests I added are just to be sure I don't break existing code) 4. Support actual types that are generic in type variables with upper bounds or values (likely we just need to be careful when propagating constraints and choosing free variable within an SCC). 5. Add backtracking for upper/lower bound choice. In general, in the current "Hanoi Tower" inference scheme it is very hard to backtrack, but in in this specific choice in the new solver, it should be totally possible to switch from lower to upper bound on a previous step, if we found no solution (or `<nothing>`/`object`). 6. After we polish it, we can use the new solver in more situations, e.g. for return type context, and for unification during callable subtyping. 7. Long term we may want to allow instances to bind type variables, at least for things like `LRUCache[[x: T], T]`. Btw note that I apply force expansion to type aliases and callback protocols. Since I can't transform e.g. `A = Callable[[T], T]` into a generic callable without getting proper type. 8. We need to figure out a solution for scenarios where non-linear targets with free variables and constant targets mix without secondary constraints, like `T <: List[int], T <: List[S]`. I am planning to address at least majority of the above items, but I think we should move slowly, since in my experience type inference is really fragile topic with hard to predict long reaching consequences. Please play with this PR if you want to and have time, and please suggest tests to add.

ilevkivskyi added bug mypy got something wrong priority-0-high false-positive mypy gave an error on correct code topic-type-variables labels Oct 5, 2018

ilevkivskyi mentioned this issue Oct 5, 2018

Release 0.640 planning #5741

Closed

ilevkivskyi mentioned this issue Oct 23, 2018

Dataclasses: default_factory=list doesn't work for Collection[Foo] #5823

Closed

This was referenced Nov 30, 2018

Release 0.650 planning #5960

Closed

Mitigate a regression in the external context #6006

Merged

berquist mentioned this issue Mar 15, 2019

Typing with converters not working python-attrs/attrs#519

Open

ilevkivskyi mentioned this issue May 6, 2019

Infer type argument of an instance from arguments to a generic method called on it #6746

Open

ilevkivskyi mentioned this issue May 26, 2019

Type variables set to <nothing> do not unify with more specific variables when used in a more specific context #6895

Open

ilevkivskyi mentioned this issue Jun 4, 2019

Implicit TypeVar expansion of a Generic via another generic method #6933

Open

gabbard pushed a commit to isi-vista/immutablecollections that referenced this issue Jul 25, 2019

Workaround methods for mypy type inference problems with converters

62b3577

See python/mypy#5738 and python-attrs/attrs#519

ilevkivskyi mentioned this issue Sep 29, 2019

List gets silently converted into Iterable in itertools.chain.from_iterable #7519

Closed

ilevkivskyi mentioned this issue Dec 5, 2019

Stop saying mypy is beta software? #6740

Closed

Michael0x2a mentioned this issue Jan 2, 2020

Crash in lambda expression as generic argument #8230

Closed

ilevkivskyi mentioned this issue Feb 17, 2020

How can I use tuple as a converter to attr.ib without upsetting mypy? #8389

Closed

This was referenced Feb 25, 2020

Changed typing to work with the --disallow-any-generics flag on vistautils isi-vista/immutablecollections#73

Closed

Added new mypy tags and fixed all new mypy errors isi-vista/vistautils#117

Open

sjustas mentioned this issue Mar 9, 2020

Generic type is not inferred across descriptor class methods #8516

Closed

hauntsaninja mentioned this issue Apr 7, 2020

PEP 612 tracking issue #8645

Closed

hauntsaninja mentioned this issue Dec 30, 2020

map(os.path.basename, ...): Argument 1 to "map" has incompatible type overloaded function; expected "Callable[[str], AnyStr]" #9864

Closed

hauntsaninja mentioned this issue Jan 18, 2021

groupby with generic key function fails #9920

Closed

hauntsaninja mentioned this issue Sep 26, 2021

Meta issue tracker #11201

Closed

JelleZijlstra mentioned this issue May 3, 2022

Argument 1 to "__anext__" of "AsyncIterator" has incompatible type "AsyncIterator[_T_co]"; expected "AsyncIterator[_T_co]" #12723

Closed

ilevkivskyi mentioned this issue May 23, 2023

Foundations for non-linear solver and polymorphic application #15287

Merged

ilevkivskyi closed this as completed in #15287 Jun 18, 2023

cebtenzzre mentioned this issue Jun 18, 2023

set.__and__ false-positive involving generic inference #15462

Open

Uh oh!

Generic inference regression from #5699 #5738

Generic inference regression from #5699 #5738

Comments

msullivan commented Oct 4, 2018

ilevkivskyi commented Oct 4, 2018

Uh oh!

ilevkivskyi commented Oct 4, 2018

Uh oh!

ilevkivskyi commented Oct 5, 2018

Uh oh!

JukkaL commented Oct 5, 2018

Uh oh!

gvanrossum commented Oct 5, 2018

Uh oh!

ilevkivskyi commented Oct 5, 2018

Uh oh!

euresti commented Oct 17, 2018

Uh oh!

ilevkivskyi commented Jul 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Michael0x2a commented Jul 15, 2019

Uh oh!

ilevkivskyi commented Jul 15, 2019

Uh oh!

Michael0x2a commented Jul 15, 2019

Uh oh!

ilevkivskyi commented Jul 15, 2019

Uh oh!

JukkaL commented Jul 16, 2019

Uh oh!

ilevkivskyi commented Jul 17, 2019

Uh oh!

ilevkivskyi commented Aug 12, 2019

Uh oh!

bwo commented May 18, 2020

Uh oh!

RunOrVeith commented Aug 14, 2020

Uh oh!

phinate commented Jan 26, 2021

Uh oh!

cebtenzzre commented Aug 7, 2022

Uh oh!

ilevkivskyi commented Jul 14, 2019 •

edited

Loading