Fix handling of type variables to prevent confusion across different functions/classes #1261

gnprice · 2016-03-02T01:52:39Z

Currently type variables are identified by two namespaces "function" and "class" and a small numeric index, with the numbering starting from 1 in each new function and class. Which function or which class a given type variable refers to is implicit in where the type variable comes from in a given section of code in the type-checker. This leads to a number of bugs (TODO: link some of them here.)

@JukkaL suggests instead identifying each type variable by a fresh ID number distinct from all others, which is common practice in other type-inference engines.

JukkaL · 2016-03-02T18:31:46Z

Renamed Refactoring -> Fix in the title as this is will change behavior.

JukkaL · 2016-03-17T18:27:00Z

Should we move this to an earlier milestone? (I'll add more details soon.)

gvanrossum · 2016-03-17T19:07:14Z

Yeah, I think this one is important and should be given priority. Maybe we should just move it into 0.3.2 and move anything that's a cleanup or refactoring out of there and into 0.3.3?

JukkaL · 2016-03-17T20:43:23Z

Sounds good to me.

rwbarton · 2016-04-06T21:31:42Z

Using distinct ids for unrelated type variables would certainly lead to less confusion (both for the human and mypy). Though there are still subtleties: if f is a generic function and I have a call like f(f(...)), I need to be able to distinguish the type variables of the outer application f from those of the inner application of f. (Imagine f has a type like (A) -> List[A].)

One thing to watch out for is nondeterminism of generated ids. Even if ids are just allocated as sequential integers, it's easy in practice for the results to be affected by minor details, such as the order of type checking modules, This might be exposed to the user in two ways:

If the ids ever appear in error messages (not sure whether they do currently), the error messages will be nondeterministic. This is mostly only a problem for the test suite and can be handled by some normalization of error messages before comparison to the expected output.
I assume the implementation of incremental type checking involves some kind of "interface files" containing type information for modules that have already been processed. If those modules contain generic functions then the interface files will have to refer to those variables, and if they do so by the allocated id then the exact contents of the interface file become nondeterministic. This might become an issue in some situations. (I haven't looked at the implementation of incremental type checking yet, so I don't know whether it makes any assumptions about determinism of type checking.)

gvanrossum · 2016-04-14T18:21:38Z

Reid: if you get to this, great. If not, we can push it to a later milestone.

rwbarton · 2016-05-01T11:07:00Z

I took a stab at this but encountered a number of minor bugs that are masked by the way that mypy currently reuses type variable ids, so I'm going to move this to the next milestone. For my records, here are the issues I encountered (maybe some deserve their own issues in the tracker):

Subtype checking for generic functions with type variables with value restriction doesn't work correctly, since apply_generic_arguments won't allow a type variable with value restriction to be instantiated with itself (it's not one of the legal values!)
When checking method compatibility in various contexts (method overrides in subclasses, __add__ vs. __radd__ or __iadd__, function overloads) we sometimes check whether two (argument or result) types are the same, which for type variables means whether they have the same id. For example (from stdlib/3/builtins.pyi)
```
@overload
def reversed(object: Reversible[_T]) -> Iterator[_T]: ...
@overload
def reversed(object: Sequence[_T]) -> Iterator[_T]: ...
```
These are currently considered to be safely overlapping because they have the "same return type" Iterator[_T] where _T is the first (only) type variable of each overload. However, this logic is wrong. If the argument type of the first overload is changed to, say, Sequence[Tuple[_T, int]], then it becomes overlapping with the second with different return types, since the second can be specialized by replacing _T by Tuple[_T, int].

In these situations it generally seems to be correct to apply unify_generic_callables before comparing argument and result types, as happens already in is_callable_subtype.
In lib-python/3.2/fnmatch.py function filter, something funny happens at the line
```
  match = _compile_pattern(pat, isinstance(pat, bytes))
```
Here _compile_pattern is defined as
```
@functools.lru_cache(maxsize=250)
def _compile_pattern(pat: AnyStr,
                 is_bytes: bool = False) -> Callable[[AnyStr],
                                                     Match[AnyStr]]:
[...]
```
Because of the issue I mentioned in the comments of Generic decorators don't work with generic functions #1317, _compile_pattern ends up with a non-generic type that involves the type variable AnyStr, and therefore so does the local variable match. But filter is also a generic function with an AnyStr type variable, so (because the type variable ids are both -1) somewhere in the course of type checking filter the type variable of match gets expanded to the value of the type variable of filter, which isn't necessarily correct but unsurprisingly is in this case in practice. I expect this to be fixed by fixing Generic decorators don't work with generic functions #1317.

rwbarton · 2016-05-01T11:12:31Z

One of the unit tests (testOverrideGenericMethodInNonGenericClass, excerpted below) raised an interesting point:

class A:
    def f(self, x: T, y: S) -> None: pass
class C(A):
    def f(self, x: T, y: T) -> None: pass # E: Argument 2 of "f" incompatible with supertype "A"

When my branch didn't report an error on the last line, I thought at first it was a bug, but on reflection I decided that it was right and the test is wrong. After all, any time a.f(x, y) is a valid call for a an instance of A, it's also a valid call for a an instance of C because we can just pick T = object.

A correct test would be

class A:
    def f(self, x: List[T], y: List[S]) -> None: pass
class C(A):
    def f(self, x: List[T], y: List[T]) -> None: pass # Fail

Generating good error messages in this kind of situation is tricky, because each individual argument is compatible with the argument type of the supertype's method; they just aren't compatible taken together.

This is needed for subtyping of generic function with type variables with values to work correctly, which will be needed for python#1261.

This is needed for subtyping of generic function with type variables with values to work correctly, which will be needed for #1261.

rwbarton · 2016-05-10T18:56:59Z

On the three preliminary issues from two comments above: The first is fixed by #1469. The third is more complicated than I originally thought, and I filed #1515 for it. I plan to just add type: ignore comments as necessary to the stdlib sample file in which it comes up unless #1515 happens to be fixed already by then, since it currently only works by accident anyways. The second issue should be fairly straightforward and that's what I intend to look at next.

gvanrossum · 2016-05-10T19:02:43Z

Do we have some kind of process in place to find out when those `# type: ignore` comments in stdlib-samples are no longer needed?

JukkaL · 2016-05-10T19:17:59Z

I've occasionally went through stdlib-samples but not consistently. Having a warning for an unneeded # type: ignore comment would be a boon here. I think that sometimes I've undone some refactorings that previous mypy versions needed but that are no longer required, but again it hasn't been consistent.

The-Compiler · 2016-05-10T19:37:47Z

There's #1345 for that

This is a prerequisite for fixing python#1261 because overrides with generic function type variables currently work by accident: the type variables have the same ids in the original method and the overriding method, so is_subtype on the argument type returns True. This also allows an overriding method to generalize a specific type in the original method to a type variable, as demonstrated in one of the tests (testOverrideGenericMethodInNonGenericClassGeneralize). is_subtype already determines correctly whether the override is valid, so all the hard work here is in providing a more specific error message when possible.

This is a prerequisite for fixing #1261 because overrides with generic function type variables currently work by accident: the type variables have the same ids in the original method and the overriding method, so is_subtype on the argument type returns True. This also allows an overriding method to generalize a specific type in the original method to a type variable, as demonstrated in one of the tests (testOverrideGenericMethodInNonGenericClassGeneralize). is_subtype already determines correctly whether the override is valid, so all the hard work here is in providing a more specific error message when possible.

This will be needed for python#1261, which will make type variable ids unpredictable.

This is a prerequisite for fixing #1261 because overrides with generic function type variables currently work by accident: the type variables have the same ids in the original method and the overriding method, so is_subtype on the argument type returns True. This also allows an overriding method to generalize a specific type in the original method to a type variable, as demonstrated in one of the tests (testOverrideGenericMethodInNonGenericClassGeneralize). is_subtype already determines correctly whether the override is valid, so all the hard work here is in providing a more specific error message when possible.

This will be needed for python#1261, which will make type variable ids unpredictable.

This commit series gives type variable ids their own type and replaces the class/function type variable distinction with a plain/metavariable distinction. The main goal is to fix #603, but it's also progress towards #1261 and other bugs involving type variable inference. Metavariables (or unification variables) are variables introduced during type inference to represent the types that will be substituted for generic class or function type parameters. They only exist during type inference and should never escape into the inferred type of identifiers. Fixes #603.

rwbarton · 2016-07-01T23:24:59Z

I partially did this in #1580: that commit generates fresh IDs for unification variables during type inference, which live in their own type variable ID "namespace". Type variables of generic classes and functions are still allocated using the same old scheme (positive for classes, negative for functions). This fixed the known issues that are a direct result of type variable ID confusion.

I still think allocating unique IDs for all type variables is probably a good idea that may be helpful for some type inference engine improvements in the future. However, there's actually a fair amount of work involved in doing this correctly and so far there hasn't been a concrete reason to do it, so I've put this off.

The major points that would need to be addressed when implementing this are:

The tests. Type variable IDs are sometimes displayed in error messages, and certainly printed by the typexport tests. If we assign a unique ID to every type variable of every function, the IDs will necessarily become sensitive to things like the exact contents of stubs, the order of type checking, etc. That means that the test output checker needs to be robust to changes in the IDs. On the other hand, the IDs also contain information to the extent of whether or not two IDs that appear in output are equal, so the test output checker should not entirely ignore IDs, but instead make sure that two IDs are equal in the actual output exactly when the corresponding IDs are equal in the expected output. (Furthermore, a large number of test cases will need to have their outputs adjusted since the old IDs were very much not unique.)
Interaction with incremental checking. Type variable IDs are written to incremental data files, for example in the types of generic functions and their arguments. The incremental data file format does not have a clear notion of scoping for these variables (for example a type variable may appear in the type of a function before it appears in list of type variables of that function). But it's reasonable to assume that the scope of the variables is the entire file.

When reading in the incremental data files, we need to make sure that type variables from different files end up with different IDs internally, even if they have the same physical ID numbers in the two files (we cannot ensure that that will not happen), while mapping a given file's IDs to new IDs consistently. This is awkward to do from the deserialize method, which has no access to any state. Instead we'd have to deserialize into a temporary "raw"/"unbound" type variable ID that gets converted to the final, fresh ID during fixup.

gvanrossum · 2016-07-01T23:38:14Z

(Adding the Incremental label to flag this as of potential interest -- however I don't think there's an incremental action item unless we decide to proceed with the full fix here.)

gvanrossum · 2016-10-17T20:45:32Z

Removing the Incremental label, I don't think it's relevant.

JelleZijlstra · 2022-03-19T01:57:39Z

Greg's original post just says that this causes "a number of bugs", but there are no specifics and I don't recall any from the five years since this issue was last touched. Closing as obsolete.

gnprice added this to the 0.4.0 milestone Mar 2, 2016

JukkaL changed the title ~~Refactor handling of type variables to prevent confusion across different functions/classes~~ Fix handling of type variables to prevent confusion across different functions/classes Mar 2, 2016

JukkaL added the bug mypy got something wrong label Mar 17, 2016

gvanrossum modified the milestones: 0.3.2, 0.4.0 Mar 17, 2016

rwbarton mentioned this issue Apr 8, 2016

Type inference problem in generic function #603

Closed

gvanrossum assigned rwbarton Apr 14, 2016

rwbarton mentioned this issue Apr 28, 2016

Generate correct constraints for Unions #1408

Merged

rwbarton modified the milestones: 0.4.x, 0.4 May 1, 2016

rwbarton added a commit to rwbarton/mypy that referenced this issue May 3, 2016

Allow substituting typevar-with-values for typevar-with-values

57a1885

This is needed for subtyping of generic function with type variables with values to work correctly, which will be needed for python#1261.

rwbarton mentioned this issue May 3, 2016

Allow substituting typevar-with-values for typevar-with-values #1469

Merged

gvanrossum pushed a commit that referenced this issue May 3, 2016

Allow substituting typevar-with-values for typevar-with-values (#1469)

a62caad

This is needed for subtyping of generic function with type variables with values to work correctly, which will be needed for #1261.

This was referenced May 9, 2016

Specialize type variables of generic functions that are arguments #1471

Closed

functools.lru_cache doesn't work with generic functions #1515

Closed

rwbarton mentioned this issue May 11, 2016

Allow any more general type for override of a method #1524

Merged

rwbarton added a commit to rwbarton/mypy that referenced this issue May 25, 2016

Refactoring: construct TypeVarType from TypeVarDef

e94031f

This will be needed for python#1261, which will make type variable ids unpredictable.

rwbarton mentioned this issue May 25, 2016

Improve type checking of generic function bodies #1580

Merged

rwbarton added a commit to rwbarton/mypy that referenced this issue Jun 14, 2016

Refactoring: construct TypeVarType from TypeVarDef

d231cf7

This will be needed for python#1261, which will make type variable ids unpredictable.

rwbarton removed their assignment Jul 1, 2016

rwbarton modified the milestones: 0.5, 0.4.x Jul 1, 2016

gvanrossum added the topic-incremental label Jul 1, 2016

gvanrossum removed the topic-incremental label Oct 17, 2016

gvanrossum added the priority-1-normal label Jan 6, 2017

gvanrossum removed this from the 0.5 milestone Mar 29, 2017

JelleZijlstra closed this as completed Mar 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of type variables to prevent confusion across different functions/classes #1261

Fix handling of type variables to prevent confusion across different functions/classes #1261

gnprice commented Mar 2, 2016

JukkaL commented Mar 2, 2016

JukkaL commented Mar 17, 2016

gvanrossum commented Mar 17, 2016

JukkaL commented Mar 17, 2016

rwbarton commented Apr 6, 2016

gvanrossum commented Apr 14, 2016

rwbarton commented May 1, 2016

rwbarton commented May 1, 2016

rwbarton commented May 10, 2016

gvanrossum commented May 10, 2016 via email

JukkaL commented May 10, 2016

The-Compiler commented May 10, 2016

rwbarton commented Jul 1, 2016

gvanrossum commented Jul 1, 2016

gvanrossum commented Oct 17, 2016

JelleZijlstra commented Mar 19, 2022

Fix handling of type variables to prevent confusion across different functions/classes #1261

Fix handling of type variables to prevent confusion across different functions/classes #1261

Comments

gnprice commented Mar 2, 2016

JukkaL commented Mar 2, 2016

JukkaL commented Mar 17, 2016

gvanrossum commented Mar 17, 2016

JukkaL commented Mar 17, 2016

rwbarton commented Apr 6, 2016

gvanrossum commented Apr 14, 2016

rwbarton commented May 1, 2016

rwbarton commented May 1, 2016

rwbarton commented May 10, 2016

gvanrossum commented May 10, 2016 via email

JukkaL commented May 10, 2016

The-Compiler commented May 10, 2016

rwbarton commented Jul 1, 2016

gvanrossum commented Jul 1, 2016

gvanrossum commented Oct 17, 2016

JelleZijlstra commented Mar 19, 2022