Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of type variables to prevent confusion across different functions/classes #1261

Closed
gnprice opened this issue Mar 2, 2016 · 16 comments
Labels
bug mypy got something wrong priority-1-normal

Comments

@gnprice
Copy link
Collaborator

gnprice commented Mar 2, 2016

Currently type variables are identified by two namespaces "function" and "class" and a small numeric index, with the numbering starting from 1 in each new function and class. Which function or which class a given type variable refers to is implicit in where the type variable comes from in a given section of code in the type-checker. This leads to a number of bugs (TODO: link some of them here.)

@JukkaL suggests instead identifying each type variable by a fresh ID number distinct from all others, which is common practice in other type-inference engines.

@gnprice gnprice added this to the 0.4.0 milestone Mar 2, 2016
@JukkaL JukkaL changed the title Refactor handling of type variables to prevent confusion across different functions/classes Fix handling of type variables to prevent confusion across different functions/classes Mar 2, 2016
@JukkaL
Copy link
Collaborator

JukkaL commented Mar 2, 2016

Renamed Refactoring -> Fix in the title as this is will change behavior.

@JukkaL JukkaL added the bug mypy got something wrong label Mar 17, 2016
@JukkaL
Copy link
Collaborator

JukkaL commented Mar 17, 2016

Should we move this to an earlier milestone? (I'll add more details soon.)

@gvanrossum
Copy link
Member

Yeah, I think this one is important and should be given priority. Maybe we should just move it into 0.3.2 and move anything that's a cleanup or refactoring out of there and into 0.3.3?

@JukkaL
Copy link
Collaborator

JukkaL commented Mar 17, 2016

Sounds good to me.

@gvanrossum gvanrossum modified the milestones: 0.3.2, 0.4.0 Mar 17, 2016
@rwbarton
Copy link
Contributor

rwbarton commented Apr 6, 2016

Using distinct ids for unrelated type variables would certainly lead to less confusion (both for the human and mypy). Though there are still subtleties: if f is a generic function and I have a call like f(f(...)), I need to be able to distinguish the type variables of the outer application f from those of the inner application of f. (Imagine f has a type like (A) -> List[A].)

One thing to watch out for is nondeterminism of generated ids. Even if ids are just allocated as sequential integers, it's easy in practice for the results to be affected by minor details, such as the order of type checking modules, This might be exposed to the user in two ways:

  • If the ids ever appear in error messages (not sure whether they do currently), the error messages will be nondeterministic. This is mostly only a problem for the test suite and can be handled by some normalization of error messages before comparison to the expected output.
  • I assume the implementation of incremental type checking involves some kind of "interface files" containing type information for modules that have already been processed. If those modules contain generic functions then the interface files will have to refer to those variables, and if they do so by the allocated id then the exact contents of the interface file become nondeterministic. This might become an issue in some situations. (I haven't looked at the implementation of incremental type checking yet, so I don't know whether it makes any assumptions about determinism of type checking.)

@gvanrossum
Copy link
Member

Reid: if you get to this, great. If not, we can push it to a later milestone.

@rwbarton
Copy link
Contributor

rwbarton commented May 1, 2016

I took a stab at this but encountered a number of minor bugs that are masked by the way that mypy currently reuses type variable ids, so I'm going to move this to the next milestone. For my records, here are the issues I encountered (maybe some deserve their own issues in the tracker):

  • Subtype checking for generic functions with type variables with value restriction doesn't work correctly, since apply_generic_arguments won't allow a type variable with value restriction to be instantiated with itself (it's not one of the legal values!)

  • When checking method compatibility in various contexts (method overrides in subclasses, __add__ vs. __radd__ or __iadd__, function overloads) we sometimes check whether two (argument or result) types are the same, which for type variables means whether they have the same id. For example (from stdlib/3/builtins.pyi)

    @overload
    def reversed(object: Reversible[_T]) -> Iterator[_T]: ...
    @overload
    def reversed(object: Sequence[_T]) -> Iterator[_T]: ...
    

    These are currently considered to be safely overlapping because they have the "same return type" Iterator[_T] where _T is the first (only) type variable of each overload. However, this logic is wrong. If the argument type of the first overload is changed to, say, Sequence[Tuple[_T, int]], then it becomes overlapping with the second with different return types, since the second can be specialized by replacing _T by Tuple[_T, int].

    In these situations it generally seems to be correct to apply unify_generic_callables before comparing argument and result types, as happens already in is_callable_subtype.

  • In lib-python/3.2/fnmatch.py function filter, something funny happens at the line

      match = _compile_pattern(pat, isinstance(pat, bytes))
    

    Here _compile_pattern is defined as

    @functools.lru_cache(maxsize=250)
    def _compile_pattern(pat: AnyStr,
                     is_bytes: bool = False) -> Callable[[AnyStr],
                                                         Match[AnyStr]]:
    [...]
    

    Because of the issue I mentioned in the comments of Generic decorators don't work with generic functions #1317, _compile_pattern ends up with a non-generic type that involves the type variable AnyStr, and therefore so does the local variable match. But filter is also a generic function with an AnyStr type variable, so (because the type variable ids are both -1) somewhere in the course of type checking filter the type variable of match gets expanded to the value of the type variable of filter, which isn't necessarily correct but unsurprisingly is in this case in practice. I expect this to be fixed by fixing Generic decorators don't work with generic functions #1317.

@rwbarton rwbarton modified the milestones: 0.4.x, 0.4 May 1, 2016
@rwbarton
Copy link
Contributor

rwbarton commented May 1, 2016

One of the unit tests (testOverrideGenericMethodInNonGenericClass, excerpted below) raised an interesting point:

class A:
    def f(self, x: T, y: S) -> None: pass
class C(A):
    def f(self, x: T, y: T) -> None: pass # E: Argument 2 of "f" incompatible with supertype "A"

When my branch didn't report an error on the last line, I thought at first it was a bug, but on reflection I decided that it was right and the test is wrong. After all, any time a.f(x, y) is a valid call for a an instance of A, it's also a valid call for a an instance of C because we can just pick T = object.

A correct test would be

class A:
    def f(self, x: List[T], y: List[S]) -> None: pass
class C(A):
    def f(self, x: List[T], y: List[T]) -> None: pass # Fail

Generating good error messages in this kind of situation is tricky, because each individual argument is compatible with the argument type of the supertype's method; they just aren't compatible taken together.

rwbarton added a commit to rwbarton/mypy that referenced this issue May 3, 2016
This is needed for subtyping of generic function with type variables
with values to work correctly, which will be needed for python#1261.
gvanrossum pushed a commit that referenced this issue May 3, 2016
This is needed for subtyping of generic function with type variables
with values to work correctly, which will be needed for #1261.
@rwbarton
Copy link
Contributor

On the three preliminary issues from two comments above: The first is fixed by #1469. The third is more complicated than I originally thought, and I filed #1515 for it. I plan to just add type: ignore comments as necessary to the stdlib sample file in which it comes up unless #1515 happens to be fixed already by then, since it currently only works by accident anyways. The second issue should be fairly straightforward and that's what I intend to look at next.

@gvanrossum
Copy link
Member

gvanrossum commented May 10, 2016 via email

@JukkaL
Copy link
Collaborator

JukkaL commented May 10, 2016

I've occasionally went through stdlib-samples but not consistently. Having a warning for an unneeded # type: ignore comment would be a boon here. I think that sometimes I've undone some refactorings that previous mypy versions needed but that are no longer required, but again it hasn't been consistent.

@The-Compiler
Copy link
Contributor

There's #1345 for that

rwbarton added a commit to rwbarton/mypy that referenced this issue May 11, 2016
This is a prerequisite for fixing python#1261 because overrides with generic
function type variables currently work by accident: the type variables
have the same ids in the original method and the overriding method, so
is_subtype on the argument type returns True.

This also allows an overriding method to generalize a specific type in
the original method to a type variable, as demonstrated in one of the
tests (testOverrideGenericMethodInNonGenericClassGeneralize).

is_subtype already determines correctly whether the override is valid,
so all the hard work here is in providing a more specific error
message when possible.
JukkaL pushed a commit that referenced this issue May 13, 2016
This is a prerequisite for fixing #1261 because overrides with generic
function type variables currently work by accident: the type variables
have the same ids in the original method and the overriding method, so
is_subtype on the argument type returns True.

This also allows an overriding method to generalize a specific type in
the original method to a type variable, as demonstrated in one of the
tests (testOverrideGenericMethodInNonGenericClassGeneralize).

is_subtype already determines correctly whether the override is valid,
so all the hard work here is in providing a more specific error
message when possible.
rwbarton added a commit to rwbarton/mypy that referenced this issue May 25, 2016
This will be needed for python#1261, which will make type variable ids
unpredictable.
gvanrossum pushed a commit that referenced this issue May 28, 2016
This is a prerequisite for fixing #1261 because overrides with generic
function type variables currently work by accident: the type variables
have the same ids in the original method and the overriding method, so
is_subtype on the argument type returns True.

This also allows an overriding method to generalize a specific type in
the original method to a type variable, as demonstrated in one of the
tests (testOverrideGenericMethodInNonGenericClassGeneralize).

is_subtype already determines correctly whether the override is valid,
so all the hard work here is in providing a more specific error
message when possible.
rwbarton added a commit to rwbarton/mypy that referenced this issue Jun 14, 2016
This will be needed for python#1261, which will make type variable ids
unpredictable.
rwbarton added a commit that referenced this issue Jun 15, 2016
This commit series gives type variable ids their own type and replaces the class/function type variable distinction with a plain/metavariable distinction. The main goal is to fix #603, but it's also progress towards #1261 and other bugs involving type variable inference.

Metavariables (or unification variables) are variables introduced during type inference to represent the types that will be substituted for generic class or function type parameters. They only exist during type inference and should never escape into the inferred type of identifiers.

Fixes #603.
@rwbarton rwbarton removed their assignment Jul 1, 2016
@rwbarton
Copy link
Contributor

rwbarton commented Jul 1, 2016

I partially did this in #1580: that commit generates fresh IDs for unification variables during type inference, which live in their own type variable ID "namespace". Type variables of generic classes and functions are still allocated using the same old scheme (positive for classes, negative for functions). This fixed the known issues that are a direct result of type variable ID confusion.

I still think allocating unique IDs for all type variables is probably a good idea that may be helpful for some type inference engine improvements in the future. However, there's actually a fair amount of work involved in doing this correctly and so far there hasn't been a concrete reason to do it, so I've put this off.

The major points that would need to be addressed when implementing this are:

  • The tests. Type variable IDs are sometimes displayed in error messages, and certainly printed by the typexport tests. If we assign a unique ID to every type variable of every function, the IDs will necessarily become sensitive to things like the exact contents of stubs, the order of type checking, etc. That means that the test output checker needs to be robust to changes in the IDs. On the other hand, the IDs also contain information to the extent of whether or not two IDs that appear in output are equal, so the test output checker should not entirely ignore IDs, but instead make sure that two IDs are equal in the actual output exactly when the corresponding IDs are equal in the expected output. (Furthermore, a large number of test cases will need to have their outputs adjusted since the old IDs were very much not unique.)

  • Interaction with incremental checking. Type variable IDs are written to incremental data files, for example in the types of generic functions and their arguments. The incremental data file format does not have a clear notion of scoping for these variables (for example a type variable may appear in the type of a function before it appears in list of type variables of that function). But it's reasonable to assume that the scope of the variables is the entire file.

    When reading in the incremental data files, we need to make sure that type variables from different files end up with different IDs internally, even if they have the same physical ID numbers in the two files (we cannot ensure that that will not happen), while mapping a given file's IDs to new IDs consistently. This is awkward to do from the deserialize method, which has no access to any state. Instead we'd have to deserialize into a temporary "raw"/"unbound" type variable ID that gets converted to the final, fresh ID during fixup.

@rwbarton rwbarton modified the milestones: 0.5, 0.4.x Jul 1, 2016
@gvanrossum
Copy link
Member

(Adding the Incremental label to flag this as of potential interest -- however I don't think there's an incremental action item unless we decide to proceed with the full fix here.)

@gvanrossum
Copy link
Member

Removing the Incremental label, I don't think it's relevant.

@gvanrossum gvanrossum removed this from the 0.5 milestone Mar 29, 2017
@JelleZijlstra
Copy link
Member

Greg's original post just says that this causes "a number of bugs", but there are no specifics and I don't recall any from the five years since this issue was last touched. Closing as obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug mypy got something wrong priority-1-normal
Projects
None yet
Development

No branches or pull requests

6 participants