Skip to content

[WIP] Fix crash when joining tuple and NamedTuple #3129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 21 commits into from

Conversation

pkch
Copy link
Contributor

@pkch pkch commented Apr 4, 2017

Fixes #3117

First commit intentionally will fail because I'm modifying builtins stub to make the test fail (to match what happens with production code).

Only after that, I'll push commit that actually fixes the failing test.

mypy/subtypes.py Outdated
if not t.args or not right.args:
return True
if len(t.args) != len(right.args):
return False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong place to make these checks. If the problem is with TupleType, then it should be fixed there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but isn't it wrong for visit_instances to return True when the two types have a different number of type parameters? Tuple[int] vs Tuple[int, str] is one example (which caused the crash), but in the future (or even now, though I can't think of any) there may be more cases where two types have different number of type parameters. Shouldn't we, generally speaking, refuse to accept X[A] in place of X[B, C] or vice versa, when A, B, C are not Any?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point (after mapping instance to supertype) they should always have same numbers of arguments, because t and right are now Instances of the same TypeInfo. You could put assert len(t.args) == len(right.args) == len(right.type.defn.type_vars) and see what happens. If the assert is triggered, then it means there is a bug somewhere else.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert fails 298 times on mypy mypy.

I checked just one example, it's basically something like x = {1: 'a'}. In this example, in order to verify that the dict constructor has the right argument, is_subtype is asked whether Tuple[builtins.int, builtins.str] is a subtype of Tuple[builtins.int*, builtins.str*] (what does the * on the RHS mean btw?).

That in turn requires verifying that the fallback of LHS (builtins.tuple[builtins.object]) is a subtype of the fallback of RHS (builtins.tuple). visit_instance is used to verify that.

map_instance_to_supertype returns right away here because both sides are builtins.tuple. Then the code we're discussing is reached, with t.args having length 1 ([builtins.object]) and right.args having length 0.

I thought that's not a problem, since 0 arguments is treated as Any; and I thought Any is also ok. But I thought in every other situation, it's not a bug, but merely the type args don't match (so I can return False).

But I may be wrong; are you saying that map_instance_to_supertype is supposed to do that check, so if I see mismatched type args after it returns, it's a bug?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the bug happens even before. At this stage all the type arguments should have correct counts. For example, if you look at third pass in typeanal.py, then you will find code that puts [AnyType()]*whatever_is_needed. But maybe mypy code itself does not always create Instances cleanly. For example, builtins.tuple is not something valid, it should be builtins.tuple[Any].

(what does the * on the RHS mean btw?)

* indicates this type is a result of type variable substitution. Typically this happens as a result of type inference.

but merely the type args don't match

We don't have variadic generic classes, so logically this should never happen.

I think, ideally, you could try to find where those strange Instances are created (I suppose there are only few such places) and fix them to return Instances with correct number of type variables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look, but AFAIU builtins.tuple[Any] isn't the same as as builtins.tuple; the latter has an arbitrary number of arguments, while the former must have exactly 1. Are you saying that mypy should never create Instance that corresponds to Tuple (without type arguments)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instances that cause problems represent fallback for various tuples. I tried to force all tuples to be created with len(fallback) == len(tuple.items()) by changing stuff like

# checker.py
return TupleType(type_parameters, self.named_type('builtins.tuple'))` 

into

return TupleType(
    type_parameters,
   self.named_generic_type('builtins.tuple', [AnyType()]*len(type_parameters))

But (a) I'm not sure it's even correct; and (b) star args need to have variadic fallback.

So I could not get rid of assertion failure.

I pushed the code I tried to a separate branch, if you like to take a look.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, buitlins.tuple should always have one type argument, it represents a fallback Instance for comparisons with other Instances. For example, is Tuple[int, float] a subtype of Sequence[float], etc. I have found this code:

fallback_item = join.join_type_list(items)
return TupleType(items, self.chk.named_generic_type('builtins.tuple', [fallback_item]))

This is what should ideally happen always.
Note that in typeanal.py types are not fully analysed in second pass, so that it is better to fix the tuple fallback in the third pass.

@pkch
Copy link
Contributor Author

pkch commented Apr 5, 2017

I made a table that helped me understand representation of tuple-related types:

BUILTIN_TUPLE = <Symbol Node corresponding to 'builtins.tuple'>
BUILTIN_INT = <Symbol Node corresponding to 'builtins.int'>

Type            Representation

# Fixed-length Tuple represented as TupleType
Tuple[int]      TupleType([BUILTINS_INT], fallback=Instance(BUILTIN_TUPLE, [AnyType()]))
Tuple[Any]      TupleType([AnyType()], fallback=Instance(BUILTIN_TUPLE, [AnyType()]))
Tuple[()]       TupleType([], fallback=Instance(BUILTIN_TUPLE, [AnyType()]))

# Variadic Tuple represented as Instance(BUILTINS_TUPLE) with 1 argument
Tuple[int, ...] Instance(BUILTINS_TUPLE, [int])

# Tuple with all type checking disabled can be written in 3 different ways in python,
# and is represented as Instance(BUILTINS_TUPLE) with 0 or 1 argument
Tuple[Any, ...] Instance(BUILTINS_TUPLE, [AnyType()])
Tuple           Instance(BUILTINS_TUPLE, [])
tuple           Instance(BUILTINS_TUPLE, [])

# Invalid types
tuple[int, ...] TupleType([BUILTINS_INT, AnyType()], fallback=Instance(BUILTIN_TUPLE, [AnyType()]))
tuple[int]      TupleType([BUILTINS_INT], fallback=Instance(BUILTIN_TUPLE, [AnyType()]))

The last two types are invalid (both mypy and runtime report an error when they are used), so the code that creates them can be deleted (tests pass with it deleted):

# in TypeAnalyser.visit_unbound_type
if len(t.args) > 0 and info.fullname() == 'builtins.tuple':
    return TupleType(self.anal_array(t.args),
                     Instance(info, [AnyType()], t.line),
                     t.line)

Also note that the types Tuple and Tuple[Any, ...], which seem to have the same meaning, are represented by different objects (the difference is in the fallback used). I'm not sure if it's intentional, but I can't think of any problem with this apart from inconsistency. If we choose to standardize, I'd prefer to do so on Instance(BUILTIN_TUPLE, [AnyType()]), since it's more explicit.

@ilevkivskyi
Copy link
Member

@pkch

The last two types are invalid (both mypy and runtime report an error when they are used), so the code that creates them can be deleted (tests pass with it deleted):

I will be glad if this code is removed.

Concerning tuple vs Tuple vs Tuple[Any, ...], PEP 484 clearly says they all should be equivalent. I think the internal representation for all three should be Instance(BUILTIN_TUPLE, [AnyType()]).

The representation of Tuple[int] should be TupleType([BUILTINS_INT], fallback=Instance(BUILTIN_TUPLE, [BUILTINS_INT])) (note that fallback is different from what you write). For tuples with length more than 1, the fallback should contain the join of item types, as I mentioned before.

Again this is all what should happen ideally, I will be happy if you could implement this in code.

Copy link
Member

@ilevkivskyi ilevkivskyi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should continue working on this. I believe this is important. Here are some comments.

@@ -91,7 +91,7 @@ from typing import Tuple
t1 = None # type: Tuple[A, A]
t2 = None # type: tuple

t1 = t2 # E: Incompatible types in assignment (expression has type Tuple[Any, ...], variable has type "Tuple[A, A]")
t1 = t2 # E: Incompatible types in assignment (expression has type tuple, variable has type "Tuple[A, A]")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe put tuple in quotes like this "tuple"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the test would fail then -- or did you mean to change the code the generates error messages?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or did you mean to change the code the generates error messages?

Sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a change, but for consistency, also changed a few other similar cases (which involve types other than tuple). lmk if that's ok.

@@ -833,6 +834,8 @@ def __init__(self, items: List[Type], fallback: Instance, line: int = -1,
column: int = -1, implicit: bool = False) -> None:
self.items = items
self.fallback = fallback
# TODO: assert not (isinstance(fallback, Instance) and fallback.type and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to see the TODO items resolved before this is merged.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't looked into this yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And again, not sure why assert fails. I'll commit fixes to the other problems first.

mypy/typeanal.py Outdated
return instance
if info.fullname() == 'builtins.tuple':
assert not t.args
t.args = [AnyType()]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change causing big diff really needed? Or you just wanted to save an indentation level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I just wanted to save the indent and a slight duplication of code. Should I revert?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I revert?

This would simplify reviewing (this PR is already non-trivial).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# Map left type to corresponding right instances.
t = map_instance_to_supertype(left, right.type)

# TODO: assert len(t.args) == len(right.args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is still a TODO? I thought the main idea is to fix this specific point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it seemed to pass in the real world, but fail in tests. Let me figure out what stubs it needs... Or maybe I'll just sneak in some no-fixture tests into the test suite while nobody's watching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or maybe I'll just sneak in some no-fixture tests into the test suite while nobody's watching.

:-) I like the idea of running test with real full stubs, but let us postpone this for some time.
It is better to update fixtures for now, to make them match real stubs more.

Actually, the fact that some stubs can crash here is a bit dangerous, maybe we are still missing something?

Copy link
Contributor Author

@pkch pkch May 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with Awaitable is fixed, the problem with NewType I'll wait for you to fix it in your PR as we discussed. There may be other problems that I haven't investigated yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm still not sure why the assert fails, I'll look into it later.

@ilevkivskyi ilevkivskyi self-assigned this May 31, 2017
@ilevkivskyi
Copy link
Member

@pkch What is the status of this PR? Could you please fix the merge conflicts?

@pkch
Copy link
Contributor Author

pkch commented Jun 5, 2017

@ilevkivskyi I think you suggested to pause this until you do a separate PR you discussed with Jukka that involved prohibiting named_type in favour of named_generic_type.

@ilevkivskyi
Copy link
Member

@pkch

I think you suggested to pause this until you do a separate PR you discussed with Jukka that involved prohibiting named_type in favour of named_generic_type.

That PR was merged few weeks ago, although named_type is still there, it just now automatically calls named_generic_type with a correct number of Anys IIRC.

I will give a brief review now.

@@ -7,6 +7,8 @@
FunctionLike, TypeVarDef
)

import mypy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this import here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

mypy/messages.py Outdated
@@ -253,14 +253,17 @@ def format_simple(self, typ: Type, verbosity: int = 0) -> str:
base_str = itype.type.fullname()
else:
base_str = itype.type.name()
if itype.args == []:
if itype.args == [] or len(itype.args) == 1 and type(itype.args[0]) == AnyType:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really need message formatting changes in this same PR?
Note that there is another open PR #3430 that is going to update Instances formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My other changes caused a large number of [Any] to be added to error messages. I don't think error messages are improved by that, and even if they are, it would require changing a lot of updates to test cases' expected output.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, yeah let me remove this code change, and update the error messages instead.

mypy/typeanal.py Outdated
t.line)
if info.fullname() == 'builtins.tuple':
assert not t.args
return Instance(info, self.anal_array([AnyType()]), t.line, t.column)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change looks a bit strange. Could you please add a comment explaining this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment; more details are in the last two paragraphs of the earlier comment in this PR.

mypy/typeanal.py Outdated
# if it's not builtins.tuple, then its bases should have tuple[Any]
# TODO: put assert here if it's not too slow
if isinstance(t.fallback, Instance) and t.fallback.type.fullname() == 'builtins.tuple':
fallback_item = UnionType.make_simplified_union(t.items)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is not a good idea to use unions at this stage, since not all types are analyzed (we might need a separate pass for this in a separate PR).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for now I just removed this code.

mypy/types.py Outdated
@@ -1490,7 +1496,7 @@ def visit_tuple_type(self, t: TupleType) -> str:
s = self.list_str(t.items)
if t.fallback and t.fallback.type:
fallback_name = t.fallback.type.fullname()
if fallback_name != 'builtins.tuple':
if fallback_name not in ('builtins.tuple', 'builtins.object'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, do you really need this formatting change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intended this change to preserve existing message format that otherwise would have changed; but I can't produce any specific examples where it's necessary, so I can get rid of this.

mypy/types.py Outdated
@@ -1741,7 +1747,9 @@ def set_typ_args(tp: Type, new_args: List[Type], line: int = -1, column: int = -
if isinstance(tp, Instance):
return Instance(tp.type, new_args, line, column)
if isinstance(tp, TupleType):
return tp.copy_modified(items=new_args)
fallback_args = [UnionType.make_simplified_union(new_args)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using unions is not safe here, since this function can be called during the second phase of semantic analysis. Probably just remove this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you didn't tell me, I would've never noticed that. Is there any way to protect our codebase from these types of mistakes (where a construct is used too early in the analysis process)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is an item in the roadmap about documenting the details of different passes.
(Also there is an item about re-working semantic analysis passes.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I removed this as well. How would we remember to add this logic and the one here (both of which I removed) to a new PR in the future?

@@ -22,7 +22,8 @@ class list(Iterable[T], Generic[T]):
def append(self, x: T) -> None: pass
def extend(self, x: Iterable[T]) -> None: pass

class tuple(Generic[T]): pass
_T_co = TypeVar('_T_co', covariant=True)
class tuple(Sequence[_T_co], Generic[_T_co]): pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you interested in fixing tuple in other fixtures? Maybe you can do this even in this PR and then turn on some asserts? (It is not necessary to inherit from Sequence everywhere, just make them generic in one variable)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I'll fix it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced all tuples in fixtures to use covariant type variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except in tuple-simple.pyi, where somehow it says "True is not defined."

@ilevkivskyi
Copy link
Member

@pkch
(Also it looks like you still have an unnecessary typeshed commit change in this PR.)

@pkch pkch force-pushed the namedtuple-join branch 4 times, most recently from 3bc53ca to c9dba66 Compare June 24, 2017 18:57
@pkch pkch force-pushed the namedtuple-join branch from ed22724 to a230dd9 Compare June 24, 2017 19:13
@pkch pkch force-pushed the namedtuple-join branch from d83faaf to a993ba6 Compare June 24, 2017 20:04
@@ -1332,8 +1332,8 @@ reveal_type(g)
with f('') as s:
reveal_type(s)
[out]
_program.py:13: error: Revealed type is 'def (x: builtins.int) -> contextlib.GeneratorContextManager[builtins.str*]'
_program.py:14: error: Revealed type is 'def (*x: builtins.str) -> contextlib.GeneratorContextManager[builtins.int*]'
_program.py:13: error: Revealed type is 'def (x: builtins.int) -> contextlib.ContextManager[builtins.str*]'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilevkivskyi this change in error message does not seem right; do you know if this is a problem I introduced with my PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nm, it disappeared... I swear, it happened, but I have no explanation as to how. Ghosts?

@pkch
Copy link
Contributor Author

pkch commented Jun 25, 2017

I'm looking into a strange failure in commit f292ea4, where Travis python 3.5 build failed (while all other Travis and Appveyor builds succeeded). The error is seemingly unrelated to this PR:

FAILURE  #7 run eval-test-G
Expected:
  fooo                                          (diff)
Actual:
  (empty)
Traceback (most recent call last):
  File "/home/travis/build/python/mypy/mypy/test/data.py", line 273, in run
    self.perform(self)
  File "/home/travis/build/python/mypy/mypy/test/testpythoneval.py", line 94, in test_python_evaluation
    testcase.file, testcase.line))
  File "/home/travis/build/python/mypy/mypy/test/helpers.py", line 86, in assert_string_arrays_equal
    raise AssertionFailure(msg)
AssertionFailure: Invalid output (/home/travis/build/python/mypy/test-data/unit/pythoneval.test, line 430)
test_testpythoneval_PythonEvaluationSuite.testGenericPatterns failed

The test which failed is:

[case testGenericPatterns]
from typing import Pattern
import re
p = None  # type: Pattern[str]
p = re.compile('foo*')
b = None  # type: Pattern[bytes]
b = re.compile(b'foo*')
print(p.match('fooo').group(0))
[out]
fooo

I pushed an empty commit to see if this is reproducible.

@pkch
Copy link
Contributor Author

pkch commented Jun 25, 2017

Ok this is not good. I introduced a Heisenbug with this PR. Commit f292ea4 fails the build, and the empty commit 52b0c0f following it passes. I'll look into it.

@pkch pkch changed the title Fix crash when joining tuple and NamedTuple [WIP] Fix crash when joining tuple and NamedTuple Jun 25, 2017
@gvanrossum
Copy link
Member

The "Heisenbug" might be a Travis-CI issue. So far I've only seen it on Python 3.4 runs, but maybe it's also occurring for others? #3543. I suspect there's a subprocess that fails and some parent isn't checking exit statuses.

@ilevkivskyi
Copy link
Member

@pkch

I replaced all tuples in fixtures to use covariant type variable.

Sorry, I was not very clear. The invariance in fixtures is not a big problem (if a problem at all). The problem is that dozens of fixtures use

class tuple: pass

and/or same same for list, dict, and set, i.e. they are not generic at all.

Except in tuple-simple.pyi, where somehow it says "True is not defined."

This is because the corresponding stub does not define class bool: pass

@ilevkivskyi
Copy link
Member

@pkch Are you still working on this? I would rather see this merged sooner than later (especially if you fix the fixtures in this PR).

@pkch
Copy link
Contributor Author

pkch commented Jul 18, 2017 via email

@pkch
Copy link
Contributor Author

pkch commented Jul 25, 2017

Remains to do: normalize tuple -> Tuple[Any] in ThirdPass, and add asserts about number of type args

@pkch
Copy link
Contributor Author

pkch commented Jul 26, 2017

This PR contained three components:

  1. fix the crash with namedtuple
  2. make tuples generic in stubs
  3. normalize tuple -> Tuple[Any]
  1. is already taken care of in master since this PR was opened.
  2. is taken care of in Make tuple generic in most stubs #3767 that I just opened
  3. ended up being too difficult to do without causing more problems. I can still try to do it, but the chance I'll succeed is growing smaller, and I don't want to delay 2).

For now, I'm going to close this PR but if I or anyone else can work on it again, we can of course reopen.

@pkch pkch closed this Jul 26, 2017
ilevkivskyi pushed a commit that referenced this pull request Jul 26, 2017
Make tuples generic in stubs to reduce incompatibility between stubs and production.
This is a carve-out from #3129.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NamedTuple joined with Tuple crashes mypy
3 participants