-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect and support module aliasing via assignment. #3435
Conversation
In semantic analysis, if we find a simple assignment statement where the rvalue is a module, make the lvalue a direct alias to that same module. Fixes python#1778.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea. Maybe you can implement some items from your TODO in process_module_assignmanets
?
Sure, if this looks like the right approach, I'll flesh it out to support additional cases. |
@ilevkivskyi I took care of some of the TODOs. Propagating module aliases in non-global assignments is running into some strange issues that look like they might require deeper refactoring (see #3440). Maybe this can be merged without solving that problem yet? It's at least strictly an improvement over the status quo, and will cover the most common cases. |
There may be some question about error handling here. In general I aimed for just silently not propagating module references in any case we don't understand, but this code will fail with an internal mypy exception if there are iterables of mismatched lengths between rvalue/lvalues. Such code would fail at runtime (and should fail later in type checking) anyway. Let me know if I should add some more defensive handling of this case. |
mypy should never fail with an internal error. Also you don't need to care about all possible nested scenarios (I think unpacking modules is very rare). Just consider few simple scenarios. The code simplicity and robustness is much more important here. |
Never mind, I mis-remembered the behavior of I agree that nested unpacking of modules is probably quite rare (though one level of tuple unpacking assignment with modules is something I have seen in compatibility-shim code). However, I don't think the code here would be significantly simplified by removing support for that. Let me know if you think otherwise. |
After #3450 is merged, the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is almost ready. Just few suggestions.
mypy/semanal.py
Outdated
litemlists = [v.items for v in cast(List[Union[TupleExpr, ListExpr]], lvals)] | ||
ritems = cast(Union[TupleExpr, ListExpr], rval).items | ||
for rv, *lvs in zip(ritems, *litemlists): | ||
self._process_module_assignment(lvs, rv, ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks a bit unclear and probably unnecessary. We need to support only x, y = mod1, mod2
(with equal length of l.h.s and r.h.s.) and x = y = mod
but not nested cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you be more specific what you find unclear about this code so I can improve the clarity? When I rewrite this code to not support nesting, it doesn't make it any simpler or clearer. The recursive approach is still the simplest even for a single layer of iterable (otherwise you have to do something else weird to support both iterable and single rval), and once you have recursion, not supporting nesting is more complex than supporting it.
I agree that nested unpacked module assignment is likely very rare, but it seems strange to me to reduce compatibility with true Python semantics in order to achieve (AFAICS) a negligible or nonexistent gain in clarity/simplicity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to agree that with all the comprehensions, casts, repetitions of (TupleExpr, ListExpr) and cleverness with zip and *x, I have no confidence that I understand what this code does. And then I'm not even touching on recursion. So the code appears unmaintainable, and I think it needs to be improved (maybe some comments would help).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok! Pushed an update that breaks down the code a bit and adds comments to clarify the intent. Let me know if that helps.
[file n.py] | ||
b = 'bar' | ||
|
||
[builtins fixtures/module.pyi] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to see few more tests. For example, a test that shows this failing:
import m
x, y = m # must be an error here (Module object is not iterable or similar)
and in general more failures, like failure on attempted access/assignment to a non-existing module attribute.
Also probably some chained assignments:
import m
x = m
y = x
reveal_type(y.a) # 'builtins.str'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Very good point, will add these tests.
* master: (23 commits) Make return type of open() more precise (python#3477) Add test cases that delete a file during incremental checking (python#3461) Parse each format-string component separately (python#3390) Don't warn about returning Any if it is a proper subtype of the return type (python#3473) Add __setattr__ support (python#3451) Remove bundled lib-typing (python#3337) Move version of extensions to post-release (python#3348) Fix None slice bounds with strict-optional (python#3445) Allow NewType subclassing NewType. (python#3465) Add console scripts (python#3074) Fix 'variance' label. Change label for variance section to just 'variance' (python#3429) Better error message for invalid package names passed to mypy (python#3447) Fix last character cut in html-report if file does not end with newline (python#3466) Print pytest output as it happens (python#3463) Add mypy roadmap (python#3460) Add flag to avoid interpreting arguments with a default of None as Optional (python#3248) Add type checking plugin support for functions (python#3299) Mismatch of inferred type and return type note (python#3428) Sync typeshed (python#3449) ...
* master: Support accessing modules imported in class bodies within methods. (python#3450) Make Type[X] compatible with metaclass of X (python#3346) Sync typeshed (python#3479) Handle flags in pytest passthrough (python#3467)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope this will be the last round of review, just few minor comments.
mypy/semanal.py
Outdated
def process_module_assignment(self, s: AssignmentStmt) -> None: | ||
"""Check if s assigns a module an alias name; if so, update symbol table.""" | ||
# TODO support module alias assignment not in global scope | ||
if not self.is_module_scope(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that #3450 is merged, you can remove this, or make it less restrictive.
# y = n = b | ||
# and then we recursively call this method for each of those assignments. | ||
# If the rval and all lvals are not all of the same length, zip will just ignore | ||
# extra elements, so no error will be raised here; mypy will later complain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somehow not convincing. Could you please add a test that
import m
x, y = m, m, m
is flagged as an error? (Plus also some length mismatches in nested scenarios.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yesterday I added a test for the x, y, z = m, n
case; will also push a test for x, y = m, m, m
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... and also for length mismatches in nested scenarios.
mypy/semanal.py
Outdated
# seq_rval = (a, b) | ||
# seq_lvals = [(x, y), (m, n)] | ||
# We now zip this into: | ||
# elementwise_assignments = [(a, x, m), (b, y, n)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe put a
and b
on the right here and in the actual zip
? (It will be more logical and will match the order of arguments in _process_module_assignment
.)
@carljm Also please resolve the merge conflict. |
Enabling tracking module assignment in class scopes causes one new typing error in stdlib-samples, where the test is checking to see that the class attribute module is not one particular module that misses Also pushed more extensive tests for unpacking mismatches, and the lval/rval reordering you suggested. I think I've addressed all comments! |
@ilevkivskyi Putting the rval on the right end of the elementwise assignment tuples relies on handling of star-args that is only available in Py35+. The workarounds seem to me to be harder to understand, so I've gone back to the previous ordering with rval in the left, hope that's OK. Let me know if you have an alternative suggestion. |
@ilevkivskyi Thanks for the reviews here! Anything else you'd like to see changed? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for a delay. Here two small comments.
mypy/semanal.py
Outdated
@@ -2356,6 +2357,64 @@ def is_classvar(self, typ: Type) -> bool: | |||
def fail_invalid_classvar(self, context: Context) -> None: | |||
self.fail('ClassVar can only be used for assignments in class body', context) | |||
|
|||
def process_module_assignment(self, s: AssignmentStmt) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need a function that just unconditionally calls another function? Maybe one is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, good point. This was to present a cleaner API (pass in only the AssignmentStmt
), but given there's only one call site and will never be more, that's not very useful. Consolidated.
@@ -23,7 +23,7 @@ def safe_rmdir(dirname: str) -> None: | |||
|
|||
class GenericTest(unittest.TestCase): | |||
# The path module to be tested | |||
pathmodule = genericpath # type: Any | |||
pathmodule = genericpath |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should keep the Any
annotation here and not add # type: ignore
below. This makes me think about another important aspect, if there is an explicit type annotation in a module assignment, then it should not create a module alias, but just a normal variable with that type. This will be consistent with how type aliases behave (It would be great to document this behaviour, but not necessarily in this PR):
import m
import types
class A: ...
class B(A): ...
Alias = A # creates a type alias
Var: Type[A] = A # creates a normal variable
mod = m # creates a module alias
mod.a # OK
mod_var: types.ModuleType = m # creates a normal variable
mod_var.a # Fails, ModuleType has no attribute a
And don't forget to add a test for module assignment with an explicit type (both types.ModuleType
and Any
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so taking several things separately:
-
Why is it better to keep
# Any
annotation for the module instead of# type: ignore
in the one specific location mypy doesn't understand? It seems to me the latter provides a better test case for mypy, as it will validate all other attribute accesses on the module. If this were in my own code base, I would definitely prefer the# type: ignore
on one special-case line over an# Any
annotation that turns off all checking for that variable. -
Nonetheless, an explicit annotation of
Any
type should override the module-ref propagation; that was a bug in this PR, very good catch! Pushing fix and test. -
I don't understand why an explicit annotation of
types.ModuleType
should prevent propagation of the module-reference information. It doesn't seem consistent with how class references work. To extend the above example slightly:
from typing import Type
class A:
foo = 'bar'
var: Type[A] = A
reveal_type(var.foo)
This code type-checks fine, and reveals a type of builtins.str
. To me, this is perfectly parallel to:
import m
import types
mod: types.ModuleType = m
reveal_type(m.a)
I would definitely expect the latter code to succeed and reveal type of builtins.str
, not fail with Module has no attribute "a"
. Why does giving an explicit type of types.ModuleType
imply that I want mypy to play dumb about which module it is and what attributes it has?
I've pushed the behavior that makes sense to me for now; will be happy to update once I understand why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, after further thought I think I understand your point: "type alias" is to "var of type Type" as "module alias" is to "var of type types.ModuleType." I'm still not sure that this provides the most intuitive or useful behavior, given that type aliases and module references are used in different ways. I just tested and I can still pass a real module reference to a function that takes a parameter annotated as types.ModuleType
-- is there anything I can't do with a real module reference that I can do with an ordinary variable of types.ModuleType
? Put differently, given your preferred behavior, what real-world use case is there to annotate a variable as types.ModuleType
?
Anyway, changed my mind and pushing all changes as you suggest. Still would like to understand the reasoning, though.
* master: Improve test output when a test doesn't specify the correct fixture (python#3488) Speed up tests by simplifying the stub for typing (python#3486) Clean up test fixtures (python#3485) Add additional quick mode test cases (python#3480) Add --strict-optional on by default to roadmap (python#3478) Minor speed-up in warn-return-any (python#3481)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are replies to your questions an one more style comment.
@@ -23,7 +23,7 @@ def safe_rmdir(dirname: str) -> None: | |||
|
|||
class GenericTest(unittest.TestCase): | |||
# The path module to be tested | |||
pathmodule = genericpath # type: Any | |||
pathmodule = genericpath # type: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any
is the most precise currently existing type that reflects the semantics of this variable. If we had dependent types/literal types, then this should be something like Union[Literal[genericpath], Literal[posixpath]]
.
mypy/semanal.py
Outdated
lvals: List[Expression], | ||
rval: Expression, | ||
ctx: AssignmentStmt, | ||
) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically this style is not used in mypy. You can just wrap argument list like this:
def process_module_assignment(self, lvals: List[Expression], rval: Expression,
ctx: AssignmentStmt) -> None:
mypy/semanal.py
Outdated
# respect explicitly annotated type | ||
if ( | ||
isinstance(lval.node, Var) and | ||
lval.node.type is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably better to also directly check for if s.type is not None: ...
in visit_assignment_stmt
, and skip this whole function otherwise.
In general, here is why this is needed. The question is what should we do with things like this:
import mod1, mod2
if random() > 0.5:
x = mod1
else:
x = mod2
I would say this should either infer ModuleType
or be an error, since we can't statically determine what will be x
at runtime. I would prefer the latter since explicit is better than implicit, so that a user is forced to write:
x: ModuleType
if random() > 0.5:
x = mod1
else:
x = mod2
However, this could break some existing code, so inferring ModuleType
seems to be safer for now. Currently, your PR seems to make x
an alias to mod2
simply because it appears textually later. This could be fixed by checking for lvalue.is_def
.
* master: fix crash when serializing a type-ignored property setter (python#3493)
@ilevkivskyi I've implemented the behavior we discussed. The trickiest part is supporting
as a fallback to regular var of type This works and isn't too complex, but it does mean a new Let me know what you think. |
mypy/semanal.py
Outdated
for rv, *lvs in elementwise_assignments: | ||
self.process_module_assignment(lvs, rv, ctx) | ||
elif isinstance(rval, NameExpr): | ||
# import pdb; pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this comment probably. Sorry, I have to run now, will take a look this evening.
If the behavior you propose in #3494 is OK, then we can simplify this: remove |
I presume that if we want:
to infer plain Var of ModuleType for x, then we also want:
to also turn I just pushed code to support this join to ModuleType var. I'm not 100% confident of the correctness of the new code that creates the Var from scratch, and it seems to be partial duplication of logic that really belongs in Again, this is implementation complexity we would avoid if we can go with your proposal from #3494, in which case I think this module reassignment would also just be an error. |
It looks like this becomes too complicated. As I mentioned in #3494 we probably should go with an error if there is not explicit annotation. |
Ok, that makes sense to me; will definitely simplify things (and bring it closer to the current implementation for type aliases). I will push an update. |
…thout explicit annotation.
Pushed the simplification. Hopefully we are getting close :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this looks good! Just two style comments, I will merge when you address them and test pass.
mypy/semanal.py
Outdated
if lnode.kind == MODULE_REF and lnode.node is not rnode.node: | ||
self.fail( | ||
"Cannot assign multiple modules to name '{}' " | ||
"without explicit annotation to ModuleType".format(lval.name), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a native speaker, is it better to say without explicit 'types.ModuleType' annotation
?
mypy/semanal.py
Outdated
# respect explicitly annotated type | ||
if ( | ||
isinstance(lval.node, Var) and | ||
lval.node.type is not None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor style point: I think it is better to put it on one line here:
if isinstance(lval.node, Var) and lval.node.type is not None:
...
The maximum line width is 99 in mypy.
Done! |
In semantic analysis, if we find a simple assignment statement where the rvalue
is a module, make the lvalue a direct alias to that same module.
Fixes #1778.
This PR is very much a trial balloon to see whether I'm even on roughly the right path.
Thanks in advance for review and advice!