You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Most mypy tests use minimal fake stubs instead of real builtins, and if you forget to use the right stubs they default to very minimal stubs that often cause weird test failures (often crashes) that are difficult to figure out, especially for new contributors.
The main benefits of using the fake stubs are performance and debuggability. The latter might not be obvious, but the main idea is that it's often significantly easier to debug type checker issues if a test case doesn't do a lot of work and you only have a small number of things in the symbol table.
I'm not convinced that switching to full stubs everywhere is a good idea, due to the above issues. Even with better caching, the tests run via runtests.py unit-test would likely run several times slower than they currently do if we'd use full stubs everywhere, and I'd rather see the tests run faster than slower.
Here is an idea for how to improve the situation without too many compromises:
Combine the existing fixtures into a single fake stub that is good enough for all or almost all test cases that use fake stubs. This would still be much smaller than the full stub, so performance and debuggability would be reasonable.
We could also add some additional features to the stubs to make them easier to use.
Use incremental mode to avoid reprocessing the stubs on every test case.
We'll probably want some test cases that don't use incremental mode to test non-incremental type checking as well, but it's probably enough to only have a small subset of tests like this.
Make the fake stubs more discoverable. For example, print a note with the path of the fake stub file on test failure to make it easy to see where stubs come from.
Implement a heuristic that will detect if a test fails due to a built-in name being undefined in the fake stubs and prints out a useful message. This doesn't need to be very fancy. Maybe find all class and function names defined in the full builtins stubs using regexps and if see if one of those names is reported as undefined.
If we'd make test cases that use full stubs less expensive, it would be easier to justify writing more test cases that use full stubs, even if most test cases still used fake stubs. Thus contributors who don't like the fake stubs could opt out of using them, or we could even direct new contributors to writing full stub test cases by default. Core team members could periodically migrate some full stub test cases to use fake stubs if the test suite becomes too slow.
The text was updated successfully, but these errors were encountered:
One example of how the current system is tricky to use: if you add a test case that looks like this, in say check-classes.test:
[case testWrapperClassMethod]
F = TypeVar('F', Callable[Any, Any])
[builtins fixtures/classmethod.py]
then runtests.py testcheck will crash with an exception, apparently because builtins.tuple can't be found, because it isn't in that fixture. That's a real example from the first few weeks I was working in the mypy codebase (I'm cleaning out old branches in my local clone) -- I initially had some more complex test case to test some unrelated work I was doing, then was baffled at the crash I got. I spent a while debugging and reducing the test case to this, eventually discovering these fake stubs and understanding what was going on.
One thing that would help a lot: in all the fake stubs (or the one single fake stub), include everything that mypy internally assumes has to exist, like builtins.tuple here, so that at a minimum it won't crash. Doesn't necessarily need all the methods the real thing has, just what the type-checker assumes it can count on finding.
Most mypy tests use minimal fake stubs instead of real builtins, and if you forget to use the right stubs they default to very minimal stubs that often cause weird test failures (often crashes) that are difficult to figure out, especially for new contributors.
The main benefits of using the fake stubs are performance and debuggability. The latter might not be obvious, but the main idea is that it's often significantly easier to debug type checker issues if a test case doesn't do a lot of work and you only have a small number of things in the symbol table.
I'm not convinced that switching to full stubs everywhere is a good idea, due to the above issues. Even with better caching, the tests run via
runtests.py unit-test
would likely run several times slower than they currently do if we'd use full stubs everywhere, and I'd rather see the tests run faster than slower.Here is an idea for how to improve the situation without too many compromises:
The text was updated successfully, but these errors were encountered: