-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better testing of stubs #754
Comments
IIUC this is similar to comparing the stub to what stubgen (mypy's stub generator) can easily discover, right? So maybe that would be an implementation strategy? In general I worry that this would still be incredibly imprecise (since stubgen doesn't discover types). I also worry that the amount of test code per module, just to specify exceptions/improvements, could easily be larger than the size of the stub for the module. Which would give it a poor scaling behavior. |
We might be able to reuse parts of stubgen. This would have some nice benefits over just using stubgen:
This would certainly be imprecise, but I believe that it would still be useful, similar how the existing typeshed tests are useful and prevent an interesting set of errors. We'd have to experiment to see whether the number of exceptions required would make this impractical. |
pytype has an option to test a
It won't do everything on your list, but it will find:
|
Cool! I wonder if we could use that in the CI scripts? |
Another idea would be to check for things present only in a Python 2 or 3 stub, but not both, and giving a warning if the same thing is present at runtime in both versions. |
I'm working on using typeshed stubs in PyCharm and I must admit I have very little confidence in many stub files. I'm especially worried about their incompleteness since it will result in false warnings about missing methods. As a part of the idea to test typeshed stubs better I propose adding Python files that use the API from stubs along with the stubs into the The DefinitelyTyped repo for TypeScript uses this approach for testing their stubs. |
It seems odd to hand-craft a new |
@matthiaskramm A In addition, having common test data that requires an analyzer to actually resolve references to symbols and check types makes our type checkers more compatible and compliant to PEP 484. |
I'm with Matthias. I agree that we need to have some way to verify that
stubs match the implementation, but I don't think that hand-crafted test
files are an improvement. Many APIs are very rich. Writing a test file that
exercises the full API would be way more work than writing the stubs.
I propose to invent some way of cross-checking stubs with the original
source code by employing mypy's stub generator. We could run the stub
generator and somehow compare its output to the actual stub and note
discrepancies. We'd need some mechanism for indicating expected
discrepancies too.
|
When I see a typeshed stub, how can I be sure that it's correct to any extent? And since a stub overrides the whole contents of a If someone changes things in typeshed in an incompatible way, we at PyCharm will notice any regressions at least. It would be better to check not only for regressions, but for incompatibilities between type checkers as well. This is one of the main reasons I'm proposing to make static tests for stubs a part of typeshed. Most fixes to the stubs come from real examples of false errors. It's not enough to just fix a stub and forget about it. We have to run a type checker manually in order to make sure the problem is fixed. And still there may be incompatibilities between the results of type checker and the other ones. Since we already have this code example that contains the problem, why don't we add it to automated tests so there will be no regression in the future? Static tests for type checkers could co-exist with checks by introspection. We don't have to pick just one option. Meanwhile I'll be sending my PRs on top of the master branch without any tests. |
I think that hand-crafted Here are some examples where I think that they would be useful:
Mypy already has a small number of tests like this (https://github.com/python/mypy/blob/master/test-data/unit/pythoneval.test) and they've been occasionally useful. |
The idea of making contributions easier to review is a good point. I've already mentioned other points above, I think they are all valid. |
Here are some more ideas for automated checks that might be pretty easy to implement, at least for an interesting subset of packages:
|
Sent a PR that suggests both static and run-time checking (via pytest) #917. |
Hello everybody! I've been working on mypy's stubtest.py with the goal of adding tests to typeshed. My WIP branch is visible here: https://github.com/hauntsaninja/mypy/tree/stubtest2/scripts This version of Here's a list of errors when running against latest master of typeshed: https://gist.github.com/hauntsaninja/1ecb078ccbd293112b67fb7236727f5d In general, it seems to work well. The last 25 PRs visible at https://github.com/python/typeshed/pulls?utf8=✓&q=is%3Apr+author%3Ahauntsaninja were all identified by One thing to call out is that I think this sort of testing is really necessary for dealing with differences between Python versions. I recently spent some time working on trying to complete py38 support in typeshed and it took a lot of manual effort to trawl through docs for each module. This would also help prevent regressions for older Python versions. If this seems useful, I'm curious what suggestions people have to get this to the point that we could use it in typeshed CI. The script supports outputting and using whitelists (it also notes any unused entries of whitelists), which, as Jukka suggested, we could gradually burn down. Let me know if a complete list of its capabilities is useful. Please let me know what you think! I'd also really appreciate code review, particularly from folks familiar with mypy internals. It's pretty much a complete rewrite, so a lot of scope for mistakes. In the short term, I'm going to be working on the following:
I'm also curious if people have opinions about where this should live? I'm partial to the point of view discussed in #996 – that if it continued to live in mypy it could perhaps be hard to coordinate changes to unblock typeshed CI. I could also move it to its own repo? |
@hauntsaninja thanks, this is great! It would be really helpful if we can run this script in typeshed CI in the future. Putting it either in the mypy or the typeshed repo could lead to issues with coordinating changes, but because the script relies heavily on mypy internals, I think it makes more sense to put it in the mypy repo. That way, when somebody makes a change to mypy that breaks the guts of the script, they can just fix stubtest in the same commit. On the other hand, if a typeshed change exposes a false positive in stubtest, it should be possible to just update a whitelist. Perhaps it would also be useful to have stubtest support a magic comment like |
This is indeed fantastic (as evidenced by your relentless onslaught of PRs). I think the long-term goal should be to do the tests here in typeshed, as this is where potentially problematic changes happen and fixes can be applied. But if stubtest is really tied to mypy internals at the moment, I agree that the mypy repo seems more appropriate for the time being. That said, another long-term goal for both the typeshed and mypy projects should be to untangle the build processes. Ideally, changes to one repo should not cause failures in the other one. I think this is a precondition for moving the stubtest tests here. |
Okay, I've gotten some of my short-term work list done, so I've opened up a draft PR against mypy at python/mypy#8325. Assuming it finds a home there, I'd like to propose the following path to stubtest in typeshed CI and look at some scenarios. We add a test to typeshed CI that uses a pinned version of stubtest to run stubtest for each python version from py35 onwards on stdlib stubs with a python version specific whitelist that's checked into typeshed.
Some other notes:
Edit: |
I feel stubtest does a good enough job here that we can close this issue. stubtest provides decent overall coverage, and makes stub evolution easy, as I can attest to with Python 3.8 and 3.9 and several fixed regressions against older Pythons. It's not perfect, for instance, stubtest can't do much about return types, but it should service most of the discussion in this post. I think our approach to type checking code to test stubs should therefore more be along the lines of #1339. In particular, it'd be nice to see something like Rust's crater or Black's primer. Doing so could also help reduce our dependence on contributors with the ability to test large internal codebases. |
Reviewing stubs for correctness is hard, and the existing stubs have many bugs (that are gradually being fixed). We could perhaps improve both of these by having better tests for stubs. Here's a concrete idea for that (inspired by a proposal I remember having seen at Gitter):
int
,str
,bool
and other simple types).None
default value have typeOptional[x]
.pip install
the package before running the tests. We can have a config file that specifies the versions of 3rd party modules to use to make the tests repeatable.We could blacklist the modules that currently don't pass the test and gradually burn down the blacklist.
This wouldn't perfect and several things couldn't be checked automatically:
Optional[...]
and maybe a few other things.However, this could still be pretty valuable, and this shouldn't be too hard to implement. We could start with very rudimentary checks and gradually improve the testing tool as we encounter new bugs in the stubs that we could automatically guard against.
The text was updated successfully, but these errors were encountered: