-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Switch to pytest from our homegrown test runner #1673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yep -- I think we all agree that the current test runner is pretty frustrating. Last week at the PyCon sprints where we had a bunch of new contributors getting involved, I found myself apologizing every time I explained how to filter tests (with the positional argument and with We were actually just discussing this also on #1668, and so far everyone agrees that the right answer will be to move to something like pytest. If you want to try putting together a PR, that would be very useful! Pytest is (mostly for good reasons) kind of a complicated beast with a lot of choices in how to set it up, so I recommend discussing early and often, either on this thread or on a work-in-progress PR. |
I also hate hate hate the builtins test fixtures. +1 to killing them with fire. They only exist for perf reasons, and I'm not at all sure they're worth it. |
A big problem here seems to be that (for reasons having to do with mypy's
early history as a separate language, probably) the non-data-driven tests
use a set of classes that is custom to mypy's "myunit" test runner (which
is wrapped by runtests.py). I don't know enough about pytest to understand
whether we'd have to redo all that infrastructure. I also don't know how
easy it is to get pytest to understand our data-driven tests. It would be
pretty frustrating if pytest treated each file of test data as a single
test.
|
Yeah, one thing we'll definitely want is for pytest to see the individual On Tue, Jun 7, 2016 at 3:29 PM, Guido van Rossum notifications@github.com
|
I'll try to see if I can tackle this over the weekend! |
Cool, switching to a standard test runner would have many potential benefits. (Note that this has been discussed before, see #380. That issue also explains why things are as they are.) The last time the test runner was rewritten we had a bunch of issues and regressions and let's try to avoid them this time. Here are some things to consider:
Test fixtures are mostly a separate issue. I'll file a new issue. |
FWIW the fast way to run a single test case is currently to use the myunit incantation given in README.md, ignoring runtests.py. It would be much nicer if pytest could handle all of these. |
Yeah, "change one thing at a time" is definitely good advice for this kind of migration. I'd recommend taking that farther, even, and try to find a way to migrate one section of things at a time. For example, one good strategy could be to take just one of the larger "tasks" in
You'd want to preserve the ability to run just one or a few related unit tests at a time (perhaps by having A PR that does that for one driver without making a ton of changes in total would be a really solid step forward. There'll probably be discussion on the details (made possible by having the code up in a PR), and then once it's merged it should be pretty smooth sailing to do the same for most of the other drivers. Once they're all converted, |
…over to it (#1944) This is a step toward #1673 (switching entirely to pytest from myunit and runtests.py), using some of the ideas developed in @kirbyfan64's PR #1723. Both `py.test` with no arguments and `py.test mypy/test/testcheck.py` work just as you'd hope, while `./runtests.py` continues to run all the tests. The output is very similar to the myunit output. It doesn't spew voluminous stack traces or other verbose data, and it continues using `assert_string_arrays_equal` to produce nicely-formatted comparisons when e.g. type-checker error messages differ. On error it even includes the filename and line number for the test case itself, which I've long wanted, and I think pytest's separator lines and coloration make the output slightly easier to read when there are multiple failures. The `-i` option from myunit is straightforwardly ported over as `--update-data`, giving it a longer name because it feels like the kind of heavyweight and uncommon operation that deserves such a name. It'd be equally straightforward to port over `-u`, but in the presence of source control I think `--update-data` does the job on its own. One small annoyance is that if there's a failure early in a test run, pytest doesn't print the detailed report on that failure until the whole run is over. This has often annoyed me in using pytest on other projects; useful workarounds include passing `-x` to make it stop at the first failure, `-k` to filter the set of tests to be run, or (especially with our tests where errors often go through `assert_string_arrays_equal`) `-s` to let stdout and stderr pass through immediately. For interactive use I think it'd nearly always be preferable to do what myunit does by immediately printing the detailed information, so I may come back to this later to try to get pytest to do that. We don't yet take advantage of `xdist` to parallelize within a `py.test` run (though `xdist` works for me out of the box in initial cursory testing.) For now we just stick with the `runtests.py` parallelization, so we set up a separate `py.test` command for each test module.
With #1944 merged, the migration is begun! Some next steps, not necessarily in a particular order:
The big prizes here are the simplification from cutting things out at the end; and the speedups from getting effective parallelism, especially on |
Oh, also:
|
Uhhh...I think I kind of screwed this one up... So sorry! Glad @gnprice was able to actually, y'know, finish it! |
@kirbyfan64 Nothing to apologize for! Thanks for your work in #1723 -- I drew on that for #1944 and it's now landed. Want to take on some of the further steps of the migration? ;-) In particular I think you mentioned in a comment on #1723 that you'd already done some testing of xdist, and gotten it to work after a bugfix upstream. Setting things up so a plain |
The biggest issue with test speed now is not the runner anymore but excessive subprocessing and disk thrashing. If we switch to py.test but still have to subprocess for every myunit invocation and for every mypy test or flake8 run, it's not going to win us much, especially with so many As far as I can tell, the myunit Are there plans to make myunit use the API to not have to fire a fresh mypy subprocess for every check? |
This change is a step towards removing `runtests.py` (see #1673). The exclusion list in the flake8 configuration in `setup.cfg` has been updated to enable running the linter from the root of the project by simply invoking `flake8`. This enables it to leverage its own file discovery and its own multiprocessing queue without excessive subprocessing for linting every file. This gives a minor speed up in local test runs. Before: total time in lint: 130.914682 After: total time in lint: 20.379915 There's an additional speedup on Travis because linting is now only performed on Python 3.6. More importantly, this means flake8 is now running over all files unless explicitly excluded in `setup.cfg`. This will help avoiding unintentional omissions in the future (see comments on #2637). Note: running `flake8` as a single lazy subprocess in `runtests.py` doesn't sacrifice any parallelism because the linter has its own process pool. Minimal whitespace changes were required to `mypy_extensions.py` but in return flake8 will check it now exactly like it checks the rest of the `mypy/*` codebase. Those are also done on #2637 but that hasn't landed yet. Finally, flake8-bugbear and flake8-pyi were added to test requirements to make the linter configuration consistent with typeshed. I hope the additional checks will speed up future pull requests by automating bigger parts of the code review. The pyi plugin enables forward reference support when linting .pyi files. That means it's now possible to run `flake8` inside the typeshed submodule or on arbitrary .pyi files during development (which your editor could do for you), for example on fixtures. See discussion on #2629 on checks that are disabled and why.
The way I see it, moving the rest of the tests (i.e. the non data-driven tests) to pytest requires using @pytest.fixture
def fx_contra() -> TypeFixture:
return TypeFixture(CONTRAVARIANT) And a test, with test cases such as def test_is_proper_subtype_contravariance(fx_contra) -> None:
assert_true(is_proper_subtype(fx_contra.gsab, fx_contra.gb)) Type checking the consistency of pytest's fixtures will require nontrivial mypy plugin, at least. Is it OK to leave it unchecked, and open a separate issue for that? |
Pytest can also run tests defined using the stdlib |
#3880 completed the migration of the data-driven tests to pytest. |
Great! Thanks for working through the various test modules. What should be the criteria for closing this issue? The retirement or runtests.py? Why would we need that script still if we can run all tests through pytest? |
runtests.py also type checks various things, verifies that modules can be imported, and it runs lint. These would need to be migrated to pytest as well, in addition to the remaining non-data-driven myunit tests. A reasonable next milestone would be the removal of myunit. |
After #4369, I think we should replace some of the simpler This would entail e.g. replacing |
Myunit is finally gone. Thanks @elazarg! |
We are not quite there still, the current status is:
@elazarg As I understand you are working on this, right? |
This is now finally fixed with #5274 🎉 |
Ok, so when I saw the PR a while back that "parallelized" the test runner, I was super hyped. But the hype wore off when I realized that the unit tests were still run sequentially.
In particular, it continuously bugs me how, when running the full unit test suite, there's absolutely no sort of progress indicator of any kind. To make things worse,
runtests.py
is largely undocumented, leaving me to wonder why stuff like./runtests.py unit-test -a '*something*'
works but not./runtests.py -a '*something*'
.In addition, to an extent,
-a
is slightly...useless. I mean, because the naming convention for tests is so inconsistent, it's rarely ever useful.Also, test fixtures are a cruel mess. It took me 30 minutes of debugging to figure out why the tests were giving out obscure
KeyError
s that I couldn't reproduce in my own code. Turns out, the fixture I was using didn't definelist
. But defininglist
caused other tests to brutally fail, so then I just created a new fixture. IMO this whole thing should either be reworked or (better yet) completely gutted and thrown into a fire pit. A very hot fire pit. With sulfur.So I see two options:
runtests.py
, and making the actual unit tests run in parallel, preferably also with a progress bar of some sort.Thoughts? I could try and help out here a bit if you guys come to a consensus of some sort.
The text was updated successfully, but these errors were encountered: