-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use pytest and friends in tox #451
Conversation
Thanks very much - I'll take a closer look at this over the weekend! Regarding the python-pkcs11 module, I had no trouble installing it here on Windows with either Python 3.6.8 or Python 3.9.10. This is on a Windows 11 system. I'm using "pip" to do the install on the system, though, and not trying to load the dependencies as part of actually running the test. Is that where you were seeing the error you mentioned? I have been able to reproduce the hang you mentioned on Windows on 3.9.10 but not on 3.6.8, which is consistent with what you reported here. One thing I noticed is that it may be related to the call to "rmtree" called when cleaning up the temp directory created for a test. However, I haven't had the time to investigate this further. Since I can still run the tests on Python 3.6, I've been living with that for now. Regarding running tests in parallel, I have that working here with "tox -p" (and prior to that existing I used "detox"), but I'll definitely check out what you've proposed here and see how it compares. The GitHub action support looks nice! I've got web hooks to trigger tests in AppVeyor (Windows) right now, but my previous setup for testing macOS and Linux on TravisCI stopped working a while back when they switched to a credit-based system. Even though they claimed open source projects could still get free credits, they never properly fixed my account when I agreed to switch over to their new setup. So, I've just been testing those locally. This looks like it might be a nice alternative. |
The error looks like this: https://github.com/hexchain/asyncssh/runs/5135867987?check_suite_focus=true
"tox -p" is toxenv-level parallelization, which is handy if you have multiple Python versions installed and would like to test on them all at once. "pytest-xdist" works on test-case-level, it runs the whole testsuite across a set of workers. Given that some tests take significantly longer time than the others, xdist is helpful to prevent slow tests blocking overall progress. (With pytest, the longest N tests can be easily determined using The proposed GH action already runs in a similar fashion like
Glad you like it :-) |
Where do you get the VM resources from for the GH action? Is that something GitHub provides? Regarding "tox", I do have all currently supported versions of Python (3.6-3.10) installed here, so running those in parallel definitely helps. I can come pretty close to maxing out my 8-core iMac with that. I can see the xdist form of parallelism helping here as well, though, when some tests block for a period of time on something like a network timeout. Do you know of any way to combine the two? Looking at the distutils code, it appears the error you're seeing may only be an issue on win32 when NOT using the MSVC compiler: if sys.platform == "win32":
from distutils._msvccompiler import MSVCCompiler
if not isinstance(self.compiler, MSVCCompiler):
template = "python%d%d"
if self.debug:
template = template + '_d'
pythonlib = (template %
(sys.hexversion >> 24, (sys.hexversion >> 16) & 0xff))
# don't extend ext.libraries, it may be shared with other
# extensions, it is a reference to the original list
return ext.libraries + [pythonlib] It would be pretty easy to work around the error by just changing the last line to: return ext.libraries + (pythonlib,) You may run into other errors after that, but you might want to give it a try and see how far you get. That said, iIt looks like a similar issue would also crop up on Android, and that this module in general seems to be assuming that ext.libraries is a list, not a tuple, so that's probably not really the right fix. The real question is how that got set to be a tuple in the first place. There are places where _distutils sets ext.libraries based on arguments passed in from an external module, so a caller might be passing the wrong type here but it's a pretty deep stack trace to work through. |
Yes, every job gets a 2-core VM.
It should be possible. Tox has the Regarding the python-pkcs11 issue, the root cause seems to be its setup.py according to this comment. To make it run I would probably have to figure out why Python on GH actions does not use the MSVC compiler (which is actually installed). EDIT: It seems setuptools correctly finds and uses MSVC, it is just because python-pkcs11 uses tuple for the |
I got a chance to take a closer look at this today. For the most part, I have it working in my setup here, but I'm running into a problem where it is collecting tests from some local files in a "saved" and a "tmp" directory, at the same level as the "tests" directory. Is there a way to limit it to only pick up things in "tests"? With "unittest", that seemed to be the default. I'm also seeing files show up under "saved" and "tmp" in the coverage output, but I'm hoping if I remove them from the test collection process, that will also prevent them from showing up in the coverage output. Also, in my setup here, I was running the "coverage" clean, combine, and report steps manually. The new setup is much nicer, but I'd like to have it run the "html" version of the coverage report, either instead of or in addition to the text-based report. Can this be combined into the existing "testing:report" section, or does it need to be a new section? On another note, I saw the proposed changes in your PR on python-pkcs11. Looks promising -- thanks for digging into that! |
I think I have figured out the first issue I mentioned. Adding the following to tox.ini seems to work and the tests now run cleanly for me here.
The total runtime is a bit better than unittest as well. I still need to experiment with how many instances of tox vs. xdist, but leaving those at the defaults had everything finish here in about 6m17s, vs. 7m51s for my previous setup. |
It is possible to also generate the HTML report. I've pushed an update with your |
Thanks - that seems to work nicely! It appears that the using pytest to invoke the tests seems to have changed the coverage I get, though. It was previous at 100% for all modules and now a couple of modules are reporting less than that. It seems related to pause/resume_reading functionality. I'll need to look more closely at that. |
Thanks for your work on this! Another question: In this new setup, what's the equivalent of "python -m unittest <test_targets>"? I tried "tox -e py310 -- <test_targets>", but that didn't seem to work. I was using a name like "tests.test_forward._TestTCPForwarding.test_forward_local_port_pause" as the test name. I also tried this without the "tests" prefix. Also, when running individual tests like this, I'm thinking it might be best to bypass the coverage-related functions. Any way to make this conditional? |
I think I figured out the single-test thing. It's rather non-Pythonic. The syntax for test name appears to be "::::". For instance:
This seems to work, as does running just a single class or a single module if you don't specify all three pieces. I'm guessing maybe this has something to do with tox supporting multiple languages, but I would have thought that the interpretation of the test name would have been handled by pytest, not tox. |
Actually, after reading more closely, I see the choice of the "::" syntax is in fact a pytest thing, not a tox thing. |
Arguments after As for test selection, a simpler (but not as accurate) way is:
|
Thanks. Regarding coverage, I found that I can manually disable coverage when running a subset of the tests by adding "--no-cov" to the command line (after the "--"). It'd be nice if there was some way to automatically enable coverage only when running the full test suite, though. Also, one other change I'd like to revert here is the exclusion of the "tests" directory in .coveragerc. I actually find it valuable to see the coverage on the test modules. When that's not at 100%, that indicates there's something wrong with that test code. |
If you have pytest installed globally for a specific Python installation, you can directly use e.g. |
Yes - thanks. Now that I know the proper test name syntax to use for pytest, that seems to work nicely. I'm still working on figuring out the coverage issue, but I've managed to track it down to being related to xdist. When I run my pause test case with either pytest or tox, it works as expected. However, running the exact same test where the only change is to add in "-n auto" causes that test to no longer trigger a pause on the channel. It's interesting that this happens even when running only a single test, where there really shouldn't be any opportunity for parallelism in scheduling the tests. I'm not sure what I'm going to do about this yet. I guess I'll need to see if there are other ways to trigger a pause which are not affected by xdist, but so far I haven't had much luck. One other thing I noticed that that some of the tests are being skipped when I run with "-n auto" but there are no skipped tests when running without "-n auto". I think this might be independent of the coverage issue I'm seeing, but that's something else I will need to investigate before switching over. |
I'm on Linux and unfortunately cannot seem to reproduce: |
Here's what I get from
So, it looks like all the skipped tests are either a result of
try:
# pylint: disable=unused-import
if sys.platform == 'win32': # pragma: no cover
from .gss_win32 import GSSBase, GSSClient, GSSServer, GSSError
else:
from .gss_unix import GSSBase, GSSClient, GSSServer, GSSError
gss_available = True
except ImportError: # pragma: no cover
gss_available = False The
try:
import pkcs11
from pkcs11 import Attribute, KeyType, Mechanism, ObjectClass
from pkcs11 import PrivateKey, Token
from pkcs11.util.rsa import encode_rsa_public_key
from pkcs11.util.ec import encode_ec_public_key
pkcs11_available = True
except (ImportError, ModuleNotFoundError): # pragma: no cover
pkcs11_available = False I also tried this on Linux and got the exact same 30 tests skipped. I'm now seeing these skipped tests both with and without the "-n auto" present in tox.ini, though, which is different from what I saw this morning where I saw the skipped test with "-n auto" but not without it. This is very strange. |
I think the problem is that the toxenv Actually, I think we can just get rid of the Here is an example [tox]
minversion = 2.0
envlist = clean,{py36,py37,py38,py39,py310},report
skip_missing_interpreters = True
[testenv]
deps =
aiofiles
bcrypt>=3.1.3
fido2>=0.9.2
gssapi>=1.2.0 ; sys.platform != 'win32'
libnacl>=1.4.2
pyOpenSSL>=17.0.0
pytest
pytest-cov
pytest-xdist
python-pkcs11>=0.7.0 ; sys.platform != 'win32'
pywin32>=227 ; sys.platform == 'win32'
setuptools>=18.5
uvloop>=0.9.1 ; sys.platform != 'win32'
usedevelop = True
commands =
{envpython} -m pytest --cov --cov-append -n auto {posargs}
depends =
{py36,py37,py38,py39,py310}: clean
report: {py36,py37,py38,py39,py310}
[testenv:clean]
deps = coverage
skip_install = true
commands = coverage erase
[testenv:report]
deps = coverage
skip_install = true
commands =
coverage report --show-missing
coverage html
[pytest]
testpaths = tests |
This change seem to solve the skipped tests issue on macOS and Linux, but there was a failure on Windows that I'm trying to see if I can reproduce (see the Appveyor report here). It looks like it may be hanging. So far, running "pytest" manually on Windows works with Python 3.8, but not 3.9 (it hangs like we saw before, even with unittest). When I run "tox" locally on Windows, I actually see build failures and a much earlier hang even with Python 3.8, but I haven't had the time to dig into the details yet. Also, nothing has changed here with the coverage issue I mentioned. I'll still need to see if I can solve that. |
The hanging test was not run on Appveyor before, because Regarding coverage, I'm currently seeing the following missing lines on Linux (Python 3.10, OpenSSL 1.1):
Part of the problem is because some In the meantime, I've made some changes, which I hope improves the coverage situation. It now also automatically kills any test that takes more than 5 minutes, in order to figure out the hanging test case. |
The hanging test on Windows seems to be |
Thanks for looking into this! I have had problems with the GSS-related tests on Windows before, and I think I ended up not installing the gssapi module on Windows as a workaround. It looks like your modified tox.ini already has a conditional checking if Regarding coverage, here's what I see missing on macOS right now:
These all have to do with exercising the pause/resume reading & writing flow-control calls. When I last checked Linux, I wasn't seeing anything missing in connection.py or sftp.py. The connection one you saw seems related to GSS testing, and seems to have to do with whether you are running an older version of Python which doesn't support nanosecond timing. The issue with test_public_key.py has to do with whether OpenSSL 3 is installed or not. As for util.py, that's another case related to having older versions of Python installed to cover both branches, as you mentioned. |
GSS on Windows depends on I'll try to get a macOS to look into the coverage issue. |
Ah, right! I remembered win32 had its own gss implementation (sspi.py), but I had forgotten how that translated in terms of external dependencies. On my system here, it looks like sspi.py is installed but it's not finding win32security for some reason, and that's causing the test to be skipped, preventing me from seeing the hang. It's kind of odd, though, as the pywin32 package is installed. Even when I uninstall and reinstall pywin32, I'm still seeing win32security reported as being missing. This was on a python 3.8 distribution. I also have python 3.9 installed on this same system, and it can import sspi and win32security without a problem, allowing GSS to work when I test it manually but showing the hang in test_auth. So, at least that's consistent. My guess is there's something wrong in my GSS stub code when running against the Windows version that I just hadn't noticed due to the "broken" win32security issue. |
I think I've fixed the issue with testing GSS auth on Windows. Could you rebase this PR against the latest version of "develop" and see if the Appveyor test now passes? |
Ok - the GSS auth issue seems to be worked out, but in my local Windows testing I'm running into a new issue in test_x11, which seems to only happen when running with xdist. I'm seeing failures in various X11 tests when "-n auto" is used, but these issues don't show up with xdist disabled (-n 0). I think it may have to do with the tests attempting to open X11 listeners on a common set of ports, so the tests are conflicting with one another when you try to parallelize them: one test's client may be connecting to another test's server. This module may be one where we need to disable xdist-style parallel testing. Interestingly, the same failures aren't showing up on macOS, and I also haven't seen them in Linux yet. I'm not quite sure why.
From an earlier run:
|
I found a better fix for the GSSAPI issue, and confirmed that both real-world connections and the auth unit tests work properly on all of Linux, macOS, and Windows. This is now available in commit 2f733d5. I was also able to get back to 100% code coverage by disabling "pytest-xdist" (and removing "-n auto"). While that adds about 1.5 minutes to the testing time, it seems MUCH more reliable, at least with the unit tests I have right now, and I can still run with "tox -p" to nearly max out my CPU when running tests against multiple Python versions. I'm still seeing issues running the tests on Python 3.8 and later on Windows, related to problems with the rmtree() function in the standard library, but that's been a problem for a while now so I'll work on that separately. One other issue I'm seeing is that pathnames for Windows in the coverage file are different than macOS/Linux, so I end up getting errors like "No source for code: '/Users/ronf/src/asyncssh/Z:\src\asyncssh\tests\test_auth.py'." It looks like this can be fixed with a [path] section in .coveragerc, but I'm surprised to be seeing this given the "coverage erase" that should be running at the beginning of every tox invocation. Somehow, though, the Windows coverage data is not being cleared by "coverage erase", but it is being picked up at the end by "coverage report". I'm not sure why. As a workaround, I've done the following for now: @@ -9,5 +9,6 @@ exclude_lines =
partial_branches =
pragma: no branch
for .*
-omit =
- .tox/*
+include =
+ asyncssh/*
+ tests/* I've also made some small tweaks to my local tox.ini, which currently looks like:
The main differences are adding minimum version numbers to all the packages and switching the 'platform' definition back to using the name 'macos' instead of 'darwin', but otherwise I think it matches one of the versions you posted. Since removing "xdist" addresses the coverage issues I saw along with some other intermittent errors, i'm tempted to go with this version for now. We can continue to evolve it, but that'll at least be a step in the right direction. What do you think? Are there other changes missing here that you'd recommend? I really appreciate the work you've done here -- pytest seems extremely nice, and the integration with coverage is much cleaner than what I was doing before. Thanks so much! |
That sounds good, I've been experimenting with different options and also couldn't seem to make xdist consistent. Maybe there are implicit states shared between tests that cause problems when executed in parallel. The The two extra commits fix some test errors I've been seeing locally. |
More info on the .coveragerc issue: The problem turned out to be that one of the files in pycache was written on Windows. This was the same directory used on macOS (network mounted), and macOS was picking up pathnames from pycache files written by Windows, apparently. Erasing coverage didn't affect pycache files, so the wrong path just kept coming back for those modules even after a "coverage erase". I see that you may some changes to set the coverage file based on env, but I'm not sure that'll address the pycache issue I saw. Regarding xdist, there is some sharing of state at the class level between tests, as the "server" side of the testing is generally only started once per class, and not once per test instance. If xdist is trying to run multiple tests in the same class in parallel, that could explain some of the problems we're seeing. Regarding "darwin" as a platform, would it make sense to also list "win32" as a platform rather than "windows", so it's more of a one-to-one mapping? If we did that, would we even need the "platform" definition in [testenv]? I'll take a look at the other two patches you submitted shortly. |
It is possible to make xdist do load-balancing according to scope (class) or file (with Regarding platform names, yes, it is possible to use Regarding coverage, how did you invoke |
It looks like coverage tries to support the combined Windows and Linux use case (or other use cases where paths different between platforms that coverage files are collected from) using the "sources" config option under [path], but I'm not quite sure of the right syntax in this case where you get a weird mix of absolute and relative paths. I see things like '/Users/ronf/src/asyncssh/Z:\src\asyncssh\tests\test_connection_auth.py' showing up in the "No source for code" error. I saw this when invoking 'tox' with no arguments where "erase" was running. However, it turned out there was another subdirectory ('tmp') at the same level as 'asyncssh' and 'tests' that I had compiled some of the code from on Windows, and its coverage file was being picked up by 'report'. Once I removed that file, the error went away and 'erase' worked as intended. |
Regarding xdist, there's some set-up code which runs at file scope, like disabling ssh-agent tests completely in test_agent.py when an agent isn't started but everything else should be class or instance level. There are some other challenges, though, like there are a limited number of ports available for the X11 testing to use, so too much parallelism there could potentially cause testing to fail. |
Using the coverage data produced on GH actions (https://github.com/hexchain/asyncssh/actions/runs/1868345615) I did some test. The following configuration works:
It correctly combines coverage data from all three OSes. Without |
Have you tried enabling relative paths and using that for the macOS & Linux cases, but using an absolute path for the Windows case, possibly with a wildcard to only specify the tail of the path? (like "D:*\asyncssh")? I'd want to add matches for "tests" in there as well, though, and one of the things I haven't figured out is how to group multiple source lists together, such that the alternative versions of the "asyncssh" matches would map to the first "asyncssh" entry but subsequent matches for "tests" would map to the first "tests" entry. |
I haven't tried that, sounds like it would need two
The option names under |
Thanks very much! This seems like a good point to pull in the changes you've made so far, so I've gone ahead and done so. I'm happy to continue working with you on any other improvements you'd like to make. I'll also look into the '%d' change we discussed, and I'm also looking into fixing the problems getting the tests to run properly on Python 3.8 and later on Windows. Also, thanks for the info on [paths] -- I'll experiment a bit more with that. |
Commit 6c7d646 is now available with the changes to return a more specific error when either a UID or home directory token is used on a system where those values are unavailable. |
This PR makes tests run by tox use PyTest. PyTest is compatible with unittest-style tests, offers better output, and has a great ecosystem with many plugins.
Some comments:
clean
andreport
envs is documented at https://pytest-cov.readthedocs.io/en/latest/tox.html.platform.system()
.skip_missing_interpreters
option belongs to[tox]
, otherwise it does not work.deps
is updated with all optional deps.pytest-cov
is for coverage collection and reporting.pytest-xdist
is for running tests in parallel. On my machine, the total time taken to run a single round of test is reduced from ~6 minutes to ~2 minutes.python-pkcs11
currently does not install on Windows, due to Error received when installing: 'TypeError: can only concatenate tuple (not "list") to tuple' pyauth/python-pkcs11#133. It is thus excluded for now.If desired I can also make a PR to run tests on GH actions. Here is an example, it should take ~20 minutes: https://github.com/hexchain/asyncssh/actions/runs/1814008759
Note that on Windows and Python 3.8+, a lot of tests hang indefinitely so these combinations are excluded.