-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use globally insatlled installed packages for GPU tests #1011
Conversation
Documentation preview |
@@ -15,9 +15,9 @@ commands = | |||
deps = | |||
-rrequirements/test.txt | |||
setenv = | |||
CUDA_VISIBLE_DEVICES=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without setting CUDA_VISIBLE_DEVICES=0
, we get an error:
tensorflow.python.framework.errors_impl.InvalidArgumentError: Device ordinals must be set for all virtual devices or none. But the device_ordinal is specified for 1 while previous devices didn't have any set.
We didn't have to set this with TF 2.9. Is it specific to TF 2.10? Is there a better way to handle this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If would like to run these tests with only 1 GPU, we can also change the runner we use in the workflow config. Currently set to 2GPU
models/.github/workflows/gpu-ci.yml
Line 15 in e2276f4
runs-on: 2GPU |
_ = model.evaluate(valid) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed because this is unnecessary for these tests, and evaluate()
is already tested in one of the other unit tests.
For GPU tests, we should use globally installed packages as we don't install cudf, cupy, etc. in the test environment. By setting
sitepackages=true
in tox, tox will use the packages in the CI container if not available in the environment. We hadsitepackages=true
before but it got dropped at some point. This PR restoressitepackages=true
in the tox testpy38-gpu
.