Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TESTS] Refactor tests to run on either the GPU or CPU #6331

Merged
merged 1 commit into from
Sep 2, 2020

Conversation

tkonolige
Copy link
Contributor

Much of the time spent in testing is duplicated work between CPU and GPU test nodes. The main reason is that there is no way to control which TVM devices are enabled at runtime, so tests that use LLVM will run on both GPU and CPU nodes.

This patch adds an environment variable, TVM_TEST_DEVICES, which controls which TVM devices should be used by tests. Devices not in TVM_TEST_DEVICES can still be used, so tests must be careful to check that the desired device is enabled with tvm.testing.device_enabled or by enumerating all devices with tvm.testing.enabled_devices. All tests have been retrofitted with these checks.

This patch also provides the decorator @tvm.testing.gpu to mark a test as possibly using the gpu. Tests that require the gpu can use @tvm.testing.requires_gpu. Tests without these flags will not be run on GPU nodes.

@tkonolige tkonolige force-pushed the test_markers branch 3 times, most recently from e64a748 to 324066a Compare August 24, 2020 19:09
python/tvm/testing.py Show resolved Hide resolved
python/tvm/testing.py Outdated Show resolved Hide resolved
python/tvm/_ffi/runtime_ctypes.py Outdated Show resolved Hide resolved
python/tvm/testing.py Outdated Show resolved Hide resolved
tests/python/topi/python/test_topi_conv2d_nhwc_winograd.py Outdated Show resolved Hide resolved
python/tvm/testing.py Outdated Show resolved Hide resolved
tests/scripts/setup-pytest-env.sh Outdated Show resolved Hide resolved
Jenkinsfile Outdated Show resolved Hide resolved
@tqchen
Copy link
Member

tqchen commented Aug 24, 2020

@tqchen
Copy link
Member

tqchen commented Aug 24, 2020

It might also be helpful to send a quick RFC given that the change of test infra will affect quite a lot of people

@tkonolige
Copy link
Contributor Author

I agree that an RFC could be useful, but maybe I could just add information to the docs instead?

Copy link
Member

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nitpicks

tests/python/contrib/test_random.py Show resolved Hide resolved
tests/python/relay/dyn/test_dynamic_op_level3.py Outdated Show resolved Hide resolved
Copy link
Contributor

@kparzysz-quic kparzysz-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep in mind that eventually the mapping target_name -> device will no longer be sufficient. There can be several devices with different characteristics that may all fall under the same general target (e.g. gpu), and that any individual test should be able to specify required device properties.
The use of strings to pass targets around may be phased out at some point, please document somewhere the necessary updates to this scheme when that happens.

python/tvm/testing.py Outdated Show resolved Hide resolved
python/tvm/testing.py Outdated Show resolved Hide resolved
@tkonolige
Copy link
Contributor Author

@kparzysz-quic I agree that using strings for targets here is pretty awkward. Is there a better way to check which device a target requires? I see that #6315 makes targets more structured, but I don't see a way to take a target string and get the associated device.

@kparzysz-quic
Copy link
Contributor

Currently there is no good way to switch away from str here, but the question is how much work would it take to allow, for example, two different 'cuda' devices to be available. I haven't analyzed the code well enough to see, but if you could outline somewhere what changes would be necessary, it would help the future transition.

@tkonolige
Copy link
Contributor Author

@tqchen I think I've addressed all issues. Do I need to do something to have Jenkins use the new Jenkinsfile?

@tqchen
Copy link
Member

tqchen commented Aug 26, 2020

@tkonolige because uses the old Jenkinsfile (before it get merged). Please try to send another PR to add gpuonly_test that redirects to the normal integration test first, then we can tweak the scripts once that PR get merged

@tkonolige
Copy link
Contributor Author

@tqchen I'm getting a couple errors with CUDA initialization failing. I'm really sure of the cause, but it seems like it might have to do with forking.

@tkonolige tkonolige force-pushed the test_markers branch 10 times, most recently from a5abf4f to f18f71b Compare September 1, 2020 17:47
@tkonolige tkonolige force-pushed the test_markers branch 2 times, most recently from ab0b4bc to bd404c6 Compare September 2, 2020 00:01
Much of the time spent in testing is duplicated work between CPU and GPU
test nodes. The main reason is that there is no way to control which
TVM devices are enabled at runtime, so tests that use LLVM will run on
both GPU and CPU nodes.

This patch adds an environment variable, TVM_TEST_DEVICES, which
controls which TVM devices should be used by tests. Devices not in
TVM_TEST_DEVICES can still be used, so tests must be careful to check
that the desired device is enabled with `tvm.testing.device_enabled` or
by enumerating all devices with `tvm.testing.enabled_devices`. All
tests have been retrofitted with these checks.

This patch also provides the decorator `@tvm.testing.gpu` to mark a test
as possibly using the gpu. Tests that require the gpu can use
`@tvm.testing.requires_gpu`. Tests without these flags will not be run
on GPU nodes.
@tkonolige
Copy link
Contributor Author

@tqchen All tests are passing now... can we merge?

@tqchen tqchen merged commit b235e59 into apache:master Sep 2, 2020
@tqchen
Copy link
Member

tqchen commented Sep 2, 2020

Thanks @tkonolige @zhiics @kparzysz-quic !

kevinthesun pushed a commit to kevinthesun/tvm that referenced this pull request Sep 17, 2020
Much of the time spent in testing is duplicated work between CPU and GPU
test nodes. The main reason is that there is no way to control which
TVM devices are enabled at runtime, so tests that use LLVM will run on
both GPU and CPU nodes.

This patch adds an environment variable, TVM_TEST_DEVICES, which
controls which TVM devices should be used by tests. Devices not in
TVM_TEST_DEVICES can still be used, so tests must be careful to check
that the desired device is enabled with `tvm.testing.device_enabled` or
by enumerating all devices with `tvm.testing.enabled_devices`. All
tests have been retrofitted with these checks.

This patch also provides the decorator `@tvm.testing.gpu` to mark a test
as possibly using the gpu. Tests that require the gpu can use
`@tvm.testing.requires_gpu`. Tests without these flags will not be run
on GPU nodes.
kevinthesun pushed a commit to kevinthesun/tvm that referenced this pull request Sep 18, 2020
Much of the time spent in testing is duplicated work between CPU and GPU
test nodes. The main reason is that there is no way to control which
TVM devices are enabled at runtime, so tests that use LLVM will run on
both GPU and CPU nodes.

This patch adds an environment variable, TVM_TEST_DEVICES, which
controls which TVM devices should be used by tests. Devices not in
TVM_TEST_DEVICES can still be used, so tests must be careful to check
that the desired device is enabled with `tvm.testing.device_enabled` or
by enumerating all devices with `tvm.testing.enabled_devices`. All
tests have been retrofitted with these checks.

This patch also provides the decorator `@tvm.testing.gpu` to mark a test
as possibly using the gpu. Tests that require the gpu can use
`@tvm.testing.requires_gpu`. Tests without these flags will not be run
on GPU nodes.
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request Sep 18, 2020
Much of the time spent in testing is duplicated work between CPU and GPU
test nodes. The main reason is that there is no way to control which
TVM devices are enabled at runtime, so tests that use LLVM will run on
both GPU and CPU nodes.

This patch adds an environment variable, TVM_TEST_DEVICES, which
controls which TVM devices should be used by tests. Devices not in
TVM_TEST_DEVICES can still be used, so tests must be careful to check
that the desired device is enabled with `tvm.testing.device_enabled` or
by enumerating all devices with `tvm.testing.enabled_devices`. All
tests have been retrofitted with these checks.

This patch also provides the decorator `@tvm.testing.gpu` to mark a test
as possibly using the gpu. Tests that require the gpu can use
`@tvm.testing.requires_gpu`. Tests without these flags will not be run
on GPU nodes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants