-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ONNX] Enable GPU in ONNX importer tests #7438
Conversation
Definitely a needed improvement, we'll see what it uncovers. |
oops indeed there are some problems with gpu test |
[WIP] Fixes for GPU ONNX tests
…ged. Matt or myself will fix later but we don't have time right now.
@@ -31,6 +31,9 @@ find . -type f -path "*.pyc" | xargs rm -f | |||
# Rebuild cython | |||
make cython3 | |||
|
|||
# Only run GPU enabled tests on GPU | |||
export PYTEST_ADDOPTS="-m gpu $PYTEST_ADDOPTS" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want this? I think this is skipping a lot of tests that were previously running.
It seems there are many tests that don't use @tvm.testing.uses_gpu
.
tvm/tests/python/frontend/pytorch/test_object_detection.py
Lines 93 to 94 in 96b0981
def test_detection_models(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should use targets
option in verify_with_ort_with_inputs
for now
targets=None, |
See for example
tvm/tests/python/frontend/onnx/test_forward.py
Lines 3421 to 3423 in 3beec22
verify_with_ort_with_inputs( | |
model, [indata], targets=["llvm"], dtype="int64", use_vm=True, opset=9 | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, all the tests are running on GPU, which causes the argmin / argmax onnx test to fail. The ONNX test file uses @tvm.testing.uses_gpu to indicate whether we should run tests on GPU or not, but right now those decorators don't actually do anything, which is definitely not good. I think we should try to move towards enabling the @tvm.testing.uses_gpu in the long run.
What if I set PYTEST_ADDOPTS before the onnx tests, and reset it after? Since the file uses the @tvm.testing.uses_gpu a lot, I think we should either remove the decorators or let them actually do something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, I think that hardcoding targets is bad because it's not easy to see -- that's what caused none of the ONNX tests to be run on GPU in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, surprised to hear that use_gpu
doesn't do anything. Let's make it
PYTEST_ADDOPTS="-m gpu $PYTEST_ADDOPTS" run_pytest cython python-frontend-onnx tests/python/frontend/onnx
This should only modify env vars when running the onnx test.
Also, can you go through TODO(mbrookhart)
in the onnx frontend test and add use_gpus
? There are about 10-15 of them.
# TODO(mbrookhart): enable once VM supports heterogenous execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doh! I thought I had removed all of those comments, but I must have only done it for the dynamic test files and missed the onnx file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can open a separate PR to remove these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already did it!
@electriclilies Can you take a look at the CI issue? |
@masahi I forgot to disable the dynamic version of the batch matmul test on cuda, it should be fixed now. |
@electriclilies Please try kick again, the flaky issue should be fixed now |
@masahi It's green! Can you merge? :) |
Thanks @electriclilies @mbrookhart this is merged! |
* remove hardcoded target and ctx * fix c-codgen for floating point mod * MDisable onnx gpu test for argmin / argmax so we can get this fix merged. Matt or myself will fix later but we don't have time right now. * lint * fix black * Add flag to task_python_frontend.sh to only run GPU enabled tests on GPU * black again * Enable GPU for test_nonzero * Respond to comments * Don't test batch matmul on CUDA * Turn cuda off for dynamic batch matmul test * Fix task script * Flaky test * another flaky test Co-authored-by: mbrookhart <mbrookhart@octoml.ai>
* remove hardcoded target and ctx * fix c-codgen for floating point mod * MDisable onnx gpu test for argmin / argmax so we can get this fix merged. Matt or myself will fix later but we don't have time right now. * lint * fix black * Add flag to task_python_frontend.sh to only run GPU enabled tests on GPU * black again * Enable GPU for test_nonzero * Respond to comments * Don't test batch matmul on CUDA * Turn cuda off for dynamic batch matmul test * Fix task script * Flaky test * another flaky test Co-authored-by: mbrookhart <mbrookhart@octoml.ai>
* remove hardcoded target and ctx * fix c-codgen for floating point mod * MDisable onnx gpu test for argmin / argmax so we can get this fix merged. Matt or myself will fix later but we don't have time right now. * lint * fix black * Add flag to task_python_frontend.sh to only run GPU enabled tests on GPU * black again * Enable GPU for test_nonzero * Respond to comments * Don't test batch matmul on CUDA * Turn cuda off for dynamic batch matmul test * Fix task script * Flaky test * another flaky test Co-authored-by: mbrookhart <mbrookhart@octoml.ai>
* remove hardcoded target and ctx * fix c-codgen for floating point mod * MDisable onnx gpu test for argmin / argmax so we can get this fix merged. Matt or myself will fix later but we don't have time right now. * lint * fix black * Add flag to task_python_frontend.sh to only run GPU enabled tests on GPU * black again * Enable GPU for test_nonzero * Respond to comments * Don't test batch matmul on CUDA * Turn cuda off for dynamic batch matmul test * Fix task script * Flaky test * another flaky test Co-authored-by: mbrookhart <mbrookhart@octoml.ai>
In the onnx test, we were setting target and ctx to llvm and cpu every time we ran the TVM version of the op, regardless of the target and ctx actually passed in. The result is that the accuracy of TVM GPU outputs were never checked against ONNX.
The purpose of this PR is to see which of the tests in tests/python/frontend/onnx/test_forward.py break after enabling GPU.