Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ONNX] Enable GPU in ONNX importer tests #7438

Merged
merged 17 commits into from
Mar 30, 2021

Conversation

electriclilies
Copy link
Contributor

In the onnx test, we were setting target and ctx to llvm and cpu every time we ran the TVM version of the op, regardless of the target and ctx actually passed in. The result is that the accuracy of TVM GPU outputs were never checked against ONNX.

The purpose of this PR is to see which of the tests in tests/python/frontend/onnx/test_forward.py break after enabling GPU.

@mbrookhart
Copy link
Contributor

Definitely a needed improvement, we'll see what it uncovers.

@masahi
Copy link
Member

masahi commented Feb 11, 2021

oops indeed there are some problems with gpu test

@@ -31,6 +31,9 @@ find . -type f -path "*.pyc" | xargs rm -f
# Rebuild cython
make cython3

# Only run GPU enabled tests on GPU
export PYTEST_ADDOPTS="-m gpu $PYTEST_ADDOPTS"
Copy link
Member

@masahi masahi Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure we want this? I think this is skipping a lot of tests that were previously running.

It seems there are many tests that don't use @tvm.testing.uses_gpu.

def test_detection_models():

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we should use targets option in verify_with_ort_with_inputs for now

See for example

verify_with_ort_with_inputs(
model, [indata], targets=["llvm"], dtype="int64", use_vm=True, opset=9
)
(we should test this function on GPU, btw)

Copy link
Contributor Author

@electriclilies electriclilies Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, all the tests are running on GPU, which causes the argmin / argmax onnx test to fail. The ONNX test file uses @tvm.testing.uses_gpu to indicate whether we should run tests on GPU or not, but right now those decorators don't actually do anything, which is definitely not good. I think we should try to move towards enabling the @tvm.testing.uses_gpu in the long run.

What if I set PYTEST_ADDOPTS before the onnx tests, and reset it after? Since the file uses the @tvm.testing.uses_gpu a lot, I think we should either remove the decorators or let them actually do something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think that hardcoding targets is bad because it's not easy to see -- that's what caused none of the ONNX tests to be run on GPU in the first place.

Copy link
Member

@masahi masahi Mar 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, surprised to hear that use_gpu doesn't do anything. Let's make it

PYTEST_ADDOPTS="-m gpu $PYTEST_ADDOPTS" run_pytest cython python-frontend-onnx tests/python/frontend/onnx

This should only modify env vars when running the onnx test.

Also, can you go through TODO(mbrookhart) in the onnx frontend test and add use_gpus? There are about 10-15 of them.

# TODO(mbrookhart): enable once VM supports heterogenous execution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh! I thought I had removed all of those comments, but I must have only done it for the dynamic test files and missed the onnx file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can open a separate PR to remove these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already did it!

@masahi
Copy link
Member

masahi commented Mar 22, 2021

@electriclilies Can you take a look at the CI issue?

@electriclilies
Copy link
Contributor Author

electriclilies commented Mar 22, 2021

@masahi I forgot to disable the dynamic version of the batch matmul test on cuda, it should be fixed now.

@masahi
Copy link
Member

masahi commented Mar 27, 2021

@electriclilies Please try kick again, the flaky issue should be fixed now

@electriclilies
Copy link
Contributor Author

@masahi It's green! Can you merge? :)

@masahi masahi merged commit 1c2555a into apache:main Mar 30, 2021
@masahi
Copy link
Member

masahi commented Mar 30, 2021

Thanks @electriclilies @mbrookhart this is merged!

@electriclilies electriclilies deleted the onnx_gpu_enabled branch March 30, 2021 18:58
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* remove hardcoded target and ctx

* fix c-codgen for floating point mod

* MDisable onnx gpu test for argmin / argmax so we can get this fix merged. Matt or myself will fix later but we don't have time right now.

* lint

* fix black

* Add flag to task_python_frontend.sh to only run GPU enabled tests on GPU

* black again

* Enable GPU for test_nonzero

* Respond to comments

* Don't test batch matmul on CUDA

* Turn cuda off for dynamic batch matmul test

* Fix task script

* Flaky test

* another flaky test

Co-authored-by: mbrookhart <mbrookhart@octoml.ai>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* remove hardcoded target and ctx

* fix c-codgen for floating point mod

* MDisable onnx gpu test for argmin / argmax so we can get this fix merged. Matt or myself will fix later but we don't have time right now.

* lint

* fix black

* Add flag to task_python_frontend.sh to only run GPU enabled tests on GPU

* black again

* Enable GPU for test_nonzero

* Respond to comments

* Don't test batch matmul on CUDA

* Turn cuda off for dynamic batch matmul test

* Fix task script

* Flaky test

* another flaky test

Co-authored-by: mbrookhart <mbrookhart@octoml.ai>
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
* remove hardcoded target and ctx

* fix c-codgen for floating point mod

* MDisable onnx gpu test for argmin / argmax so we can get this fix merged. Matt or myself will fix later but we don't have time right now.

* lint

* fix black

* Add flag to task_python_frontend.sh to only run GPU enabled tests on GPU

* black again

* Enable GPU for test_nonzero

* Respond to comments

* Don't test batch matmul on CUDA

* Turn cuda off for dynamic batch matmul test

* Fix task script

* Flaky test

* another flaky test

Co-authored-by: mbrookhart <mbrookhart@octoml.ai>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
* remove hardcoded target and ctx

* fix c-codgen for floating point mod

* MDisable onnx gpu test for argmin / argmax so we can get this fix merged. Matt or myself will fix later but we don't have time right now.

* lint

* fix black

* Add flag to task_python_frontend.sh to only run GPU enabled tests on GPU

* black again

* Enable GPU for test_nonzero

* Respond to comments

* Don't test batch matmul on CUDA

* Turn cuda off for dynamic batch matmul test

* Fix task script

* Flaky test

* another flaky test

Co-authored-by: mbrookhart <mbrookhart@octoml.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants