Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime] Extend Graph Runtime To Support Cuda Graph Launch #7616

Merged
merged 19 commits into from
Mar 17, 2021
Merged

[Runtime] Extend Graph Runtime To Support Cuda Graph Launch #7616

merged 19 commits into from
Mar 17, 2021

Conversation

zhuochenKIDD
Copy link
Contributor

@zhuochenKIDD zhuochenKIDD commented Mar 9, 2021

We are currently using graph runtime to run some CTR models on NV-GPU, for our in-house model (around 100 nodes in tvm json graph ) cuGraphLaunch can reduce 5% to 10% percent latency vs the original for-loop cuda kernel launch.

So I wonder if the extension might benefits other workloads, I haven't test other types of models.

@comaniac
Copy link
Contributor

comaniac commented Mar 9, 2021

@zhuochenKIDD is this ready for review? Please modify the description if so; otherwise please mark this PR as a draft first. Thanks.

@zhuochenKIDD
Copy link
Contributor Author

@comaniac I've added test case, would you please help review, thanks.

tests/python/unittest/test_runtime_graph_cugraph.py Outdated Show resolved Hide resolved
tests/python/unittest/test_runtime_graph_cugraph.py Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated
Comment on lines 326 to 327
if(USE_CUDA)
if(USE_GRAPH_RUNTIME_CUGRAPH)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes USE_GRAPH_RUNTIME_CUGRAPH silent when CUDA is OFF and may confuse users. We should have

if(USE_GRAPH_RUNTIME_CUGRAPH)
  if(NOT USE_CUDA)
    // error out saying please config with USE_CUDA=ON.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to CUDA.cmake to better check CUDA version > 10, so it might like cudnn/cublas feature, is that ok?

python/tvm/contrib/cu_graph/cugraph_runtime.py Outdated Show resolved Hide resolved
except ValueError:
raise ValueError(
"Please set '(USE_GRAPH_RUNTIME_CUGRAPH ON)' in "
"config.cmake and rebuild TVM to enable cu_graph test mode"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why test mode?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because we are currently evaluating CUDA graph API vs kernel launch, and it's keep on going, using TVM is more convenient to do so on new workloads than TF Runtime. And also currently only Kernel-kind cuda node is in captured CUDA graph, in might be more benefits when Memcpy-kind node or using manually created cuda graph, so currently I am not sure current stream-capture way is the optimal way, perhaps need more test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. We usually call it "experimental". I'll suggest the following:

To enable CuGraph (experimental), please set '(USE_GRAPH_RUNTIME_CUGRAPH ON)'
in config.cmake and rebuild TVM

src/runtime/graph/cugraph/graph_runtime_cugraph.cc Outdated Show resolved Hide resolved
src/runtime/graph/cugraph/graph_runtime_cugraph.cc Outdated Show resolved Hide resolved
src/runtime/graph/cugraph/graph_runtime_cugraph.cc Outdated Show resolved Hide resolved
tests/python/unittest/test_runtime_graph_cugraph.py Outdated Show resolved Hide resolved
Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some miner changes but overall is good. Two additional points:

  1. I found two terms cuGraph and CUDA graph are used in this PR. It would be better to just use cuGraph.
  2. It would be great if you could send a follow-up PR for a tutorial to explain how to use the two interfaces.

cmake/config.cmake Outdated Show resolved Hide resolved
cmake/modules/CUDA.cmake Outdated Show resolved Hide resolved
cmake/modules/CUDA.cmake Outdated Show resolved Hide resolved
if(CUDAToolkit_VERSION_MAJOR LESS "10")
message(FATAL_ERROR "CUDA Graph requires at least CUDA 10, got=" ${CUDAToolkit_VERSION})
endif()
message(STATUS "Build with Graph runtime cuGraph support...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to have one terminology in this PR. Either cuGraph or CUDA graph.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I removed all cuGraph or cu_graph name, use CUDA Graph instread

return False
return True
except RuntimeError:
warnings.warn("Cannot find cuda path")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning has no information and can consider to remove.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

python/tvm/testing.py Outdated Show resolved Hide resolved
src/runtime/graph/graph_runtime_factory.cc Outdated Show resolved Hide resolved
tests/python/unittest/test_runtime_graph_cugraph.py Outdated Show resolved Hide resolved
tests/python/unittest/test_runtime_graph_cugraph.py Outdated Show resolved Hide resolved
@zhuochenKIDD
Copy link
Contributor Author

Just some miner changes but overall is good. Two additional points:

  1. I found two terms cuGraph and CUDA graph are used in this PR. It would be better to just use cuGraph.
  2. It would be great if you could send a follow-up PR for a tutorial to explain how to use the two interfaces.
  1. I removed cuGraph and changed code to CUDA graph because it's NV official terminology and found cuGraph is another lib for graph algorithms
  2. I will add more docs when ready, by tutorial do you mean I add a py in tutorials/frontend or a rst in docs/dev?

Copy link
Contributor

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I'm going to merge this first and the doc could be the next PR.
For the doc location, it's reasonable to put it under TVM runtime along with debugger (.rst), but to me this is a feature not limited to developers, so it would be more impactful if we put it under tutorial (.py). @tqchen @hogepodge could you please advice?

@comaniac comaniac merged commit 60ff0c7 into apache:main Mar 17, 2021
@comaniac
Copy link
Contributor

Thanks @zhuochenKIDD

@tqchen
Copy link
Member

tqchen commented Mar 17, 2021

a tutorial/howto guide would be nice

@hogepodge
Copy link
Contributor

Agree with Tianqi. A how-to guide would be best. You can write it as a Sphinx-Gallery document, under the tvm/tutorials directory. I'm not entirely certain which subdirectory it should go under (you should avoid the get_started directory). Maybe a new directory if it doesn't fit into classifications for the others.

trevor-m pushed a commit to trevor-m/tvm that referenced this pull request May 6, 2021
)

* add graph runtime cuGraph poc

* lint format

* add unittest

* fix review comments

* Update CMakeLists.txt

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* build cuda graph runtime in gpu test

* Revert "build cuda graph runtime in gpu test"

This reverts commit f286711.

* rename cuGraph to CUDA Graph

* rename cuda_graph

* rename cuda_graph

* lint format

* Update src/runtime/graph/graph_runtime_factory.cc

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* Update python/tvm/testing.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* fix lint error

* remove unnecessary warn

* add test, fix lint

* fix lint W0223

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
trevor-m pushed a commit to neo-ai/tvm that referenced this pull request May 11, 2021
)

* add graph runtime cuGraph poc

* lint format

* add unittest

* fix review comments

* Update CMakeLists.txt

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* build cuda graph runtime in gpu test

* Revert "build cuda graph runtime in gpu test"

This reverts commit f286711.

* rename cuGraph to CUDA Graph

* rename cuda_graph

* rename cuda_graph

* lint format

* Update src/runtime/graph/graph_runtime_factory.cc

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* Update python/tvm/testing.py

Co-authored-by: Cody Yu <comaniac0422@gmail.com>

* fix lint error

* remove unnecessary warn

* add test, fix lint

* fix lint W0223

Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants