-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Runtime] Extend Graph Runtime To Support Cuda Graph Launch #7616
Conversation
@zhuochenKIDD is this ready for review? Please modify the description if so; otherwise please mark this PR as a draft first. Thanks. |
@comaniac I've added test case, would you please help review, thanks. |
CMakeLists.txt
Outdated
if(USE_CUDA) | ||
if(USE_GRAPH_RUNTIME_CUGRAPH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes USE_GRAPH_RUNTIME_CUGRAPH silent when CUDA is OFF and may confuse users. We should have
if(USE_GRAPH_RUNTIME_CUGRAPH)
if(NOT USE_CUDA)
// error out saying please config with USE_CUDA=ON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved this to CUDA.cmake to better check CUDA version > 10, so it might like cudnn/cublas feature, is that ok?
except ValueError: | ||
raise ValueError( | ||
"Please set '(USE_GRAPH_RUNTIME_CUGRAPH ON)' in " | ||
"config.cmake and rebuild TVM to enable cu_graph test mode" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why test mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because we are currently evaluating CUDA graph API vs kernel launch, and it's keep on going, using TVM is more convenient to do so on new workloads than TF Runtime. And also currently only Kernel-kind cuda node is in captured CUDA graph, in might be more benefits when Memcpy-kind node or using manually created cuda graph, so currently I am not sure current stream-capture way is the optimal way, perhaps need more test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. We usually call it "experimental". I'll suggest the following:
To enable CuGraph (experimental), please set '(USE_GRAPH_RUNTIME_CUGRAPH ON)'
in config.cmake and rebuild TVM
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
This reverts commit f286711.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some miner changes but overall is good. Two additional points:
- I found two terms
cuGraph
andCUDA graph
are used in this PR. It would be better to just usecuGraph
. - It would be great if you could send a follow-up PR for a tutorial to explain how to use the two interfaces.
cmake/modules/CUDA.cmake
Outdated
if(CUDAToolkit_VERSION_MAJOR LESS "10") | ||
message(FATAL_ERROR "CUDA Graph requires at least CUDA 10, got=" ${CUDAToolkit_VERSION}) | ||
endif() | ||
message(STATUS "Build with Graph runtime cuGraph support...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be better to have one terminology in this PR. Either cuGraph
or CUDA graph
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I removed all cuGraph or cu_graph name, use CUDA Graph instread
python/tvm/contrib/nvcc.py
Outdated
return False | ||
return True | ||
except RuntimeError: | ||
warnings.warn("Cannot find cuda path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning has no information and can consider to remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
Co-authored-by: Cody Yu <comaniac0422@gmail.com>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I'm going to merge this first and the doc could be the next PR.
For the doc location, it's reasonable to put it under TVM runtime along with debugger (.rst), but to me this is a feature not limited to developers, so it would be more impactful if we put it under tutorial (.py). @tqchen @hogepodge could you please advice?
Thanks @zhuochenKIDD |
a tutorial/howto guide would be nice |
Agree with Tianqi. A how-to guide would be best. You can write it as a Sphinx-Gallery document, under the tvm/tutorials directory. I'm not entirely certain which subdirectory it should go under (you should avoid the get_started directory). Maybe a new directory if it doesn't fit into classifications for the others. |
) * add graph runtime cuGraph poc * lint format * add unittest * fix review comments * Update CMakeLists.txt Co-authored-by: Cody Yu <comaniac0422@gmail.com> * build cuda graph runtime in gpu test * Revert "build cuda graph runtime in gpu test" This reverts commit f286711. * rename cuGraph to CUDA Graph * rename cuda_graph * rename cuda_graph * lint format * Update src/runtime/graph/graph_runtime_factory.cc Co-authored-by: Cody Yu <comaniac0422@gmail.com> * Update python/tvm/testing.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * fix lint error * remove unnecessary warn * add test, fix lint * fix lint W0223 Co-authored-by: Cody Yu <comaniac0422@gmail.com>
) * add graph runtime cuGraph poc * lint format * add unittest * fix review comments * Update CMakeLists.txt Co-authored-by: Cody Yu <comaniac0422@gmail.com> * build cuda graph runtime in gpu test * Revert "build cuda graph runtime in gpu test" This reverts commit f286711. * rename cuGraph to CUDA Graph * rename cuda_graph * rename cuda_graph * lint format * Update src/runtime/graph/graph_runtime_factory.cc Co-authored-by: Cody Yu <comaniac0422@gmail.com> * Update python/tvm/testing.py Co-authored-by: Cody Yu <comaniac0422@gmail.com> * fix lint error * remove unnecessary warn * add test, fix lint * fix lint W0223 Co-authored-by: Cody Yu <comaniac0422@gmail.com>
We are currently using graph runtime to run some CTR models on NV-GPU, for our in-house model (around 100 nodes in tvm json graph ) cuGraphLaunch can reduce 5% to 10% percent latency vs the original for-loop cuda kernel launch.
So I wonder if the extension might benefits other workloads, I haven't test other types of models.