-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RUNTIME][CLML] OpenCLML tuning and profiling enhanced #13843
Conversation
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment. Generated by tvm-bot |
199755d
to
4f672d5
Compare
Tuning cache bin is serialized through DMLC::Stream to support multiple CLML sub graphs with in a tvm module. Individual tuning cache blobs are saved to same output file. New API on OpenCLWorkspace to enable or disable profiling on command queue rather doing this only when Timer is invoked. This is required to perform CLML operator tuning. CLML layer profiling now uses OpenCL Timer interface. This PR also fix avoiding pad operator offloading at the very first layer (to be specific before at least one convolution layer) due to the limitation of CLML pad operator is concerned about layout. Please refer to CLML SDK documentation for more details.
4f672d5
to
9960020
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several comments
Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
a8b79f5
to
535d1e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks
* [RUNTIME][CLML] OpenCLML tuning and profiling enhanced Tuning cache bin is serialized through DMLC::Stream to support multiple CLML sub graphs with in a tvm module. Individual tuning cache blobs are saved to same output file. New API on OpenCLWorkspace to enable or disable profiling on command queue rather doing this only when Timer is invoked. This is required to perform CLML operator tuning. CLML layer profiling now uses OpenCL Timer interface. This PR also fix avoiding pad operator offloading at the very first layer (to be specific before at least one convolution layer) due to the limitation of CLML pad operator is concerned about layout. Please refer to CLML SDK documentation for more details. * Update src/runtime/opencl/opencl_common.h Co-authored-by: Egor Churaev <egor.churaev@gmail.com> * * review comments --------- Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
Tuning cache bin is serialized through DMLC::Stream to support multiple CLML sub graphs with in a tvm module. Individual tuning cache blobs are saved to same output file.
New API on OpenCLWorkspace to enable or disable profiling on command queue rather doing this only when Timer is invoked. This is required to perform CLML operator tuning.
CLML layer profiling now uses OpenCL Timer interface.
This PR also fix avoiding pad operator offloading at the very first layer (to be specific before at least one convolution layer) due to the limitation of CLML pad operator is concerned about layout. Please refer to CLML SDK documentation for more details.
Co-Authored-By: Krishna Raju Vegiraju quic_kvegiraju@quicinc.com