Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RUNTIME][CLML] OpenCLML tuning and profiling enhanced #13843

Merged
merged 3 commits into from
Jan 30, 2023

Conversation

srkreddy1238
Copy link
Contributor

@srkreddy1238 srkreddy1238 commented Jan 25, 2023

Tuning cache bin is serialized through DMLC::Stream to support multiple CLML sub graphs with in a tvm module. Individual tuning cache blobs are saved to same output file.

New API on OpenCLWorkspace to enable or disable profiling on command queue rather doing this only when Timer is invoked. This is required to perform CLML operator tuning.

CLML layer profiling now uses OpenCL Timer interface.

This PR also fix avoiding pad operator offloading at the very first layer (to be specific before at least one convolution layer) due to the limitation of CLML pad operator is concerned about layout. Please refer to CLML SDK documentation for more details.

Co-Authored-By: Krishna Raju Vegiraju quic_kvegiraju@quicinc.com

@tvm-bot
Copy link
Collaborator

tvm-bot commented Jan 25, 2023

Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.

Generated by tvm-bot

@srkreddy1238 srkreddy1238 force-pushed the clml_tuning branch 3 times, most recently from 199755d to 4f672d5 Compare January 26, 2023 02:09
Tuning cache bin is serialized through DMLC::Stream to support multiple
CLML sub graphs with in a tvm module. Individual tuning cache blobs are
saved to same output file.

New API on OpenCLWorkspace to enable or disable profiling on command queue
rather doing this only when Timer is invoked. This is required to perform
CLML operator tuning.

CLML layer profiling now uses OpenCL Timer interface.

This PR also fix avoiding pad operator offloading at the very first layer
(to be specific before at least one convolution layer) due to the limitation
of CLML pad operator is concerned about layout. Please refer to CLML SDK
documentation for more details.
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several comments

srkreddy1238 and others added 2 commits January 27, 2023 12:24
Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
Copy link
Contributor

@echuraev echuraev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks

@echuraev echuraev merged commit 3c81d9b into apache:main Jan 30, 2023
fzi-peccia pushed a commit to fzi-peccia/tvm that referenced this pull request Mar 27, 2023
* [RUNTIME][CLML] OpenCLML tuning and profiling enhanced

Tuning cache bin is serialized through DMLC::Stream to support multiple
CLML sub graphs with in a tvm module. Individual tuning cache blobs are
saved to same output file.

New API on OpenCLWorkspace to enable or disable profiling on command queue
rather doing this only when Timer is invoked. This is required to perform
CLML operator tuning.

CLML layer profiling now uses OpenCL Timer interface.

This PR also fix avoiding pad operator offloading at the very first layer
(to be specific before at least one convolution layer) due to the limitation
of CLML pad operator is concerned about layout. Please refer to CLML SDK
documentation for more details.

* Update src/runtime/opencl/opencl_common.h

Co-authored-by: Egor Churaev <egor.churaev@gmail.com>

* * review comments

---------

Co-authored-by: Egor Churaev <egor.churaev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants