-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Texture support][Part 0] Device API and runtime support #7711
[Texture support][Part 0] Device API and runtime support #7711
Conversation
Thanks @csullivan some quick comments, will read more carefully in the incoming week. |
@tqchen, @ZihengJiang, would you kindly consider reviewing once more? The main change is to remove texture specific device apis and rely on I also introduced an opencl buffer descriptor that tracks the allocation layout. With the layout and the DLTensor CopyDataFromTo overload I've verified that a sub-texture allocation can be correctly copied out of a 2d texture pool of larger extent. This solves an issue I raised in Part 4. I appreciate any additional feedback you have. |
LGTM. @tqchen, could you check whether the pr look good to you? |
case cl::BufferDescriptor::MemoryLayout::IMAGE_2D_ACTIVATION: | ||
case cl::BufferDescriptor::MemoryLayout::IMAGE_2D_WEIGHT: | ||
auto image_info = GetImageInfo(from_desc, from); | ||
// TODO(csullivan): Support calculating row_pitch correctly in the case of reuse. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to add a few testcases in python that demonstrates the copy into image where image size is bigger than the normal one. Perhaps the easiest way is to construct an NDArray then write a PackedFunc that takes a smaller view from it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be replaced with AOT memory planning when the relay/tir/compile engine refactor lands.
allocations and copying to/from host/image2d directly. Allocation employs a lowering convention to 2d images for activations and weights.
utilities that can be shared by codegen and the runtime.
git@github.com:ZihengJiang/tvm.git:52822c5bd [RUNTIME] OpenCL texture memory.
memory layout through OpenCL Device API.
overload and tensor shapes to calculate image extent when copying date directly to or from texture cache.
of storage allocs of texture scope.
@tqchen Thanks for the great feedback, could you take a look again? |
Thanks @csullivan . will let @ZihengJiang manage the PR |
Merged now. Thanks @csullivan for the hard working. |
* Add TVMBackendAllocTexture and support in OpenCL device API. * Add runtime optimized caching allocator. This should be replaced with AOT memory planning when the relay/tir/compile engine refactor lands. * Few bug fixes for runtime texture allocator. * Add OpenCL device api support for image2d<float16> textures. * Update OpenCL DeviceAPI to support Image2D data space allocations and copying to/from host/image2d directly. Allocation employs a lowering convention to 2d images for activations and weights. * Fix to follow OpenCL spec. for indexing. * Rename texture_pool.h -> texture.h * Move Nd to 2d lowering convention code into runtime texture utilities that can be shared by codegen and the runtime. * Update texture lowering utilities * Add TODO comment about pitch support * Remove FreeTexture * Fix ICHECK comment * Partial cherry pick from @ZihengJiang git@github.com:ZihengJiang/tvm.git:52822c5bd [RUNTIME] OpenCL texture memory. * Remove runtime and device texture APIs. * Add OpenCL packed functions for texture workspace (de)allocations. * Add OpenCLBuffer structure to track memory layout through OpenCL Device API. * Rebase: TVMContext -> Device * Implement DLTensor* overload of CopyDataToFrom in OpenCL DeviceAPI. * Implement OpenCL CopyDataFromTo(DLTensor*...) overload and tensor shapes to calculate image extent when copying date directly to or from texture cache. * Update format (cpp-lint) * Update format (clang) * Buffer descriptor name change and formatting. * Add texture pool documentation. * Update runtime to use new global.texture scope. * Move texture_pool.cc into opencl impl. * Add test coverage for copying in and out of storage allocs of texture scope. * Documented APIs and structures, renamed buffer descriptor layout tags. Co-authored-by: ZihengJiang <ziheng@apache.org>
* Add TVMBackendAllocTexture and support in OpenCL device API. * Add runtime optimized caching allocator. This should be replaced with AOT memory planning when the relay/tir/compile engine refactor lands. * Few bug fixes for runtime texture allocator. * Add OpenCL device api support for image2d<float16> textures. * Update OpenCL DeviceAPI to support Image2D data space allocations and copying to/from host/image2d directly. Allocation employs a lowering convention to 2d images for activations and weights. * Fix to follow OpenCL spec. for indexing. * Rename texture_pool.h -> texture.h * Move Nd to 2d lowering convention code into runtime texture utilities that can be shared by codegen and the runtime. * Update texture lowering utilities * Add TODO comment about pitch support * Remove FreeTexture * Fix ICHECK comment * Partial cherry pick from @ZihengJiang git@github.com:ZihengJiang/tvm.git:52822c5bd [RUNTIME] OpenCL texture memory. * Remove runtime and device texture APIs. * Add OpenCL packed functions for texture workspace (de)allocations. * Add OpenCLBuffer structure to track memory layout through OpenCL Device API. * Rebase: TVMContext -> Device * Implement DLTensor* overload of CopyDataToFrom in OpenCL DeviceAPI. * Implement OpenCL CopyDataFromTo(DLTensor*...) overload and tensor shapes to calculate image extent when copying date directly to or from texture cache. * Update format (cpp-lint) * Update format (clang) * Buffer descriptor name change and formatting. * Add texture pool documentation. * Update runtime to use new global.texture scope. * Move texture_pool.cc into opencl impl. * Add test coverage for copying in and out of storage allocs of texture scope. * Documented APIs and structures, renamed buffer descriptor layout tags. Co-authored-by: ZihengJiang <ziheng@apache.org>
This PR introduces 2d texture memory support to the OpenCL Device API runtime.
Device runtime
See RFC here: https://discuss.tvm.apache.org/t/rfc-texture-memory-support/9467