Tensor Comprehensions (TC) is a fully-functional C++ library to automatically synthesize high-performance machine learning kernels using Halide, ISL and NVRTC or LLVM. TC additionally provides basic integration with Caffe2 and PyTorch. We provide more details in our paper on arXiv.
This library is designed to be highly portable, machine-learning-framework agnostic and only requires a simple tensor library with memory allocation, offloading and synchronization capabilities.
For now, we have integrated TC with the Caffe2 and PyTorch.
The following illustrates a short but powerful feature of the library: the capacity to JIT-compile high-performance machine learning kernels on demand, for specific sizes.
import tensor_comprehensions as tc
import torch
lang = """
def tensordot(float(N, C1, C2, H, W) I0, float(N, C2, C3, H, W) I1) -> (O) {
O(n, c1, c3, h, w) +=! I0(n, c1, c2, h, w) * I1(n, c2, c3, h, w)
}
"""
N, C1, C2, C3, H, W = 32, 512, 8, 2, 28, 28
tensordot = tc.define(lang, name="tensordot")
I0, I1 = torch.randn(N, C1, C2, H, W).cuda(), torch.randn(N, C2, C3, H, W).cuda()
best_options = tensordot.autotune(I0, I1, cache=True)
out = tensordot(I0, I1, options=best_options)
After a few generations of autotuning
on a 2-GPU P100 system, we see results resembling:
We have not yet characterized the precise fraction of peak performance we obtain but it is not uncommon to obtain 80%+ of peak shared memory bandwidth after autotuning. Solid register-level optimizations are still in the work but TC in its current form already addresses the productivity gap between the needs of research and the needs of production. Which is why we are excited to share it with the entire community and bring this collaborative effort in the open.
General: You can find detailed information about Tensor Comprehensions here.
C++ API: We also provide documentation for our C++ API which can can be found here
We provide conda package for making it easy to install and use TC binary. Please refer to our documentation here for instructions.
You can find documentation here which contains instructions for building TC via docker, conda packages or in non-conda environment.
- Email: tensorcomp@fb.com
- GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.
- Slack: For discussion around framework integration, build support, collaboration, etc. join our slack channel https://tensorcomprehensions.herokuapp.com/.
See the CODE_OF_CONDUCT.md file for more details.
Tensor Comprehensions is distributed under a permissive Apache v2.0 license, see the LICENSE file for more details.
See the CONTRIBUTING.md file for more details.