-
Notifications
You must be signed in to change notification settings - Fork 6.8k
TVM bridge support to JIT NDArray Function by TVM #9880
Conversation
based on an early version of work by @ZihengJiang |
TVM's side of PR apache/tvm#930 |
The test now pass, @piiswrong @szha can you review? |
Support wrap TVM compiled function as a NDArray function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be at least one test for the integration, and should be part of CI.
# pylint: disable= no-member, undefined-variable | ||
|
||
@property | ||
def _tvm_handle(self): | ||
return self.handle.value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a handle exposed for PackedFunc convention interface of TVM, to allow arbitrary positional arguments calls without adding new C API. Specifically, the wrapped function is a TVM PackedFunc that will recognize NDArray as an extension object, and pass the address of NDArray handles correctly to the arguments.
It is later received in here https://github.com/apache/incubator-mxnet/pull/9880/files#diff-3aa2a3c799e125e086769bc1d5f6490aR74
I have detailed my reasoning of but yet adding test-case to this PR. The TVM bridge depends on a header only component of TVM and does not have to link against the tvm runtime. So merging this won't introduce any additional burdens to MXNet's runtime. This feature can only be used when TVM and MXNet are both available in the system. If we are open to bring TVM(with LLVM dependency) as part of CI, we can propose another PR to change the Jenkinsfile (to add LLVM as part of build), and bring the testcase into mxnet CI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm heavily against splitting development and testing into two stages or PRs. Every PR must be covered by proper tests and thus the modifications to the jenkinsfile have to be part of this PR in order to get it approved by me.
While I'm aware that adding the TVM bridge is less impactful than for example the mkldnn PR, that case has clearly shown that tolerating merging PRs which are not entirely tested is harmful for the project and should be absolute exceptions.
Just to be clear, it is the way TVM bridge works that starts this special situation. This PR requires joint changes in both repo, and the feature won't be available until changes in TVM and MXNet are both made. Unlike mkldnn, the user do not have to switch on USE_TVM as hard dependency, but can directly use this feature when both TVM and MXNet are available. When user do not have TVM, this won't affect user at all(unlike in cases like mkldnn, which requires user to install mkldnn by default) I can add a minimum MXNet side of testcase that verifies the bridge function exist. A more thourough test, however, would require bring TVM's build to MXNet's CI, which is a decision that I think need more discussions. |
This being said, I totally agree that having proper testing is important. That is why there is already test-cases that get locally verified for these changes (which is also optional test that only runs when both are available in TVM side of changes). So the correctness and quality of the code change is covered by the test_mxnet_bridge.py. The only question is that if we want to directly bring that testcase to this PR now, that involves bring TVM's build into current jenkins pipeline, which I think deserves some discussion before we do so. |
I don't see any issues in building a dependency, we're doing this for a lot of cases. The test execution would be part of the integration test stage while any compilation happens during build stage. Well if we want to advertise MXNet being compatible with TVM, then it should be properly tested. What kind of discussion would you expect? |
I just mean the cost of building TVM's LLVM dependency. I don't want to directly introduce additional burden to the CI while this being purely optional. Anyway, I get your point and will see if we can do a test with TVM's minimum dependency build |
Don't worry about that. We are currently looking into ccache integration which should reduce the impact by a lot - especially if only GCC but not nvcc is being used. |
Testcase and ci added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see my comments. Also, please add more test to cover all added APIs
tests/python/gpu/test_tvm_bridge.py
Outdated
import tvm.contrib.mxnet | ||
import topi | ||
except ImportError: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Print message that test is not run because of missing tv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TVM*
echo USE_RPC=1 >> config.mk | ||
echo USE_GRAPH_RUNTIME=1 >> config.mk | ||
echo CUDA_PATH=/usr/local/cuda >> config.mk | ||
make -j10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please make use of all CPU cores
|
||
# Build and install TVM | ||
cd /tmp | ||
git clone https://github.com/dmlc/tvm/ --recursive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you aware that the result of this script is being cached indefinitely? In that case, it would be better to specify a stable version instead of Master as otherwise environments may differ on different slaves
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i am aware of that, change to used a fixed tag
@marcoabreu addressed the comments. The current test case already covers the API use-case of CPU and GPU of the async engine wrapping. |
include/mxnet/tensor_blob.h
Outdated
namespace mxnet { | ||
|
||
// redefine DLPack enumeration to be backward compatible. | ||
const int kCPU = kDLCPU; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be constexpr? what keeps it from generating an integer in the data segment for each file compiled?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, will change to constexpr
include/mxnet/tensor_blob.h
Outdated
const int kCPU = kDLCPU; | ||
const int kGPU = kDLGPU; | ||
// extension type code under TVM function. | ||
const int kTVMNDArrayTypeCode = 19; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to make it an enumerator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mostly because the 19 seems arbitrary? and maybe extensible to other numbers in the future? in that case, an enum could help to manage accidental overlap.
although my assumptions here may not be correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enumerator is allocated in the TVM side and reserved for MXNet and NNVM project, so it is not arbitrary chosen https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L97
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is picked to be the last reserved enumerator for NNVM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will add a comment about this in the updated code
// by default assume we mutate the array. | ||
if (const_loc_ptr < const_loc.size() && | ||
i == const_loc[const_loc_ptr]) { | ||
const_vars->push_back(nd.var()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this called a lot in performance-sensitive areas? should we do a reserve()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for all vectors here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't know the size of vector before hand
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
// sorted position of constant arguments | ||
std::vector<int> const_loc; | ||
for (int i = 0; i < num_const; ++i) { | ||
const_loc.push_back(wrap_args[i + 3].operator int()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reserve?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not on critical path(function construction instead of running)
LGTM |
tests/python/gpu/test_tvm_bridge.py
Outdated
import tvm.contrib.mxnet | ||
import topi | ||
except ImportError: | ||
print("TVM bridge test skipped because TVM is missing...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use logging.warn instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides that one minor requested change about logging, approved from my side
|
||
// C callback that can be used by TVM to extract | ||
// the WrapAsyncCall function. | ||
extern "C" MXNET_DLL int MXTVMBridge(TVMFunctionHandle pregister) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a CAPI? Should it be put in the c api folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is queried by TVM, so not publicly facing C API. I feel that it is better to put in here. but we can move to c_api folder
f = tvm.build(s, [x, y, zz, scale]) | ||
|
||
# get a mxnet version | ||
mxf = tvm.contrib.mxnet.to_mxnet_func(f, const_loc=[0, 1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This design feels weird. MXNet and TVM are mutually referencing each other.
mxnet backend is referencing TVM with TVMBridge and tvm is referencing mxnet with contrib.mxnet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benefit of this design, is that MXNet do not have to link libtvm.so which comes with llvm dependency and it will work out of box when mxnet and tvm are both available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically, we can view this as both tvm and mxnet depend on header only fraction of TVM runtime, and TVM contrib library calls MXNet to retrieve the bridge function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to chime in here, was just wondering: Does TVM look for this function specifically, or does mxnet call a TVM function beforehand, passing it a pointer to this callback (which would seem less circular).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TVM looks for this function specifically. By this function I mean TVMBridge
@piiswrong Can you check and merge, or provide a list of action items that you think should change? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to document the number 19 a bit more and add documentation that allows cross referencing? Maybe add a link to the TVM implementation that actually defines this number.
I added comment in the declaration point |
Not in the Python part and there's no link to where this number actually comes from. I personally would not know where to look in TVM |
OK, will do an update in the python part as well |
Thanks for the reviews! If there is no requests to change things today. I am going to merge this in tomorrow |
* | ||
* We do require TVM and MXNet to be built with same C++ ABI of std::function | ||
*/ | ||
#define TVM_RUNTIME_HEADER_ONLY 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should you undefine it at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since it is in cc file, this is not necessary
I understand that this is a quick hack to plug TVM support to mxnet NDArray and I think we should merge it for the time being. |
This design is certainly unconventional and takes me a while to come up with it, in a sense that it tries to achieve something that would otherwise impossible to achieve. Introducing TVM support to MXNet NDArray without a direct link dependency so a user can try it out anytime by using the default binary distro. The complication and benefit bough by MXNet's async execution makes the callback wrapping style case a bit tricky. If the function call had been simple CUDA calls things would be simpler. But thankfully the logic is clear enough. With this we can do things like write customized update rule in TVM while still JIT compile to get the best speedups |
build with commit tvm/runtiime/packed_func.h, No such file or directory |
@tornadomeet do |
for the first time, do |
@tqchen yes, that fixed. |
Really like this feature, but am having some trouble getting to work. Even with MXNet and TVM both built from source, I'm getting the error
When I try to call to_mxnet_func. Is there some other dependency needed to make it work? |
@jwfromm We do need to build MXNet from source for now before next release. This should work out of the box if you build MXNet and TVM from source. The error is likely due to the fact that your Specifically, please confirm the tvm_bridge.cc is built and linked into the shared library https://github.com/apache/incubator-mxnet/blob/master/src/nnvm/tvm_bridge.cc#L174 and use |
You're absolutely right, I was accidentally overwriting the built from source library with a pip install. Works great now! |
* TVM bridge support. Support wrap TVM compiled function as a NDArray function. * Testcases and CI to include TVM as dependency * address review comments * Add more comments, change to constexpr * change to log warn * update comment on the type code
* TVM bridge support. Support wrap TVM compiled function as a NDArray function. * Testcases and CI to include TVM as dependency * address review comments * Add more comments, change to constexpr * change to log warn * update comment on the type code
@tqchen sometimes the error is terminate called after throwing an instance of 'dmlc::Error'
what(): TVMCall CFunc Error:
Traceback (most recent call last):
File "tvm/_ffi/_cython/./function.pxi", line 38, in tvm._ffi._cy3.core.tvm_callback
TypeError: _list() missing 2 required positional arguments: 'name' and 'func' It seems we cannot call sometimes the error is
|
@tqchen Hi! I wrote a minimum reproducible example. Steps to reproduce
In the function Thanks. |
Support wrap TVM compiled function as an NDArray function. This enables use TVM as RTC module for MX's async function
Example
Technical Details
The bridge is quite natural as MXNet already uses DLTensor representation, which is used by TVM. The hard part is that we need to use MXNet's engine to run the compiled function, instead of running them directly.
Since TVM relies on LLVM, it is a bit too early to directly introduce this dependency. This PR does this differently. The TVM bridge depends on a header only component of TVM and does not have to link against tvm runtime.
When a user has TVM installed in their environment, TVM queries the MXTVMBridge function to get the wrapper logic and use it to run MXNet's function asynchronously. When a user does not have TVM installed, the additional logic won't add any additional link dependencies.
Because of this optional linking logic, I did not include test case for MXNet's CI. But have verified that the code works locally on GPU and CPU case here
Restriction
MXNet and TVM need to be built with same C++ ABI (because we pass around PackedFunc). This is somewhat a restriction but makes the code sharing easier by using the PackedFunc system. This usually can be achieved by using the same c++ compiler. For example, (g++4.8 and g++5.0 are not compatible, usually, the latest version of clang is compatible with latest version of g++), running incompatible ABI will cause undefined behavior in the code and possible segfault. This restriction can be possibly removed by forcing a pure C ABI, but requires additional work and may also affect the conciseness of code.