Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Reformat of TensorRT to use subgraph API #14040

Merged
merged 1 commit into from
May 1, 2019

Conversation

Caenorst
Copy link
Contributor

@Caenorst Caenorst commented Jan 31, 2019

Description

This PR modify TensorRT usage by relying on the Subgraph API, the backend is named 'TensorRT'

Checklist

Essentials

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Code is well-documented:
  • Need an update in documentation for general use cases
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • TensorRT now just rely on the Subgraph API, and a contrib function for loading weights
  • Add FP16 support, computation in FP16 forced by default
  • Graph partitioning have changed so if the same input of a subgraph is used multiple times in the subgraph then it stay as a single node in the subgraph

Comments

  • Now that it's using the Subgraph API, you can use TensorRT with both module API and symbolic API
  • The way weights are loaded in the TensorRT node is a little bit dirty, we will do a follow up PR (including one on NNVM) to natively have a node attributes own subgraph parameters including a setter on the c api

*/
MXNET_DLL int MXExecutorGetOptimizedSymbol(ExecutorHandle handle,
SymbolHandle *out);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to remove an API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually created this API just for the previous TensorRT implementation, I'm not sure it is used anywhere else. It could actually be used in case you are calling a subgraph backend with variable environment. Do you want to keep it ?

Copy link
Contributor

@KellenSunderland KellenSunderland Feb 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It affects semantic versioning if we remove it (it can break compilation on downstream projects) so if there's not a strong reason to remove it we should leave it in for that reason. Can it still preform a useful function (for example showing the graph after optimization)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, should I make it more general then ? (It was only working using TrtGraphExecutor which doesn't exist anymore)

}
}

inline std::string StringDType(int dtype) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed, don't see any references to it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug function, nice catch, will remove

if (trt_builder->platformHasFastFp16()) {
trt_builder->setFp16Mode(true);
} else {
LOG(INFO) << "WARNING: TensorRT can't use fp16 on this plateform";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plateform -> platform

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also we're logging INFO level logs but have WARNING in the message. I'd remove the warning from the message and set log level to warning. (this is a common issue in our codebase).

return false;
}
}
if (isTRTCompatible(new_node))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be simplified to

return isTRTCompatible(new_node);

if (o.index == e->index && o.node.get() == e->node.get()) {
e->index = i;
e->node = subgraph_node;
// TODO(cfujitsang): For futur support this would fail
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

futur -> future.

}

nnvm::NodePtr CreateSubgraphNode(const nnvm::Symbol &sym,
const int subgraph_id = 0) const override {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible can we avoid default values in overriding functions.

@KellenSunderland
Copy link
Contributor

ONNX-Tensorrt might fail to build as it adds a new default target that requires a header not in a standard search path. There's a few ways to work around the problem, we could just build the library and not that tool in CI, or we could include the header folder location in the search path like so: https://github.com/apache/incubator-mxnet/pull/13906/files#diff-56133c25b5a238b76f54c0928f05a8e6

It should allow the TensorRT build to pass CI if we make that change in this PR.

@Caenorst
Copy link
Contributor Author

Caenorst commented Feb 4, 2019

Seems that my modification in CutGraphInputs is breaking some of the default attributes inference functions, I think my modification make sense (duplicating inputs in the subgraph appear to me like a bug). I can modify src/operator/subgraph/common.h accordingly if you agree with my modification on CutGraphInputs.

@KellenSunderland
Copy link
Contributor

CI should be in reasonably good shape now. Looks like there's some linting issues in a few headers:

=====269/271 cpp-header files passed check=====
src/operator/subgraph/tensorrt/nnvm_to_onnx-inl.h: 1 Errors of 1 Categories map={'build': 1}
src/operator/subgraph/tensorrt/tensorrt-inl.h: 4 Errors of 2 Categories map={'whitespace': 1, 'build': 3}

@reminisce
Copy link
Contributor

Great to see this is happening. I have two high-level comments:

  1. If you use the subgraph API, there should be no needs to add specific Python functions (e.g. init_tensorrt_params) to use TensorRT as a backend inference engine. I think everything can be handled in backend to support Module and simple_bind APIs in the frontend.
  2. Have you considered adopting TensorRT C++ APIs to convert MXNet subgraph IR to TensorRT IR? This would allow you to get rid of dependency of protobuf, onnx, and onnx-trt. I used to find building MXNet with those third-party libs is a painful process, especially on edge devices. In addition, simply relying on the TensorRT C++ APIs allows we to extend operator coverage promptly once new release of TensorRT is out. I did this while integrating TensorRT with TVM, which shares the same graph IR as in MXNet.
    https://github.com/reminisce/tvm/blob/subgraph_integration/src/contrib/subgraph/tensorrt_executor.cc

@Caenorst
Copy link
Contributor Author

Caenorst commented Feb 5, 2019

@reminisce

  1. If you know how to do it I'd like your help, I asked the question on the dev list and have discussed with Kellen Sunderland and Zheng Da, and we haven't find an easy solution. The final objective later would be to modify function that do parameters initialization. Currently using init_tensorrt_params doesn't prevent to use Module or Symbolic API

  2. so far on our side, there is no plan to integrate TensorRT in MxNet without ONNX. It's easier for us to rely on onnx-tensorrt for reusability

@reminisce
Copy link
Contributor

@Caenorst
Just want to clarify, I'm not blocking this PR. We can think through about these comments and make incremental changes afterwards.

The point is adding zero overhead in terms of user experience of using TensorRT in MXNet. I don't think you need to pass the param data explicitly beforehand. They can all be collected at the first forward call for each subgraph. You can take a look how this is done in TVM. On the high level, in the forward function of a tensorrt subgraph op, you have a list of inputs which contains both input data and params. You just need to differentiate param names from input data names. This can be achieved by analyzing whether a data node's NodeEntry is an operator's input data or param.

Regarding the dependence on protobuf, onnx, and onnx-tensorrt, here is my two cents, I personally don't think introducing so many third-party dependencies is a good idea, because it results in an extra layer of complexity in terms of implementation, maintenance, build, and deployment, and the conversion process is not manageable by this community. If there is a direct solution of creating a bridge from nnvm to tensorrt, why not use it?

@Caenorst
Copy link
Contributor Author

Caenorst commented Feb 7, 2019

@reminisce
I completely get your points, we thought about loading the weights from inputs, the problem is that you can't mix context in a single graph and as far as I know you can't delete the weights from the graph, so it will force to have the twice the weights in GPU memory, which may become a problem at some point. Ideally if we can somehow at least deallocate the memory of the weights once copied inside the TensorRT engine that would be nice, I haven't tried to see if it is doable tho

@reminisce
Copy link
Contributor

reminisce commented Feb 7, 2019

you can't mix context in a single graph

What do you mean "mix context"? We only have one context which gpu in this case.

Regarding releasing the weight params after they are copied to tensorrt engines, I think it's doable by removing the corresponding NDArray from data_entry_ in GraphExecutor.

@Caenorst
Copy link
Contributor Author

Caenorst commented Feb 7, 2019

@reminisce
Yes, what I mean is that if weights for TensorRT node are on CPU while the rest of the graph is on GPU, it would be less of a problem. Also an argument for onnx-tensorrt is that there is more Ops supported with plugins implemented (slice, some activation, resize with nearest interpolation...)

@reminisce
Copy link
Contributor

weights for TensorRT node are on CPU while the rest of the graph is on GPU.

This is not true. When binding completes, you have all the weights on GPU.

@reminisce
Copy link
Contributor

Also an argument for onnx-tensorrt is that there is more Ops supported with plugins implemented (slice, some activation, resize with nearest interpolation...)

Are you saying that there are operators supported by TensorRT but creating them is not available through the C++ APIs? Have those operators been added to the unconditionalTRTop? The TRT C++ API also supports plugin interface.

Copy link
Contributor Author

@Caenorst Caenorst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you saying that there are operators supported by TensorRT but creating them is not available through the C++ APIs? Have those operators been added to the unconditionalTRTop? The TRT C++ API also supports plugin interface.

They are added as IPlugin (so not natively supported by TensorRT), but of course, you could do try to do your own Plugins. For the moment I don't think any of those ops are supported by my implementation, but I don't think it will be hard to do, it's just a long term argument

}
}

inline std::string StringDType(int dtype) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debug function, nice catch, will remove

*/
MXNET_DLL int MXExecutorGetOptimizedSymbol(ExecutorHandle handle,
SymbolHandle *out);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, should I make it more general then ? (It was only working using TrtGraphExecutor which doesn't exist anymore)

@ankkhedia
Copy link
Contributor

@Caenorst Could you look fix merge conflicts and retrigger CI?
@reminisce Could you please help @Caenorst with his questions?

@Caenorst
Copy link
Contributor Author

weights for TensorRT node are on CPU while the rest of the graph is on GPU.

This is not true. When binding completes, you have all the weights on GPU.

Yes, that's what I meant, weights have to be copied from CPU to GPU, so they have to be originally on CPU. Actually they have to be copied first in the ONNX model

@karan6181
Copy link
Contributor

@KellenSunderland and @anirudh2290 could you please review this PR again.

Thanks !

@KellenSunderland
Copy link
Contributor

@karan6181 Taking a look this week.

@KellenSunderland
Copy link
Contributor

KellenSunderland commented Mar 28, 2019

Hey @Caenorst Sorry can you rebase this once more? There's no reported conflicts from Github but for some reason when running in CI it's not picking up some changes which means the R tests will fail (they're trying to use an old domain).

PR looks good to me though, will merge as soon as it's rebased.

@Caenorst Caenorst force-pushed the trt_reformat branch 2 times, most recently from ccae63f to 5418349 Compare March 28, 2019 20:59
@abhinavs95
Copy link
Contributor

@Caenorst Thank you for making the changes. @KellenSunderland is this good to go?

@abhinavs95
Copy link
Contributor

@mxnet-label-bot add [pr-awaiting-testing]

@marcoabreu marcoabreu added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Mar 28, 2019
@KellenSunderland
Copy link
Contributor

LGTM, did some testing on a home machine.

@KellenSunderland
Copy link
Contributor

Small update: still debugging a crash showing up during teardown. Seems like a problem relating to the order of static scoped objects being deconstructed.

@KellenSunderland
Copy link
Contributor

Are you able to reproduce this locally @Caenorst? I've run through quite a few times now and haven't hit the issue.

@KellenSunderland
Copy link
Contributor

Scratch that, able to reproduce now. Digging a bit further in ...

@KellenSunderland
Copy link
Contributor

Ok I see what's happening. Looks like we've done some refactoring to help prevent shutdown engine crashes but still have some fixed to apply before it's working 100%. I believe this PR should solve the issue we're seeing in the test: #14591 Would you be able to test with that commit applied?

@KellenSunderland
Copy link
Contributor

Ok issue that was tripping up our tests has been fixed. Would you be able to do another rebase (sorry) and then we should be able to merge.

@roywei
Copy link
Member

roywei commented Apr 29, 2019

@Caenorst thanks for the contribution, gentle ping to rebase so we can merge this PR.

@Caenorst
Copy link
Contributor Author

Caenorst commented May 1, 2019

Rebase done. Sorry for the delay.

@KellenSunderland
Copy link
Contributor

No worries about the delay. My team's been testing the TRT integration out and are seeing better than expected speedups so far. Let's merge this change. I'm adding a todo for myself to update all documentation relating to the API change in our TRT tutorials. We've also found a few minor bugs we'll hope to PR soon.

@KellenSunderland KellenSunderland merged commit 1c874cf into apache:master May 1, 2019
access2rohit pushed a commit to access2rohit/incubator-mxnet that referenced this pull request May 14, 2019
@ThomasDelteil
Copy link
Contributor

@KellenSunderland @Caenorst , the tutorial for tensorrt is out of the date and I have had a few user asking us why it's not working anymore https://mxnet.incubator.apache.org/versions/master/tutorials/tensorrt/inference_with_trt.html
What is the current recommended way to use tensorRT ?
Thanks!

@Caenorst
Copy link
Contributor Author

Caenorst commented Jun 7, 2019

@ThomasDelteil the PR for updating it is on its way: https://mxnet.incubator.apache.org/versions/master/tutorials/tensorrt/inference_with_trt.html

Meanwhile here is the new way to use TensorRT:

sym, arg_params, aux_params = mx.model.load_checkpoint('my_net', epochs)
trt_sym = sym.get_backend_symbol('TensorRT')
remaining_arg_params, remaining_aux_params = mx.contrib.tensorrt.init_tensorrt_params(trt_sym, arg_params, aux_params)

And then you can use trt_sym as your usual symbol, no more tensorrt_bind or MXNET_USE_TENSORRT environment variable. Also now FP16 computation is by default if you want to disable it use the function mx.contrib.tensorrt.set_use_fp16(False)

@ThomasDelteil
Copy link
Contributor

Thanks @Caenorst for the quick update!

@KellenSunderland
Copy link
Contributor

KellenSunderland commented Jun 7, 2019 via email

haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-testing PR is reviewed and waiting CI build and test
Projects
None yet
Development

Successfully merging this pull request may close these issues.