Relay/TRT Integration (whole graph only) #54

trevor-m · 2019-11-05T00:00:13Z

This PR includes support for a version of Relay/TRT integration which only works when the entire model can be converted to TRT. It is enabled with the EnableTrt pass. If any op in the model cannot be converted to TRT, EnableTrt will return the original module unmodified.

How to use

Build TVM with cmake flag USE_TENSORRT=ON or USE_TENSORRT=/path/to/TensorRT. USE_CUDA should be enabled as well.
Convert the model into TensorRT. This step will determine if every node in the graph can be converted to TensorrRT and if so will mark the graph to use TensorRT and apply some specific optimization passes.

import tvm.relay.tensorrt
mod = relay.tensorrt.EnableTrt(mod, params)

Check if TRT was enabled. If not, it means some op in the graph is not supported by the TensorRT conversion. EnableTrt will output which particular ops are not supported and why.

assert mod['main'].attrs and mod['main'].attrs.Compiler == 'tensorrt'

Finish compilation.

with relay.build_config(opt_level=2, disabled_pass={"SimplifyInference"}):
  graph, lib, params = relay.build(mod, "cuda", params=params)

(Optional) Serialize/deserialize the compiled model. The model will be serialized to three files: compiled.json, compiled.params, and compiled.tensorrt.

# Serialize
with open('compiled.json', 'w') as f_graph_json:
  f_graph_json.write(graph)
with open('compiled.params', 'wb') as f_params:
  f_params.write(relay.save_param_dict(params))
lib.save('compiled.tensorrt')

# Deserialize
with open('compiled.json', 'r') as f_graph_json:
  graph = f_graph_json.read()
with open('compiled.params', 'rb') as f_params:
  params = tvm.relay.load_param_dict(f_params.read())
lib = tvm.module.load("compiled.tensorrt")

Run inference. The first invocation will trigger creation of the TensorRT engine. This could take up to a few minutes.

# Create graph runtime
mod = graph_runtime.create(graph, lib, ctx=tvm.gpu(0))
mod.set_input(**params)

i_data = np.random.uniform(0, 1, input_shape).astype(dtype)
# Build TensorRT engine
mod.run(data=i_data)

# Run inference
mod.run(data=i_data)
res = mod.get_output(0)

The tests tests/python/relay/test_tensorrt.py provide some deeper examples of how to use this feature.

The NNVM/TRT integration is still present.

trevor-m · 2019-12-20T19:46:28Z

@zhiics Could you please review?

trevor-m · 2019-12-26T23:07:20Z

@yongwww @reminisce @ziyu-guo Could you please review?

cmake/modules/contrib/TensorRT.cmake

zhiics

Started reviewing. Will spend more time on it.

CC @comaniac who might be interested as well

include/tvm/attrs.h

python/tvm/relay/transform.py

src/relay/backend/contrib/tensorrt/trt_builder.cc

trevor-m · 2020-01-09T22:54:00Z

@zhiics Could you take a quick look at my changes to use the external codegen/runtime?

zhiics

Need more time to review as this is a very big PR. Please ping a few more ppl to take a look.

cmake/config.cmake

src/relay/pass/enable_tensorrt.cc

zhiics

Some more comments

src/runtime/contrib/tensorrt/tensorrt_module.cc

src/runtime/contrib/tensorrt/tensorrt_module.h

src/runtime/contrib/tensorrt/utils.h

tests/python/relay/test_tensorrt.py

Fix merge Fix merge and clean up logs Add BiasAdd, Concat, padding ceil mode, and clean up code Fix formatting and remove unused headers uncomment models Fix bug with variable input, clean up Don't split batch norm Move TRT execution to TrtExecutor Clean up Clean up Add paritioning Implement graph_runtime execution for Relay/TRT Fix bug in extern op Fix compilation Add EnableTrt pass to perform same modification as previous wholegraphannotator Renable NNVM TRT Remove SimplifyBatchnorm, add rules for converting ops Fix format, remove unused tests Enable multiple outputs Fix multiple outputs Fix activation lookup Fix no newline at eof Add license header. Add consistency test to models Add method to check TRT used. Improve comments Fix lint Add util to check TRT version Add if guards around TRT5.1 APIs Add env var for workspace size, fix logger fix build Add TRT versioning to EnableTrt pass Fix build error in DLR Fix compile for DLR Update dmlc-core, fix copyright header, undo change to includes Remove unused headers Fix IsTrtCompatible visitor and move op list to constructor Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround Fix formatting. Add unit tests Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops. Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass Support (2,3,0,1) tranpose on weights Allow stride to be incomplete. Support ConstantNode -> kWeight Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool Comments, disable failign test Fix CI lint Removed unused variables from TrtBuilder. Add more comments Fix build for TRT4 Add GetTrtVersion(), Move convert map to function, remove uneeded include, make batch_size_, logger_ TrtBuilder members, check output existence Use shared_ptr for converters. Add check for num outputs and inputs Support image.resize Make GetOpConverters return a shared_ptr Clarify count inclusive padding weirdness Use external codegen/runtime Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes Require format to be tensorrt so that loader knows how to load FoldConstants Destroy engine and context after use. Store TRT weights from op converters. Formatting Always apply ConvertLayout to NCHW Clean up Add ASF header Change ObjectRef -> NodeRef Fix lint Fix pylint Fix bug with scalar weights Making TRT cmake more informative Make tensorrt tests dependent on whether trt codegen is enabled Add serialization test.

trevor-m · 2020-01-14T23:38:43Z

@comaniac @anijain2305 Could you please review? Thanks!

comaniac

Just took a quick pass. Will take another look later this week.
Except for the left comments, one issue I saw is the redundent functions such as GetShape and trt_version_ge defined in both Relay pass side and runtime side. This also brings anoher concern I have. Since TensorRT is a specific backend, it seems not a good practice to put its pass along with other Relay passes. It would be much better if all TRT related staffs can be put together.

In addition, please make up missing docstrings of C++ functions/classes with the formal format if time allows.

include/tvm/relay/transform.h

python/tvm/relay/transform.py

src/runtime/contrib/tensorrt/tensorrt_module.cc

apivovarov · 2020-01-23T22:41:46Z

src/relay/backend/contrib/tensorrt/codegen_tensorrt.cc

+      }
+    } else {
+      LOG(FATAL) << "The input ref is expected to be a Relay function or module"
+                 << "\n";


In all other places the log message ends with full stop

apivovarov · 2020-01-23T22:44:07Z