Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relay/TRT Integration (whole graph only) #54

Merged
merged 22 commits into from
Jan 24, 2020
Merged

Conversation

trevor-m
Copy link

@trevor-m trevor-m commented Nov 5, 2019

This PR includes support for a version of Relay/TRT integration which only works when the entire model can be converted to TRT. It is enabled with the EnableTrt pass. If any op in the model cannot be converted to TRT, EnableTrt will return the original module unmodified.

How to use

  1. Build TVM with cmake flag USE_TENSORRT=ON or USE_TENSORRT=/path/to/TensorRT. USE_CUDA should be enabled as well.

  2. Convert the model into TensorRT. This step will determine if every node in the graph can be converted to TensorrRT and if so will mark the graph to use TensorRT and apply some specific optimization passes.

import tvm.relay.tensorrt
mod = relay.tensorrt.EnableTrt(mod, params)
  1. Check if TRT was enabled. If not, it means some op in the graph is not supported by the TensorRT conversion. EnableTrt will output which particular ops are not supported and why.
assert mod['main'].attrs and mod['main'].attrs.Compiler == 'tensorrt'
  1. Finish compilation.
with relay.build_config(opt_level=2, disabled_pass={"SimplifyInference"}):
  graph, lib, params = relay.build(mod, "cuda", params=params)
  1. (Optional) Serialize/deserialize the compiled model. The model will be serialized to three files: compiled.json, compiled.params, and compiled.tensorrt.
# Serialize
with open('compiled.json', 'w') as f_graph_json:
  f_graph_json.write(graph)
with open('compiled.params', 'wb') as f_params:
  f_params.write(relay.save_param_dict(params))
lib.save('compiled.tensorrt')

# Deserialize
with open('compiled.json', 'r') as f_graph_json:
  graph = f_graph_json.read()
with open('compiled.params', 'rb') as f_params:
  params = tvm.relay.load_param_dict(f_params.read())
lib = tvm.module.load("compiled.tensorrt")
  1. Run inference. The first invocation will trigger creation of the TensorRT engine. This could take up to a few minutes.
# Create graph runtime
mod = graph_runtime.create(graph, lib, ctx=tvm.gpu(0))
mod.set_input(**params)

i_data = np.random.uniform(0, 1, input_shape).astype(dtype)
# Build TensorRT engine
mod.run(data=i_data)

# Run inference
mod.run(data=i_data)
res = mod.get_output(0)

The tests tests/python/relay/test_tensorrt.py provide some deeper examples of how to use this feature.

The NNVM/TRT integration is still present.

@trevor-m trevor-m force-pushed the trevmorr-trt branch 2 times, most recently from 80d223c to 6103470 Compare November 25, 2019 23:07
@trevor-m trevor-m force-pushed the trevmorr-trt branch 3 times, most recently from 92341d5 to edd68b2 Compare December 9, 2019 23:57
@trevor-m trevor-m changed the title [WIP] Relay/TRT Integration (whole graph only) Relay/TRT Integration (whole graph only) Dec 20, 2019
@trevor-m
Copy link
Author

@zhiics Could you please review?

@trevor-m
Copy link
Author

@yongwww @reminisce @ziyu-guo Could you please review?

Copy link

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started reviewing. Will spend more time on it.

CC @comaniac who might be interested as well

include/tvm/attrs.h Outdated Show resolved Hide resolved
python/tvm/relay/transform.py Outdated Show resolved Hide resolved
src/relay/backend/contrib/tensorrt/trt_builder.cc Outdated Show resolved Hide resolved
src/relay/backend/contrib/tensorrt/trt_builder.cc Outdated Show resolved Hide resolved
src/relay/backend/contrib/tensorrt/trt_builder.cc Outdated Show resolved Hide resolved
src/relay/backend/contrib/tensorrt/trt_builder.cc Outdated Show resolved Hide resolved
src/relay/backend/contrib/tensorrt/trt_builder.cc Outdated Show resolved Hide resolved
src/relay/backend/contrib/tensorrt/trt_builder.cc Outdated Show resolved Hide resolved
@trevor-m trevor-m force-pushed the trevmorr-trt branch 2 times, most recently from 677457f to afc2731 Compare January 9, 2020 18:19
@trevor-m
Copy link
Author

trevor-m commented Jan 9, 2020

@zhiics Could you take a quick look at my changes to use the external codegen/runtime?

Copy link

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need more time to review as this is a very big PR. Please ping a few more ppl to take a look.

cmake/config.cmake Outdated Show resolved Hide resolved
src/relay/pass/enable_tensorrt.cc Outdated Show resolved Hide resolved
src/relay/pass/enable_tensorrt.cc Outdated Show resolved Hide resolved
src/relay/pass/enable_tensorrt.cc Outdated Show resolved Hide resolved
src/relay/pass/enable_tensorrt.cc Outdated Show resolved Hide resolved
Copy link

@zhiics zhiics left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments

src/runtime/contrib/tensorrt/tensorrt_module.h Outdated Show resolved Hide resolved
src/runtime/contrib/tensorrt/utils.h Outdated Show resolved Hide resolved
src/runtime/contrib/tensorrt/utils.h Outdated Show resolved Hide resolved
tests/python/relay/test_tensorrt.py Outdated Show resolved Hide resolved
Trevor Morris added 3 commits January 14, 2020 19:07
Fix merge

Fix merge and clean up logs

Add BiasAdd, Concat, padding ceil mode, and clean up code

Fix formatting and remove unused headers

uncomment models

Fix bug with variable input, clean up

Don't split batch norm

Move TRT execution to TrtExecutor

Clean up

Clean up

Add paritioning

Implement graph_runtime execution for Relay/TRT

Fix bug in extern op

Fix compilation

Add EnableTrt pass to perform same modification as previous wholegraphannotator

Renable NNVM TRT

Remove SimplifyBatchnorm, add rules for converting ops

Fix format, remove unused tests

Enable multiple outputs

Fix multiple outputs

Fix activation lookup

Fix no newline at eof

Add license header. Add consistency test to models

Add method to check TRT used. Improve comments

Fix lint

Add util to check TRT version

Add if guards around TRT5.1 APIs

Add env var for workspace size, fix logger

fix build

Add TRT versioning to EnableTrt pass

Fix build error in DLR

Fix compile for DLR

Update dmlc-core, fix copyright header, undo change to includes

Remove unused headers

Fix IsTrtCompatible visitor and move op list to constructor

Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test

Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround

Fix formatting. Add unit tests

Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops.

Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass

Support (2,3,0,1) tranpose on weights

Allow stride to be incomplete. Support ConstantNode -> kWeight

Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool

Comments, disable failign test

Fix CI lint

Removed unused variables from TrtBuilder. Add more comments

Fix build for TRT4

Add GetTrtVersion(), Move convert map to function, remove uneeded include,  make batch_size_, logger_ TrtBuilder members, check output existence

Use shared_ptr for converters. Add check for num outputs and inputs

Support image.resize

Make GetOpConverters return a shared_ptr

Clarify count inclusive padding weirdness

Use external codegen/runtime

Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes

Require format to be tensorrt so that loader knows how to load

FoldConstants

Destroy engine and context after use. Store TRT weights from op converters. Formatting

Always apply ConvertLayout to NCHW

Clean up

Add ASF header

Change ObjectRef -> NodeRef

Fix lint

Fix pylint

Fix bug with scalar weights

Making TRT cmake more informative

Make tensorrt tests dependent on whether trt codegen is enabled

Add serialization test.
@trevor-m
Copy link
Author

@comaniac @anijain2305 Could you please review? Thanks!

Copy link

@comaniac comaniac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a quick pass. Will take another look later this week.
Except for the left comments, one issue I saw is the redundent functions such as GetShape and trt_version_ge defined in both Relay pass side and runtime side. This also brings anoher concern I have. Since TensorRT is a specific backend, it seems not a good practice to put its pass along with other Relay passes. It would be much better if all TRT related staffs can be put together.

In addition, please make up missing docstrings of C++ functions/classes with the formal format if time allows.

include/tvm/relay/transform.h Outdated Show resolved Hide resolved
python/tvm/relay/transform.py Outdated Show resolved Hide resolved
python/tvm/relay/transform.py Outdated Show resolved Hide resolved
python/tvm/relay/transform.py Outdated Show resolved Hide resolved
python/tvm/relay/transform.py Outdated Show resolved Hide resolved
}
} else {
LOG(FATAL) << "The input ref is expected to be a Relay function or module"
<< "\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all other places the log message ends with full stop

bool BiasAddOpChecker(const CallNode* call, const std::string& op_name,
const std::tuple<int, int, int>& trt_version) {
auto shape0 = GetShape(call->type_args[0]);
if (shape0.size() < 2 || shape0.size() > 4) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto shape0_size = GetShape(call->type_args[0]).size();

LOG(INFO) << op_name << " not supported: pad mode must be constant.";
return false;
} else if (attrs->pad_value != 0.0) {
LOG(INFO) << op_name << " not supported: pad value must be zero.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I both of the messages above I'd output current pad_mode and pad_value.

LOG(INFO) << op_name << " not supported: pad mode is %s but must be constant.";

LOG(INFO) << op_name << " not supported: pad value is %f but must be 0.0."

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I went ahead and included current values for the rest of the ops messages as well.

// This workaround isn't required anymore since ConstantFolding will take
// care of the transpose for us. However, in the case where the weights
// aren't marked as params it can still be useful.
if ((op_name == "nn.conv2d" || op_name == "nn.dense") && i == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put i == 1 expr to the left of && because it can prevent unnecessary string comparisons

auto it = trt_compatible_ops.find(op_name);
if (it == trt_compatible_ops.end() ||
!it->second(call, op_name, trt_version_)) {
LOG(INFO) << op_name << " not supported.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we say smth about trt_version_ here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added message to TrtVersionChecker, and calling TrtVersionChecker from other checkers when applicable.

}

private:
bool compatible_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like compatible_ is not initialized after the object is constructed. It will be initialized only after user of the object calls Check.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializing it in constructor now

Array<Var> func_params;
for (auto param : func->params) {
func_params.push_back(param);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use constructor(vector) or constructor(begin, end) to create and fill func_params here?

op->checked_type_);

original_inputs_.push_back({var, GetRef<Expr>(op)});
return std::move(var);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to std::move precludes the NRVO compiler optimization here.

return
if not relay.tensorrt.IsTrtRuntimeAvailable():
print("skip because tensorrt runtime is not available")
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two checks above repeat 5 times in this module. Should we have utility method for this?

lib = tvm.module.load("compiled.tensorrt")
# Run
mod = graph_runtime.create(graph, lib, ctx=tvm.gpu(0))
mod.set_input(**params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

params=bytearray(f_params.read())
...
mod.load_params(params)

# Run reference
mod = relay.Module()
mod['main'] = f
with relay.build_config(opt_level=1):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use opt_level=3 here?

for (int w = 0; w < s; w++) {
const int input_index = (x) + (y * k) + (z * s * c * k) + (w * c * k);
const int output_index =
(y * k * r * s) + (x * r * s) + (z * s) + (w);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is the only line which is different in TransposeRSCKtoCKRS in comparison to TransposeRSCKtoKCRS. Maybe define two functions to calculate output_index for CKRS and KCRS and use generic implementation + corresponding function to get one or another behavior?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I created TransposeWeights4D which takes input and output strides as arguments.

}
} else {
LOG(FATAL) << "TRT requires a constant input here.";
}
Copy link
Member

@apivovarov apivovarov Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove "TRT requires a constant input here." repetition here. e.g.

        CallNode* transpose = call->args[i].as<CallNode>();
        VarNode* weights;
        if (transpose != nullptr
            && transpose->op.as<OpNode>()->name == "transpose"
            && (weights = transpose->args[0].as<VarNode>()) != nullptr) {
          GetInputAsTransposedWeights(transpose, weights);
        } else {
          LOG(FATAL) << "TRT requires a constant input here.";
        }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I was wondering how to clean that up.

for (size_t i = 0; i < call->args.size(); ++i) {
// Handle special case where input must be constant array on CPU.
if (!converter->variable_input_count &&
converter->input_types[i] == kWeight) {
Copy link
Member

@apivovarov apivovarov Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can make this "global" if to take just 5 lines and shift all code in the loop to the left

    if (converter->variable_input_count ||
        converter->input_types[i] != kWeight) {
      VisitExpr(call->args[i]);
      continue;
    }


class LegalizeLayoutTranform(ExprMutator):
"""
Legalize Relay layout transforms to transpose ops to simplify TensorRT conversion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to use Legalize and relay.transpose? Can we use layout transform to convert source Relay graph to what TensorRT expects?

Copy link
Author

@trevor-m trevor-m Jan 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its better to leverage relay's pass ability to convert layout_transform op to the more standard transpose ops. This way we only need to write one TrtOpConverter for transpose. If we didn't perform this legalize, I would need to write an additional TrtOpConverter for layout_transform which would be nearly identical to the one for transpose.

This feature of relay is very useful. For example, TRT recently announced that they won't support INT8 for matmul/fully connected layer and they want everyone to just use 1x1 Conv instead (https://docs.nvidia.com/deeplearning/sdk/tensorrt-best-practices/index.html#optimize-layer). So in the future, I plan to have a similar pass to convert all matmul/dense layers into convolutions to take advantage of this. At that point I won't need a converter for dense anymore since everything would go to conv.

@trevor-m trevor-m merged commit ea78f1d into neo-ai:dev Jan 24, 2020
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Feb 28, 2020
* Add tensorrt backend.

Fix merge

Fix merge and clean up logs

Add BiasAdd, Concat, padding ceil mode, and clean up code

Fix formatting and remove unused headers

uncomment models

Fix bug with variable input, clean up

Don't split batch norm

Move TRT execution to TrtExecutor

Clean up

Clean up

Add paritioning

Implement graph_runtime execution for Relay/TRT

Fix bug in extern op

Fix compilation

Add EnableTrt pass to perform same modification as previous wholegraphannotator

Renable NNVM TRT

Remove SimplifyBatchnorm, add rules for converting ops

Fix format, remove unused tests

Enable multiple outputs

Fix multiple outputs

Fix activation lookup

Fix no newline at eof

Add license header. Add consistency test to models

Add method to check TRT used. Improve comments

Fix lint

Add util to check TRT version

Add if guards around TRT5.1 APIs

Add env var for workspace size, fix logger

fix build

Add TRT versioning to EnableTrt pass

Fix build error in DLR

Fix compile for DLR

Update dmlc-core, fix copyright header, undo change to includes

Remove unused headers

Fix IsTrtCompatible visitor and move op list to constructor

Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test

Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround

Fix formatting. Add unit tests

Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops.

Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass

Support (2,3,0,1) tranpose on weights

Allow stride to be incomplete. Support ConstantNode -> kWeight

Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool

Comments, disable failign test

Fix CI lint

Removed unused variables from TrtBuilder. Add more comments

Fix build for TRT4

Add GetTrtVersion(), Move convert map to function, remove uneeded include,  make batch_size_, logger_ TrtBuilder members, check output existence

Use shared_ptr for converters. Add check for num outputs and inputs

Support image.resize

Make GetOpConverters return a shared_ptr

Clarify count inclusive padding weirdness

Use external codegen/runtime

Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes

Require format to be tensorrt so that loader knows how to load

FoldConstants

Destroy engine and context after use. Store TRT weights from op converters. Formatting

Always apply ConvertLayout to NCHW

Clean up

Add ASF header

Change ObjectRef -> NodeRef

Fix lint

Fix pylint

Fix bug with scalar weights

Making TRT cmake more informative

Make tensorrt tests dependent on whether trt codegen is enabled

Add serialization test.

* Refactor EnableTRT checkers

* Fix const weight detection

* remove tensorrt_module.h, add test for multiple outputs. Use normal GetShape. Remove GetType. Add flag for additional model testing

Undo add comments to prevent conflicts

* Separate TRT from relay. Add docstrings and more comments. Move all passes to python. Remove double lookup for Execute

Formatting

Fix lint

Fix pylint

Rename codegen_tensorrt. Check registry get. Add comments

Make trt codegen off by default.

* disable for ci

* TRT codegen can be turned on independently

* Fix tests

* Fix build without runtime

* Enable AvgPool approximation

* Remove change to cmake config

* Move passes to PreprocessForTrt. Use op.name. Rename LegalizeLayoutTransform.

* Add newlin to EOF. Remove else. Reserve space for vectors

* Remove AdaptivePool2D commentted out code. Add comment for transposed weight workaround

* Rename IsCompatibleFn

* Use ++i instead of i++

* Improve incompatible messages, use string::empty, small improvements

* Use constructor to fill func_params

* Remove std::move

* Use opt level 3, add helper to check whether to run test, improve load_params

* Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D

* Clean up VisitExpr(CallNode) for args
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Mar 16, 2020
* Add tensorrt backend.

Fix merge

Fix merge and clean up logs

Add BiasAdd, Concat, padding ceil mode, and clean up code

Fix formatting and remove unused headers

uncomment models

Fix bug with variable input, clean up

Don't split batch norm

Move TRT execution to TrtExecutor

Clean up

Clean up

Add paritioning

Implement graph_runtime execution for Relay/TRT

Fix bug in extern op

Fix compilation

Add EnableTrt pass to perform same modification as previous wholegraphannotator

Renable NNVM TRT

Remove SimplifyBatchnorm, add rules for converting ops

Fix format, remove unused tests

Enable multiple outputs

Fix multiple outputs

Fix activation lookup

Fix no newline at eof

Add license header. Add consistency test to models

Add method to check TRT used. Improve comments

Fix lint

Add util to check TRT version

Add if guards around TRT5.1 APIs

Add env var for workspace size, fix logger

fix build

Add TRT versioning to EnableTrt pass

Fix build error in DLR

Fix compile for DLR

Update dmlc-core, fix copyright header, undo change to includes

Remove unused headers

Fix IsTrtCompatible visitor and move op list to constructor

Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test

Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround

Fix formatting. Add unit tests

Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops.

Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass

Support (2,3,0,1) tranpose on weights

Allow stride to be incomplete. Support ConstantNode -> kWeight

Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool

Comments, disable failign test

Fix CI lint

Removed unused variables from TrtBuilder. Add more comments

Fix build for TRT4

Add GetTrtVersion(), Move convert map to function, remove uneeded include,  make batch_size_, logger_ TrtBuilder members, check output existence

Use shared_ptr for converters. Add check for num outputs and inputs

Support image.resize

Make GetOpConverters return a shared_ptr

Clarify count inclusive padding weirdness

Use external codegen/runtime

Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes

Require format to be tensorrt so that loader knows how to load

FoldConstants

Destroy engine and context after use. Store TRT weights from op converters. Formatting

Always apply ConvertLayout to NCHW

Clean up

Add ASF header

Change ObjectRef -> NodeRef

Fix lint

Fix pylint

Fix bug with scalar weights

Making TRT cmake more informative

Make tensorrt tests dependent on whether trt codegen is enabled

Add serialization test.

* Refactor EnableTRT checkers

* Fix const weight detection

* remove tensorrt_module.h, add test for multiple outputs. Use normal GetShape. Remove GetType. Add flag for additional model testing

Undo add comments to prevent conflicts

* Separate TRT from relay. Add docstrings and more comments. Move all passes to python. Remove double lookup for Execute

Formatting

Fix lint

Fix pylint

Rename codegen_tensorrt. Check registry get. Add comments

Make trt codegen off by default.

* disable for ci

* TRT codegen can be turned on independently

* Fix tests

* Fix build without runtime

* Enable AvgPool approximation

* Remove change to cmake config

* Move passes to PreprocessForTrt. Use op.name. Rename LegalizeLayoutTransform.

* Add newlin to EOF. Remove else. Reserve space for vectors

* Remove AdaptivePool2D commentted out code. Add comment for transposed weight workaround

* Rename IsCompatibleFn

* Use ++i instead of i++

* Improve incompatible messages, use string::empty, small improvements

* Use constructor to fill func_params

* Remove std::move

* Use opt level 3, add helper to check whether to run test, improve load_params

* Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D

* Clean up VisitExpr(CallNode) for args
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Mar 25, 2020
* Add tensorrt backend.

Fix merge

Fix merge and clean up logs

Add BiasAdd, Concat, padding ceil mode, and clean up code

Fix formatting and remove unused headers

uncomment models

Fix bug with variable input, clean up

Don't split batch norm

Move TRT execution to TrtExecutor

Clean up

Clean up

Add paritioning

Implement graph_runtime execution for Relay/TRT

Fix bug in extern op

Fix compilation

Add EnableTrt pass to perform same modification as previous wholegraphannotator

Renable NNVM TRT

Remove SimplifyBatchnorm, add rules for converting ops

Fix format, remove unused tests

Enable multiple outputs

Fix multiple outputs

Fix activation lookup

Fix no newline at eof

Add license header. Add consistency test to models

Add method to check TRT used. Improve comments

Fix lint

Add util to check TRT version

Add if guards around TRT5.1 APIs

Add env var for workspace size, fix logger

fix build

Add TRT versioning to EnableTrt pass

Fix build error in DLR

Fix compile for DLR

Update dmlc-core, fix copyright header, undo change to includes

Remove unused headers

Fix IsTrtCompatible visitor and move op list to constructor

Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test

Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround

Fix formatting. Add unit tests

Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops.

Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass

Support (2,3,0,1) tranpose on weights

Allow stride to be incomplete. Support ConstantNode -> kWeight

Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool

Comments, disable failign test

Fix CI lint

Removed unused variables from TrtBuilder. Add more comments

Fix build for TRT4

Add GetTrtVersion(), Move convert map to function, remove uneeded include,  make batch_size_, logger_ TrtBuilder members, check output existence

Use shared_ptr for converters. Add check for num outputs and inputs

Support image.resize

Make GetOpConverters return a shared_ptr

Clarify count inclusive padding weirdness

Use external codegen/runtime

Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes

Require format to be tensorrt so that loader knows how to load

FoldConstants

Destroy engine and context after use. Store TRT weights from op converters. Formatting

Always apply ConvertLayout to NCHW

Clean up

Add ASF header

Change ObjectRef -> NodeRef

Fix lint

Fix pylint

Fix bug with scalar weights

Making TRT cmake more informative

Make tensorrt tests dependent on whether trt codegen is enabled

Add serialization test.

* Refactor EnableTRT checkers

* Fix const weight detection

* remove tensorrt_module.h, add test for multiple outputs. Use normal GetShape. Remove GetType. Add flag for additional model testing

Undo add comments to prevent conflicts

* Separate TRT from relay. Add docstrings and more comments. Move all passes to python. Remove double lookup for Execute

Formatting

Fix lint

Fix pylint

Rename codegen_tensorrt. Check registry get. Add comments

Make trt codegen off by default.

* disable for ci

* TRT codegen can be turned on independently

* Fix tests

* Fix build without runtime

* Enable AvgPool approximation

* Remove change to cmake config

* Move passes to PreprocessForTrt. Use op.name. Rename LegalizeLayoutTransform.

* Add newlin to EOF. Remove else. Reserve space for vectors

* Remove AdaptivePool2D commentted out code. Add comment for transposed weight workaround

* Rename IsCompatibleFn

* Use ++i instead of i++

* Improve incompatible messages, use string::empty, small improvements

* Use constructor to fill func_params

* Remove std::move

* Use opt level 3, add helper to check whether to run test, improve load_params

* Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D

* Clean up VisitExpr(CallNode) for args
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 10, 2020
* Add tensorrt backend.

Fix merge

Fix merge and clean up logs

Add BiasAdd, Concat, padding ceil mode, and clean up code

Fix formatting and remove unused headers

uncomment models

Fix bug with variable input, clean up

Don't split batch norm

Move TRT execution to TrtExecutor

Clean up

Clean up

Add paritioning

Implement graph_runtime execution for Relay/TRT

Fix bug in extern op

Fix compilation

Add EnableTrt pass to perform same modification as previous wholegraphannotator

Renable NNVM TRT

Remove SimplifyBatchnorm, add rules for converting ops

Fix format, remove unused tests

Enable multiple outputs

Fix multiple outputs

Fix activation lookup

Fix no newline at eof

Add license header. Add consistency test to models

Add method to check TRT used. Improve comments

Fix lint

Add util to check TRT version

Add if guards around TRT5.1 APIs

Add env var for workspace size, fix logger

fix build

Add TRT versioning to EnableTrt pass

Fix build error in DLR

Fix compile for DLR

Update dmlc-core, fix copyright header, undo change to includes

Remove unused headers

Fix IsTrtCompatible visitor and move op list to constructor

Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test

Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround

Fix formatting. Add unit tests

Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops.

Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass

Support (2,3,0,1) tranpose on weights

Allow stride to be incomplete. Support ConstantNode -> kWeight

Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool

Comments, disable failign test

Fix CI lint

Removed unused variables from TrtBuilder. Add more comments

Fix build for TRT4

Add GetTrtVersion(), Move convert map to function, remove uneeded include,  make batch_size_, logger_ TrtBuilder members, check output existence

Use shared_ptr for converters. Add check for num outputs and inputs

Support image.resize

Make GetOpConverters return a shared_ptr

Clarify count inclusive padding weirdness

Use external codegen/runtime

Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes

Require format to be tensorrt so that loader knows how to load

FoldConstants

Destroy engine and context after use. Store TRT weights from op converters. Formatting

Always apply ConvertLayout to NCHW

Clean up

Add ASF header

Change ObjectRef -> NodeRef

Fix lint

Fix pylint

Fix bug with scalar weights

Making TRT cmake more informative

Make tensorrt tests dependent on whether trt codegen is enabled

Add serialization test.

* Refactor EnableTRT checkers

* Fix const weight detection

* remove tensorrt_module.h, add test for multiple outputs. Use normal GetShape. Remove GetType. Add flag for additional model testing

Undo add comments to prevent conflicts

* Separate TRT from relay. Add docstrings and more comments. Move all passes to python. Remove double lookup for Execute

Formatting

Fix lint

Fix pylint

Rename codegen_tensorrt. Check registry get. Add comments

Make trt codegen off by default.

* disable for ci

* TRT codegen can be turned on independently

* Fix tests

* Fix build without runtime

* Enable AvgPool approximation

* Remove change to cmake config

* Move passes to PreprocessForTrt. Use op.name. Rename LegalizeLayoutTransform.

* Add newlin to EOF. Remove else. Reserve space for vectors

* Remove AdaptivePool2D commentted out code. Add comment for transposed weight workaround

* Rename IsCompatibleFn

* Use ++i instead of i++

* Improve incompatible messages, use string::empty, small improvements

* Use constructor to fill func_params

* Remove std::move

* Use opt level 3, add helper to check whether to run test, improve load_params

* Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D

* Clean up VisitExpr(CallNode) for args
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
* Add tensorrt backend.

Fix merge

Fix merge and clean up logs

Add BiasAdd, Concat, padding ceil mode, and clean up code

Fix formatting and remove unused headers

uncomment models

Fix bug with variable input, clean up

Don't split batch norm

Move TRT execution to TrtExecutor

Clean up

Clean up

Add paritioning

Implement graph_runtime execution for Relay/TRT

Fix bug in extern op

Fix compilation

Add EnableTrt pass to perform same modification as previous wholegraphannotator

Renable NNVM TRT

Remove SimplifyBatchnorm, add rules for converting ops

Fix format, remove unused tests

Enable multiple outputs

Fix multiple outputs

Fix activation lookup

Fix no newline at eof

Add license header. Add consistency test to models

Add method to check TRT used. Improve comments

Fix lint

Add util to check TRT version

Add if guards around TRT5.1 APIs

Add env var for workspace size, fix logger

fix build

Add TRT versioning to EnableTrt pass

Fix build error in DLR

Fix compile for DLR

Update dmlc-core, fix copyright header, undo change to includes

Remove unused headers

Fix IsTrtCompatible visitor and move op list to constructor

Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test

Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround

Fix formatting. Add unit tests

Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops.

Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass

Support (2,3,0,1) tranpose on weights

Allow stride to be incomplete. Support ConstantNode -> kWeight

Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool

Comments, disable failign test

Fix CI lint

Removed unused variables from TrtBuilder. Add more comments

Fix build for TRT4

Add GetTrtVersion(), Move convert map to function, remove uneeded include,  make batch_size_, logger_ TrtBuilder members, check output existence

Use shared_ptr for converters. Add check for num outputs and inputs

Support image.resize

Make GetOpConverters return a shared_ptr

Clarify count inclusive padding weirdness

Use external codegen/runtime

Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes

Require format to be tensorrt so that loader knows how to load

FoldConstants

Destroy engine and context after use. Store TRT weights from op converters. Formatting

Always apply ConvertLayout to NCHW

Clean up

Add ASF header

Change ObjectRef -> NodeRef

Fix lint

Fix pylint

Fix bug with scalar weights

Making TRT cmake more informative

Make tensorrt tests dependent on whether trt codegen is enabled

Add serialization test.

* Refactor EnableTRT checkers

* Fix const weight detection

* remove tensorrt_module.h, add test for multiple outputs. Use normal GetShape. Remove GetType. Add flag for additional model testing

Undo add comments to prevent conflicts

* Separate TRT from relay. Add docstrings and more comments. Move all passes to python. Remove double lookup for Execute

Formatting

Fix lint

Fix pylint

Rename codegen_tensorrt. Check registry get. Add comments

Make trt codegen off by default.

* disable for ci

* TRT codegen can be turned on independently

* Fix tests

* Fix build without runtime

* Enable AvgPool approximation

* Remove change to cmake config

* Move passes to PreprocessForTrt. Use op.name. Rename LegalizeLayoutTransform.

* Add newlin to EOF. Remove else. Reserve space for vectors

* Remove AdaptivePool2D commentted out code. Add comment for transposed weight workaround

* Rename IsCompatibleFn

* Use ++i instead of i++

* Improve incompatible messages, use string::empty, small improvements

* Use constructor to fill func_params

* Remove std::move

* Use opt level 3, add helper to check whether to run test, improve load_params

* Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D

* Clean up VisitExpr(CallNode) for args
trevor-m pushed a commit to trevor-m/tvm that referenced this pull request Apr 16, 2020
* Add tensorrt backend.

Fix merge

Fix merge and clean up logs

Add BiasAdd, Concat, padding ceil mode, and clean up code

Fix formatting and remove unused headers

uncomment models

Fix bug with variable input, clean up

Don't split batch norm

Move TRT execution to TrtExecutor

Clean up

Clean up

Add paritioning

Implement graph_runtime execution for Relay/TRT

Fix bug in extern op

Fix compilation

Add EnableTrt pass to perform same modification as previous wholegraphannotator

Renable NNVM TRT

Remove SimplifyBatchnorm, add rules for converting ops

Fix format, remove unused tests

Enable multiple outputs

Fix multiple outputs

Fix activation lookup

Fix no newline at eof

Add license header. Add consistency test to models

Add method to check TRT used. Improve comments

Fix lint

Add util to check TRT version

Add if guards around TRT5.1 APIs

Add env var for workspace size, fix logger

fix build

Add TRT versioning to EnableTrt pass

Fix build error in DLR

Fix compile for DLR

Update dmlc-core, fix copyright header, undo change to includes

Remove unused headers

Fix IsTrtCompatible visitor and move op list to constructor

Add dropout to compatible ops for CheckTrtCompatible only. Add not compatible test

Add squeeze, transpose, reshape, pad, and reduce ops. Add transpose on weights workaround

Fix formatting. Add unit tests

Support transpose on weights for conv2d and dense. Support asymmetric padding. Temp fix for 1D inputs. Add units tests for all ops.

Support StridedSlice, AdaptivePooling approximation, Pytorch addmm fixer pass

Support (2,3,0,1) tranpose on weights

Allow stride to be incomplete. Support ConstantNode -> kWeight

Fix pass serialized graph by value in runtime. Allow inclusive count for strided pool

Comments, disable failign test

Fix CI lint

Removed unused variables from TrtBuilder. Add more comments

Fix build for TRT4

Add GetTrtVersion(), Move convert map to function, remove uneeded include,  make batch_size_, logger_ TrtBuilder members, check output existence

Use shared_ptr for converters. Add check for num outputs and inputs

Support image.resize

Make GetOpConverters return a shared_ptr

Clarify count inclusive padding weirdness

Use external codegen/runtime

Move to src/runtime/contrib/tensorrt. Add Save and Load methods for tensorrt module. Rename some classes

Require format to be tensorrt so that loader knows how to load

FoldConstants

Destroy engine and context after use. Store TRT weights from op converters. Formatting

Always apply ConvertLayout to NCHW

Clean up

Add ASF header

Change ObjectRef -> NodeRef

Fix lint

Fix pylint

Fix bug with scalar weights

Making TRT cmake more informative

Make tensorrt tests dependent on whether trt codegen is enabled

Add serialization test.

* Refactor EnableTRT checkers

* Fix const weight detection

* remove tensorrt_module.h, add test for multiple outputs. Use normal GetShape. Remove GetType. Add flag for additional model testing

Undo add comments to prevent conflicts

* Separate TRT from relay. Add docstrings and more comments. Move all passes to python. Remove double lookup for Execute

Formatting

Fix lint

Fix pylint

Rename codegen_tensorrt. Check registry get. Add comments

Make trt codegen off by default.

* disable for ci

* TRT codegen can be turned on independently

* Fix tests

* Fix build without runtime

* Enable AvgPool approximation

* Remove change to cmake config

* Move passes to PreprocessForTrt. Use op.name. Rename LegalizeLayoutTransform.

* Add newlin to EOF. Remove else. Reserve space for vectors

* Remove AdaptivePool2D commentted out code. Add comment for transposed weight workaround

* Rename IsCompatibleFn

* Use ++i instead of i++

* Improve incompatible messages, use string::empty, small improvements

* Use constructor to fill func_params

* Remove std::move

* Use opt level 3, add helper to check whether to run test, improve load_params

* Replace TransposeRSCKtoCKRS/KCRS with TransposeWeights4D

* Clean up VisitExpr(CallNode) for args
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.