diff --git a/docs/en/faq.md b/docs/en/faq.md index f1183e6413..a8d0f8a517 100644 --- a/docs/en/faq.md +++ b/docs/en/faq.md @@ -1 +1,30 @@ ## Frequently Asked Questions + +### TensorRT + +- "WARNING: Half2 support requested on hardware without native FP16 support, performance will be negatively affected." + + Fp16 mode requires a device with full-rate fp16 support. + +- "error: parameter check failed at: engine.cpp::setBindingDimensions::1046, condition: profileMinDims.d[i] <= dimensions.d[i]" + + When building an `ICudaEngine` from an `INetworkDefinition` that has dynamically resizable inputs, users need to specify at least one optimization profile. Which can be set in deploy config: + + ```python + backend_config = dict( + common_config=dict(max_workspace_size=1 << 30), + model_inputs=[ + dict( + input_shapes=dict( + input=dict( + min_shape=[1, 3, 320, 320], + opt_shape=[1, 3, 800, 1344], + max_shape=[1, 3, 1344, 1344]))) + ]) + ``` + + The input tensor shape should be limited between `min_shape` and `max_shape`. + +- "error: [TensorRT] INTERNAL ERROR: Assertion failed: cublasStatus == CUBLAS_STATUS_SUCCESS" + + TRT 7.2.1 switches to use cuBLASLt (previously it was cuBLAS). cuBLASLt is the defaulted choice for SM version >= 7.0. You may need CUDA-10.2 Patch 1 (Released Aug 26, 2020) to resolve some cuBLASLt issues. Another option is to use the new TacticSource API and disable cuBLASLt tactics if you dont want to upgrade. diff --git a/mmdeploy/backend/tensorrt/wrapper.py b/mmdeploy/backend/tensorrt/wrapper.py index 0efd419c3c..9a0cd2e10f 100644 --- a/mmdeploy/backend/tensorrt/wrapper.py +++ b/mmdeploy/backend/tensorrt/wrapper.py @@ -88,7 +88,18 @@ def forward(self, inputs: Dict[str, assert self._output_names is not None bindings = [None] * (len(self._input_names) + len(self._output_names)) + profile_id = 0 for input_name, input_tensor in inputs.items(): + # check if input shape is valid + profile = self.engine.get_profile_shape(profile_id, input_name) + assert input_tensor.dim() == len( + profile[0]), 'Input dim is different from engine profile.' + for s_min, s_input, s_max in zip(profile[0], input_tensor.shape, + profile[2]): + assert s_min <= s_input <= s_max, \ + 'Input shape should be between ' \ + + f'{profile[0]} and {profile[2]}' \ + + f' but get {tuple(input_tensor.shape)}.' idx = self.engine.get_binding_index(input_name) # All input tensors must be gpu variables