TensorRT doesn't accelerate #35

Faldict · 2018-07-21T11:38:53Z

Compared with original models, the time cost using tensorrt engine is two times more. So why doesn't it accelerate the running speed? The figure below shows the MXNet model and TensorRT engine's running time per batch.

Sometimes, it occurs such errors:

Cuda error in file src/implicit_gemm.cu at line 1214: invalid resource handle
[TensorRT] ERROR: customWinogradConvActLayer.cpp (308) - Cuda Error in execute: 33
[TensorRT] ERROR: customWinogradConvActLayer.cpp (308) - Cuda Error in execute: 33

It's very weird, and I don't know what happened.

The text was updated successfully, but these errors were encountered:

yinghai · 2018-07-24T14:28:26Z

Could you list the code about how you run the test and measure the runtime? And which model did you use?

Faldict · 2018-07-25T05:31:20Z

@yinghai The experiments, running on a PC with Ubuntu 16.04 and a GTX 1060 GPU, test LeNet (trained by myself) and Inception-7 (downloaded from mxnet-model-gallery) models.

At first, I used your tensorrt_engine.

from tensorrt_engine import Engine

...

engine = Engine(trt_engine)
engine.run(data)

Unfortunately, it sometimes occured some Cuda error, as described before. Then I followed the TensorRT official documents:

stream = cuda.Stream()
cuda.memcpy_htod_async(d_input, data, stream)
context.enqueue(batch_size, bindings, stream.handle, None)
cuda.memcpy_dtoh_async(output, d_output, stream)
stream.synchronize()

The time cost is manually measured by time.time().

Faldict · 2018-07-26T02:23:14Z

This problem is quite similar to #32.

benbarsdell · 2018-07-30T17:36:07Z

What max_batch_size are you specifying? TensorRT performance will be best when batch_size = max_batch_size.

Faldict · 2018-08-08T02:17:33Z

@benbarsdell Thanks for your reply. I tried with batch_size = max_batch_size = 32, but it still performed slower than the original MXNet model. So what can I do?

cliffwoolley · 2018-08-08T05:04:41Z

We should probably separate the error message from the performance problem. I suggest let's get the error condition sorted out first.

Cuda error in file src/implicit_gemm.cu at line 1214: invalid resource handle
[TensorRT] ERROR: customWinogradConvActLayer.cpp (308) - Cuda Error in execute: 33

So that's from cuDNN. Exactly which version of cuDNN is this? And while we're at it, which CUDA, which TensorRT versions?

Faldict · 2018-08-08T12:11:19Z

@cliffwoolley I used cuDNN 7, CUDA 9.0 and TensorRT 4.0. All of them are downloaded from nvidia official websites and installed following their instructions. What's more, I built both MXNet and onnx-tensorrt from sources.

cliffwoolley · 2018-08-08T13:49:06Z

With apologies, can you say exactly which cuDNN version? There have been around ten different released versions numbered like 7.x.y.

Faldict · 2018-08-09T02:09:20Z

@cliffwoolley Sorry, I just guess it doesn't matters... I recheck my cuDNN version, typing

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

and it gets

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 4
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

So it seems that the cuDNN version is 7.0.4, right?

cliffwoolley · 2018-08-09T06:04:17Z

If you're able to try one of the cuDNN 7.1 or 7.2 builds -- and if that doesn't already fix the problem for you -- then we should be able to use the API logging feature that was added in cuDNN 7.1 to chase down where the problem is happening.

Thanks,
Cliff

Faldict · 2018-08-09T08:59:44Z

@cliffwoolley It works fine after I upgraded to cuDNN 7.2.1......at least, until now. But the memory cost to run the tensorrt_engine is too large, which becomes another reason for me to use tensorrt directly. Here is a simplified version of my code to use the tensorrt_engine. Could you plz take a look and point out where I can optimize it?

import numpy as np
import tensorrt as trt
from tensorrt_engine import Engine

G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.INFO)
trt_engine = trt.utils.load_engine(G_LOGGER, 'model/Inception-7.trt')
engine = Engine(trt_engine)
batch_size = 16

data = np.random.normal(0, 1, size=(batch_size, 3, 299, 299)).astype('float32')
output = engine.run([data])

Many thanks!

cliffwoolley · 2018-08-10T00:10:16Z

@benbarsdell Any further advice you can offer here?

cliffwoolley · 2018-08-10T18:15:40Z

@Faldict -- It's a bit of an aside, but apache/mxnet#11325 was merged to the MXNet master branch today. It uses onnx-tensorrt on your behalf under the hood. I wonder if you have a better experience using that higher-level interface?

Faldict · 2018-08-11T14:43:56Z

@cliffwoolley I have followed that PR from last month. Now that it is merged, I'll try it.

liuchang8am · 2019-05-18T04:15:33Z

any updates? Same issue here, tensorrt does not accelerate onnx(converted from pytorch) models

kobewangSky · 2019-11-20T07:33:09Z

try to save .trt and load a again

kevinch-nv · 2020-10-25T08:46:19Z

Does anyone have a repro for this issue with the latest version of TensorRT (7.2)?

kevinch-nv · 2020-12-01T16:21:53Z

Closing due to inactivity - if you are still having issues with the latest version of onnx-tensorrt feel free to open a new issue.

ttanzhiqiang · 2021-06-30T03:31:55Z

https://github.com/ttanzhiqiang/onnx_tensorrt_project

Faldict changed the title ~~TensorRT doesn~~ TensorRT doesn't accelerate Jul 21, 2018

Faldict mentioned this issue Aug 16, 2018

Failed to import MXNet built with TensorRT apache/mxnet#12142

Closed

Faldict mentioned this issue Nov 20, 2018

onnx_backend_test.py libnvonnxparser.so.0: cannot open shared object file #43

Closed

kevinch-nv added repro requested Request more information about reproduction of issue triaged Issue has been triaged by maintainers labels Oct 25, 2020

kevinch-nv closed this as completed Dec 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorRT doesn't accelerate #35

TensorRT doesn't accelerate #35

Faldict commented Jul 21, 2018 •

edited

Loading

yinghai commented Jul 24, 2018

Faldict commented Jul 25, 2018 •

edited

Loading

Faldict commented Jul 26, 2018

benbarsdell commented Jul 30, 2018

Faldict commented Aug 8, 2018

cliffwoolley commented Aug 8, 2018

Faldict commented Aug 8, 2018 •

edited

Loading

cliffwoolley commented Aug 8, 2018

Faldict commented Aug 9, 2018

cliffwoolley commented Aug 9, 2018

Faldict commented Aug 9, 2018

cliffwoolley commented Aug 10, 2018

cliffwoolley commented Aug 10, 2018

Faldict commented Aug 11, 2018

liuchang8am commented May 18, 2019

kobewangSky commented Nov 20, 2019

kevinch-nv commented Oct 25, 2020

kevinch-nv commented Dec 1, 2020

ttanzhiqiang commented Jun 30, 2021

TensorRT doesn't accelerate #35

TensorRT doesn't accelerate #35

Comments

Faldict commented Jul 21, 2018 • edited Loading

yinghai commented Jul 24, 2018

Faldict commented Jul 25, 2018 • edited Loading

Faldict commented Jul 26, 2018

benbarsdell commented Jul 30, 2018

Faldict commented Aug 8, 2018

cliffwoolley commented Aug 8, 2018

Faldict commented Aug 8, 2018 • edited Loading

cliffwoolley commented Aug 8, 2018

Faldict commented Aug 9, 2018

cliffwoolley commented Aug 9, 2018

Faldict commented Aug 9, 2018

cliffwoolley commented Aug 10, 2018

cliffwoolley commented Aug 10, 2018

Faldict commented Aug 11, 2018

liuchang8am commented May 18, 2019

kobewangSky commented Nov 20, 2019

kevinch-nv commented Oct 25, 2020

kevinch-nv commented Dec 1, 2020

ttanzhiqiang commented Jun 30, 2021

Faldict commented Jul 21, 2018 •

edited

Loading

Faldict commented Jul 25, 2018 •

edited

Loading

Faldict commented Aug 8, 2018 •

edited

Loading