Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorRT doesn't accelerate #35

Closed
Faldict opened this issue Jul 21, 2018 · 19 comments
Closed

TensorRT doesn't accelerate #35

Faldict opened this issue Jul 21, 2018 · 19 comments
Labels
repro requested Request more information about reproduction of issue triaged Issue has been triaged by maintainers

Comments

@Faldict
Copy link

Faldict commented Jul 21, 2018

Compared with original models, the time cost using tensorrt engine is two times more. So why doesn't it accelerate the running speed? The figure below shows the MXNet model and TensorRT engine's running time per batch.

pk

Sometimes, it occurs such errors:

Cuda error in file src/implicit_gemm.cu at line 1214: invalid resource handle
[TensorRT] ERROR: customWinogradConvActLayer.cpp (308) - Cuda Error in execute: 33
[TensorRT] ERROR: customWinogradConvActLayer.cpp (308) - Cuda Error in execute: 33

It's very weird, and I don't know what happened.

@Faldict Faldict changed the title TensorRT doesn TensorRT doesn't accelerate Jul 21, 2018
@yinghai
Copy link

yinghai commented Jul 24, 2018

Could you list the code about how you run the test and measure the runtime? And which model did you use?

@Faldict
Copy link
Author

Faldict commented Jul 25, 2018

@yinghai The experiments, running on a PC with Ubuntu 16.04 and a GTX 1060 GPU, test LeNet (trained by myself) and Inception-7 (downloaded from mxnet-model-gallery) models.

At first, I used your tensorrt_engine.

from tensorrt_engine import Engine

...

engine = Engine(trt_engine)
engine.run(data)

Unfortunately, it sometimes occured some Cuda error, as described before. Then I followed the TensorRT official documents:

stream = cuda.Stream()
cuda.memcpy_htod_async(d_input, data, stream)
context.enqueue(batch_size, bindings, stream.handle, None)
cuda.memcpy_dtoh_async(output, d_output, stream)
stream.synchronize()

The time cost is manually measured by time.time().

@Faldict
Copy link
Author

Faldict commented Jul 26, 2018

This problem is quite similar to #32.

@benbarsdell
Copy link
Contributor

What max_batch_size are you specifying? TensorRT performance will be best when batch_size = max_batch_size.

@Faldict
Copy link
Author

Faldict commented Aug 8, 2018

@benbarsdell Thanks for your reply. I tried with batch_size = max_batch_size = 32, but it still performed slower than the original MXNet model. So what can I do?

@cliffwoolley
Copy link

We should probably separate the error message from the performance problem. I suggest let's get the error condition sorted out first.

Cuda error in file src/implicit_gemm.cu at line 1214: invalid resource handle
[TensorRT] ERROR: customWinogradConvActLayer.cpp (308) - Cuda Error in execute: 33

So that's from cuDNN. Exactly which version of cuDNN is this? And while we're at it, which CUDA, which TensorRT versions?

@Faldict
Copy link
Author

Faldict commented Aug 8, 2018

@cliffwoolley I used cuDNN 7, CUDA 9.0 and TensorRT 4.0. All of them are downloaded from nvidia official websites and installed following their instructions. What's more, I built both MXNet and onnx-tensorrt from sources.

@cliffwoolley
Copy link

With apologies, can you say exactly which cuDNN version? There have been around ten different released versions numbered like 7.x.y.

@Faldict
Copy link
Author

Faldict commented Aug 9, 2018

@cliffwoolley Sorry, I just guess it doesn't matters... I recheck my cuDNN version, typing

cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

and it gets

#define CUDNN_MAJOR 7
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 4
--
#define CUDNN_VERSION    (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)

#include "driver_types.h"

So it seems that the cuDNN version is 7.0.4, right?

@cliffwoolley
Copy link

If you're able to try one of the cuDNN 7.1 or 7.2 builds -- and if that doesn't already fix the problem for you -- then we should be able to use the API logging feature that was added in cuDNN 7.1 to chase down where the problem is happening.

Thanks,
Cliff

@Faldict
Copy link
Author

Faldict commented Aug 9, 2018

@cliffwoolley It works fine after I upgraded to cuDNN 7.2.1......at least, until now. But the memory cost to run the tensorrt_engine is too large, which becomes another reason for me to use tensorrt directly. Here is a simplified version of my code to use the tensorrt_engine. Could you plz take a look and point out where I can optimize it?

import numpy as np
import tensorrt as trt
from tensorrt_engine import Engine

G_LOGGER = trt.infer.ConsoleLogger(trt.infer.LogSeverity.INFO)
trt_engine = trt.utils.load_engine(G_LOGGER, 'model/Inception-7.trt')
engine = Engine(trt_engine)
batch_size = 16

data = np.random.normal(0, 1, size=(batch_size, 3, 299, 299)).astype('float32')
output = engine.run([data])

Many thanks!

@cliffwoolley
Copy link

@benbarsdell Any further advice you can offer here?

@cliffwoolley
Copy link

@Faldict -- It's a bit of an aside, but apache/mxnet#11325 was merged to the MXNet master branch today. It uses onnx-tensorrt on your behalf under the hood. I wonder if you have a better experience using that higher-level interface?

@Faldict
Copy link
Author

Faldict commented Aug 11, 2018

@cliffwoolley I have followed that PR from last month. Now that it is merged, I'll try it.

@liuchang8am
Copy link

any updates? Same issue here, tensorrt does not accelerate onnx(converted from pytorch) models

@kobewangSky
Copy link

try to save .trt and load a again

@kevinch-nv kevinch-nv added repro requested Request more information about reproduction of issue triaged Issue has been triaged by maintainers labels Oct 25, 2020
@kevinch-nv
Copy link
Collaborator

Does anyone have a repro for this issue with the latest version of TensorRT (7.2)?

@kevinch-nv
Copy link
Collaborator

Closing due to inactivity - if you are still having issues with the latest version of onnx-tensorrt feel free to open a new issue.

@ttanzhiqiang
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
repro requested Request more information about reproduction of issue triaged Issue has been triaged by maintainers
Projects
None yet
Development

No branches or pull requests

8 participants