Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in converting onnx to tensorrt #414

Closed
wwdok opened this issue Dec 3, 2020 · 13 comments
Closed

Error in converting onnx to tensorrt #414

wwdok opened this issue Dec 3, 2020 · 13 comments
Assignees

Comments

@wwdok
Copy link
Contributor

wwdok commented Dec 3, 2020

Hi, there, i followed this tutorial to export tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth to onnx format, and this is the exported model, then i tried using trtexec to convert it to tensorrt file, but it reports error:

(base) weidawang@weidawang-TUF-Gaming-FX506LU-FX506LU:~/app/TensorRT-7.2.1.6/bin$ ./trtexec --onnx=tmp.onnx --saveEngine=tmp.engine
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=tmp.onnx --saveEngine=tmp.engine
[12/03/2020-15:26:48] [I] === Model Options ===
[12/03/2020-15:26:48] [I] Format: ONNX
[12/03/2020-15:26:48] [I] Model: tmp.onnx
[12/03/2020-15:26:48] [I] Output:
[12/03/2020-15:26:48] [I] === Build Options ===
[12/03/2020-15:26:48] [I] Max batch: explicit
[12/03/2020-15:26:48] [I] Workspace: 16 MiB
[12/03/2020-15:26:48] [I] minTiming: 1
[12/03/2020-15:26:48] [I] avgTiming: 8
[12/03/2020-15:26:48] [I] Precision: FP32
[12/03/2020-15:26:48] [I] Calibration: 
[12/03/2020-15:26:48] [I] Refit: Disabled
[12/03/2020-15:26:48] [I] Safe mode: Disabled
[12/03/2020-15:26:48] [I] Save engine: new-tmp.engine
[12/03/2020-15:26:48] [I] Load engine: 
[12/03/2020-15:26:48] [I] Builder Cache: Enabled
[12/03/2020-15:26:48] [I] NVTX verbosity: 0
[12/03/2020-15:26:48] [I] Tactic sources: Using default tactic sources
[12/03/2020-15:26:48] [I] Input(s)s format: fp32:CHW
[12/03/2020-15:26:48] [I] Output(s)s format: fp32:CHW
[12/03/2020-15:26:48] [I] Input build shapes: model
[12/03/2020-15:26:48] [I] Input calibration shapes: model
[12/03/2020-15:26:48] [I] === System Options ===
[12/03/2020-15:26:48] [I] Device: 0
[12/03/2020-15:26:48] [I] DLACore: 
[12/03/2020-15:26:48] [I] Plugins:
[12/03/2020-15:26:48] [I] === Inference Options ===
[12/03/2020-15:26:48] [I] Batch: Explicit
[12/03/2020-15:26:48] [I] Input inference shapes: model
[12/03/2020-15:26:48] [I] Iterations: 10
[12/03/2020-15:26:48] [I] Duration: 3s (+ 200ms warm up)
[12/03/2020-15:26:48] [I] Sleep time: 0ms
[12/03/2020-15:26:48] [I] Streams: 1
[12/03/2020-15:26:48] [I] ExposeDMA: Disabled
[12/03/2020-15:26:48] [I] Data transfers: Enabled
[12/03/2020-15:26:48] [I] Spin-wait: Disabled
[12/03/2020-15:26:48] [I] Multithreading: Disabled
[12/03/2020-15:26:48] [I] CUDA Graph: Disabled
[12/03/2020-15:26:48] [I] Separate profiling: Disabled
[12/03/2020-15:26:48] [I] Skip inference: Disabled
[12/03/2020-15:26:48] [I] Inputs:
[12/03/2020-15:26:48] [I] === Reporting Options ===
[12/03/2020-15:26:48] [I] Verbose: Disabled
[12/03/2020-15:26:48] [I] Averages: 10 inferences
[12/03/2020-15:26:48] [I] Percentile: 99
[12/03/2020-15:26:48] [I] Dump refittable layers:Disabled
[12/03/2020-15:26:48] [I] Dump output: Disabled
[12/03/2020-15:26:48] [I] Profile: Disabled
[12/03/2020-15:26:48] [I] Export timing to JSON file: 
[12/03/2020-15:26:48] [I] Export output to JSON file: 
[12/03/2020-15:26:48] [I] Export profile to JSON file: 
[12/03/2020-15:26:48] [I] 
[12/03/2020-15:26:48] [I] === Device Information ===
[12/03/2020-15:26:48] [I] Selected Device: GeForce GTX 1660 Ti
[12/03/2020-15:26:48] [I] Compute Capability: 7.5
[12/03/2020-15:26:48] [I] SMs: 24
[12/03/2020-15:26:48] [I] Compute Clock Rate: 1.59 GHz
[12/03/2020-15:26:48] [I] Device Global Memory: 5944 MiB
[12/03/2020-15:26:48] [I] Shared Memory per SM: 64 KiB
[12/03/2020-15:26:48] [I] Memory Bus Width: 192 bits (ECC disabled)
[12/03/2020-15:26:48] [I] Memory Clock Rate: 6.001 GHz
[12/03/2020-15:26:48] [I] 
----------------------------------------------------------------
Input filename:   new-tmp.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.7
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[12/03/2020-15:26:49] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
ERROR: builtin_op_importers.cpp:1601 In function importIf:
[8] Assertion failed: cond.is_weights() && cond.weights().count() == 1 && "If condition must be a initializer!"
[12/03/2020-15:26:49] [E] Failed to parse onnx file
[12/03/2020-15:26:49] [E] Parsing model failed
[12/03/2020-15:26:49] [E] Engine creation failed
[12/03/2020-15:26:49] [E] Engine set up failed
&&&& FAILED TensorRT.trtexec # ./trtexec --onnx=tmp.onnx --saveEngine=tmp.engine
@SuX97
Copy link
Collaborator

SuX97 commented Dec 3, 2020

Did the onnx converting throw any error? Make sure that the verify flag is on.

@wwdok
Copy link
Contributor Author

wwdok commented Dec 3, 2020

This is the command i used that time :
python tools/pytorch2onnx.py ./configs/recognition/tsn/tsn_r50_video_inference_1x1x3_100e_kinetics400_rgb.py ./checkpoints/tsn_r50_1x1x3_100e_kinetics400_rgb_20200614-e508be42.pth --shape 1 1 3 224 224 --verify
and its output is :

Successfully exported ONNX model: tmp.onnx
2020-12-02 17:35:38.198641608 [W:onnxruntime:, graph.cc:1031 Graph] Initializer 555 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.

...

2020-12-02 17:35:38.216127122 [W:onnxruntime:, graph.cc:1031 Graph] Initializer 715 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2020-12-02 17:35:38.216142937 [W:onnxruntime:, graph.cc:1031 Graph] Initializer cls_head.fc_cls.bias appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2020-12-02 17:35:38.216159417 [W:onnxruntime:, graph.cc:1031 Graph] Initializer cls_head.fc_cls.weight appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
The numerical values are same between Pytorch and ONNX

@SuX97
Copy link
Collaborator

SuX97 commented Dec 3, 2020

It seems that it works. Please raise this issue in tensorRT repo.

@SuX97
Copy link
Collaborator

SuX97 commented Dec 3, 2020

Some ops may not be supported by tensorRT. For example, squeeze used in pytorch1.7 will generate if branch in onnx, which is not supported by tensorRT. Some suggest downgrade your pytorch and transfer to onnx. Please fire this issue to tensorRT.

@wwdok
Copy link
Contributor Author

wwdok commented Dec 3, 2020

@SuX97 I have raised this issue in TensorRT, and thank you for your tips, i have found a same issue and answer here, i will try downgrade torch to 1.6

@wwdok wwdok closed this as completed Dec 3, 2020
@wwdok
Copy link
Contributor Author

wwdok commented Dec 3, 2020

Good news ~ I used colab(whose torch version is 1.5) to convert, the exported onnx model does not have if node anymore , replaced by squeeze node :
image
And the convertion to tensorrt seems also good :

(base) weidawang@weidawang-TUF-Gaming-FX506LU-FX506LU:~/app/TensorRT-7.2.1.6/bin$ ./trtexec --onnx=tmp.onnx --saveEngine=tmp.engine
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=tmp.onnx --saveEngine=tmp.engine
[12/03/2020-18:29:15] [I] === Model Options ===
[12/03/2020-18:29:15] [I] Format: ONNX
[12/03/2020-18:29:15] [I] Model: tmp.onnx
[12/03/2020-18:29:15] [I] Output:
[12/03/2020-18:29:15] [I] === Build Options ===
[12/03/2020-18:29:15] [I] Max batch: explicit
[12/03/2020-18:29:15] [I] Workspace: 16 MiB
[12/03/2020-18:29:15] [I] minTiming: 1
[12/03/2020-18:29:15] [I] avgTiming: 8
[12/03/2020-18:29:15] [I] Precision: FP32
[12/03/2020-18:29:15] [I] Calibration: 
[12/03/2020-18:29:15] [I] Refit: Disabled
[12/03/2020-18:29:15] [I] Safe mode: Disabled
[12/03/2020-18:29:15] [I] Save engine: tmp.engine
[12/03/2020-18:29:15] [I] Load engine: 
[12/03/2020-18:29:15] [I] Builder Cache: Enabled
[12/03/2020-18:29:15] [I] NVTX verbosity: 0
[12/03/2020-18:29:15] [I] Tactic sources: Using default tactic sources
[12/03/2020-18:29:15] [I] Input(s)s format: fp32:CHW
[12/03/2020-18:29:15] [I] Output(s)s format: fp32:CHW
[12/03/2020-18:29:15] [I] Input build shapes: model
[12/03/2020-18:29:15] [I] Input calibration shapes: model
[12/03/2020-18:29:15] [I] === System Options ===
[12/03/2020-18:29:15] [I] Device: 0
[12/03/2020-18:29:15] [I] DLACore: 
[12/03/2020-18:29:15] [I] Plugins:
[12/03/2020-18:29:15] [I] === Inference Options ===
[12/03/2020-18:29:15] [I] Batch: Explicit
[12/03/2020-18:29:15] [I] Input inference shapes: model
[12/03/2020-18:29:15] [I] Iterations: 10
[12/03/2020-18:29:15] [I] Duration: 3s (+ 200ms warm up)
[12/03/2020-18:29:15] [I] Sleep time: 0ms
[12/03/2020-18:29:15] [I] Streams: 1
[12/03/2020-18:29:15] [I] ExposeDMA: Disabled
[12/03/2020-18:29:15] [I] Data transfers: Enabled
[12/03/2020-18:29:15] [I] Spin-wait: Disabled
[12/03/2020-18:29:15] [I] Multithreading: Disabled
[12/03/2020-18:29:15] [I] CUDA Graph: Disabled
[12/03/2020-18:29:15] [I] Separate profiling: Disabled
[12/03/2020-18:29:15] [I] Skip inference: Disabled
[12/03/2020-18:29:15] [I] Inputs:
[12/03/2020-18:29:15] [I] === Reporting Options ===
[12/03/2020-18:29:15] [I] Verbose: Disabled
[12/03/2020-18:29:15] [I] Averages: 10 inferences
[12/03/2020-18:29:15] [I] Percentile: 99
[12/03/2020-18:29:15] [I] Dump refittable layers:Disabled
[12/03/2020-18:29:15] [I] Dump output: Disabled
[12/03/2020-18:29:15] [I] Profile: Disabled
[12/03/2020-18:29:15] [I] Export timing to JSON file: 
[12/03/2020-18:29:15] [I] Export output to JSON file: 
[12/03/2020-18:29:15] [I] Export profile to JSON file: 
[12/03/2020-18:29:15] [I] 
[12/03/2020-18:29:15] [I] === Device Information ===
[12/03/2020-18:29:15] [I] Selected Device: GeForce GTX 1660 Ti
[12/03/2020-18:29:15] [I] Compute Capability: 7.5
[12/03/2020-18:29:15] [I] SMs: 24
[12/03/2020-18:29:15] [I] Compute Clock Rate: 1.59 GHz
[12/03/2020-18:29:15] [I] Device Global Memory: 5944 MiB
[12/03/2020-18:29:15] [I] Shared Memory per SM: 64 KiB
[12/03/2020-18:29:15] [I] Memory Bus Width: 192 bits (ECC disabled)
[12/03/2020-18:29:15] [I] Memory Clock Rate: 6.001 GHz
[12/03/2020-18:29:15] [I] 
----------------------------------------------------------------
Input filename:   tmp.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.5
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[12/03/2020-18:29:16] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/03/2020-18:29:16] [W] [TRT] TensorRT was linked against cuDNN 8.0.4 but loaded cuDNN 8.0.1
[12/03/2020-18:29:21] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[12/03/2020-18:29:41] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/03/2020-18:29:41] [W] [TRT] TensorRT was linked against cuDNN 8.0.4 but loaded cuDNN 8.0.1
[12/03/2020-18:29:42] [I] Engine built in 27.26 sec.
[12/03/2020-18:29:42] [W] [TRT] TensorRT was linked against cuDNN 8.0.4 but loaded cuDNN 8.0.1
[12/03/2020-18:29:42] [I] Starting inference
[12/03/2020-18:29:45] [I] Warmup completed 0 queries over 200 ms
[12/03/2020-18:29:45] [I] Timing trace has 0 queries over 3.00759 s
[12/03/2020-18:29:45] [I] Trace averages of 10 runs:
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.5012 ms - Host latency: 3.5865 ms (end to end 6.63405 ms, enqueue 0.974474 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50219 ms - Host latency: 3.5862 ms (end to end 6.62244 ms, enqueue 0.966696 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.47442 ms - Host latency: 3.56103 ms (end to end 6.53537 ms, enqueue 0.983316 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.46584 ms - Host latency: 3.55501 ms (end to end 6.50543 ms, enqueue 0.995529 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.46685 ms - Host latency: 3.55428 ms (end to end 6.50602 ms, enqueue 0.986707 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51667 ms - Host latency: 3.6061 ms (end to end 6.62857 ms, enqueue 0.988583 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50295 ms - Host latency: 3.58989 ms (end to end 6.58759 ms, enqueue 0.984445 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49899 ms - Host latency: 3.58642 ms (end to end 6.62967 ms, enqueue 0.998599 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49687 ms - Host latency: 3.58237 ms (end to end 6.55042 ms, enqueue 0.987997 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.4682 ms - Host latency: 3.5542 ms (end to end 6.50807 ms, enqueue 0.992023 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50959 ms - Host latency: 3.59756 ms (end to end 6.61785 ms, enqueue 0.982251 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51691 ms - Host latency: 3.60585 ms (end to end 6.60836 ms, enqueue 1.00314 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50948 ms - Host latency: 3.59705 ms (end to end 6.5989 ms, enqueue 0.985931 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50347 ms - Host latency: 3.59052 ms (end to end 6.59128 ms, enqueue 0.987506 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.66447 ms - Host latency: 3.75212 ms (end to end 6.93524 ms, enqueue 0.984314 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.47981 ms - Host latency: 3.56681 ms (end to end 6.60183 ms, enqueue 0.997113 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49183 ms - Host latency: 3.57916 ms (end to end 6.57836 ms, enqueue 0.99032 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51829 ms - Host latency: 3.60653 ms (end to end 6.59449 ms, enqueue 0.99483 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51309 ms - Host latency: 3.60005 ms (end to end 6.61431 ms, enqueue 0.990363 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55487 ms - Host latency: 3.64157 ms (end to end 6.68866 ms, enqueue 0.989655 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.62501 ms - Host latency: 3.70947 ms (end to end 6.79692 ms, enqueue 0.996863 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.69387 ms - Host latency: 3.76348 ms (end to end 7.18004 ms, enqueue 0.277502 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51757 ms - Host latency: 3.58398 ms (end to end 6.6722 ms, enqueue 0.28584 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55166 ms - Host latency: 3.6226 ms (end to end 6.72692 ms, enqueue 0.493219 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51912 ms - Host latency: 3.60181 ms (end to end 6.57311 ms, enqueue 0.970178 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51698 ms - Host latency: 3.59933 ms (end to end 6.59961 ms, enqueue 0.978198 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50181 ms - Host latency: 3.58467 ms (end to end 6.58507 ms, enqueue 0.982117 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50253 ms - Host latency: 3.58584 ms (end to end 6.55149 ms, enqueue 0.979663 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50485 ms - Host latency: 3.58262 ms (end to end 6.66311 ms, enqueue 0.689026 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.47218 ms - Host latency: 3.57175 ms (end to end 6.6651 ms, enqueue 0.381677 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51073 ms - Host latency: 3.69282 ms (end to end 6.61033 ms, enqueue 0.879211 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.6234 ms - Host latency: 3.82188 ms (end to end 6.83784 ms, enqueue 1.07085 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.52297 ms - Host latency: 3.72323 ms (end to end 6.64814 ms, enqueue 1.06002 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50391 ms - Host latency: 3.70596 ms (end to end 6.5915 ms, enqueue 1.05543 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.48986 ms - Host latency: 3.68956 ms (end to end 6.5479 ms, enqueue 1.05437 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.47135 ms - Host latency: 3.67329 ms (end to end 6.56201 ms, enqueue 1.05729 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51748 ms - Host latency: 3.72827 ms (end to end 6.61471 ms, enqueue 1.06365 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51836 ms - Host latency: 3.71415 ms (end to end 6.60272 ms, enqueue 1.05378 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.502 ms - Host latency: 3.69934 ms (end to end 6.56716 ms, enqueue 1.0399 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50001 ms - Host latency: 3.70118 ms (end to end 6.58937 ms, enqueue 1.07372 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49895 ms - Host latency: 3.6988 ms (end to end 6.58568 ms, enqueue 1.08145 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.48777 ms - Host latency: 3.68893 ms (end to end 6.56816 ms, enqueue 1.09414 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51868 ms - Host latency: 3.71694 ms (end to end 6.62025 ms, enqueue 1.06139 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51616 ms - Host latency: 3.69432 ms (end to end 6.64792 ms, enqueue 1.0449 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49843 ms - Host latency: 3.58062 ms (end to end 6.56771 ms, enqueue 0.982654 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51761 ms - Host latency: 3.6 ms (end to end 6.62808 ms, enqueue 0.985229 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55392 ms - Host latency: 3.63937 ms (end to end 6.69657 ms, enqueue 0.987744 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55586 ms - Host latency: 3.63547 ms (end to end 6.83339 ms, enqueue 0.786267 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.68865 ms - Host latency: 3.76644 ms (end to end 7.05211 ms, enqueue 0.94093 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.76388 ms - Host latency: 3.83889 ms (end to end 7.087 ms, enqueue 0.500183 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51058 ms - Host latency: 3.58583 ms (end to end 6.84854 ms, enqueue 0.343774 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50596 ms - Host latency: 3.58763 ms (end to end 6.60355 ms, enqueue 0.979382 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51829 ms - Host latency: 3.60071 ms (end to end 6.6201 ms, enqueue 0.974622 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51846 ms - Host latency: 3.59961 ms (end to end 6.66152 ms, enqueue 0.974512 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51082 ms - Host latency: 3.59385 ms (end to end 6.61138 ms, enqueue 0.974561 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50305 ms - Host latency: 3.58545 ms (end to end 6.60781 ms, enqueue 0.979053 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49915 ms - Host latency: 3.58142 ms (end to end 6.63296 ms, enqueue 0.976514 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.48174 ms - Host latency: 3.56543 ms (end to end 6.56367 ms, enqueue 0.98418 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.54905 ms - Host latency: 3.63508 ms (end to end 6.65264 ms, enqueue 0.989185 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.52505 ms - Host latency: 3.6116 ms (end to end 6.62173 ms, enqueue 0.986035 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51772 ms - Host latency: 3.60378 ms (end to end 6.60291 ms, enqueue 0.998096 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51038 ms - Host latency: 3.59685 ms (end to end 6.58843 ms, enqueue 1.00012 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50364 ms - Host latency: 3.59412 ms (end to end 6.60437 ms, enqueue 0.990723 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50039 ms - Host latency: 3.58928 ms (end to end 6.57339 ms, enqueue 0.986987 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.48169 ms - Host latency: 3.5677 ms (end to end 6.54409 ms, enqueue 0.993506 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.64368 ms - Host latency: 3.73164 ms (end to end 6.87336 ms, enqueue 0.975415 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.523 ms - Host latency: 3.60999 ms (end to end 6.65276 ms, enqueue 0.996997 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49858 ms - Host latency: 3.58293 ms (end to end 6.62622 ms, enqueue 0.969092 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50261 ms - Host latency: 3.58931 ms (end to end 6.58706 ms, enqueue 0.969849 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49373 ms - Host latency: 3.5791 ms (end to end 6.60793 ms, enqueue 1.00156 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51704 ms - Host latency: 3.59897 ms (end to end 6.63171 ms, enqueue 0.987109 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51519 ms - Host latency: 3.59707 ms (end to end 6.57544 ms, enqueue 0.980395 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50615 ms - Host latency: 3.58992 ms (end to end 6.58933 ms, enqueue 0.981812 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55564 ms - Host latency: 3.63804 ms (end to end 6.70557 ms, enqueue 0.980273 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55491 ms - Host latency: 3.63665 ms (end to end 6.69539 ms, enqueue 0.983545 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.54644 ms - Host latency: 3.63 ms (end to end 6.70127 ms, enqueue 0.987012 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51833 ms - Host latency: 3.59587 ms (end to end 6.71345 ms, enqueue 0.825928 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.57864 ms - Host latency: 3.66128 ms (end to end 6.60034 ms, enqueue 1.00061 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.69294 ms - Host latency: 3.77864 ms (end to end 7.24656 ms, enqueue 0.385229 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51819 ms - Host latency: 3.5991 ms (end to end 6.63071 ms, enqueue 0.982251 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.5189 ms - Host latency: 3.60029 ms (end to end 6.61841 ms, enqueue 0.988647 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.4988 ms - Host latency: 3.58149 ms (end to end 6.57305 ms, enqueue 0.9896 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.61914 ms - Host latency: 3.70115 ms (end to end 6.87793 ms, enqueue 0.978247 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51555 ms - Host latency: 3.59778 ms (end to end 6.69834 ms, enqueue 0.976733 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55164 ms - Host latency: 3.63052 ms (end to end 6.76877 ms, enqueue 0.91792 ms)
[12/03/2020-18:29:45] [I] Host Latency
[12/03/2020-18:29:45] [I] min: 3.52893 ms (end to end 5.36047 ms)
[12/03/2020-18:29:45] [I] max: 4.49451 ms (end to end 8.10828 ms)
[12/03/2020-18:29:45] [I] mean: 3.62987 ms (end to end 6.65785 ms)
[12/03/2020-18:29:45] [I] median: 3.6001 ms (end to end 6.62024 ms)
[12/03/2020-18:29:45] [I] percentile: 4.0498 ms at 99% (end to end 7.5918 ms at 99%)
[12/03/2020-18:29:45] [I] throughput: 0 qps
[12/03/2020-18:29:45] [I] walltime: 3.00759 s
[12/03/2020-18:29:45] [I] Enqueue Time
[12/03/2020-18:29:45] [I] min: 0.153809 ms
[12/03/2020-18:29:45] [I] max: 1.91772 ms
[12/03/2020-18:29:45] [I] median: 0.980225 ms
[12/03/2020-18:29:45] [I] GPU Compute
[12/03/2020-18:29:45] [I] min: 3.45679 ms
[12/03/2020-18:29:45] [I] max: 4.41138 ms
[12/03/2020-18:29:45] [I] mean: 3.52736 ms
[12/03/2020-18:29:45] [I] median: 3.5127 ms
[12/03/2020-18:29:45] [I] percentile: 3.96289 ms at 99%
[12/03/2020-18:29:45] [I] total compute time: 3.00178 s
&&&& PASSED TensorRT.trtexec # ./trtexec --onnx=tmp.onnx --saveEngine=tmp.engine

@dreamerlin
Copy link
Collaborator

congratulations !

@wwdok
Copy link
Contributor Author

wwdok commented Dec 10, 2020

Hi, there, now i want to try converting other models like ircsn besides TSN to onnx format, but i came across an error.
I run the command in colab : !python tools/pytorch2onnx.py ./configs/recognition/csn/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb.py ./checkpoints/ircsn_ig65m_pretrained_bnfrozen_r152_32x2x1_58e_kinetics400_rgb_20200812-9037a758.pth --shape 32 2 1 256 256 --verify --show
Its output logs are :

Traceback (most recent call last):
  File "tools/pytorch2onnx.py", line 163, in <module>
    verify=args.verify)
  File "tools/pytorch2onnx.py", line 74, in pytorch2onnx
    opset_version=opset_version)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/__init__.py", line 168, in export
    custom_opsets, enable_onnx_checker, use_external_data_format)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 69, in export
    use_external_data_format=use_external_data_format)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 488, in _export
    fixed_batch_size=fixed_batch_size)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 334, in _model_to_graph
    graph, torch_out = _trace_and_get_graph_from_model(model, args, training)
  File "/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py", line 291, in _trace_and_get_graph_from_model
    torch.jit._get_trace_graph(model, args, _force_outplace=False, _return_inputs_states=True)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 278, in _get_trace_graph
    outs = ONNXTracedModule(f, _force_outplace, return_inputs, _return_inputs_states)(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 361, in forward
    self._force_outplace,
  File "/usr/local/lib/python3.6/dist-packages/torch/jit/__init__.py", line 348, in wrapper
    outs.append(self.inner(*trace_inputs))
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 548, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 534, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/content/mmaction2/mmaction/models/recognizers/recognizer3d.py", line 59, in forward_dummy
    x = self.extract_feat(imgs)
  File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
    return old_func(*args, **kwargs)
  File "/content/mmaction2/mmaction/models/recognizers/base.py", line 72, in extract_feat
    x = self.backbone(imgs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 548, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 534, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/content/mmaction2/mmaction/models/backbones/resnet3d.py", line 795, in forward
    x = self.conv1(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 548, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 534, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mmcv/cnn/bricks/conv_module.py", line 192, in forward
    x = self.conv(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 548, in __call__
    result = self._slow_forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 534, in _slow_forward
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/mmcv/cnn/bricks/wrappers.py", line 79, in forward
    return super().forward(x)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 485, in forward
    self.padding, self.dilation, self.groups)
RuntimeError: Expected 5-dimensional input for 5-dimensional weight [64, 3, 3, 7, 7], but got 4-dimensional input of size [64, 1, 256, 256] instead

I also tried on i3d and tpn, what are the common reasons for these three errors ? Thanks !

@innerlee
Copy link
Contributor

innerlee commented Dec 10, 2020

Please open a new issue for new issues

@wwdok
Copy link
Contributor Author

wwdok commented Dec 10, 2020

All right, i will open new issue ~

@kartik1395
Copy link

Good news ~ I used colab(whose torch version is 1.5) to convert, the exported onnx model does not have if node anymore , replaced by squeeze node :
image
And the convertion to tensorrt seems also good :

(base) weidawang@weidawang-TUF-Gaming-FX506LU-FX506LU:~/app/TensorRT-7.2.1.6/bin$ ./trtexec --onnx=tmp.onnx --saveEngine=tmp.engine
&&&& RUNNING TensorRT.trtexec # ./trtexec --onnx=tmp.onnx --saveEngine=tmp.engine
[12/03/2020-18:29:15] [I] === Model Options ===
[12/03/2020-18:29:15] [I] Format: ONNX
[12/03/2020-18:29:15] [I] Model: tmp.onnx
[12/03/2020-18:29:15] [I] Output:
[12/03/2020-18:29:15] [I] === Build Options ===
[12/03/2020-18:29:15] [I] Max batch: explicit
[12/03/2020-18:29:15] [I] Workspace: 16 MiB
[12/03/2020-18:29:15] [I] minTiming: 1
[12/03/2020-18:29:15] [I] avgTiming: 8
[12/03/2020-18:29:15] [I] Precision: FP32
[12/03/2020-18:29:15] [I] Calibration: 
[12/03/2020-18:29:15] [I] Refit: Disabled
[12/03/2020-18:29:15] [I] Safe mode: Disabled
[12/03/2020-18:29:15] [I] Save engine: tmp.engine
[12/03/2020-18:29:15] [I] Load engine: 
[12/03/2020-18:29:15] [I] Builder Cache: Enabled
[12/03/2020-18:29:15] [I] NVTX verbosity: 0
[12/03/2020-18:29:15] [I] Tactic sources: Using default tactic sources
[12/03/2020-18:29:15] [I] Input(s)s format: fp32:CHW
[12/03/2020-18:29:15] [I] Output(s)s format: fp32:CHW
[12/03/2020-18:29:15] [I] Input build shapes: model
[12/03/2020-18:29:15] [I] Input calibration shapes: model
[12/03/2020-18:29:15] [I] === System Options ===
[12/03/2020-18:29:15] [I] Device: 0
[12/03/2020-18:29:15] [I] DLACore: 
[12/03/2020-18:29:15] [I] Plugins:
[12/03/2020-18:29:15] [I] === Inference Options ===
[12/03/2020-18:29:15] [I] Batch: Explicit
[12/03/2020-18:29:15] [I] Input inference shapes: model
[12/03/2020-18:29:15] [I] Iterations: 10
[12/03/2020-18:29:15] [I] Duration: 3s (+ 200ms warm up)
[12/03/2020-18:29:15] [I] Sleep time: 0ms
[12/03/2020-18:29:15] [I] Streams: 1
[12/03/2020-18:29:15] [I] ExposeDMA: Disabled
[12/03/2020-18:29:15] [I] Data transfers: Enabled
[12/03/2020-18:29:15] [I] Spin-wait: Disabled
[12/03/2020-18:29:15] [I] Multithreading: Disabled
[12/03/2020-18:29:15] [I] CUDA Graph: Disabled
[12/03/2020-18:29:15] [I] Separate profiling: Disabled
[12/03/2020-18:29:15] [I] Skip inference: Disabled
[12/03/2020-18:29:15] [I] Inputs:
[12/03/2020-18:29:15] [I] === Reporting Options ===
[12/03/2020-18:29:15] [I] Verbose: Disabled
[12/03/2020-18:29:15] [I] Averages: 10 inferences
[12/03/2020-18:29:15] [I] Percentile: 99
[12/03/2020-18:29:15] [I] Dump refittable layers:Disabled
[12/03/2020-18:29:15] [I] Dump output: Disabled
[12/03/2020-18:29:15] [I] Profile: Disabled
[12/03/2020-18:29:15] [I] Export timing to JSON file: 
[12/03/2020-18:29:15] [I] Export output to JSON file: 
[12/03/2020-18:29:15] [I] Export profile to JSON file: 
[12/03/2020-18:29:15] [I] 
[12/03/2020-18:29:15] [I] === Device Information ===
[12/03/2020-18:29:15] [I] Selected Device: GeForce GTX 1660 Ti
[12/03/2020-18:29:15] [I] Compute Capability: 7.5
[12/03/2020-18:29:15] [I] SMs: 24
[12/03/2020-18:29:15] [I] Compute Clock Rate: 1.59 GHz
[12/03/2020-18:29:15] [I] Device Global Memory: 5944 MiB
[12/03/2020-18:29:15] [I] Shared Memory per SM: 64 KiB
[12/03/2020-18:29:15] [I] Memory Bus Width: 192 bits (ECC disabled)
[12/03/2020-18:29:15] [I] Memory Clock Rate: 6.001 GHz
[12/03/2020-18:29:15] [I] 
----------------------------------------------------------------
Input filename:   tmp.onnx
ONNX IR version:  0.0.6
Opset version:    11
Producer name:    pytorch
Producer version: 1.5
Domain:           
Model version:    0
Doc string:       
----------------------------------------------------------------
[12/03/2020-18:29:16] [W] [TRT] onnx2trt_utils.cpp:220: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.
[12/03/2020-18:29:16] [W] [TRT] TensorRT was linked against cuDNN 8.0.4 but loaded cuDNN 8.0.1
[12/03/2020-18:29:21] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[12/03/2020-18:29:41] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/03/2020-18:29:41] [W] [TRT] TensorRT was linked against cuDNN 8.0.4 but loaded cuDNN 8.0.1
[12/03/2020-18:29:42] [I] Engine built in 27.26 sec.
[12/03/2020-18:29:42] [W] [TRT] TensorRT was linked against cuDNN 8.0.4 but loaded cuDNN 8.0.1
[12/03/2020-18:29:42] [I] Starting inference
[12/03/2020-18:29:45] [I] Warmup completed 0 queries over 200 ms
[12/03/2020-18:29:45] [I] Timing trace has 0 queries over 3.00759 s
[12/03/2020-18:29:45] [I] Trace averages of 10 runs:
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.5012 ms - Host latency: 3.5865 ms (end to end 6.63405 ms, enqueue 0.974474 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50219 ms - Host latency: 3.5862 ms (end to end 6.62244 ms, enqueue 0.966696 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.47442 ms - Host latency: 3.56103 ms (end to end 6.53537 ms, enqueue 0.983316 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.46584 ms - Host latency: 3.55501 ms (end to end 6.50543 ms, enqueue 0.995529 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.46685 ms - Host latency: 3.55428 ms (end to end 6.50602 ms, enqueue 0.986707 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51667 ms - Host latency: 3.6061 ms (end to end 6.62857 ms, enqueue 0.988583 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50295 ms - Host latency: 3.58989 ms (end to end 6.58759 ms, enqueue 0.984445 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49899 ms - Host latency: 3.58642 ms (end to end 6.62967 ms, enqueue 0.998599 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49687 ms - Host latency: 3.58237 ms (end to end 6.55042 ms, enqueue 0.987997 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.4682 ms - Host latency: 3.5542 ms (end to end 6.50807 ms, enqueue 0.992023 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50959 ms - Host latency: 3.59756 ms (end to end 6.61785 ms, enqueue 0.982251 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51691 ms - Host latency: 3.60585 ms (end to end 6.60836 ms, enqueue 1.00314 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50948 ms - Host latency: 3.59705 ms (end to end 6.5989 ms, enqueue 0.985931 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50347 ms - Host latency: 3.59052 ms (end to end 6.59128 ms, enqueue 0.987506 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.66447 ms - Host latency: 3.75212 ms (end to end 6.93524 ms, enqueue 0.984314 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.47981 ms - Host latency: 3.56681 ms (end to end 6.60183 ms, enqueue 0.997113 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49183 ms - Host latency: 3.57916 ms (end to end 6.57836 ms, enqueue 0.99032 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51829 ms - Host latency: 3.60653 ms (end to end 6.59449 ms, enqueue 0.99483 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51309 ms - Host latency: 3.60005 ms (end to end 6.61431 ms, enqueue 0.990363 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55487 ms - Host latency: 3.64157 ms (end to end 6.68866 ms, enqueue 0.989655 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.62501 ms - Host latency: 3.70947 ms (end to end 6.79692 ms, enqueue 0.996863 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.69387 ms - Host latency: 3.76348 ms (end to end 7.18004 ms, enqueue 0.277502 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51757 ms - Host latency: 3.58398 ms (end to end 6.6722 ms, enqueue 0.28584 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55166 ms - Host latency: 3.6226 ms (end to end 6.72692 ms, enqueue 0.493219 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51912 ms - Host latency: 3.60181 ms (end to end 6.57311 ms, enqueue 0.970178 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51698 ms - Host latency: 3.59933 ms (end to end 6.59961 ms, enqueue 0.978198 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50181 ms - Host latency: 3.58467 ms (end to end 6.58507 ms, enqueue 0.982117 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50253 ms - Host latency: 3.58584 ms (end to end 6.55149 ms, enqueue 0.979663 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50485 ms - Host latency: 3.58262 ms (end to end 6.66311 ms, enqueue 0.689026 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.47218 ms - Host latency: 3.57175 ms (end to end 6.6651 ms, enqueue 0.381677 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51073 ms - Host latency: 3.69282 ms (end to end 6.61033 ms, enqueue 0.879211 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.6234 ms - Host latency: 3.82188 ms (end to end 6.83784 ms, enqueue 1.07085 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.52297 ms - Host latency: 3.72323 ms (end to end 6.64814 ms, enqueue 1.06002 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50391 ms - Host latency: 3.70596 ms (end to end 6.5915 ms, enqueue 1.05543 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.48986 ms - Host latency: 3.68956 ms (end to end 6.5479 ms, enqueue 1.05437 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.47135 ms - Host latency: 3.67329 ms (end to end 6.56201 ms, enqueue 1.05729 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51748 ms - Host latency: 3.72827 ms (end to end 6.61471 ms, enqueue 1.06365 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51836 ms - Host latency: 3.71415 ms (end to end 6.60272 ms, enqueue 1.05378 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.502 ms - Host latency: 3.69934 ms (end to end 6.56716 ms, enqueue 1.0399 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50001 ms - Host latency: 3.70118 ms (end to end 6.58937 ms, enqueue 1.07372 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49895 ms - Host latency: 3.6988 ms (end to end 6.58568 ms, enqueue 1.08145 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.48777 ms - Host latency: 3.68893 ms (end to end 6.56816 ms, enqueue 1.09414 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51868 ms - Host latency: 3.71694 ms (end to end 6.62025 ms, enqueue 1.06139 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51616 ms - Host latency: 3.69432 ms (end to end 6.64792 ms, enqueue 1.0449 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49843 ms - Host latency: 3.58062 ms (end to end 6.56771 ms, enqueue 0.982654 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51761 ms - Host latency: 3.6 ms (end to end 6.62808 ms, enqueue 0.985229 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55392 ms - Host latency: 3.63937 ms (end to end 6.69657 ms, enqueue 0.987744 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55586 ms - Host latency: 3.63547 ms (end to end 6.83339 ms, enqueue 0.786267 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.68865 ms - Host latency: 3.76644 ms (end to end 7.05211 ms, enqueue 0.94093 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.76388 ms - Host latency: 3.83889 ms (end to end 7.087 ms, enqueue 0.500183 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51058 ms - Host latency: 3.58583 ms (end to end 6.84854 ms, enqueue 0.343774 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50596 ms - Host latency: 3.58763 ms (end to end 6.60355 ms, enqueue 0.979382 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51829 ms - Host latency: 3.60071 ms (end to end 6.6201 ms, enqueue 0.974622 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51846 ms - Host latency: 3.59961 ms (end to end 6.66152 ms, enqueue 0.974512 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51082 ms - Host latency: 3.59385 ms (end to end 6.61138 ms, enqueue 0.974561 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50305 ms - Host latency: 3.58545 ms (end to end 6.60781 ms, enqueue 0.979053 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49915 ms - Host latency: 3.58142 ms (end to end 6.63296 ms, enqueue 0.976514 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.48174 ms - Host latency: 3.56543 ms (end to end 6.56367 ms, enqueue 0.98418 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.54905 ms - Host latency: 3.63508 ms (end to end 6.65264 ms, enqueue 0.989185 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.52505 ms - Host latency: 3.6116 ms (end to end 6.62173 ms, enqueue 0.986035 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51772 ms - Host latency: 3.60378 ms (end to end 6.60291 ms, enqueue 0.998096 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51038 ms - Host latency: 3.59685 ms (end to end 6.58843 ms, enqueue 1.00012 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50364 ms - Host latency: 3.59412 ms (end to end 6.60437 ms, enqueue 0.990723 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50039 ms - Host latency: 3.58928 ms (end to end 6.57339 ms, enqueue 0.986987 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.48169 ms - Host latency: 3.5677 ms (end to end 6.54409 ms, enqueue 0.993506 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.64368 ms - Host latency: 3.73164 ms (end to end 6.87336 ms, enqueue 0.975415 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.523 ms - Host latency: 3.60999 ms (end to end 6.65276 ms, enqueue 0.996997 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49858 ms - Host latency: 3.58293 ms (end to end 6.62622 ms, enqueue 0.969092 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50261 ms - Host latency: 3.58931 ms (end to end 6.58706 ms, enqueue 0.969849 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.49373 ms - Host latency: 3.5791 ms (end to end 6.60793 ms, enqueue 1.00156 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51704 ms - Host latency: 3.59897 ms (end to end 6.63171 ms, enqueue 0.987109 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51519 ms - Host latency: 3.59707 ms (end to end 6.57544 ms, enqueue 0.980395 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.50615 ms - Host latency: 3.58992 ms (end to end 6.58933 ms, enqueue 0.981812 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55564 ms - Host latency: 3.63804 ms (end to end 6.70557 ms, enqueue 0.980273 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55491 ms - Host latency: 3.63665 ms (end to end 6.69539 ms, enqueue 0.983545 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.54644 ms - Host latency: 3.63 ms (end to end 6.70127 ms, enqueue 0.987012 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51833 ms - Host latency: 3.59587 ms (end to end 6.71345 ms, enqueue 0.825928 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.57864 ms - Host latency: 3.66128 ms (end to end 6.60034 ms, enqueue 1.00061 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.69294 ms - Host latency: 3.77864 ms (end to end 7.24656 ms, enqueue 0.385229 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51819 ms - Host latency: 3.5991 ms (end to end 6.63071 ms, enqueue 0.982251 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.5189 ms - Host latency: 3.60029 ms (end to end 6.61841 ms, enqueue 0.988647 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.4988 ms - Host latency: 3.58149 ms (end to end 6.57305 ms, enqueue 0.9896 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.61914 ms - Host latency: 3.70115 ms (end to end 6.87793 ms, enqueue 0.978247 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.51555 ms - Host latency: 3.59778 ms (end to end 6.69834 ms, enqueue 0.976733 ms)
[12/03/2020-18:29:45] [I] Average on 10 runs - GPU latency: 3.55164 ms - Host latency: 3.63052 ms (end to end 6.76877 ms, enqueue 0.91792 ms)
[12/03/2020-18:29:45] [I] Host Latency
[12/03/2020-18:29:45] [I] min: 3.52893 ms (end to end 5.36047 ms)
[12/03/2020-18:29:45] [I] max: 4.49451 ms (end to end 8.10828 ms)
[12/03/2020-18:29:45] [I] mean: 3.62987 ms (end to end 6.65785 ms)
[12/03/2020-18:29:45] [I] median: 3.6001 ms (end to end 6.62024 ms)
[12/03/2020-18:29:45] [I] percentile: 4.0498 ms at 99% (end to end 7.5918 ms at 99%)
[12/03/2020-18:29:45] [I] throughput: 0 qps
[12/03/2020-18:29:45] [I] walltime: 3.00759 s
[12/03/2020-18:29:45] [I] Enqueue Time
[12/03/2020-18:29:45] [I] min: 0.153809 ms
[12/03/2020-18:29:45] [I] max: 1.91772 ms
[12/03/2020-18:29:45] [I] median: 0.980225 ms
[12/03/2020-18:29:45] [I] GPU Compute
[12/03/2020-18:29:45] [I] min: 3.45679 ms
[12/03/2020-18:29:45] [I] max: 4.41138 ms
[12/03/2020-18:29:45] [I] mean: 3.52736 ms
[12/03/2020-18:29:45] [I] median: 3.5127 ms
[12/03/2020-18:29:45] [I] percentile: 3.96289 ms at 99%
[12/03/2020-18:29:45] [I] total compute time: 3.00178 s
&&&& PASSED TensorRT.trtexec # ./trtexec --onnx=tmp.onnx --saveEngine=tmp.engine

Hi, How are you making inference for video file using the TensorRT model ?

@wwdok
Copy link
Contributor Author

wwdok commented Feb 18, 2021

@kartik1395 At that time, i referred to Tensorrt's yolov3_onnx python sample

@scuizhibin
Copy link

Will reducing the version in this way affect the results?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants