TRT EP memory leak fix #7415

chilo-ms · 2021-04-22T17:04:45Z

Description: Fix memory leak.

It’s been detected by Address Sanitizer here:
Direct leak of 48 byte(s) in 1 object(s) allocated from:
0 0x7f9124743b in operator new(unsigned long) (/usr/lib/aarch64-linux-gnu/libasan.so.4+0xd243b)
1 0x7f63c5244b in createInferRuntime_INTERNAL (/usr/lib/aarch64-linux-gnu/libnvinfer.so.7+0x35a44b)
2 0x7f70d806a7 ()
3 0x7f70d86a4f ()
4 0x7f70ddd403 ()
5 0x7f70ddcfd3 ()
6 0x7f8c04bb7f in (anonymous namespace)::InitializeSession(OrtSessionOptions const*, std::unique_ptr<onnxruntime::InferenceSession, std::default_deleteonnxruntime::InferenceSession >&) (/home/onnxruntime/repos/onnxruntime/build/Linux/Debug/libonnxruntime.so.1.7.0+0xb1b7f)
7 0x7f8c04bf1f in OrtApis::CreateSession(OrtEnv const*, char const*, OrtSessionOptions const*, OrtSession**) (/home/onnxruntime/repos/onnxruntime/build/Linux/Debug/libonnxruntime.so.1.7.0+0xb1f1f)
8 0x5586028ccb in main /home/onnxruntime/repos/valgrind/onnxrt_trt_memsample/main.cpp:65
9 0x7f8bca86df in __libc_start_main (/lib/aarch64-linux-gnu/libc.so.6+0x206df)
10 0x55860284a3 (/home/onnxruntime/repos/valgrind/onnxrt_trt_memsample/build/onnx_memtest+0x54a3)

as well as by valgrind here for the same leak:
==132002== 72 (48 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 3,517 of 5,620
==132002== at 0x483CFE3: operator new(unsigned long) (vg_replace_malloc.c:417)
==132002== by 0x10C5E5BAB: createInferRuntime_INTERNAL (in /usr/lib/libnvinfer.so.7.1.3)
==132002== by 0xFF027C51: nvinfer1::(anonymous namespace)::createInferRuntime(nvinfer1::ILogger&) (NvInferRuntime.h:1906)
==132002== by 0xFF02A191: onnxruntime::TensorrtExecutionProvider::TensorrtExecutionProvider(onnxruntime::TensorrtExecutionProviderInfo const&) (tensorrt_execution_provider.cc:226)

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.

jywu-msft · 2021-04-22T17:34:19Z

trt_state->runtime is a pointer to tensorrt_ptr::unique_pointernvinfer1::IRuntime, so we need to use get() to get the raw pointer of unique_ptr first and then access the object's member function.

Not quite sure whether there is a more concise way to write it.

that's weird. operator->() for unique_ptr is the same as get()
i.e. unique_ptr-> should be the same as unique_ptr.get()

anyway, my comment was also about not needing to store raw pointer in local variable.
you can still do trt_state->runtime.get()->deserializeCudaEngine()
and remove line 1298 (auto runtime = trt_state->runtime->get() )

jywu-msft · 2021-04-22T17:35:55Z

trt_state->runtime is a pointer to tensorrt_ptr::unique_pointernvinfer1::IRuntime, so we need to use get() to get the raw pointer of unique_ptr first and then access the object's member function.
Not quite sure whether there is a more concise way to write it.

that's weird. operator->() for unique_ptr is the same as get()
i.e. unique_ptr-> should be the same as unique_ptr.get()

anyway, my comment was also about not needing to store raw pointer in local variable.
you can still do trt_state->runtime.get()->deserializeCudaEngine()
and remove line 1298 (auto runtime = trt_state->runtime->get() )

oh now I see why. runtime is a pointer to a unique_ptr , that is kind of strange to have pointer to a unique_ptr.

jywu-msft · 2021-04-22T17:39:42Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.h

@@ -99,7 +99,7 @@ struct TensorrtFuncState {
 std::string trt_node_name_with_precision;
 bool engine_cache_enable;
 std::string engine_cache_path;
- nvinfer1::IRuntime* runtime = nullptr;
+ tensorrt_ptr::unique_pointer<nvinfer1::IRuntime>* runtime = nullptr;


pointer to unique_ptr is kind of strange.
would it be better to just keep it as a raw pointer as before?

Explained in other comment.

jywu-msft · 2021-04-22T17:41:06Z

onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc

@@ -1243,7 +1243,7 @@ common::Status TensorrtExecutionProvider::Compile(const std::vector<Node*>& fuse
 &engines_[context->node_name], &contexts_[context->node_name], &builders_[context->node_name],
 &networks_[context->node_name], input_info_[context->node_name], output_info_[context->node_name],
 input_shape_ranges_[context->node_name], &tensorrt_mu_, &fp16_enable_, &int8_enable_, &max_workspace_size_,
- trt_node_name_with_precision, engine_cache_enable_, cache_path_, runtime_, nullptr,
+ trt_node_name_with_precision, engine_cache_enable_, cache_path_, &runtime_, nullptr,


what if we just pass the raw pointer here instead, instead of passing the address of the unique_ptr

Yes, we can pass the raw pointer as before. But we need to call runtime_.destroy() in destructor in order to release IRuntime object.

Previously, I'm thinking about passing unique_ptr, but it will have following compile error:
/home/onnxruntime/repos/onnxruntime/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:1247:89: error: use of deleted function ‘std::unique_ptr<_Tp, _Dp>::unique_ptr(const std::unique_ptr<_Tp, _Dp>&) [with _Tp = nvinfer1::IRuntime; _Dp = onnxruntime::tensorrt_ptr::TensorrtInferDeleter]’

The reason is that Copy constructor and Copy assignment of unique_ptr are deleted functions. So it will get error on line 1246. I think that's the reason that in struct TensorrtFuncState, we pass pointer to unique_ptr, such as parser, engine, context....

runtime is already wrapped with a unique_ptr in TrtExecutionProvider right?
it will get freed when TrtExecutionProvider gets destroyed.
what you did, passing address of unique_ptr to the TensorrtFuncState is basically like passing in the underlying raw pointer, but in a more indirect way, right?

Passing raw pointer is a solution, but it will lose code consistency since other members in struct TensorrtFuncState uses the defined unique_pointer which will handle destroy() automacially.

discussed offline. need to clean up some of this code.

chilo-ms requested review from stevenlix and jywu-msft April 22, 2021 17:04

chilo-ms requested a review from a team as a code owner April 22, 2021 17:04

jywu-msft reviewed Apr 22, 2021

View reviewed changes

chilo-ms added 3 commits April 22, 2021 23:26

fix memory leak

2e8955b

small refactor

cd8d640

code refactor

48c86d5

chilo-ms force-pushed the trt_ep_mem_leak_fix branch from 7b5333d to 48c86d5 Compare April 23, 2021 06:28

jywu-msft approved these changes Apr 23, 2021

View reviewed changes

jywu-msft merged commit f1c3f3f into master Apr 23, 2021

jywu-msft deleted the trt_ep_mem_leak_fix branch April 23, 2021 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRT EP memory leak fix #7415

TRT EP memory leak fix #7415

chilo-ms commented Apr 22, 2021 •

edited

Loading

jywu-msft commented Apr 22, 2021

jywu-msft commented Apr 22, 2021 •

edited

Loading

jywu-msft Apr 22, 2021 •

edited

Loading

chilo-ms Apr 23, 2021

jywu-msft Apr 22, 2021

chilo-ms Apr 23, 2021 •

edited

Loading

jywu-msft Apr 23, 2021 •

edited

Loading

chilo-ms Apr 23, 2021 •

edited

Loading

jywu-msft Apr 23, 2021

TRT EP memory leak fix #7415

TRT EP memory leak fix #7415

Conversation

chilo-ms commented Apr 22, 2021 • edited Loading

jywu-msft commented Apr 22, 2021

jywu-msft commented Apr 22, 2021 • edited Loading

jywu-msft Apr 22, 2021 • edited Loading

Choose a reason for hiding this comment

chilo-ms Apr 23, 2021

Choose a reason for hiding this comment

jywu-msft Apr 22, 2021

Choose a reason for hiding this comment

chilo-ms Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

jywu-msft Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

chilo-ms Apr 23, 2021 • edited Loading

Choose a reason for hiding this comment

jywu-msft Apr 23, 2021

Choose a reason for hiding this comment

chilo-ms commented Apr 22, 2021 •

edited

Loading

jywu-msft commented Apr 22, 2021 •

edited

Loading

jywu-msft Apr 22, 2021 •

edited

Loading

chilo-ms Apr 23, 2021 •

edited

Loading

jywu-msft Apr 23, 2021 •

edited

Loading

chilo-ms Apr 23, 2021 •

edited

Loading