-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings #4587
Open
fran6co opened this issue
Jul 5, 2022
· 6 comments
· Fixed by triton-inference-server/onnxruntime_backend#126
Open
ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings #4587
fran6co opened this issue
Jul 5, 2022
· 6 comments
· Fixed by triton-inference-server/onnxruntime_backend#126
Labels
investigating
The developement team is investigating this issue
Comments
rmccorm4
added
the
investigating
The developement team is investigating this issue
label
Jul 5, 2022
This also happens when using models from a cloud service like s3 |
There are 3 solutions:
|
This would be very helpul to speed up development and reduce our system's start time. |
Filed DLIS-3954 to look into this. |
yf711
added a commit
to microsoft/onnxruntime
that referenced
this issue
Sep 21, 2022
#13015) **Update engine hash id generator with model name/model content/metadata** **Description**: * Updated engine id generator, which use model name/model inputs & outputs/env metadata (instead of model path) to generate hash * New bridged API were introduced in order to enable id generator in the TRTEP utility **Motivation and Context** - Why is this change required? What problem does it solve? To fix this [issue](triton-inference-server/server#4587) caused by id generator using model path How to use: * Call [TRTGenerateMetaDefId(const GraphViewer& graph_viewer, HashValue& model_hash)](https://github.com/microsoft/onnxruntime/blob/0fcce74a565478b4c83fac5a3230e9786bb53ab3/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc#L715) to generate hash id for TRT engine cache How to test: * On WIndows, run: * .\onnxruntime_test_all.exe --gtest_filter=TensorrtExecutionProviderTest.TRTMetadefIdGeneratorUsingModelHashing * .\onnxruntime_test_all.exe --gtest_filter=TensorrtExecutionProviderTest.TRTSubgraphIdGeneratorUsingModelHashing **Appendix** * [Existing engine id generator that uses model path](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/execution_provider.cc#L112-L182)
linnealovespie
pushed a commit
to microsoft/onnxruntime
that referenced
this issue
Sep 30, 2022
#13015) **Update engine hash id generator with model name/model content/metadata** **Description**: * Updated engine id generator, which use model name/model inputs & outputs/env metadata (instead of model path) to generate hash * New bridged API were introduced in order to enable id generator in the TRTEP utility **Motivation and Context** - Why is this change required? What problem does it solve? To fix this [issue](triton-inference-server/server#4587) caused by id generator using model path How to use: * Call [TRTGenerateMetaDefId(const GraphViewer& graph_viewer, HashValue& model_hash)](https://github.com/microsoft/onnxruntime/blob/0fcce74a565478b4c83fac5a3230e9786bb53ab3/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc#L715) to generate hash id for TRT engine cache How to test: * On WIndows, run: * .\onnxruntime_test_all.exe --gtest_filter=TensorrtExecutionProviderTest.TRTMetadefIdGeneratorUsingModelHashing * .\onnxruntime_test_all.exe --gtest_filter=TensorrtExecutionProviderTest.TRTSubgraphIdGeneratorUsingModelHashing **Appendix** * [Existing engine id generator that uses model path](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/execution_provider.cc#L112-L182)
Any news on this topic? I still face the same issue. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Description
Using onnxruntime backend with tensorrt and engine cache via the
load_model
makes the tensorrt cache to be regenerated every time.Triton Information
Triton container nvcr.io/nvidia/tritonserver:22.06-py3
To Reproduce
config file
Any onnx file and call:
Expected behavior
It should generate the engine cache only once.
The problem comes from https://github.com/triton-inference-server/core/blob/bb9756f2012b3b15bf8d7a9e1e2afd62a7e603b5/src/model_repository_manager.cc#L108 where it creates a temporary folder with a random name and the trt engine cache uses the path as part of the cache
The text was updated successfully, but these errors were encountered: