Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings #4587

Open
fran6co opened this issue Jul 5, 2022 · 6 comments · Fixed by triton-inference-server/onnxruntime_backend#126
Labels
investigating The developement team is investigating this issue

Comments

@fran6co
Copy link

fran6co commented Jul 5, 2022

Description

Using onnxruntime backend with tensorrt and engine cache via the load_model makes the tensorrt cache to be regenerated every time.

Triton Information
Triton container nvcr.io/nvidia/tritonserver:22.06-py3

To Reproduce

config file

{
  "name": "model-name",
  "platform": "onnxruntime_onnx",
  "optimization": {
    "execution_accelerators": {
      "gpu_execution_accelerator": [
        {
          "name": "tensorrt",
          "parameters": {
            "trt_engine_cache_path": "/root/.cache/triton-tensorrt",
            "trt_engine_cache_enable": "true",
            "precision_mode": "FP16"
          }
        }
      ]
    }
  }
}

Any onnx file and call:

triton_client.load_model(model_name, config=model_config_json, files={"file:1/model.onnx": onnx_model_binary})

Expected behavior

It should generate the engine cache only once.

The problem comes from https://github.com/triton-inference-server/core/blob/bb9756f2012b3b15bf8d7a9e1e2afd62a7e603b5/src/model_repository_manager.cc#L108 where it creates a temporary folder with a random name and the trt engine cache uses the path as part of the cache

@rmccorm4
Copy link
Contributor

rmccorm4 commented Jul 5, 2022

Hi @fran6co ,

Thanks for reporting the issue and doing some initial investigation.

@GuanLuo what do you think, related to your recent override changes?

@fran6co
Copy link
Author

fran6co commented Jul 6, 2022

This also happens when using models from a cloud service like s3

@fran6co
Copy link
Author

fran6co commented Jul 6, 2022

There are 3 solutions:

@robertbagge
Copy link

This would be very helpul to speed up development and reduce our system's start time.

@rmccorm4
Copy link
Contributor

Filed DLIS-3954 to look into this.

@Tabrizian Tabrizian reopened this Jul 29, 2022
yf711 added a commit to microsoft/onnxruntime that referenced this issue Sep 21, 2022
#13015)

**Update engine hash id generator with model name/model
content/metadata**

**Description**: 

* Updated engine id generator, which use model name/model inputs &
outputs/env metadata (instead of model path) to generate hash
* New bridged API were introduced in order to enable id generator in the
TRTEP utility

**Motivation and Context**
- Why is this change required? What problem does it solve? To fix this
[issue](triton-inference-server/server#4587)
caused by id generator using model path

How to use:
* Call [TRTGenerateMetaDefId(const GraphViewer& graph_viewer, HashValue&
model_hash)](https://github.com/microsoft/onnxruntime/blob/0fcce74a565478b4c83fac5a3230e9786bb53ab3/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc#L715)
to generate hash id for TRT engine cache

How to test:
* On WIndows, run:
* .\onnxruntime_test_all.exe
--gtest_filter=TensorrtExecutionProviderTest.TRTMetadefIdGeneratorUsingModelHashing
* .\onnxruntime_test_all.exe
--gtest_filter=TensorrtExecutionProviderTest.TRTSubgraphIdGeneratorUsingModelHashing

**Appendix**
* [Existing engine id generator that uses model
path](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/execution_provider.cc#L112-L182)
linnealovespie pushed a commit to microsoft/onnxruntime that referenced this issue Sep 30, 2022
#13015)

**Update engine hash id generator with model name/model
content/metadata**

**Description**: 

* Updated engine id generator, which use model name/model inputs &
outputs/env metadata (instead of model path) to generate hash
* New bridged API were introduced in order to enable id generator in the
TRTEP utility

**Motivation and Context**
- Why is this change required? What problem does it solve? To fix this
[issue](triton-inference-server/server#4587)
caused by id generator using model path

How to use:
* Call [TRTGenerateMetaDefId(const GraphViewer& graph_viewer, HashValue&
model_hash)](https://github.com/microsoft/onnxruntime/blob/0fcce74a565478b4c83fac5a3230e9786bb53ab3/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc#L715)
to generate hash id for TRT engine cache

How to test:
* On WIndows, run:
* .\onnxruntime_test_all.exe
--gtest_filter=TensorrtExecutionProviderTest.TRTMetadefIdGeneratorUsingModelHashing
* .\onnxruntime_test_all.exe
--gtest_filter=TensorrtExecutionProviderTest.TRTSubgraphIdGeneratorUsingModelHashing

**Appendix**
* [Existing engine id generator that uses model
path](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/execution_provider.cc#L112-L182)
@bmaier96
Copy link

bmaier96 commented Aug 9, 2023

Any news on this topic? I still face the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating The developement team is investigating this issue
Development

Successfully merging a pull request may close this issue.

5 participants