ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings #4587

fran6co · 2022-07-05T17:33:24Z

Description

Using onnxruntime backend with tensorrt and engine cache via the load_model makes the tensorrt cache to be regenerated every time.

Triton Information
Triton container nvcr.io/nvidia/tritonserver:22.06-py3

To Reproduce

config file

{
  "name": "model-name",
  "platform": "onnxruntime_onnx",
  "optimization": {
    "execution_accelerators": {
      "gpu_execution_accelerator": [
        {
          "name": "tensorrt",
          "parameters": {
            "trt_engine_cache_path": "/root/.cache/triton-tensorrt",
            "trt_engine_cache_enable": "true",
            "precision_mode": "FP16"
          }
        }
      ]
    }
  }
}

Any onnx file and call:

triton_client.load_model(model_name, config=model_config_json, files={"file:1/model.onnx": onnx_model_binary})

Expected behavior

It should generate the engine cache only once.

The problem comes from https://github.com/triton-inference-server/core/blob/bb9756f2012b3b15bf8d7a9e1e2afd62a7e603b5/src/model_repository_manager.cc#L108 where it creates a temporary folder with a random name and the trt engine cache uses the path as part of the cache

The text was updated successfully, but these errors were encountered:

rmccorm4 · 2022-07-05T18:58:51Z

Hi @fran6co ,

Thanks for reporting the issue and doing some initial investigation.

@GuanLuo what do you think, related to your recent override changes?

fran6co · 2022-07-06T09:19:12Z

This also happens when using models from a cloud service like s3

fran6co · 2022-07-06T09:21:52Z

There are 3 solutions:

change how the tensorrt cache path is generated (this needs a change in onnxruntime
create temporary path with consistent names when dealing with cloud or overridden
change triton onnxruntime backend to not use paths but binary, this produces consistent tensorrt caches Fixes tensorrt cache being regenerated on path change onnxruntime_backend#126

robertbagge · 2022-07-11T12:55:10Z

This would be very helpul to speed up development and reduce our system's start time.

rmccorm4 · 2022-07-11T19:07:07Z

Filed DLIS-3954 to look into this.

#13015) **Update engine hash id generator with model name/model content/metadata** **Description**: * Updated engine id generator, which use model name/model inputs & outputs/env metadata (instead of model path) to generate hash * New bridged API were introduced in order to enable id generator in the TRTEP utility **Motivation and Context** - Why is this change required? What problem does it solve? To fix this [issue](triton-inference-server/server#4587) caused by id generator using model path How to use: * Call [TRTGenerateMetaDefId(const GraphViewer& graph_viewer, HashValue& model_hash)](https://github.com/microsoft/onnxruntime/blob/0fcce74a565478b4c83fac5a3230e9786bb53ab3/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc#L715) to generate hash id for TRT engine cache How to test: * On WIndows, run: * .\onnxruntime_test_all.exe --gtest_filter=TensorrtExecutionProviderTest.TRTMetadefIdGeneratorUsingModelHashing * .\onnxruntime_test_all.exe --gtest_filter=TensorrtExecutionProviderTest.TRTSubgraphIdGeneratorUsingModelHashing **Appendix** * [Existing engine id generator that uses model path](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/execution_provider.cc#L112-L182)

bmaier96 · 2023-08-09T12:20:15Z

Any news on this topic? I still face the same issue.

rmccorm4 added the investigating The developement team is investigating this issue label Jul 5, 2022

fran6co mentioned this issue Jul 6, 2022

Fixes tensorrt cache being regenerated on path change triton-inference-server/onnxruntime_backend#126

Merged

fran6co mentioned this issue Jul 6, 2022

Add support for loading onnx files with the tensorRT backend #4594

Closed

Tabrizian closed this as completed in triton-inference-server/onnxruntime_backend#126 Jul 25, 2022

Tabrizian reopened this Jul 29, 2022

Tabrizian mentioned this issue Sep 12, 2022

TRT Engine Cache Regeneration Issue triton-inference-server/onnxruntime_backend#145

Open

yf711 mentioned this issue Sep 19, 2022

Update engine hash id generator with model name/model content/metadata microsoft/onnxruntime#13015

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings #4587

ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings #4587

fran6co commented Jul 5, 2022

rmccorm4 commented Jul 5, 2022

fran6co commented Jul 6, 2022

fran6co commented Jul 6, 2022

robertbagge commented Jul 11, 2022

rmccorm4 commented Jul 11, 2022

bmaier96 commented Aug 9, 2023

ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings #4587

ONNXRuntime TensorRT cache gets regenerated every time a model is uploaded even with correct settings #4587

Comments

fran6co commented Jul 5, 2022

rmccorm4 commented Jul 5, 2022

fran6co commented Jul 6, 2022

fran6co commented Jul 6, 2022

robertbagge commented Jul 11, 2022

rmccorm4 commented Jul 11, 2022

bmaier96 commented Aug 9, 2023