Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update engine hash id generator with model name/model content/metadata #13015

Merged
merged 3 commits into from
Sep 21, 2022

Conversation

yf711
Copy link
Contributor

@yf711 yf711 commented Sep 19, 2022

Update engine hash id generator with model name/model content/metadata

Description:

  • Updated engine id generator, which use model name/model inputs & outputs/env metadata (instead of model path) to generate hash
  • New bridged API were introduced in order to enable id generator in the TRTEP utility

Motivation and Context

  • Why is this change required? What problem does it solve? To fix this issue caused by id generator using model path

How to use:

How to test:

  • On WIndows, run:
    • .\onnxruntime_test_all.exe --gtest_filter=TensorrtExecutionProviderTest.TRTMetadefIdGeneratorUsingModelHashing
    • .\onnxruntime_test_all.exe --gtest_filter=TensorrtExecutionProviderTest.TRTSubgraphIdGeneratorUsingModelHashing

Appendix

@jywu-msft jywu-msft requested a review from chilo-ms September 20, 2022 01:43
@chilo-ms
Copy link
Contributor

chilo-ms commented Sep 20, 2022

As we discussed in yesterday's meeting, please also add following changes:

  • Replace GenerateMetaDefId() with TRTGenerateMetaDefId() in tensorrt_execution_provider.cc
  • Add one more test case where two same models placed in different path, but the hash values should be the same.

It's also better to check the model with multiple partitions/engines, for example, FasterRCNN and MaskRCNN.

@@ -665,6 +665,10 @@ struct ProviderHost {
virtual bool Graph__GetInitializedTensor(const Graph* p, const std::string& tensor_name, const ONNX_NAMESPACE::TensorProto*& value) = 0;

virtual const Node* Graph__ParentNode(const Graph* p) const = 0;
virtual const Graph* Graph__ParentGraph(const Graph* p) const = 0;
virtual const std::string& Graph__Name(const Graph* p) noexcept = 0;
Copy link
Contributor

@chilo-ms chilo-ms Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest make it:
virtual const std::string& Graph__Name(const Graph* p) const noexcept = 0;
to be aligned with ORT Graph API.

Same for:
virtual const std::vector<const NodeArg*>& Graph__GetInputsIncludingInitializers(const Graph* p) const noexcept = 0;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. Those have been included in the 2nd commit

@jywu-msft
Copy link
Member

jywu-msft commented Sep 21, 2022

Can you address the cpplint warnings (if they make sense)?

@jywu-msft jywu-msft requested a review from stevenlix September 21, 2022 02:47

// get the hash for the model when loaded from file
HashValue model_hash;
int id = TRTGenerateMetaDefId(viewer, model_hash);
Copy link
Contributor

@chilo-ms chilo-ms Sep 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's good to test the correctness and functionality of TRTGenerateMetaDefId().

I think it's better to test the TRT EP inference run and check the hash file name of the engine cache in the filesystem as well, either manually or in the unit test. So that we can make sure the code path of TRT EP calling TRTGenerateMetaDefId() is okay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been manually verified by running FastRCNN locally. Thanks for sharing the instruction!

@yf711 yf711 merged commit 240aead into main Sep 21, 2022
@yf711 yf711 deleted the yifanl/updateEngineHashIdGenerator branch September 21, 2022 18:10
@pranavsharma
Copy link
Contributor

Do we've plans to test this in the Triton context? Here's the issue.

@chilo-ms
Copy link
Contributor

chilo-ms commented Sep 21, 2022

Do we've plans to test this in the Triton context? Here's the issue.

What's the best way to verify the fix in theTriton context? I'm not familiar with the Triton testing, could they simply pick up the main branch and test?
We have tested that model path has no impact on the hash id from our side.

linnealovespie pushed a commit that referenced this pull request Sep 30, 2022
#13015)

**Update engine hash id generator with model name/model
content/metadata**

**Description**: 

* Updated engine id generator, which use model name/model inputs &
outputs/env metadata (instead of model path) to generate hash
* New bridged API were introduced in order to enable id generator in the
TRTEP utility

**Motivation and Context**
- Why is this change required? What problem does it solve? To fix this
[issue](triton-inference-server/server#4587)
caused by id generator using model path

How to use:
* Call [TRTGenerateMetaDefId(const GraphViewer& graph_viewer, HashValue&
model_hash)](https://github.com/microsoft/onnxruntime/blob/0fcce74a565478b4c83fac5a3230e9786bb53ab3/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc#L715)
to generate hash id for TRT engine cache

How to test:
* On WIndows, run:
* .\onnxruntime_test_all.exe
--gtest_filter=TensorrtExecutionProviderTest.TRTMetadefIdGeneratorUsingModelHashing
* .\onnxruntime_test_all.exe
--gtest_filter=TensorrtExecutionProviderTest.TRTSubgraphIdGeneratorUsingModelHashing

**Appendix**
* [Existing engine id generator that uses model
path](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/framework/execution_provider.cc#L112-L182)
@yf711 yf711 restored the yifanl/updateEngineHashIdGenerator branch December 12, 2022 09:17
stevenlix added a commit that referenced this pull request Dec 15, 2022
There are some issues in
#13015,
1. Model name should be used rather than graph name in the model ID
generator.
2. Hash collision is observed in ID cache, which means different model
may have the same key and thus load same hash id from the cache.
3. For the class and function that generate model id, MetaDef in the
name is not appropriate.
4. Should reuse murmurhash3 rather than copy it over to TRT EP
This PR fixes those issues.
henrywu2019 pushed a commit to henrywu2019/onnxruntime that referenced this pull request Dec 26, 2022
There are some issues in
microsoft#13015,
1. Model name should be used rather than graph name in the model ID
generator.
2. Hash collision is observed in ID cache, which means different model
may have the same key and thus load same hash id from the cache.
3. For the class and function that generate model id, MetaDef in the
name is not appropriate.
4. Should reuse murmurhash3 rather than copy it over to TRT EP
This PR fixes those issues.
yf711 pushed a commit that referenced this pull request Jan 6, 2023
There are some issues in
#13015,
1. Model name should be used rather than graph name in the model ID
generator.
2. Hash collision is observed in ID cache, which means different model
may have the same key and thus load same hash id from the cache.
3. For the class and function that generate model id, MetaDef in the
name is not appropriate.
4. Should reuse murmurhash3 rather than copy it over to TRT EP
This PR fixes those issues.
@yf711 yf711 deleted the yifanl/updateEngineHashIdGenerator branch February 8, 2023 17:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants