-
Notifications
You must be signed in to change notification settings - Fork 2
Sonic dynamic model loading #25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Sonic dynamic model loading #25
Conversation
…r method in TritonClient. Update BuildFile.xml and fix formatting in header files.
…tructor for TritonClient, and update BuildFile.xml to include Catch2 for testing.
…tests; remove old cfg
…lection; remove unused parameters and improve documentation.
- Introduced `loadModel` and `unloadModel` methods for managing model lifecycle. - Added mutex for thread safety during model operations. - Updated `TritonService` header and implementation to support dynamic model management. - Enhanced logging for model loading and unloading processes. - Updated test configurations to include dynamic model loading tests.
…requirements - Modified input handling to utilize actual model input for "x" instead of dummy data. - Adjusted shape and data allocation for input to meet base class expectations. - Updated parameter set description method to use TritonClient for configuration.
| else | ||
| edm::LogWarning("TritonFailure") << "TritonService(): " << baseMsg << " for " << serverName << " (" | ||
| << server.url << ")" << extraMsg; | ||
| edm::LogWarning("TritonFailure") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please prevent your text editor from making spurious formatting changes. these will be reversed by scram b code-format and just pollute the git history.
| } | ||
|
|
||
| bool TritonService::loadModel(const std::string& modelName, const std::string& path) { | ||
| std::lock_guard<std::mutex> lock(modelLoadMutex_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is okay for a first implementation, but we will need to understand the performance impact. atomic updates are preferable if the algorithm can be formulated appropriately.
| bool TritonService::loadModel(const std::string& modelName, const std::string& path) { | ||
| std::lock_guard<std::mutex> lock(modelLoadMutex_); | ||
|
|
||
| bool isModelLoaded = loadedModels_.count(modelName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we avoid the separate loadedModels_ set and just check if the model is present in modelRefCount_ and has a value >0?
| return true; | ||
| } | ||
|
|
||
| // Find which server can host this model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic here seems inconsistent:
- if this implementation is only intended to work with the fallback server, there should not be a need to check the model's list of servers
- if this implementation is intended to work with all servers, then
modelRefCount_would need to be per-server
| sit->second.sslOptions | ||
| ); | ||
|
|
||
| if (!err.IsOk()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the previous call should be wrapped in TRITON_THROW_IF_ERROR, because if a model cannot be loaded for inference, execution of the program has to stop. (see examples elsewhere in this file)
|
|
||
| // Actually load the model on the server | ||
| err = client->LoadModel(modelName); | ||
| if (!err.IsOk()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TRITON_THROW_IF_ERROR
| bool isNotSafeToUnload = (sit == servers_.end()); | ||
| if (isNotSafeToUnload) { | ||
| edm::LogWarning("TritonService") << "unloadModel: Fallback server not found"; | ||
| loadedModels_.erase(modelName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it erased if it could not be unloaded? (similar question for subsequent blocks below)
| sit->second.sslOptions | ||
| ); // Creates Triton gRPC client | ||
|
|
||
| if (!err.IsOk()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TRITON_THROW_IF_ERROR
|
I have not looked at the test code yet because I think the logic in the TritonService needs to be addressed first. Another general point: part of the idea for dynamic loading with the fallback server would be to get rid of the |
PR description:
This PR introduces dynamic model loading/unloading capabilities and server health monitoring to the SONIC Triton integration in CMSSW. The main features include:
1. Dynamic Model Loading and Unloading:
loadModel()andunloadModel()methods toTritonServicefor managing model lifecycle at runtimeDynamicModelLoadingProducertest module to validate dynamic model management4. Code Improvements:
customize.pyfor better configurabilityTritonClientwith new constructor for testing and enhanced server connection methodsExpected Output Changes:
Dependencies:
Files Modified:
HeterogeneousCore/SonicCore/src/SonicClientBase.ccHeterogeneousCore/SonicCore/plugins/BuildFile.xmlHeterogeneousCore/SonicTriton/interface/TritonService.hHeterogeneousCore/SonicTriton/interface/TritonClient.hHeterogeneousCore/SonicTriton/src/TritonService.ccHeterogeneousCore/SonicTriton/src/TritonClient.ccHeterogeneousCore/SonicTriton/src/RetryActionDiffServer.ccHeterogeneousCore/SonicTriton/test/BuildFile.xmlHeterogeneousCore/SonicTriton/test/tritonTest_cfg.pyNew Files:
HeterogeneousCore/SonicTriton/test/DynamicModelLoadingProducer.ccHeterogeneousCore/SonicTriton/test/test_RetryActionDiffServer.ccRemoved Files:
HeterogeneousCore/SonicTriton/test/RetryActionDiffServer.cc(replaced with unit test)PR validation:
Unit Tests:
The following tests have been added and run:
TestHeterogeneousCoreSonicTritonRetryActionDiff - ✅ PASSED
TestHeterogeneousCoreSonicTritonRetryActionSame - ✅ PASSED
TestHeterogeneousCoreSonicTritonProducerCPU - ✅ PASSED
TestHeterogeneousCoreSonicTritonProducerGPU - ✅ PASSED
TestHeterogeneousCoreSonicCoreFilter/Producer/Analyzer - ✅ PASSED
TestHeterogeneousCoreSonicTritonDynamicModelLoading (SingleThread/Concurrent) -⚠️
TestHeterogeneousCoreSonicTritonRetryActionDiffServer -⚠️
Integration Tests:
scram b -j 8Known Issues:
Documentation:
Backport Information:
This PR is NOT a backport. It is intended for CMSSW_15_1_X release cycle.
If backporting becomes necessary, it would target future release cycles after initial integration and validation in CMSSW_15_1_X.