-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for sharing an ORT session #141
Add support for sharing an ORT session #141
Conversation
Thanks @quic-suppugun for this! Sharing model weights across instances is a huge benefit for us as we use multiple instances to improve throughput, but with larger models memory becomes the bottle neck. We were going to implement a similar fix when we came across this PR. Please merge:
|
Thank you for this contribution! We'll need you to submit a signed CLA (see here) to review this. Please let me know once you have sent it in. CC: @pranavsharma |
Getting the CLA sorted will keep you posted. We've now merged this PR successfully into 23.06, with minor changes. I plan to review and add those changes, assuming the github setup will let me modify this PR. What's the most expedient process? |
Thank you, @mleies. The most expedient way would be if you also added a test to the server QA folder here, or perhaps added to the L0_onnx_optimization test if it's a fit. If not, we'd need to get someone internally to add the testing. We'd also run CI to ensure nothing breaks. Ideally, @pranavsharma would review the PR once it's ready with no conflicts. If Pranav does not have time, I can also review it. That review process starts with getting the CLA in though, since we cannot merge your changes without a CLA. |
Also, @mleies, not sure if you and @quic-suppugun are on the same team. If not, we'd need to close out this PR and need a fresh PR with a CLA provided by whoever wrote the code. |
src/onnxruntime_utils.h
Outdated
@@ -27,6 +27,7 @@ | |||
#pragma once | |||
|
|||
#include <onnxruntime_c_api.h> | |||
#include <regex> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Headers should be included where they're needed. In this case, the cc file.
src/onnxruntime_utils.cc
Outdated
GetInstanceGroupName( | ||
const std::string& model_name, const std::string& instance_name) | ||
{ | ||
std::regex groupNameRegex('(' + model_name + '_' + "[0-9]" + ')'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In both this and onnxruntime.cc variable names don't follow the Google convention.
src/onnxruntime.cc
Outdated
@@ -967,7 +1050,7 @@ class ModelInstanceState : public BackendModelInstance { | |||
|
|||
// Onnx Runtime variables that are used across runs on this | |||
// instance. | |||
OrtSession* session_; | |||
std::shared_ptr<OrtSession> session_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the reason to use shared_ptr everywhere? shared_ptrs are heavy and we should try to avoid them as much as possible. Since the sessions for the groups are stored in ModelState and used elsewhere, we should be able to use unique_ptr.
src/onnxruntime_utils.cc
Outdated
return ""; | ||
} | ||
|
||
if (std::regex_search(instance_name, groupName, groupNameRegex)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain and document why a regex search is needed?
src/onnxruntime.cc
Outdated
{ | ||
RETURN_ERROR_IF_TRUE( | ||
group_name.empty(), TRITONSERVER_ERROR_INVALID_ARG, | ||
std::string("Invalid group name")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please print the name of the group in the error msg.
src/onnxruntime.cc
Outdated
sessionEntry = groupInstanceSessionMap_.find(group_name); | ||
RETURN_ERROR_IF_TRUE( | ||
(sessionEntry == groupInstanceSessionMap_.end()), | ||
TRITONSERVER_ERROR_NOT_FOUND, std::string("No such group")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please print the name of the group in the error msg.
src/onnxruntime.cc
Outdated
{ | ||
RETURN_ERROR_IF_TRUE( | ||
group_name.empty(), TRITONSERVER_ERROR_INVALID_ARG, | ||
std::string("Invalid group name")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please print the name of the group in the error msg.
src/onnxruntime.cc
Outdated
// Check is we are sharing the session. If so get the session pointer and | ||
// return | ||
if (share_session_) { | ||
if (GetSessionForGroup(instance_group_name, session) == nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a potential for a race condition where multiple instances are coming up and each end up creating a Session? If ModelState is always initialized before the instances, there's no issue and we won't need to use a lock. (I don't quite remember the sequence).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a PR for documentation as well?
src/onnxruntime.cc
Outdated
|
||
// Indicate if an onnxrt session should be shared or not. This is a model | ||
// global and applies to all instances. So, storing it in the model state | ||
bool share_session_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to make the config more descriptive like share_session_between_instances.
Any update? this PR is great, can't wait to see it merge |
Hello @pranavsharma, @mleies, and @dyastremsky, I am reaching out in hopes of facilitating a swift merge for the current PR. I've constructed an ONNX image based on the 23.12 build (see mleies's docs) , which you can find below: Code repository: https://github.com/Jackiexiao/onnxruntime_backend/tree/r23.12 Additionally, I've updated the main branch at main...Jackiexiao:onnxruntime_backend:main with the following revisions:
You can review this modification in context here: https://github.com/Jackiexiao/onnxruntime_backend/blob/128f7aa4a3eb4b4ad94f171824e85c48ec6303a3/src/onnxruntime.cc#L2981C1-L2983C35 I would greatly appreciate any assistance or additional feedback to expedite the merging process. Thank you for your support and guidance. |
Thanks for your interest @Jackiexiao. @pranavsharma, I have a commit addressing all the review comments. Should I discard the current commit to rebase the new one on same branch or should I place a new pull request for it? |
@tanmayv25 @mc-nv @kthui Could you please take a look at this Pull Request? Thank you. |
e000a48
to
6b896f0
Compare
@pranavsharma @dyastremsky, I addressed review comments in PR 248. |
For every instance in a model instance group a new ORT session is
created. This code adds support to share a session per instance
group.
This support can be enabled by defining 'share_session' to true
in triton model config "parameters". Example:
parameters [
.....
{
key: "share_session"
value: {string_value: "true"}
}
]
This is a global parameter and cannot be defined per instance
group. The user should determine if the parameter makes sense for
their setup.
GetInstanceGroupName function is added to find the instance group
name through regex search over instance name.