Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for sharing an ORT session #141

Conversation

quic-suppugun
Copy link

For every instance in a model instance group a new ORT session is
created. This code adds support to share a session per instance
group.
This support can be enabled by defining 'share_session' to true
in triton model config "parameters". Example:
parameters [
.....
{
key: "share_session"
value: {string_value: "true"}
}
]

This is a global parameter and cannot be defined per instance
group. The user should determine if the parameter makes sense for
their setup.
GetInstanceGroupName function is added to find the instance group
name through regex search over instance name.

@mleies
Copy link

mleies commented Mar 22, 2023

Thanks @quic-suppugun for this!

Sharing model weights across instances is a huge benefit for us as we use multiple instances to improve throughput, but with larger models memory becomes the bottle neck. We were going to implement a similar fix when we came across this PR.

Please merge:

  • We are currently using this fix, merged into 22.07, without issue
  • It is of little risk, as the fix is only used when model config files specify sharing sessions

@dyastremsky
Copy link
Contributor

Thank you for this contribution! We'll need you to submit a signed CLA (see here) to review this. Please let me know once you have sent it in.

CC: @pranavsharma

@mleies
Copy link

mleies commented Aug 10, 2023

@dyastremsky,

Getting the CLA sorted will keep you posted.

We've now merged this PR successfully into 23.06, with minor changes. I plan to review and add those changes, assuming the github setup will let me modify this PR. What's the most expedient process?

@dyastremsky
Copy link
Contributor

Thank you, @mleies. The most expedient way would be if you also added a test to the server QA folder here, or perhaps added to the L0_onnx_optimization test if it's a fit. If not, we'd need to get someone internally to add the testing. We'd also run CI to ensure nothing breaks.

Ideally, @pranavsharma would review the PR once it's ready with no conflicts. If Pranav does not have time, I can also review it.

That review process starts with getting the CLA in though, since we cannot merge your changes without a CLA.

@dyastremsky
Copy link
Contributor

Also, @mleies, not sure if you and @quic-suppugun are on the same team. If not, we'd need to close out this PR and need a fresh PR with a CLA provided by whoever wrote the code.

@@ -27,6 +27,7 @@
#pragma once

#include <onnxruntime_c_api.h>
#include <regex>
Copy link
Contributor

@pranavsharma pranavsharma Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Headers should be included where they're needed. In this case, the cc file.

GetInstanceGroupName(
const std::string& model_name, const std::string& instance_name)
{
std::regex groupNameRegex('(' + model_name + '_' + "[0-9]" + ')');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In both this and onnxruntime.cc variable names don't follow the Google convention.

@@ -967,7 +1050,7 @@ class ModelInstanceState : public BackendModelInstance {

// Onnx Runtime variables that are used across runs on this
// instance.
OrtSession* session_;
std::shared_ptr<OrtSession> session_;
Copy link
Contributor

@pranavsharma pranavsharma Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason to use shared_ptr everywhere? shared_ptrs are heavy and we should try to avoid them as much as possible. Since the sessions for the groups are stored in ModelState and used elsewhere, we should be able to use unique_ptr.

return "";
}

if (std::regex_search(instance_name, groupName, groupNameRegex)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain and document why a regex search is needed?

{
RETURN_ERROR_IF_TRUE(
group_name.empty(), TRITONSERVER_ERROR_INVALID_ARG,
std::string("Invalid group name"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please print the name of the group in the error msg.

sessionEntry = groupInstanceSessionMap_.find(group_name);
RETURN_ERROR_IF_TRUE(
(sessionEntry == groupInstanceSessionMap_.end()),
TRITONSERVER_ERROR_NOT_FOUND, std::string("No such group"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please print the name of the group in the error msg.

{
RETURN_ERROR_IF_TRUE(
group_name.empty(), TRITONSERVER_ERROR_INVALID_ARG,
std::string("Invalid group name"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please print the name of the group in the error msg.

// Check is we are sharing the session. If so get the session pointer and
// return
if (share_session_) {
if (GetSessionForGroup(instance_group_name, session) == nullptr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a potential for a race condition where multiple instances are coming up and each end up creating a Session? If ModelState is always initialized before the instances, there's no issue and we won't need to use a lock. (I don't quite remember the sequence).

Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a PR for documentation as well?


// Indicate if an onnxrt session should be shared or not. This is a model
// global and applies to all instances. So, storing it in the model state
bool share_session_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to make the config more descriptive like share_session_between_instances.

@Jackiexiao
Copy link

Any update? this PR is great, can't wait to see it merge

@Jackiexiao
Copy link

Jackiexiao commented Feb 5, 2024

Hello @pranavsharma, @mleies, and @dyastremsky,

I am reaching out in hopes of facilitating a swift merge for the current PR. I've constructed an ONNX image based on the 23.12 build (see mleies's docs) , which you can find below:

Code repository: https://github.com/Jackiexiao/onnxruntime_backend/tree/r23.12
GPU Docker image: jackiexiao/tritonserver:23.12-onnx-py-share-session
CPU Docker image: jackiexiao/tritonserver:23.12-onnx-py-cpu-share-session (available for pull)

Additionally, I've updated the main branch at main...Jackiexiao:onnxruntime_backend:main with the following revisions:

  • Incorporated feedback from @pranavsharma, addressing a portion of the review comments.
  • Resolved merge conflicts due to the PR's basis on an earlier version of the codebase.
  • As I am not deeply familiar with C++, I have a specific change that I would like to confirm. To ensure the share_session_between_instances feature is operational, I've set ParallelModelInstanceLoading to False. Here is the change:
  RETURN_IF_ERROR(TRITONBACKEND_BackendAttributeSetParallelModelInstanceLoading(
      backend_attributes, false));

You can review this modification in context here: https://github.com/Jackiexiao/onnxruntime_backend/blob/128f7aa4a3eb4b4ad94f171824e85c48ec6303a3/src/onnxruntime.cc#L2981C1-L2983C35

I would greatly appreciate any assistance or additional feedback to expedite the merging process.

Thank you for your support and guidance.

@quic-suppugun
Copy link
Author

Thanks for your interest @Jackiexiao.

@pranavsharma, I have a commit addressing all the review comments. Should I discard the current commit to rebase the new one on same branch or should I place a new pull request for it?

@Jackiexiao
Copy link

@tanmayv25 @mc-nv @kthui Could you please take a look at this Pull Request? Thank you.

@quic-suppugun
Copy link
Author

@pranavsharma @dyastremsky, I addressed review comments in PR 248.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants