Add support for sharing an ORT session #141

quic-suppugun · 2022-08-23T22:18:50Z

For every instance in a model instance group a new ORT session is
created. This code adds support to share a session per instance
group.
This support can be enabled by defining 'share_session' to true
in triton model config "parameters". Example:
parameters [
.....
{
key: "share_session"
value: {string_value: "true"}
}
]

This is a global parameter and cannot be defined per instance
group. The user should determine if the parameter makes sense for
their setup.
GetInstanceGroupName function is added to find the instance group
name through regex search over instance name.

mleies · 2023-03-22T23:33:05Z

Thanks @quic-suppugun for this!

Sharing model weights across instances is a huge benefit for us as we use multiple instances to improve throughput, but with larger models memory becomes the bottle neck. We were going to implement a similar fix when we came across this PR.

Please merge:

We are currently using this fix, merged into 22.07, without issue
It is of little risk, as the fix is only used when model config files specify sharing sessions

dyastremsky · 2023-07-31T17:53:36Z

Thank you for this contribution! We'll need you to submit a signed CLA (see here) to review this. Please let me know once you have sent it in.

CC: @pranavsharma

mleies · 2023-08-10T01:30:33Z

@dyastremsky,

Getting the CLA sorted will keep you posted.

We've now merged this PR successfully into 23.06, with minor changes. I plan to review and add those changes, assuming the github setup will let me modify this PR. What's the most expedient process?

dyastremsky · 2023-08-10T20:35:45Z

Thank you, @mleies. The most expedient way would be if you also added a test to the server QA folder here, or perhaps added to the L0_onnx_optimization test if it's a fit. If not, we'd need to get someone internally to add the testing. We'd also run CI to ensure nothing breaks.

Ideally, @pranavsharma would review the PR once it's ready with no conflicts. If Pranav does not have time, I can also review it.

That review process starts with getting the CLA in though, since we cannot merge your changes without a CLA.

dyastremsky · 2023-08-10T20:37:03Z

Also, @mleies, not sure if you and @quic-suppugun are on the same team. If not, we'd need to close out this PR and need a fresh PR with a CLA provided by whoever wrote the code.

pranavsharma · 2023-08-19T06:54:54Z

src/onnxruntime_utils.h

@@ -27,6 +27,7 @@
 #pragma once

 #include <onnxruntime_c_api.h>
+#include <regex>


Headers should be included where they're needed. In this case, the cc file.

pranavsharma · 2023-08-19T06:56:33Z

src/onnxruntime_utils.cc

+GetInstanceGroupName(
+    const std::string& model_name, const std::string& instance_name)
+{
+  std::regex groupNameRegex('(' + model_name + '_' + "[0-9]" + ')');


In both this and onnxruntime.cc variable names don't follow the Google convention.

pranavsharma · 2023-08-19T06:57:19Z

src/onnxruntime.cc

@@ -967,7 +1050,7 @@ class ModelInstanceState : public BackendModelInstance {

  // Onnx Runtime variables that are used across runs on this
  // instance.
-  OrtSession* session_;
+  std::shared_ptr<OrtSession> session_;


What's the reason to use shared_ptr everywhere? shared_ptrs are heavy and we should try to avoid them as much as possible. Since the sessions for the groups are stored in ModelState and used elsewhere, we should be able to use unique_ptr.

pranavsharma · 2023-08-19T07:00:53Z

src/onnxruntime_utils.cc

+    return "";
+  }
+
+  if (std::regex_search(instance_name, groupName, groupNameRegex)) {


Can you explain and document why a regex search is needed?

pranavsharma · 2023-08-19T07:19:01Z

src/onnxruntime.cc

+{
+  RETURN_ERROR_IF_TRUE(
+      group_name.empty(), TRITONSERVER_ERROR_INVALID_ARG,
+      std::string("Invalid group name"));


Please print the name of the group in the error msg.

pranavsharma · 2023-08-19T07:19:06Z

src/onnxruntime.cc

+    sessionEntry = groupInstanceSessionMap_.find(group_name);
+    RETURN_ERROR_IF_TRUE(
+        (sessionEntry == groupInstanceSessionMap_.end()),
+        TRITONSERVER_ERROR_NOT_FOUND, std::string("No such group"));


Please print the name of the group in the error msg.

pranavsharma · 2023-08-19T07:19:15Z

src/onnxruntime.cc

+{
+  RETURN_ERROR_IF_TRUE(
+      group_name.empty(), TRITONSERVER_ERROR_INVALID_ARG,
+      std::string("Invalid group name"));


Please print the name of the group in the error msg.

pranavsharma · 2023-08-19T07:21:13Z

src/onnxruntime.cc

+  // Check is we are sharing the session. If so get the session pointer and
+  // return
+  if (share_session_) {
+    if (GetSessionForGroup(instance_group_name, session) == nullptr) {


Is there a potential for a race condition where multiple instances are coming up and each end up creating a Session? If ModelState is always initialized before the instances, there's no issue and we won't need to use a lock. (I don't quite remember the sequence).

pranavsharma

Is there a PR for documentation as well?

pranavsharma · 2023-08-21T18:18:10Z

src/onnxruntime.cc

+
+  // Indicate if an onnxrt session should be shared or not. This is a model
+  // global and applies to all instances. So, storing it in the model state
+  bool share_session_;


Would be nice to make the config more descriptive like share_session_between_instances.

Jackiexiao · 2023-11-23T02:45:02Z

Any update? this PR is great, can't wait to see it merge

Jackiexiao · 2024-02-05T07:34:05Z

Hello @pranavsharma, @mleies, and @dyastremsky,

I am reaching out in hopes of facilitating a swift merge for the current PR. I've constructed an ONNX image based on the 23.12 build (see mleies's docs) , which you can find below:

Code repository: https://github.com/Jackiexiao/onnxruntime_backend/tree/r23.12
GPU Docker image: jackiexiao/tritonserver:23.12-onnx-py-share-session
CPU Docker image: jackiexiao/tritonserver:23.12-onnx-py-cpu-share-session (available for pull)

Additionally, I've updated the main branch at main...Jackiexiao:onnxruntime_backend:main with the following revisions:

Incorporated feedback from @pranavsharma, addressing a portion of the review comments.
Resolved merge conflicts due to the PR's basis on an earlier version of the codebase.
As I am not deeply familiar with C++, I have a specific change that I would like to confirm. To ensure the share_session_between_instances feature is operational, I've set ParallelModelInstanceLoading to False. Here is the change:

  RETURN_IF_ERROR(TRITONBACKEND_BackendAttributeSetParallelModelInstanceLoading(
      backend_attributes, false));

You can review this modification in context here: https://github.com/Jackiexiao/onnxruntime_backend/blob/128f7aa4a3eb4b4ad94f171824e85c48ec6303a3/src/onnxruntime.cc#L2981C1-L2983C35

I would greatly appreciate any assistance or additional feedback to expedite the merging process.

Thank you for your support and guidance.

quic-suppugun · 2024-02-15T17:34:38Z

Thanks for your interest @Jackiexiao.

@pranavsharma, I have a commit addressing all the review comments. Should I discard the current commit to rebase the new one on same branch or should I place a new pull request for it?

Jackiexiao · 2024-03-07T02:28:05Z

@tanmayv25 @mc-nv @kthui Could you please take a look at this Pull Request? Thank you.

quic-suppugun · 2024-03-20T23:29:23Z

@pranavsharma @dyastremsky, I addressed review comments in PR 248.

FabianSchuetze mentioned this pull request Feb 1, 2023

Shared weights whenever multiple instances #18

Open

nv-kmcgill53 requested review from pranavsharma and askhade August 14, 2023 18:36

pranavsharma reviewed Aug 19, 2023

View reviewed changes

pranavsharma reviewed Aug 21, 2023

View reviewed changes

This was referenced Feb 4, 2024

Add support for sharing an ORT session Jackiexiao/onnxruntime_backend#1

Merged

Add support for sharing an ORT session (Edited) #236

Closed

quic-suppugun closed this Mar 11, 2024

quic-suppugun force-pushed the support_shared_session branch from e000a48 to 6b896f0 Compare March 11, 2024 17:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for sharing an ORT session #141

Add support for sharing an ORT session #141

quic-suppugun commented Aug 23, 2022

mleies commented Mar 22, 2023

dyastremsky commented Jul 31, 2023

mleies commented Aug 10, 2023

dyastremsky commented Aug 10, 2023

dyastremsky commented Aug 10, 2023

pranavsharma Aug 19, 2023 •

edited

Loading

pranavsharma Aug 19, 2023

pranavsharma Aug 19, 2023 •

edited

Loading

pranavsharma Aug 19, 2023

pranavsharma Aug 19, 2023

pranavsharma Aug 19, 2023

pranavsharma Aug 19, 2023

pranavsharma Aug 19, 2023

pranavsharma left a comment

pranavsharma Aug 21, 2023

Jackiexiao commented Nov 23, 2023

Jackiexiao commented Feb 5, 2024 •

edited

Loading

quic-suppugun commented Feb 15, 2024

Jackiexiao commented Mar 7, 2024

quic-suppugun commented Mar 20, 2024

Add support for sharing an ORT session #141

Add support for sharing an ORT session #141

Conversation

quic-suppugun commented Aug 23, 2022

mleies commented Mar 22, 2023

dyastremsky commented Jul 31, 2023

mleies commented Aug 10, 2023

dyastremsky commented Aug 10, 2023

dyastremsky commented Aug 10, 2023

pranavsharma Aug 19, 2023 • edited Loading

Choose a reason for hiding this comment

pranavsharma Aug 19, 2023

Choose a reason for hiding this comment

pranavsharma Aug 19, 2023 • edited Loading

Choose a reason for hiding this comment

pranavsharma Aug 19, 2023

Choose a reason for hiding this comment

pranavsharma Aug 19, 2023

Choose a reason for hiding this comment

pranavsharma Aug 19, 2023

Choose a reason for hiding this comment

pranavsharma Aug 19, 2023

Choose a reason for hiding this comment

pranavsharma Aug 19, 2023

Choose a reason for hiding this comment

pranavsharma left a comment

Choose a reason for hiding this comment

pranavsharma Aug 21, 2023

Choose a reason for hiding this comment

Jackiexiao commented Nov 23, 2023

Jackiexiao commented Feb 5, 2024 • edited Loading

quic-suppugun commented Feb 15, 2024

Jackiexiao commented Mar 7, 2024

quic-suppugun commented Mar 20, 2024

pranavsharma Aug 19, 2023 •

edited

Loading

pranavsharma Aug 19, 2023 •

edited

Loading

Jackiexiao commented Feb 5, 2024 •

edited

Loading