-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Runtime] Allow for parameter sharing in GraphRuntime #3384
Conversation
Summary: In multi-threaded applications where we have multiple inferences on the same model in parallel (consider e.g. a TTS system handling multiple requests), it can be useful to share the parameters of a model amongst these multiple instances. This improves the cache utilization behaviour of the system, as multiple cores can use the same set of weights instead of evicting the identical copies of weights in a shared cache. As the underlying `NDArray` instances in `data_entry_` implement a ref-counted based sharing system, this is a simple modification of the `GraphRuntime::LoadParams` logic to instead copy parameters from an existing GraphRuntime instance. This is a little ugly in that we need both the pre-existing GraphRuntime instance, as well as the 'serialized' params (since we need to know the set of names we should copy), but without imposing additional assumptions (i.e. storing the set of param names in GraphRuntime, and enforcing that shared param names are identical to the parameters set in the preceding `LoadParams` call), this seems unavoidable. Test Plan: Unit test added.
LGTM @ajtulloch Can you take a look at ci failure? |
std::vector<std::string> names; | ||
CHECK(strm->Read(&names)) << "Invalid parameters file format"; | ||
uint64_t sz; | ||
strm->Read(&sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here may need a 'CHECK' just like line 189 and 193 did.
CHECK_LT(eid, data_entry_.size()); | ||
CHECK_EQ(data_entry_[eid].use_count(), 1); | ||
data_entry_[eid] = other.GetInput(GetInputIndex(names[i])); | ||
CHECK_GT(data_entry_[eid].use_count(), 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for consistency, 205,206 208 may need a "message" just like other check did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data_entry_[eid].use_count() should get increase 1 after one reference, this check just check if the count > 1 seems like not logically complete. how about some logic like following?
int prev_idx = other.GetInput(in_idx).use_count;
...
CHECK_EQ(data_entry_[eid].use_count(), prev_idx + 1);
uint32_t eid = this->entry_id(input_nodes_[in_idx], 0); | ||
CHECK_LT(eid, data_entry_.size()); | ||
CHECK_EQ(data_entry_[eid].use_count(), 1); | ||
data_entry_[eid] = other.GetInput(GetInputIndex(names[i])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for reuse variable purpose , here should be data_entry_[eid] = other.GetInput(in_idx).
one more question is 'is other.GetInputIndex(names[i]) always equal this->GetInputIndex(names[i]) ?', here i guess the logic should be other.GetInput(other.GetInputIndex(names[i])), if they are same we can reuse in_idx.
@ajtulloch can you address @huajsj 's comment? |
Thanks @ajtulloch @huajsj , this PR is now merged |
Summary: In multi-threaded applications where we have multiple inferences on the same model in parallel (consider e.g. a TTS system handling multiple requests), it can be useful to share the parameters of a model amongst these multiple instances. This improves the cache utilization behaviour of the system, as multiple cores can use the same set of weights instead of evicting the identical copies of weights in a shared cache. As the underlying `NDArray` instances in `data_entry_` implement a ref-counted based sharing system, this is a simple modification of the `GraphRuntime::LoadParams` logic to instead copy parameters from an existing GraphRuntime instance. This is a little ugly in that we need both the pre-existing GraphRuntime instance, as well as the 'serialized' params (since we need to know the set of names we should copy), but without imposing additional assumptions (i.e. storing the set of param names in GraphRuntime, and enforcing that shared param names are identical to the parameters set in the preceding `LoadParams` call), this seems unavoidable. Test Plan: Unit test added.
Summary: In multi-threaded applications where we have multiple inferences on the same model in parallel (consider e.g. a TTS system handling multiple requests), it can be useful to share the parameters of a model amongst these multiple instances. This improves the cache utilization behaviour of the system, as multiple cores can use the same set of weights instead of evicting the identical copies of weights in a shared cache. As the underlying `NDArray` instances in `data_entry_` implement a ref-counted based sharing system, this is a simple modification of the `GraphRuntime::LoadParams` logic to instead copy parameters from an existing GraphRuntime instance. This is a little ugly in that we need both the pre-existing GraphRuntime instance, as well as the 'serialized' params (since we need to know the set of names we should copy), but without imposing additional assumptions (i.e. storing the set of param names in GraphRuntime, and enforcing that shared param names are identical to the parameters set in the preceding `LoadParams` call), this seems unavoidable. Test Plan: Unit test added.
Summary:
In multi-threaded applications where we have multiple inferences on the
same model in parallel (consider e.g. a TTS system handling multiple
requests), it can be useful to share the parameters of a model amongst
these multiple instances. This improves the cache utilization behaviour
of the system, as multiple cores can use the same set of weights instead
of evicting the identical copies of weights in a shared cache.
As the underlying
NDArray
instances indata_entry_
implement aref-counted based sharing system, this is a simple modification of the
GraphRuntime::LoadParams
logic to instead copy parameters from anexisting GraphRuntime instance. This is a little ugly in that we need
both the pre-existing GraphRuntime instance, as well as the 'serialized'
params (since we need to know the set of names we should copy), but
without imposing additional assumptions (i.e. storing the set of param
names in GraphRuntime, and enforcing that shared param names are
identical to the parameters set in the preceding
LoadParams
call),this seems unavoidable.
Test Plan:
Unit test added.