Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime] Allow for parameter sharing in GraphRuntime #3384

Merged
merged 1 commit into from
Jun 25, 2019

Conversation

ajtulloch
Copy link
Contributor

Summary:

In multi-threaded applications where we have multiple inferences on the
same model in parallel (consider e.g. a TTS system handling multiple
requests), it can be useful to share the parameters of a model amongst
these multiple instances. This improves the cache utilization behaviour
of the system, as multiple cores can use the same set of weights instead
of evicting the identical copies of weights in a shared cache.

As the underlying NDArray instances in data_entry_ implement a
ref-counted based sharing system, this is a simple modification of the
GraphRuntime::LoadParams logic to instead copy parameters from an
existing GraphRuntime instance. This is a little ugly in that we need
both the pre-existing GraphRuntime instance, as well as the 'serialized'
params (since we need to know the set of names we should copy), but
without imposing additional assumptions (i.e. storing the set of param
names in GraphRuntime, and enforcing that shared param names are
identical to the parameters set in the preceding LoadParams call),
this seems unavoidable.

Test Plan:

Unit test added.

Summary:

In multi-threaded applications where we have multiple inferences on the
same model in parallel (consider e.g. a TTS system handling multiple
requests), it can be useful to share the parameters of a model amongst
these multiple instances. This improves the cache utilization behaviour
of the system, as multiple cores can use the same set of weights instead
of evicting the identical copies of weights in a shared cache.

As the underlying `NDArray` instances in `data_entry_` implement a
ref-counted based sharing system, this is a simple modification of the
`GraphRuntime::LoadParams` logic to instead copy parameters from an
existing GraphRuntime instance. This is a little ugly in that we need
both the pre-existing GraphRuntime instance, as well as the 'serialized'
params (since we need to know the set of names we should copy), but
without imposing additional assumptions (i.e. storing the set of param
names in GraphRuntime, and enforcing that shared param names are
identical to the parameters set in the preceding `LoadParams` call),
this seems unavoidable.

Test Plan:

Unit test added.
@kevinthesun
Copy link
Contributor

LGTM @ajtulloch Can you take a look at ci failure?

std::vector<std::string> names;
CHECK(strm->Read(&names)) << "Invalid parameters file format";
uint64_t sz;
strm->Read(&sz);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here may need a 'CHECK' just like line 189 and 193 did.

CHECK_LT(eid, data_entry_.size());
CHECK_EQ(data_entry_[eid].use_count(), 1);
data_entry_[eid] = other.GetInput(GetInputIndex(names[i]));
CHECK_GT(data_entry_[eid].use_count(), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency, 205,206 208 may need a "message" just like other check did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data_entry_[eid].use_count() should get increase 1 after one reference, this check just check if the count > 1 seems like not logically complete. how about some logic like following?
int prev_idx = other.GetInput(in_idx).use_count;
...
CHECK_EQ(data_entry_[eid].use_count(), prev_idx + 1);

uint32_t eid = this->entry_id(input_nodes_[in_idx], 0);
CHECK_LT(eid, data_entry_.size());
CHECK_EQ(data_entry_[eid].use_count(), 1);
data_entry_[eid] = other.GetInput(GetInputIndex(names[i]));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for reuse variable purpose , here should be data_entry_[eid] = other.GetInput(in_idx).
one more question is 'is other.GetInputIndex(names[i]) always equal this->GetInputIndex(names[i]) ?', here i guess the logic should be other.GetInput(other.GetInputIndex(names[i])), if they are same we can reuse in_idx.

@tqchen
Copy link
Member

tqchen commented Jun 20, 2019

@ajtulloch can you address @huajsj 's comment?

@tqchen tqchen added the status: need update need update based on feedbacks label Jun 24, 2019
@tqchen tqchen merged commit 32be34a into apache:master Jun 25, 2019
@tqchen tqchen added status: accepted and removed status: need review status: need update need update based on feedbacks labels Jun 25, 2019
@tqchen
Copy link
Member

tqchen commented Jun 25, 2019

Thanks @ajtulloch @huajsj , this PR is now merged

wweic pushed a commit to wweic/tvm that referenced this pull request Jun 26, 2019
Summary:

In multi-threaded applications where we have multiple inferences on the
same model in parallel (consider e.g. a TTS system handling multiple
requests), it can be useful to share the parameters of a model amongst
these multiple instances. This improves the cache utilization behaviour
of the system, as multiple cores can use the same set of weights instead
of evicting the identical copies of weights in a shared cache.

As the underlying `NDArray` instances in `data_entry_` implement a
ref-counted based sharing system, this is a simple modification of the
`GraphRuntime::LoadParams` logic to instead copy parameters from an
existing GraphRuntime instance. This is a little ugly in that we need
both the pre-existing GraphRuntime instance, as well as the 'serialized'
params (since we need to know the set of names we should copy), but
without imposing additional assumptions (i.e. storing the set of param
names in GraphRuntime, and enforcing that shared param names are
identical to the parameters set in the preceding `LoadParams` call),
this seems unavoidable.

Test Plan:

Unit test added.
wweic pushed a commit to neo-ai/tvm that referenced this pull request Jun 27, 2019
Summary:

In multi-threaded applications where we have multiple inferences on the
same model in parallel (consider e.g. a TTS system handling multiple
requests), it can be useful to share the parameters of a model amongst
these multiple instances. This improves the cache utilization behaviour
of the system, as multiple cores can use the same set of weights instead
of evicting the identical copies of weights in a shared cache.

As the underlying `NDArray` instances in `data_entry_` implement a
ref-counted based sharing system, this is a simple modification of the
`GraphRuntime::LoadParams` logic to instead copy parameters from an
existing GraphRuntime instance. This is a little ugly in that we need
both the pre-existing GraphRuntime instance, as well as the 'serialized'
params (since we need to know the set of names we should copy), but
without imposing additional assumptions (i.e. storing the set of param
names in GraphRuntime, and enforcing that shared param names are
identical to the parameters set in the preceding `LoadParams` call),
this seems unavoidable.

Test Plan:

Unit test added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants