Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify default naming on different instance group count #231

Merged
merged 5 commits into from
Jul 14, 2023

Conversation

kthui
Copy link
Contributor

@kthui kthui commented Jul 4, 2023

server PR: triton-inference-server/server#6018

There are three small but important changes:

Default model instance name:

  • Assuming there is an instance group named group, and its count can vary.
  • Previously, instances from the group is named:
    • If count > 1, ["group_0", "group_1", "group_2", ...].
    • If count == 1, ["group"].
  • Now, the instances are named:
    • Regardless of count, ["group", "group_1", "group_2", "group_3", ...].
    • In other words, all group_0 will be named group.

A model instance can only be reused if it will not be renamed upon reuse, based on the above new naming rule. For example:

  • Previously, instances named ["1", "2", "3"] will be reused if they are renamed to ["1", "A", "B"], providing the instances are no different to each other beside than their names.
  • Now, only the instance named "1" can be reused, because other instances are renamed.

For the model instance signature hash function, only the instance name is used for hashing, because it is reasonable to believe instances within the same model should not share the same name, so it reduces the byte size passed to the hash function. If the user would to specify the same name across all instances (becomes linear search), the performance impact is expected to be minimal, as most use-cases do not create enough instances to impact performance.

@kthui kthui force-pushed the jacky-instance-name branch from 2fb7b7c to c7ba28e Compare July 4, 2023 02:01
@kthui kthui marked this pull request as ready for review July 5, 2023 18:01
@kthui kthui requested review from nnshah1, rmccorm4 and GuanLuo July 5, 2023 18:35
@@ -220,8 +220,7 @@ TritonModelInstance::SetInstances(
}
for (int32_t c = 0; c < group.count(); ++c) {
std::string instance_name{
group.count() > 1 ? group.name() + "_" + std::to_string(c)
: group.name()};
c > 0 ? group.name() + "_" + std::to_string(c) : group.name()};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice detailed explanation. Two comments:

In other words, all group_0 will be named group.

  1. I was thinking the opposite, we should change "group" to "group_0" to have a consistent naming scheme regardless of count.

  2. What do you think about including device type / kind in the name? ex: "bert_cpu0_instance0", "bert_cpu0_instance1", "bert_gpu0_instance0", etc.

I think this would improve the intuition in the logging and debugging from user perspective. For example it may be more clear that a GPU instance is receiving work rather than a CPU instance in logs, compared to current naming convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the opposite, we should change "group" to "group_0" to have a consistent naming scheme regardless of count.

I am good with any scheme, as long as the naming is consistent. Update commit: Use different instance name unification method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about including device type / kind in the name? ex: "bert_cpu0_instance0", "bert_cpu0_instance1", "bert_gpu0_instance0", etc.

This is a good idea. I think users generally do not supply an instance name if it is optional, but when debugging is needed, they would want to see descriptive names.

But I think this change is out of scope for this PR, because

  • The code where the default name is assigned to instances is on a different file, independent from this change; and
  • We can have more time design on the default instance name for different scenarios.

What do you think about the scope cutoff? On this PR, we only address how to assign different names to instances on the same instance group (i.e. "group_default_name_0", "group_default_name_1", ...). We can decide on the group_default_name on a different PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at how the default group name is set and the signature check, so if user change the config from:

instance_group [
  { kind: KIND_CPU },
  { kind: KIND_GPU },
]

to

instance_group [
  { kind: KIND_GPU },
  { kind: KIND_CPU },
]

No instance will be reused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, because of the default name assigned to the instances will change if the location of the instances on the configuration changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the reason why I replaced the test_gpu_cpu_instance_update test with the test_instance_name_update and test_instance_same_name tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change for default name to be consistent with group_0 regardless of count.

Regarding this:

But I think this change is out of scope for this PR, because

The code where the default name is assigned to instances is on a different file, independent from this change; and
We can have more time design on the default instance name for different scenarios.
What do you think about the scope cutoff? On this PR, we only address how to assign different names to instances on the same instance group (i.e. "group_default_name_0", "group_default_name_1", ...). We can decide on the group_default_name on a different PR.

I'm fine with cutting off scope to separate PR. Please make a ticket and link it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is counter-intuitive, what is the reason of doing so?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because of the logic which assigns default name to instances that do not specify a name.

I think there is a way to avoid the behavior, but it will be complicated: During an instance update, leave the default instance name empty, so the lookup can be done purely on the instance config. Instances with the name specified will be matched first, and then for those that do not specify a name. Those without a name will use the name on the instance that is matched, if any. Otherwise, a default name is assigned.

Maybe, we can simply print on the log notifying the user when an instance reused has a different name than the one requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than minimizing the scope of the code change. I agree the behavior is counter-intuitive. Maybe a redesign on how the default name works is needed.

@kthui kthui force-pushed the jacky-instance-name branch from 8ca2bea to af88121 Compare July 7, 2023 17:42
@kthui kthui requested a review from rmccorm4 July 7, 2023 17:53
std::string instance_name{
group.count() > 1 ? group.name() + "_" + std::to_string(c)
: group.name()};
std::string instance_name{group.name() + "_" + std::to_string(c)};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have check that the groups must not share the same name? Otherwise we can have instances with the exact same name?

Copy link
Contributor Author

@kthui kthui Jul 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have check that the groups must not share the same name?

No, we actually allow one or more instances to share the same name. There are two scenarios:

  1. One or more instances are the same and share the same name.
    • Exactly which instance is reused or removed during an instance update is undefined, because there is no way to tell the difference between those instances.
    • There is a new test that manipulate instances under this case. The test will pass as long as the number of instances added or removed is correct.
  2. One or more instances are different but share the same name.
    • The instances are disambiguated by their instance group configuration.
    • There is a new test checking the correct instance is removed and created, if the config of an instance is changed but reused the same name.

Otherwise we can have instances with the exact same name?

Yes. They are distinguished by their instance configuration, if they are sharing the same name.

Copy link
Contributor

@rmccorm4 rmccorm4 Jul 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ability for different instances to have the same name will be confusing in the long term, but is functional now since the logic is based on config and not names. I would like to see this disambiguated in the future.

However, with my other comment to add more descriptive instance names like this:

What do you think about including device type / kind in the name? ex: "bert_cpu0_instance0", "bert_cpu0_instance1", "bert_gpu0_instance0", etc.

Then I think the names will be unique by default, unless users specify an identical instance group name in two or more models.

I would say we could disallow identical group names, but after adding device kind to the name, I believe deuplicate names would only ever happen by an explicit decision by the user to set them to the same name - so maybe that part is a non-issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason of allowing instances with the same name?

Copy link
Contributor Author

@kthui kthui Jul 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is always allowed on Triton. I am not sure on the reason why it was allowed in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than what is currently implemented. I do not think there is a good reason to allow instances with the same name.

@kthui kthui requested a review from GuanLuo July 8, 2023 01:58
@kthui kthui force-pushed the jacky-instance-name branch from af88121 to 90a64dd Compare July 13, 2023 00:06
Copy link
Contributor

@nnshah1 nnshah1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing consistent naming independent of count seems like a good step. Additional rules for matching instances can be handled separately. Defer to @GuanLuo and @rmccorm4 for final approvals.

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this step for now if we need to go back and rethink some of the open questions from discussion.

Copy link
Contributor

@GuanLuo GuanLuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR title needs to be changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants