-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify default naming on different instance group count #231
Conversation
2fb7b7c
to
c7ba28e
Compare
src/backend_model_instance.cc
Outdated
@@ -220,8 +220,7 @@ TritonModelInstance::SetInstances( | |||
} | |||
for (int32_t c = 0; c < group.count(); ++c) { | |||
std::string instance_name{ | |||
group.count() > 1 ? group.name() + "_" + std::to_string(c) | |||
: group.name()}; | |||
c > 0 ? group.name() + "_" + std::to_string(c) : group.name()}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice detailed explanation. Two comments:
In other words, all group_0 will be named group.
-
I was thinking the opposite, we should change "group" to "group_0" to have a consistent naming scheme regardless of count.
-
What do you think about including device type / kind in the name? ex: "bert_cpu0_instance0", "bert_cpu0_instance1", "bert_gpu0_instance0", etc.
I think this would improve the intuition in the logging and debugging from user perspective. For example it may be more clear that a GPU instance is receiving work rather than a CPU instance in logs, compared to current naming convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the opposite, we should change "group" to "group_0" to have a consistent naming scheme regardless of count.
I am good with any scheme, as long as the naming is consistent. Update commit: Use different instance name unification method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about including device type / kind in the name? ex: "bert_cpu0_instance0", "bert_cpu0_instance1", "bert_gpu0_instance0", etc.
This is a good idea. I think users generally do not supply an instance name if it is optional, but when debugging is needed, they would want to see descriptive names.
But I think this change is out of scope for this PR, because
- The code where the default name is assigned to instances is on a different file, independent from this change; and
- We can have more time design on the default instance name for different scenarios.
What do you think about the scope cutoff? On this PR, we only address how to assign different names to instances on the same instance group (i.e. "group_default_name_0", "group_default_name_1", ...
). We can decide on the group_default_name
on a different PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at how the default group name is set and the signature check, so if user change the config from:
instance_group [
{ kind: KIND_CPU },
{ kind: KIND_GPU },
]
to
instance_group [
{ kind: KIND_GPU },
{ kind: KIND_CPU },
]
No instance will be reused?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, because of the default name assigned to the instances will change if the location of the instances on the configuration changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the reason why I replaced the test_gpu_cpu_instance_update test with the test_instance_name_update and test_instance_same_name tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for making the change for default name to be consistent with group_0
regardless of count.
Regarding this:
But I think this change is out of scope for this PR, because
The code where the default name is assigned to instances is on a different file, independent from this change; and
We can have more time design on the default instance name for different scenarios.
What do you think about the scope cutoff? On this PR, we only address how to assign different names to instances on the same instance group (i.e. "group_default_name_0", "group_default_name_1", ...). We can decide on the group_default_name on a different PR.
I'm fine with cutting off scope to separate PR. Please make a ticket and link it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior is counter-intuitive, what is the reason of doing so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because of the logic which assigns default name to instances that do not specify a name.
I think there is a way to avoid the behavior, but it will be complicated: During an instance update, leave the default instance name empty, so the lookup can be done purely on the instance config. Instances with the name specified will be matched first, and then for those that do not specify a name. Those without a name will use the name on the instance that is matched, if any. Otherwise, a default name is assigned.
Maybe, we can simply print on the log notifying the user when an instance reused has a different name than the one requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than minimizing the scope of the code change. I agree the behavior is counter-intuitive. Maybe a redesign on how the default name works is needed.
8ca2bea
to
af88121
Compare
std::string instance_name{ | ||
group.count() > 1 ? group.name() + "_" + std::to_string(c) | ||
: group.name()}; | ||
std::string instance_name{group.name() + "_" + std::to_string(c)}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have check that the groups must not share the same name? Otherwise we can have instances with the exact same name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have check that the groups must not share the same name?
No, we actually allow one or more instances to share the same name. There are two scenarios:
- One or more instances are the same and share the same name.
- Exactly which instance is reused or removed during an instance update is undefined, because there is no way to tell the difference between those instances.
- There is a new test that manipulate instances under this case. The test will pass as long as the number of instances added or removed is correct.
- One or more instances are different but share the same name.
- The instances are disambiguated by their instance group configuration.
- There is a new test checking the correct instance is removed and created, if the config of an instance is changed but reused the same name.
Otherwise we can have instances with the exact same name?
Yes. They are distinguished by their instance configuration, if they are sharing the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the ability for different instances to have the same name will be confusing in the long term, but is functional now since the logic is based on config and not names. I would like to see this disambiguated in the future.
However, with my other comment to add more descriptive instance names like this:
What do you think about including device type / kind in the name? ex: "bert_cpu0_instance0", "bert_cpu0_instance1", "bert_gpu0_instance0", etc.
Then I think the names will be unique by default, unless users specify an identical instance group name in two or more models.
I would say we could disallow identical group names, but after adding device kind to the name, I believe deuplicate names would only ever happen by an explicit decision by the user to set them to the same name - so maybe that part is a non-issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason of allowing instances with the same name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always allowed on Triton. I am not sure on the reason why it was allowed in the first place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than what is currently implemented. I do not think there is a good reason to allow instances with the same name.
af88121
to
90a64dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with this step for now if we need to go back and rethink some of the open questions from discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR title needs to be changed?
server PR: triton-inference-server/server#6018
There are three small but important changes:Default model instance name:
group
, and itscount
can vary.count > 1
,["group_0", "group_1", "group_2", ...]
.count == 1
,["group"]
.count
,["group", "group_1", "group_2", "group_3", ...]
.group_0
will be namedgroup
.A model instance can only be reused if it will not be renamed upon reuse, based on the above new naming rule. For example:Previously, instances named["1", "2", "3"]
will be reused if they are renamed to["1", "A", "B"]
, providing the instances are no different to each other beside than their names.Now, only the instance named"1"
can be reused, because other instances are renamed.For the model instance signature hash function, only the instance name is used for hashing, because it is reasonable to believe instances within the same model should not share the same name, so it reduces the byte size passed to the hash function. If the user would to specify the same name across all instances (becomes linear search), the performance impact is expected to be minimal, as most use-cases do not create enough instances to impact performance.