-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support weight-stripped engine and REFIT_IDENTICAL flag #3167
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zewenli98 do you have a design for this feature?
@narendasan Ok, at first the overall design was like: In TRTInterpreter.run(): if compilation_settings.strip_engine_weights is True:
if engine_cache not hit:
1. build a weight-stripped engine
2. save the weight-stripped engine if engine_cache is set
3. return the weight-stripped engine (not yet refit)
else:
load and return the weight-stripped engine (not yet refit)
else:
if engine_cache not hit:
1. build a weight-included engine
2. save the weight-included engine if engine_cache is set
3. return the weight-included engine (don't need to refit)
else:
load and return the weight-included engine (not yet refit) Then, in TRTModule, refit if necessary before inference. |
@narendasan The design was updated. From the users' perspective, they are able to set
Besides, users can specify For the 3 workflows mentioned above,
Please see more details in the tests. |
I think that we need to separate the runtime and the compiler so im willing to spend the time serializing and deserializing. I think we should frame PR this around moving TRTInterpreter to default to building weight stripped engines. There will be 3 kinds of engines now.
The first 2 need separate cache entries. So we need to be able to hash on the weights in the case that the model is being built with We should look to prefer case 1 in the long term as it allows us to reuse the most work, case 2 would be the next preference. Case 2 should produce faster engines than Case 1 so there remains a need to support
The case for type 3 engines now is only valid if building a non refittable engine is faster than building a refit_identical engine then refitting the weights. If it is not by a significant enough margin I propose we remove that workflow and just have So assuming that we can remove type 3 engines, |
Some of the open questions are:
|
Are you referring to |
My current design is: If users specify
I also thought about it earlier. The TRT doc says "if the refit weights are not identical to the build-time weights, behavior is undefined... This enables use of a single set of weights with different inference backends, or with TensorRT plans for multiple GPU architectures."
will investigate on it. |
@narendasan I tested on building Resnet18 and vgg16 via the two paths: (1) |
@narendasan I just confirmed with TRT team, the conclusion is
I think we can rename On top of this, In summary, the 3 workflows mentioned above would be:
|
I think we should remove non-refittable then and we can add it back as a non default workflow later if theres some reason to.
I still dont know what the usecase for this is |
We should think about a solution for this since behavior is undefined |
802f56b
to
a5d3c18
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zewenli98 can you just pull the CI changes into their own branch? Those seem ready now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry did not mean to approve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Fixes #3146
Type of change
Checklist: