-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optimize and quantize command CLI #700
Add optimize and quantize command CLI #700
Conversation
From the CI logs, it looks like that now the |
The documentation is not available anymore as the PR was closed or merged. |
] | ||
|
||
if self.args.arm64: | ||
dqconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does dq
stands for? I'd probably name it qconfig
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It stands for dynamic quantization, but I will update it to what you suggest.
Great job, thanks for working on this! For adding tests, you could add two methods (for quantization and optimization) to: https://github.com/huggingface/optimum/blob/main/tests/cli/test_cli.py I think it is fine to do it in two steps, first exporting and then quantizing / optimizing in the same test. I'd recommend to use tiny models, one encoder-only, one decoder-only, and one encoder-decoder for example. The way I'd do it is to use optimum/tests/onnxruntime/test_modeling.py Lines 629 to 636 in edbfcf3
For datasets requirement, it is used in type hints, but I think you can add it in the requirements nonetheless since some methods in Line 15 in edbfcf3
|
I think I addressed all the points. Let me know @fxmarty if I missed something :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks great! Thanks for the addition! Maybe @michaelbenayoun if you want to have a look otherwise I'll merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def parse_args_onnxruntime_optimize(parser): | ||
required_group = parser.add_argument_group("Required arguments") | ||
required_group.add_argument( | ||
"--onnx_model", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not make that just a regular argument:
"--onnx_model", | |
"onnx_model", |
That way we can do:
optimum-cli onnxruntime optimize path_to_my_model -O2 my_output
Which seems less heavy IMO.
@fxmarty and maybe we should do the same for exporters, wdty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can do that for the onnx_model
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having several unnamed arguments makes things less readable and error prone. I'm not in favor personally, but it's taste.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, let's keep it like that then!
"-o", | ||
"--output", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output is now optional and it is not possible to make it regular like for export
because the command takes by default the onnx_model
folder as default output. Unless this is not the behavior you would like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it like that then!
def parse_args_onnxruntime_quantize(parser): | ||
required_group = parser.add_argument_group("Required arguments") | ||
required_group.add_argument( | ||
"--onnx_model", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
|
||
optional_group = parser.add_argument_group("Optional arguments") | ||
optional_group.add_argument( | ||
"-o", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
type=Path, | ||
help="Path to the directory where to store generated ONNX model. (defaults to --onnx_model value).", | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Being able to provide predefined optimization configs is great.
Could we also add the possibility to provide a path where an ORTConfig
is stored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add this yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would allow more custom usage!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michaelbenayoun Do you mean ORTConfig
or OptimizationConfig
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ORTConfig
since those are the ones we push to the Hub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
help="Compute the quantization parameters on a per-channel basis.", | ||
) | ||
|
||
level_group = parser.add_mutually_exclusive_group(required=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as the optimization level, maybe we could add the possibility to specify a path to an ORTConfig
.
@fxmarty some tests failed but I don't know if they are related to my changes. |
@jplu No don't worry, these are breaking from the last transformers release. @michaelbenayoun feel free to rereview |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
I just left a few comments about custom ORTConfig
and after those, we should be able to merge!
optimization_config = AutoOptimizationConfig.O3() | ||
elif self.args.O4: | ||
optimization_config = AutoOptimizationConfig.O4() | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else: | |
elif self.args.config: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same answer than below.
optimization_config = AutoOptimizationConfig.O4() | ||
else: | ||
optimization_config = ORTConfig.get_config_dict(self.args.config).optimization | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else: | |
raise ValueError("An optimization configuration must be provided, either by using the predefined optimization configurations (O1, O2, O3, O4) or by specifying the path to a custom ORTCOnfig") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an exclusive group, at least one is mandatory, if not, an error is raised. Then this part is not useful.
else: | ||
qconfig = ORTConfig.get_config_dict(self.args.config).quantization | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as for the optimization
optimize_commands = [ | ||
f"optimum-cli onnxruntime optimize --onnx_model {tempdir}/encoder -O1", | ||
f"optimum-cli onnxruntime optimize --onnx_model {tempdir}/decoder -O1", | ||
f"optimum-cli onnxruntime optimize --onnx_model {tempdir}/encoder-decoder -O1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add test with custom config here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you already have an ORTConfig file for testing usage somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fxmarty @michaelbenayoun I have an issue for testing the ORTConfig
parameter. Apparently the parameters ORTConfig.optimization
and ORTConfig.quantization
are dictionaries. How can I get them back to usual dataclass object? As the quantize
and optimize
methods expect to have objects and not dictionaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not find one so forget about i!
optimize_commands = [ | ||
f"optimum-cli onnxruntime quantize --onnx_model {tempdir}/encoder --avx2", | ||
f"optimum-cli onnxruntime quantize --onnx_model {tempdir}/decoder --avx2", | ||
f"optimum-cli onnxruntime quantize --onnx_model {tempdir}/encoder-decoder --avx2", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments
-O2 Basic and extended general optimizations, transformers-specific fusions (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details). | ||
-O3 Same as O2 with Gelu approximation (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details). | ||
-O4 Same as O3 with mixed precision (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add --config
details here
--avx2 Quantization with AVX-2 instructions. | ||
--avx512 Quantization with AVX-512 instructions. | ||
--avx512_vnni Quantization with AVX-512 and VNNI instructions. | ||
--tensorrt Quantization for NVIDIA TensorRT optimizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add --config
details here
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Thanks a lot for the contribution! |
What does this PR do?
This PR adds the quantize and optimize commands. The usage for
optimize
is:For
quantize
:I have tested the both commands with 3 models:
And works as expected. Any idea how to properly add some tests for these 2 commands? I don't really know how to properly doing since they both implies that an export of the models has already been achieved.
Fix issue #566
ping @fxmarty @michaelbenayoun
Before submitting