Add optimize and quantize command CLI #700

jplu · 2023-01-17T17:06:30Z

What does this PR do?

This PR adds the quantize and optimize commands. The usage for optimize is:

optimum-cli onnxruntime optimize --help
usage: optimum-cli <command> [<args>] onnxruntime optimize [-h] --onnx_model ONNX_MODEL [-o OUTPUT] (-O1 | -O2 | -O3 | -O4)

options:
  -h, --help            show this help message and exit
  -O1                   Basic general optimizations (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
  -O2                   Basic and extended general optimizations, transformers-specific fusions (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
  -O3                   Same as O2 with Gelu approximation (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
  -O4                   Same as O3 with mixed precision (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).

Required arguments:
  --onnx_model ONNX_MODEL
                        Path indicating where the ONNX models to optimize are located.

Optional arguments:
  -o OUTPUT, --output OUTPUT
                        Path indicating the directory where to store generated ONNX model. (defaults to --onnx_model value).

For quantize:

optimum-cli onnxruntime quantize --help
usage: optimum-cli <command> [<args>] onnxruntime quantize [-h] --onnx_model ONNX_MODEL [-o OUTPUT] (--arm64 | --avx2 | --avx512 | --avx512_vnni | --tensorrt)

options:
  -h, --help            show this help message and exit
  --arm64               Quantization for the ARM64 architecture.
  --avx2                Quantization with AVX-2 instructions.
  --avx512              Quantization with AVX-512 instructions.
  --avx512_vnni         Quantization with AVX-512 and VNNI instructions.
  --tensorrt            Quantization for NVIDIA TensorRT optimizer.

Required arguments:
  --onnx_model ONNX_MODEL
                        Path indicating where the ONNX models to quantize are located.

Optional arguments:
  -o OUTPUT, --output OUTPUT
                        Path indicating the directory where to store generated ONNX model. (defaults to --onnx_model value).

I have tested the both commands with 3 models:

bert-base-cased
gpt2
valhalla/m2m100_tiny_random

And works as expected. Any idea how to properly add some tests for these 2 commands? I don't really know how to properly doing since they both implies that an export of the models has already been achieved.

Fix issue #566

ping @fxmarty @michaelbenayoun

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

jplu · 2023-01-17T17:14:42Z

From the CI logs, it looks like that now the datasets package is needed to test the CLI but not installed.

HuggingFaceDocBuilderDev · 2023-01-17T17:23:54Z

The documentation is not available anymore as the PR was closed or merged.

optimum/commands/onnxruntime/quantize.py

fxmarty · 2023-01-18T09:07:29Z

optimum/commands/onnxruntime/quantize.py

+ ]
+
+ if self.args.arm64:
+ dqconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)


What does dq stands for? I'd probably name it qconfig

It stands for dynamic quantization, but I will update it to what you suggest.

optimum/commands/onnxruntime/quantize.py

fxmarty · 2023-01-18T09:18:02Z

Great job, thanks for working on this!

For adding tests, you could add two methods (for quantization and optimization) to: https://github.com/huggingface/optimum/blob/main/tests/cli/test_cli.py

I think it is fine to do it in two steps, first exporting and then quantizing / optimizing in the same test. I'd recommend to use tiny models, one encoder-only, one decoder-only, and one encoder-decoder for example. The way I'd do it is to use tempfile lib, and do the tests within a with tempfile.TemporaryDirectory() as tmpdirname: context, something like:

optimum/tests/onnxruntime/test_modeling.py

Lines 629 to 636 in edbfcf3

 def test_save_model(self): 

 with tempfile.TemporaryDirectory() as tmpdirname: 

 model = ORTModel.from_pretrained(self.LOCAL_MODEL_PATH) 

 model.save_pretrained(tmpdirname) 

 # folder contains all config files and ONNX exported model 

 folder_contents = os.listdir(tmpdirname) 

 self.assertTrue(ONNX_WEIGHTS_NAME in folder_contents) 

 self.assertTrue(CONFIG_NAME in folder_contents)

For datasets requirement, it is used in type hints, but I think you can add it in the requirements nonetheless since some methods in onnxruntime/configuration.py expect it to be installed:

optimum/setup.py

Line 15 in edbfcf3

REQUIRED_PKGS = [

jplu · 2023-01-18T15:25:35Z

I think I addressed all the points. Let me know @fxmarty if I missed something :)

fxmarty

It looks great! Thanks for the addition! Maybe @michaelbenayoun if you want to have a look otherwise I'll merge.

michaelbenayoun

Left a few comments suggestions about the command line, @fxmarty @jplu wdty?
About the small feature request, wdty as well? This can also be done in a later PR.

michaelbenayoun · 2023-01-19T09:30:26Z

optimum/commands/onnxruntime/optimize.py

+def parse_args_onnxruntime_optimize(parser):
+ required_group = parser.add_argument_group("Required arguments")
+ required_group.add_argument(
+ "--onnx_model",


Why not make that just a regular argument:

Suggested change

"--onnx_model",

"onnx_model",

That way we can do:

optimum-cli onnxruntime optimize path_to_my_model -O2 my_output

Which seems less heavy IMO.

@fxmarty and maybe we should do the same for exporters, wdty?

Yes we can do that for the onnx_model argument.

I think having several unnamed arguments makes things less readable and error prone. I'm not in favor personally, but it's taste.

Fair enough, let's keep it like that then!

michaelbenayoun · 2023-01-19T09:30:40Z

optimum/commands/onnxruntime/optimize.py

+ "-o",
+ "--output",


Same comment

The output is now optional and it is not possible to make it regular like for export because the command takes by default the onnx_model folder as default output. Unless this is not the behavior you would like?

Let's keep it like that then!

michaelbenayoun · 2023-01-19T09:30:51Z

optimum/commands/onnxruntime/quantize.py

+def parse_args_onnxruntime_quantize(parser):
+ required_group = parser.add_argument_group("Required arguments")
+ required_group.add_argument(
+ "--onnx_model",


Same comment

michaelbenayoun · 2023-01-19T09:30:59Z

optimum/commands/onnxruntime/quantize.py

+
+ optional_group = parser.add_argument_group("Optional arguments")
+ optional_group.add_argument(
+ "-o",


Same comment

michaelbenayoun · 2023-01-19T09:32:07Z

optimum/commands/onnxruntime/optimize.py

+ type=Path,
+ help="Path to the directory where to store generated ONNX model. (defaults to --onnx_model value).",
+ )
+


Being able to provide predefined optimization configs is great.
Could we also add the possibility to provide a path where an ORTConfig is stored?

I can add this yes.

I think it would allow more custom usage!

@michaelbenayoun Do you mean ORTConfig or OptimizationConfig?

ORTConfig since those are the ones we push to the Hub.

michaelbenayoun · 2023-01-19T09:32:42Z

optimum/commands/onnxruntime/quantize.py

+ help="Compute the quantization parameters on a per-channel basis.",
+ )
+
+ level_group = parser.add_mutually_exclusive_group(required=True)


Same comment as the optimization level, maybe we could add the possibility to specify a path to an ORTConfig.

jplu · 2023-01-26T23:11:26Z

@fxmarty some tests failed but I don't know if they are related to my changes.

fxmarty · 2023-01-27T14:13:56Z

@jplu No don't worry, these are breaking from the last transformers release.

@michaelbenayoun feel free to rereview

michaelbenayoun

Looks great!
I just left a few comments about custom ORTConfig and after those, we should be able to merge!

docs/source/onnxruntime/usage_guides/optimization.mdx

docs/source/onnxruntime/usage_guides/quantization.mdx

optimum/commands/onnxruntime/optimize.py

michaelbenayoun · 2023-01-30T10:11:45Z

optimum/commands/onnxruntime/optimize.py

+ optimization_config = AutoOptimizationConfig.O3()
+ elif self.args.O4:
+ optimization_config = AutoOptimizationConfig.O4()
+ else:


Suggested change

else:

elif self.args.config:

Same answer than below.

michaelbenayoun · 2023-01-30T10:13:07Z

optimum/commands/onnxruntime/optimize.py

+ optimization_config = AutoOptimizationConfig.O4()
+ else:
+ optimization_config = ORTConfig.get_config_dict(self.args.config).optimization
+


Suggested change

else:

raise ValueError("An optimization configuration must be provided, either by using the predefined optimization configurations (O1, O2, O3, O4) or by specifying the path to a custom ORTCOnfig")

It is an exclusive group, at least one is mandatory, if not, an error is raised. Then this part is not useful.

michaelbenayoun · 2023-01-30T10:13:28Z

optimum/commands/onnxruntime/quantize.py

+ else:
+ qconfig = ORTConfig.get_config_dict(self.args.config).quantization
+


Same comments as for the optimization

michaelbenayoun · 2023-01-30T10:14:02Z

tests/cli/test_cli.py

+ optimize_commands = [
+ f"optimum-cli onnxruntime optimize --onnx_model {tempdir}/encoder -O1",
+ f"optimum-cli onnxruntime optimize --onnx_model {tempdir}/decoder -O1",
+ f"optimum-cli onnxruntime optimize --onnx_model {tempdir}/encoder-decoder -O1",


Maybe add test with custom config here

Do you already have an ORTConfig file for testing usage somewhere?

@fxmarty @michaelbenayoun I have an issue for testing the ORTConfig parameter. Apparently the parameters ORTConfig.optimization and ORTConfig.quantization are dictionaries. How can I get them back to usual dataclass object? As the quantize and optimize methods expect to have objects and not dictionaries.

I could not find one so forget about i!

michaelbenayoun · 2023-01-30T10:14:18Z

tests/cli/test_cli.py

+ optimize_commands = [
+ f"optimum-cli onnxruntime quantize --onnx_model {tempdir}/encoder --avx2",
+ f"optimum-cli onnxruntime quantize --onnx_model {tempdir}/decoder --avx2",
+ f"optimum-cli onnxruntime quantize --onnx_model {tempdir}/encoder-decoder --avx2",


Same comments

michaelbenayoun · 2023-01-30T10:14:33Z

docs/source/onnxruntime/usage_guides/optimization.mdx

+ -O2 Basic and extended general optimizations, transformers-specific fusions (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
+ -O3 Same as O2 with Gelu approximation (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
+ -O4 Same as O3 with mixed precision (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
+


Add --config details here

michaelbenayoun · 2023-01-30T10:14:45Z

docs/source/onnxruntime/usage_guides/quantization.mdx

+ --avx2 Quantization with AVX-2 instructions.
+ --avx512 Quantization with AVX-512 instructions.
+ --avx512_vnni Quantization with AVX-512 and VNNI instructions.
+ --tensorrt Quantization for NVIDIA TensorRT optimizer.


Add --config details here

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

michaelbenayoun · 2023-02-02T17:04:09Z

Thanks a lot for the contribution!

jplu added 4 commits January 17, 2023 13:56

Add optimize CLI

ae862c2

Add the quantize CLI

4497ad0

Update documentation

984ba64

Apply style

178186b

fxmarty reviewed Jan 18, 2023

View reviewed changes

optimum/commands/onnxruntime/quantize.py Outdated Show resolved Hide resolved

fxmarty reviewed Jan 18, 2023

View reviewed changes

optimum/commands/onnxruntime/quantize.py Outdated Show resolved Hide resolved

fxmarty reviewed Jan 18, 2023

View reviewed changes

optimum/commands/onnxruntime/quantize.py Outdated Show resolved Hide resolved

fxmarty reviewed Jan 18, 2023

View reviewed changes

optimum/commands/onnxruntime/quantize.py Outdated Show resolved Hide resolved

jplu added 5 commits January 18, 2023 10:57

Address @fxmarty's comments

9baa539

Update setup.py

69fe6ec

Merge branch 'main' into add-optimize-quantize-cli

03b0021

Apply style

5d0f5b1

Add tests

7817108

fxmarty approved these changes Jan 18, 2023

View reviewed changes

michaelbenayoun reviewed Jan 19, 2023

View reviewed changes

jplu and others added 2 commits January 26, 2023 15:00

Merge branch 'huggingface:main' into add-optimize-quantize-cli

23c8141

Add ORTConfig handling

503f4e0

michaelbenayoun reviewed Jan 30, 2023

View reviewed changes

jplu and others added 5 commits January 31, 2023 09:43

Update docs/source/onnxruntime/usage_guides/optimization.mdx

0e48472

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

Update docs/source/onnxruntime/usage_guides/quantization.mdx

708c5e8

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

Update optimum/commands/onnxruntime/optimize.py

42862ac

Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>

style

2b46a8a

Address comments

0ea64b6

michaelbenayoun approved these changes Feb 2, 2023

View reviewed changes

michaelbenayoun merged commit 17b76db into huggingface:main Feb 2, 2023

jplu deleted the add-optimize-quantize-cli branch February 3, 2023 09:08


	else:
	raise ValueError("An optimization configuration must be provided, either by using the predefined optimization configurations (O1, O2, O3, O4) or by specifying the path to a custom ORTCOnfig")

		else:
		qconfig = ORTConfig.get_config_dict(self.args.config).quantization

Add optimize and quantize command CLI #700

Add optimize and quantize command CLI #700

Conversation

jplu commented Jan 17, 2023

What does this PR do?

Before submitting

jplu commented Jan 17, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Jan 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fxmarty commented Jan 18, 2023 • edited Loading

jplu commented Jan 18, 2023

fxmarty left a comment

Choose a reason for hiding this comment

michaelbenayoun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fxmarty Jan 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelbenayoun Jan 19, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jplu commented Jan 26, 2023 • edited Loading

fxmarty commented Jan 27, 2023

michaelbenayoun left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelbenayoun commented Feb 2, 2023

jplu commented Jan 17, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 17, 2023 •

edited

Loading

fxmarty commented Jan 18, 2023 •

edited

Loading

fxmarty Jan 19, 2023 •

edited

Loading

michaelbenayoun Jan 19, 2023 •

edited

Loading

jplu commented Jan 26, 2023 •

edited

Loading