Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimize and quantize command CLI #700

Merged
merged 16 commits into from
Feb 2, 2023

Conversation

jplu
Copy link
Contributor

@jplu jplu commented Jan 17, 2023

What does this PR do?

This PR adds the quantize and optimize commands. The usage for optimize is:

optimum-cli onnxruntime optimize --help
usage: optimum-cli <command> [<args>] onnxruntime optimize [-h] --onnx_model ONNX_MODEL [-o OUTPUT] (-O1 | -O2 | -O3 | -O4)

options:
  -h, --help            show this help message and exit
  -O1                   Basic general optimizations (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
  -O2                   Basic and extended general optimizations, transformers-specific fusions (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
  -O3                   Same as O2 with Gelu approximation (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
  -O4                   Same as O3 with mixed precision (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).

Required arguments:
  --onnx_model ONNX_MODEL
                        Path indicating where the ONNX models to optimize are located.

Optional arguments:
  -o OUTPUT, --output OUTPUT
                        Path indicating the directory where to store generated ONNX model. (defaults to --onnx_model value).

For quantize:

optimum-cli onnxruntime quantize --help
usage: optimum-cli <command> [<args>] onnxruntime quantize [-h] --onnx_model ONNX_MODEL [-o OUTPUT] (--arm64 | --avx2 | --avx512 | --avx512_vnni | --tensorrt)

options:
  -h, --help            show this help message and exit
  --arm64               Quantization for the ARM64 architecture.
  --avx2                Quantization with AVX-2 instructions.
  --avx512              Quantization with AVX-512 instructions.
  --avx512_vnni         Quantization with AVX-512 and VNNI instructions.
  --tensorrt            Quantization for NVIDIA TensorRT optimizer.

Required arguments:
  --onnx_model ONNX_MODEL
                        Path indicating where the ONNX models to quantize are located.

Optional arguments:
  -o OUTPUT, --output OUTPUT
                        Path indicating the directory where to store generated ONNX model. (defaults to --onnx_model value).

I have tested the both commands with 3 models:

  • bert-base-cased
  • gpt2
  • valhalla/m2m100_tiny_random

And works as expected. Any idea how to properly add some tests for these 2 commands? I don't really know how to properly doing since they both implies that an export of the models has already been achieved.

Fix issue #566

ping @fxmarty @michaelbenayoun

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@jplu
Copy link
Contributor Author

jplu commented Jan 17, 2023

From the CI logs, it looks like that now the datasets package is needed to test the CLI but not installed.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jan 17, 2023

The documentation is not available anymore as the PR was closed or merged.

]

if self.args.arm64:
dqconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does dq stands for? I'd probably name it qconfig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It stands for dynamic quantization, but I will update it to what you suggest.

@fxmarty
Copy link
Contributor

fxmarty commented Jan 18, 2023

Great job, thanks for working on this!

For adding tests, you could add two methods (for quantization and optimization) to: https://github.com/huggingface/optimum/blob/main/tests/cli/test_cli.py

I think it is fine to do it in two steps, first exporting and then quantizing / optimizing in the same test. I'd recommend to use tiny models, one encoder-only, one decoder-only, and one encoder-decoder for example. The way I'd do it is to use tempfile lib, and do the tests within a with tempfile.TemporaryDirectory() as tmpdirname: context, something like:

def test_save_model(self):
with tempfile.TemporaryDirectory() as tmpdirname:
model = ORTModel.from_pretrained(self.LOCAL_MODEL_PATH)
model.save_pretrained(tmpdirname)
# folder contains all config files and ONNX exported model
folder_contents = os.listdir(tmpdirname)
self.assertTrue(ONNX_WEIGHTS_NAME in folder_contents)
self.assertTrue(CONFIG_NAME in folder_contents)

For datasets requirement, it is used in type hints, but I think you can add it in the requirements nonetheless since some methods in onnxruntime/configuration.py expect it to be installed:

REQUIRED_PKGS = [

@jplu
Copy link
Contributor Author

jplu commented Jan 18, 2023

I think I addressed all the points. Let me know @fxmarty if I missed something :)

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks great! Thanks for the addition! Maybe @michaelbenayoun if you want to have a look otherwise I'll merge.

Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments suggestions about the command line, @fxmarty @jplu wdty?
About the small feature request, wdty as well? This can also be done in a later PR.

def parse_args_onnxruntime_optimize(parser):
required_group = parser.add_argument_group("Required arguments")
required_group.add_argument(
"--onnx_model",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not make that just a regular argument:

Suggested change
"--onnx_model",
"onnx_model",

That way we can do:

optimum-cli onnxruntime optimize path_to_my_model -O2 my_output

Which seems less heavy IMO.

@fxmarty and maybe we should do the same for exporters, wdty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can do that for the onnx_model argument.

Copy link
Contributor

@fxmarty fxmarty Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having several unnamed arguments makes things less readable and error prone. I'm not in favor personally, but it's taste.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, let's keep it like that then!

Comment on lines +19 to +20
"-o",
"--output",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output is now optional and it is not possible to make it regular like for export because the command takes by default the onnx_model folder as default output. Unless this is not the behavior you would like?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it like that then!

def parse_args_onnxruntime_quantize(parser):
required_group = parser.add_argument_group("Required arguments")
required_group.add_argument(
"--onnx_model",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment


optional_group = parser.add_argument_group("Optional arguments")
optional_group.add_argument(
"-o",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment

type=Path,
help="Path to the directory where to store generated ONNX model. (defaults to --onnx_model value).",
)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being able to provide predefined optimization configs is great.
Could we also add the possibility to provide a path where an ORTConfig is stored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this yes.

Copy link
Member

@michaelbenayoun michaelbenayoun Jan 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would allow more custom usage!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@michaelbenayoun Do you mean ORTConfig or OptimizationConfig?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ORTConfig since those are the ones we push to the Hub.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

help="Compute the quantization parameters on a per-channel basis.",
)

level_group = parser.add_mutually_exclusive_group(required=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as the optimization level, maybe we could add the possibility to specify a path to an ORTConfig.

@jplu
Copy link
Contributor Author

jplu commented Jan 26, 2023

@fxmarty some tests failed but I don't know if they are related to my changes.

@fxmarty
Copy link
Contributor

fxmarty commented Jan 27, 2023

@jplu No don't worry, these are breaking from the last transformers release.

@michaelbenayoun feel free to rereview

Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
I just left a few comments about custom ORTConfig and after those, we should be able to merge!

docs/source/onnxruntime/usage_guides/optimization.mdx Outdated Show resolved Hide resolved
docs/source/onnxruntime/usage_guides/quantization.mdx Outdated Show resolved Hide resolved
optimum/commands/onnxruntime/optimize.py Outdated Show resolved Hide resolved
optimization_config = AutoOptimizationConfig.O3()
elif self.args.O4:
optimization_config = AutoOptimizationConfig.O4()
else:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
elif self.args.config:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same answer than below.

optimization_config = AutoOptimizationConfig.O4()
else:
optimization_config = ORTConfig.get_config_dict(self.args.config).optimization

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
else:
raise ValueError("An optimization configuration must be provided, either by using the predefined optimization configurations (O1, O2, O3, O4) or by specifying the path to a custom ORTCOnfig")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an exclusive group, at least one is mandatory, if not, an error is raised. Then this part is not useful.

Comment on lines 73 to 75
else:
qconfig = ORTConfig.get_config_dict(self.args.config).quantization

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for the optimization

optimize_commands = [
f"optimum-cli onnxruntime optimize --onnx_model {tempdir}/encoder -O1",
f"optimum-cli onnxruntime optimize --onnx_model {tempdir}/decoder -O1",
f"optimum-cli onnxruntime optimize --onnx_model {tempdir}/encoder-decoder -O1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add test with custom config here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you already have an ORTConfig file for testing usage somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fxmarty @michaelbenayoun I have an issue for testing the ORTConfig parameter. Apparently the parameters ORTConfig.optimization and ORTConfig.quantization are dictionaries. How can I get them back to usual dataclass object? As the quantize and optimize methods expect to have objects and not dictionaries.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not find one so forget about i!

optimize_commands = [
f"optimum-cli onnxruntime quantize --onnx_model {tempdir}/encoder --avx2",
f"optimum-cli onnxruntime quantize --onnx_model {tempdir}/decoder --avx2",
f"optimum-cli onnxruntime quantize --onnx_model {tempdir}/encoder-decoder --avx2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments

-O2 Basic and extended general optimizations, transformers-specific fusions (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
-O3 Same as O2 with Gelu approximation (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).
-O4 Same as O3 with mixed precision (see: https://huggingface.co/docs/optimum/onnxruntime/usage_guides/optimization for more details).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add --config details here

--avx2 Quantization with AVX-2 instructions.
--avx512 Quantization with AVX-512 instructions.
--avx512_vnni Quantization with AVX-512 and VNNI instructions.
--tensorrt Quantization for NVIDIA TensorRT optimizer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add --config details here

jplu and others added 5 commits January 31, 2023 09:43
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
@michaelbenayoun
Copy link
Member

Thanks a lot for the contribution!

@michaelbenayoun michaelbenayoun merged commit 17b76db into huggingface:main Feb 2, 2023
@jplu jplu deleted the add-optimize-quantize-cli branch February 3, 2023 09:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants