Release v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures · huggingface/optimum

Optimum CLI

The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:

optimum-cli --help
optimum-cli export onnx --help
optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/

Add Optimum CLI backbone by @fxmarty in #593

Stable Diffusion ONNX export

Optimum now supports the ONNX export of stable diffusion models from the diffusers library:

optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/

Add Stable Diffusion ONNX export by @echarlaix in #570

BetterTransformer support for more architectures

BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT

The complete list of supported models is available in the documentation.

[BT] Add Bettertransformer support for FSMT by @Sumanth077 in #494
[BT] add BetterTransformer support for ViLT architecture by @ka00ri in #508
Add MBart support for BetterTransformer by @ravenouse in #516
Add CLIP BetterTransformer by @fxmarty in #534
Add BetterTransformer support for RemBERT by @hchings in #545

ONNX export for more architectures

The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.

Add Swin support in exporters.onnx by @fxmarty in #528
[ONNX] add mobilenet support by @younesbelkada in #633

Extended ONNX export for encoder-decoder and decoder models

Encoder-decoder or decoder-only models normally making use of the generate() method in transformers can now be exported in several files using the --for-ort argument:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx

yielding:

.
└── t5_small_onnx
    ├── config.json
    ├── decoder_model.onnx
    ├── decoder_with_past_model.onnx
    ├── encoder_model.onnx
    ├── special_tokens_map.json
    ├── spiece.model
    ├── tokenizer_config.json
    └── tokenizer.json

Passing --for-ort, exported models are expected to be loadable directly into ORTModel.

Add ort export in exporters for encoder-decoder models by @mht-sharma in #497
Support decoder generated with --for-ort from optimum.exporters.onnx in ORTDecoder by @fxmarty in #554

Support for ONNX models with external data at export, optimization, quantization

The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a .onnx_data file if necessary.

Handling ONNX models with external data by @NouamaneTazi in #586
Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by @JingyaHuang in #332

ONNX Runtime API improvement

Various improvements to allow for a better user experience in the ONNX Runtime integration:

ORTModel, ORTModelDecoder and ORTModelForConditionalGeneration can now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument.
ORTModel.from_pretrained() with from_transformers=True now downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it.
ORTQuantizer.save_pretrained() now saves the model configuration and the preprocessor, making the exported directory usable end-to-end.
ORTOptimizer.save_pretrained() now saves the preprocessor, making the exported directory usable end-to-end.
ONNX Runtime integration API improvement by @michaelbenayoun in #515

Custom shapes support at ONNX export

The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.

Read more: optimum-cli export onnx --help

Support custom shapes for dummy inputs by @fxmarty in #522
Support for custom input shapes in exporters onnx by @fxmarty in #575

Enable `use_cache=True` for ORTModelForCausalLM

Reusing past key values for models using ORTModelForCausalLM (e.g. gpt2) is now possible using use_cache=True, avoiding to recompute them at each iteration of the decoding:

from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = ORTModelForCausalLM.from_pretrained("gpt2", from_transformers=True, use_cache=True)

inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")

gen_tokens = model.generate(**inputs)
tokenizer.batch_decode(gen_tokens)

Enable past_key_values for ORTModelForCausalLM by @echarlaix in #326

IO binding support for ORTModelForCustomTasks

ORTModelForCustomTasks now supports IO Binding when using CUDAExecutionProvider.

Add IO binding support for custom ORTModel by @JingyaHuang in #447

Experimental support to merge ONNX decoder with/without past key values

Along with --for-ort, when passing --task causal-lm-with-past , --task seq2seq-with-past or --task speech2seq-lm-with-past during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.

An experimental support is introduced to merge the two models in one. Example:

optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/

import onnx
from optimum.onnx import merge_decoders

decoder = onnx.load("t5_onnx/decoder_model.onnx")
decoder_with_past = onnx.load("t5_onnx/decoder_with_past_model.onnx")

merged_model = merge_decoders(decoder, decoder_with_past)
onnx.save(merged_model, "t5_onnx/decoder_merged_model.onnx")

Merge ONNX decoder models by @JingyaHuang in #587

Major bugs fixed

Fix BetterTransformer with padding="max_length" by @fxmarty in #543
Fix non-nesting bug in BetterTransformer integration by @younesbelkada in #637

Other changes, bugfixes and improvements

Fix doc-builder premission error by @mishig25 in #482
Fix doc build pr premissions by @mishig25 in #484
Re-order the task manager doc by @michaelbenayoun in #483
Fix whisper device for gpu test by @fxmarty in #486
Fix tensorflow CI by @fxmarty in #489
Fix PR doc generation by @regisss in #495
Fix broken links in the doc by @fxmarty in #499
Update iobinding ORT encoder whisper by @mht-sharma in #498
fix NormalizedConfig init error message by @PaulQbFeng in #500
Change import structure for ORTModel by @fxmarty in #456
[BT] Fix failing CI tests by @younesbelkada in #501
Remove redundant condition statement in ORTDecoder(Seq2seq) by @JingyaHuang in #504
[BT] put decorator on the correct place by @younesbelkada in #509
[BT] clearer error message for norm_first by @younesbelkada in #510
Deprecate PyTorch 1.12. for BetterTransformer by @fxmarty in #513
Fix ORTModelForSeq2SeqLM test by @fxmarty in #455
Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by @fxmarty in #514
[BT] Fix doc bugs by @younesbelkada in #517
Replace sklearn by scikit-learn by @lesteve in #502
ORTModel uses optimum.exporters.onnx by @michaelbenayoun in #490
Cleanup deprecated ONNX Runtime training docker files by @JingyaHuang in #523
Added support for Tapas Model by @JuheonChu in #520
Add benchmark results to gpu doc by @JingyaHuang in #525
ORTModelForConditionalGeneration uses optimum.exporters.onnx by @mht-sharma in #529
Better error message when wrong task is given to exporters by @fxmarty in #531
Add OrtModelForSpeechSeq2Seq to doc by @fxmarty in #533
Fold sections by default in the documentation's side-bar by @regisss in #535
Import GenerationMixin from transformers.generation if transformers >= 4.25.0 by @regisss in #536
Add check_if_transformers_greater to manage different versions of transformers by @regisss in #537
Enable to push some sections to the end of the TOC in the doc by @regisss in #532
Fix import in ONNX export CLI by @fxmarty in #553
Update readme by @echarlaix in #550
Refactor of 2 functions used in ORTModel by @michaelbenayoun in #551
Update readme by @echarlaix in #556
Fix ORTTrainer wrapper duplication / PyTorch evaluate / update with transformers 4.25.1 by @JingyaHuang in #561
Fix flaky BetterTransformer test by @fxmarty in #564
enable FP16Optimizer for fp16 deepspeed training. by @AdamLouly in #547
Update documentation quick tour section by @echarlaix in #574
Move custom IOBinding to IOBindingHelper by @JingyaHuang in #571
Add test for exporters.onnx CLI by @fxmarty in #573
Documentation on quantization by @michaelbenayoun in #565
More robust tests for ORTModel using decoders and use_cache=True by @fxmarty in #576
Fix errors in onnxruntime modeling tests by @fxmarty in #585
[BT] fix flaky test by @younesbelkada in #591
Fix exporters onnx shapes by @fxmarty in #581
Fix exporters.onnx tests by @fxmarty in #584
Update on the ONNX Runtime documentation by @michaelbenayoun in #567
Add the ORTModelForSemanticSegmentation class by @TheoMrc in #539
Refactor BetterTransformer to be able to raise more informative error messages by @fxmarty in #594
Constraint temprarily NumPy version to save CIs by @JingyaHuang in #614
Add encoder_last_hidden_state as an output for encoder-decoder models by @fxmarty in #601
Update dev version by @fxmarty in #617
Fix documentation example by @echarlaix in #603
Documentation improvements by @fxmarty in #598
More informative message at ONNX export by @fxmarty in #609
Use optimum exporter for current weight sharing test by @JingyaHuang in #616
OnnxConfig now handle the export to encoder / decoder / decoder_with_past themselves by @michaelbenayoun in #590
Set explictly the device index by @JingyaHuang in #613
Fix ORT GPU test by @JingyaHuang in #624
Add GPT-J normalized config by @fxmarty in #623
Remove diffusers dependency in onnxruntime code by @fxmarty in #619
Use exporters in ORTTrainer by @mht-sharma in #546
Improve use_io_binding default value for different execution providers by @JingyaHuang in #604
fixed FuseBiasInLinear by specifying device by @IlyasMoutawwakil in #630
Fixed GPU documentation for HF pipelines by @smiraldr in #602
Add argument in the CLI to specify device to do the ONNX export on by @fxmarty in #634
Allow kwargs in all generate_dummy_inputs() methods by @fxmarty in #638

Full Changelog: v1.5.2...v1.6.0

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@TheoMrc
- Add ORTModelForSemanticSegmentation #539
@ravenouse
- Add MBart support for BetterTransformer #516
@ka00ri
- Add BetterTransformer support for ViLT architecture #508
@Sumanth077
- Add Bettertransformer support for FSMT #494

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures

Optimum CLI

Stable Diffusion ONNX export

BetterTransformer support for more architectures

ONNX export for more architectures

Extended ONNX export for encoder-decoder and decoder models

Support for ONNX models with external data at export, optimization, quantization

ONNX Runtime API improvement

Custom shapes support at ONNX export

Enable `use_cache=True` for ORTModelForCausalLM

IO binding support for ORTModelForCustomTasks

Experimental support to merge ONNX decoder with/without past key values

Major bugs fixed

Other changes, bugfixes and improvements

Significant community contributions

Contributors

v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures

Optimum CLI

Stable Diffusion ONNX export

BetterTransformer support for more architectures

ONNX export for more architectures

Extended ONNX export for encoder-decoder and decoder models

Support for ONNX models with external data at export, optimization, quantization

ONNX Runtime API improvement

Custom shapes support at ONNX export

Enable use_cache=True for ORTModelForCausalLM

IO binding support for ORTModelForCustomTasks

Experimental support to merge ONNX decoder with/without past key values

Major bugs fixed

Other changes, bugfixes and improvements

Significant community contributions

Contributors

Enable `use_cache=True` for ORTModelForCausalLM