v1.6.0: Optimum CLI, Stable Diffusion ONNX export, BetterTransformer & ONNX support for more architectures
Optimum CLI
The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:
optimum-cli --help
optimum-cli export onnx --help
optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/
Stable Diffusion ONNX export
Optimum now supports the ONNX export of stable diffusion models from the diffusers library:
optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/
- Add Stable Diffusion ONNX export by @echarlaix in #570
BetterTransformer support for more architectures
BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT
The complete list of supported models is available in the documentation.
- [BT] Add
Bettertransformer
support for FSMT by @Sumanth077 in #494 - [BT] add
BetterTransformer
support for ViLT architecture by @ka00ri in #508 - Add
MBart
support forBetterTransformer
by @ravenouse in #516 - Add CLIP BetterTransformer by @fxmarty in #534
- Add BetterTransformer support for RemBERT by @hchings in #545
ONNX export for more architectures
The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.
- Add Swin support in exporters.onnx by @fxmarty in #528
- [
ONNX
] addmobilenet
support by @younesbelkada in #633
Extended ONNX export for encoder-decoder and decoder models
Encoder-decoder or decoder-only models normally making use of the generate()
method in transformers can now be exported in several files using the --for-ort
argument:
optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx
yielding:
.
└── t5_small_onnx
├── config.json
├── decoder_model.onnx
├── decoder_with_past_model.onnx
├── encoder_model.onnx
├── special_tokens_map.json
├── spiece.model
├── tokenizer_config.json
└── tokenizer.json
Passing --for-ort
, exported models are expected to be loadable directly into ORTModel.
- Add ort export in exporters for encoder-decoder models by @mht-sharma in #497
- Support decoder generated with
--for-ort
fromoptimum.exporters.onnx
inORTDecoder
by @fxmarty in #554
Support for ONNX models with external data at export, optimization, quantization
The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a .onnx_data
file if necessary.
- Handling ONNX models with external data by @NouamaneTazi in #586
- Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by @JingyaHuang in #332
ONNX Runtime API improvement
Various improvements to allow for a better user experience in the ONNX Runtime integration:
-
ORTModel
,ORTModelDecoder
andORTModelForConditionalGeneration
can now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument. -
ORTModel.from_pretrained()
withfrom_transformers=True
now downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it. -
ORTQuantizer.save_pretrained()
now saves the model configuration and the preprocessor, making the exported directory usable end-to-end. -
ORTOptimizer.save_pretrained()
now saves the preprocessor, making the exported directory usable end-to-end. -
ONNX Runtime integration API improvement by @michaelbenayoun in #515
Custom shapes support at ONNX export
The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.
Read more: optimum-cli export onnx --help
- Support custom shapes for dummy inputs by @fxmarty in #522
- Support for custom input shapes in exporters onnx by @fxmarty in #575
Enable use_cache=True
for ORTModelForCausalLM
Reusing past key values for models using ORTModelForCausalLM (e.g. gpt2) is now possible using use_cache=True
, avoiding to recompute them at each iteration of the decoding:
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTModelForCausalLM
import torch
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = ORTModelForCausalLM.from_pretrained("gpt2", from_transformers=True, use_cache=True)
inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")
gen_tokens = model.generate(**inputs)
tokenizer.batch_decode(gen_tokens)
- Enable past_key_values for ORTModelForCausalLM by @echarlaix in #326
IO binding support for ORTModelForCustomTasks
ORTModelForCustomTasks now supports IO Binding when using CUDAExecutionProvider.
- Add IO binding support for custom ORTModel by @JingyaHuang in #447
Experimental support to merge ONNX decoder with/without past key values
Along with --for-ort
, when passing --task causal-lm-with-past
, --task seq2seq-with-past
or --task speech2seq-lm-with-past
during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.
An experimental support is introduced to merge the two models in one. Example:
optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/
import onnx
from optimum.onnx import merge_decoders
decoder = onnx.load("t5_onnx/decoder_model.onnx")
decoder_with_past = onnx.load("t5_onnx/decoder_with_past_model.onnx")
merged_model = merge_decoders(decoder, decoder_with_past)
onnx.save(merged_model, "t5_onnx/decoder_merged_model.onnx")
- Merge ONNX decoder models by @JingyaHuang in #587
Major bugs fixed
- Fix BetterTransformer with padding="max_length" by @fxmarty in #543
- Fix non-nesting bug in BetterTransformer integration by @younesbelkada in #637
Other changes, bugfixes and improvements
- Fix doc-builder premission error by @mishig25 in #482
- Fix doc build pr premissions by @mishig25 in #484
- Re-order the task manager doc by @michaelbenayoun in #483
- Fix whisper device for gpu test by @fxmarty in #486
- Fix tensorflow CI by @fxmarty in #489
- Fix PR doc generation by @regisss in #495
- Fix broken links in the doc by @fxmarty in #499
- Update iobinding ORT encoder whisper by @mht-sharma in #498
- fix NormalizedConfig init error message by @PaulQbFeng in #500
- Change import structure for ORTModel by @fxmarty in #456
- [BT] Fix failing CI tests by @younesbelkada in #501
- Remove redundant condition statement in ORTDecoder(Seq2seq) by @JingyaHuang in #504
- [BT] put decorator on the correct place by @younesbelkada in #509
- [BT] clearer error message for
norm_first
by @younesbelkada in #510 - Deprecate PyTorch 1.12. for BetterTransformer by @fxmarty in #513
- Fix ORTModelForSeq2SeqLM test by @fxmarty in #455
- Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by @fxmarty in #514
- [BT] Fix doc bugs by @younesbelkada in #517
- Replace sklearn by scikit-learn by @lesteve in #502
- ORTModel uses optimum.exporters.onnx by @michaelbenayoun in #490
- Cleanup deprecated ONNX Runtime training docker files by @JingyaHuang in #523
- Added support for Tapas Model by @JuheonChu in #520
- Add benchmark results to gpu doc by @JingyaHuang in #525
- ORTModelForConditionalGeneration uses optimum.exporters.onnx by @mht-sharma in #529
- Better error message when wrong task is given to exporters by @fxmarty in #531
- Add OrtModelForSpeechSeq2Seq to doc by @fxmarty in #533
- Fold sections by default in the documentation's side-bar by @regisss in #535
- Import GenerationMixin from transformers.generation if transformers >= 4.25.0 by @regisss in #536
- Add check_if_transformers_greater to manage different versions of transformers by @regisss in #537
- Enable to push some sections to the end of the TOC in the doc by @regisss in #532
- Fix import in ONNX export CLI by @fxmarty in #553
- Update readme by @echarlaix in #550
- Refactor of 2 functions used in ORTModel by @michaelbenayoun in #551
- Update readme by @echarlaix in #556
- Fix ORTTrainer wrapper duplication / PyTorch evaluate / update with transformers 4.25.1 by @JingyaHuang in #561
- Fix flaky BetterTransformer test by @fxmarty in #564
- enable FP16Optimizer for fp16 deepspeed training. by @AdamLouly in #547
- Update documentation quick tour section by @echarlaix in #574
- Move custom IOBinding to IOBindingHelper by @JingyaHuang in #571
- Add test for exporters.onnx CLI by @fxmarty in #573
- Documentation on quantization by @michaelbenayoun in #565
- More robust tests for ORTModel using decoders and use_cache=True by @fxmarty in #576
- Fix errors in onnxruntime modeling tests by @fxmarty in #585
- [BT] fix flaky test by @younesbelkada in #591
- Fix exporters onnx shapes by @fxmarty in #581
- Fix exporters.onnx tests by @fxmarty in #584
- Update on the ONNX Runtime documentation by @michaelbenayoun in #567
- Add the ORTModelForSemanticSegmentation class by @TheoMrc in #539
- Refactor BetterTransformer to be able to raise more informative error messages by @fxmarty in #594
- Constraint temprarily NumPy version to save CIs by @JingyaHuang in #614
- Add
encoder_last_hidden_state
as an output for encoder-decoder models by @fxmarty in #601 - Update dev version by @fxmarty in #617
- Fix documentation example by @echarlaix in #603
- Documentation improvements by @fxmarty in #598
- More informative message at ONNX export by @fxmarty in #609
- Use optimum exporter for current weight sharing test by @JingyaHuang in #616
- OnnxConfig now handle the export to encoder / decoder / decoder_with_past themselves by @michaelbenayoun in #590
- Set explictly the device index by @JingyaHuang in #613
- Fix ORT GPU test by @JingyaHuang in #624
- Add GPT-J normalized config by @fxmarty in #623
- Remove diffusers dependency in onnxruntime code by @fxmarty in #619
- Use exporters in ORTTrainer by @mht-sharma in #546
- Improve
use_io_binding
default value for different execution providers by @JingyaHuang in #604 - fixed FuseBiasInLinear by specifying device by @IlyasMoutawwakil in #630
- Fixed GPU documentation for HF pipelines by @smiraldr in #602
- Add argument in the CLI to specify device to do the ONNX export on by @fxmarty in #634
- Allow kwargs in all generate_dummy_inputs() methods by @fxmarty in #638
Full Changelog: v1.5.2...v1.6.0
Significant community contributions
The following contributors have made significant changes to the library over the last release:
- @TheoMrc
- Add ORTModelForSemanticSegmentation #539
- @ravenouse
- Add MBart support for BetterTransformer #516
- @ka00ri
- Add BetterTransformer support for ViLT architecture #508
- @Sumanth077
- Add Bettertransformer support for FSMT #494