v1.3.0: Torch FX transformations, ORTModelForSeq2SeqLM and ORTModelForImageClassification
Torch FX
The optimum.fx.optimization
module (#232) provides a set of torch.fx
graph transformations, along with classes and functions to write your own transformations and compose them.
- The
Transformation
andReversibleTransformation
represent non-reversible and reversible transformations, and it is possible to write such transformations by inheriting from those classes - The
compose
utility function enables transformation composition - Two reversible transformations were added:
MergeLinears
: merges linear layers that have the same inputChangeTrueDivToMulByInverse
: changes a division by a static value to a multiplication of its inverse
ORTModelForSeq2SeqLM
ORTModelForSeq2SeqLM
(#199) allows ONNX export and ONNX Runtime inference for Seq2Seq models.
- When exported, Seq2Seq models are decomposed into three parts : the encoder, the decoder (actually consisting of the decoder with the language modeling head), and the decoder with pre-computed key/values as additional inputs.
- This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding.
Below is an example that downloads a T5 model from the Hugging Face Hub, exports it through the ONNX format and saves it :
from optimum.onnxruntime import ORTModelForSeq2SeqLM
# Load model from hub and export it through the ONNX format
model = ORTModelForSeq2SeqLM.from_pretrained("t5-small", from_transformers=True)
# Save the exported model in the given directory
model.save_pretrained(output_dir)
ORTModelForImageClassification
ORTModelForImageClassification
(#226) allows ONNX Runtime inference for models with an image classification head.
Below is an example that downloads a ViT model from the Hugging Face Hub, exports it through the ONNX format and saves it :
from optimum.onnxruntime import ORTModelForImageClassification
# Load model from hub and export it through the ONNX format
model = ORTModelForImageClassification.from_pretrained("google/vit-base-patch16-224", from_transformers=True)
# Save the exported model in the given directory
model.save_pretrained(output_dir)
ORTOptimizer
Adds support for converting model weights from fp32 to fp16 by adding a new optimization parameter (fp16
) to OptimizationConfig
(#273).
Pipelines
Additional pipelines tasks are now supported, here is a list of the supported tasks along with the default model for each:
- Image Classification (ViT)
- Text-to-Text Generation (T5 small)
- Summarization (T5 base)
- Translation (T5 base)
Below is an example that downloads a T5 small model from the Hub and loads it with transformers pipeline for translation :
from transformers import AutoTokenizer, pipeline
from optimum.onnxruntime import ORTModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")
onnx_translation = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
text = "What a beautiful day !"
pred = onnx_translation(text)
# [{'translation_text': "C'est une belle journée !"}]
Breaking change
The ORTModelForXXX
execution provider default value is now set to CPUExecutionProvider
(#203). Before, if no execution provider was provided, it was set to CUDAExecutionProvider
if a gpu was detected, or to CPUExecutionProvider
otherwise.