Add ORTModelForVision2Seq for VisionEncoderDecoder models inference #742

mht-sharma · 2023-02-03T12:28:23Z

What does this PR do?

This PR enables the inference of the VisionEncoderDecoder models using ONNXRuntime. The PR adds ORTModelForVision2Seq for doing inference similar to AutoModelForVision2Seq by changing just a few lines.

Usage

>>> from PIL import Image
>>> from transformers import GPT2TokenizerFast, ViTImageProcessor
->>>  from transformers import AutoModelForVision2Seq
+>>>  from optimum.onnxruntime import ORTModelForVision2Seq
>>> import requests

>>> model_name = "nlpconnect/vit-gpt2-image-captioning"
->>> model = AutoModelForVision2Seq.from_pretrained(model_name)
+>>> model = ORTModelForVision2Seq.from_pretrained(model_name, from_transformers=True)
>>> tokenizer = GPT2TokenizerFast.from_pretrained(model_name)
>>> image_processor = ViTImageProcessor.from_pretrained(model_name)

>>> url = "http://images.cocodataset.org/val2017/000000039769.jpg"
>>> image = Image.open(requests.get(url, stream=True).raw)
>>> pixel_values = image_processor(image, return_tensors="pt").pixel_values

>>> generated_ids = model.generate(pixel_values)
>>> generated_text = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
>>> print(generated_text)

Limitations

Donut model not supported
TrOCR model not supported with use_cache=True.
IObinding not supported

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2023-02-03T12:48:09Z

The documentation is not available anymore as the PR was closed or merged.

fxmarty

LGTM as long as the tests pass, thank you for the addition! Could you also make sure that the code snippet in the documentation is valid?

optimum/onnxruntime/modeling_seq2seq.py

fxmarty · 2023-02-06T08:35:01Z

optimum/onnxruntime/modeling_seq2seq.py

+ Indices of decoder input sequence tokens in the vocabulary of shape `(batch_size, decoder_sequence_length)`.
+ encoder_outputs (`torch.FloatTensor`):
+ The encoder `last_hidden_state` of shape `(batch_size, encoder_sequence_length, hidden_size)`.
+ past_key_values (`tuple(tuple(torch.FloatTensor), *optional*)`


Suggested change

past_key_values (`tuple(tuple(torch.FloatTensor), *optional*)`

past_key_values (`tuple(tuple(torch.FloatTensor), *optional*)`

We should not put *optional* in Optimum following this discussion: https://huggingface.slack.com/archives/C02P0559X9S/p1669628754896019

Can rather precise what the default is? Although, for the type: Tuple[Tuple[torch.FloatTensor]]

(edit: this is copy paste, so could you edit it as well in the rest?)

Added default to None as in the function sig

fxmarty · 2023-02-06T08:42:33Z

tests/onnxruntime/test_modeling.py

+ def exclude_trocr_with_cache(params):
+ if params[0] == "trocr" and params[1] == True:
+ return None
+ return params


Just for my knowledge, why is this not supported? What will happen if an user tries to use ORTModelForVision2Seq with trocr with use_cache = True?

The model currently not output any past_key_values when use_cache=True. So need to figure out why. Probably something wrong in the modeling code.

| What will happen if an user tries to use ORTModelForVision2Seq with trocr with use_cache = True?

For this I have added an error message during export.

optimum/utils/normalized_config.py

michaelbenayoun

LGTM!

What's the status for the pat key / values?

optimum/onnxruntime/base.py

mht-sharma · 2023-02-06T14:20:05Z

LGTM!

What's the status for the pat key / values?

Past key values is not supported only for TrOCR, would look into this after Donut

mht-sharma marked this pull request as ready for review February 3, 2023 18:57

mht-sharma requested review from michaelbenayoun and fxmarty February 3, 2023 18:58

mht-sharma changed the title ~~add ORTModelForVision2Seq~~ Add ORTModelForVision2Seq for VisionEncoderDecoder models inference Feb 3, 2023

fxmarty approved these changes Feb 6, 2023

View reviewed changes

michaelbenayoun approved these changes Feb 6, 2023

View reviewed changes

optimum/onnxruntime/base.py Outdated Show resolved Hide resolved

mht-sharma added 6 commits February 6, 2023 14:53

add ORTModelForVision2Seq

569731c

removed unwanted comments

95a9440

updated docstring

63fc9ef

added tests

aabb0d8

added trocr to tests

5dbd05d

added docstring example

994a61e

mht-sharma force-pushed the add_ort_support_vision_encoder_decoder branch from 61dd641 to 994a61e Compare February 6, 2023 13:54

mht-sharma added 2 commits February 6, 2023 15:04

remove deit

ce2b817

updated default

39b1f4e

mht-sharma added 3 commits February 6, 2023 15:30

udpate doc

3e6fe6c

update doc

9a6b331

update cond

ce67c6c

mht-sharma merged commit 69764f1 into huggingface:main Feb 7, 2023

mht-sharma mentioned this pull request Apr 17, 2023

Add support for ORTModelForVision2Seq #451

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ORTModelForVision2Seq for VisionEncoderDecoder models inference #742

Add ORTModelForVision2Seq for VisionEncoderDecoder models inference #742

mht-sharma commented Feb 3, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 3, 2023 •

edited

Loading

fxmarty left a comment

fxmarty Feb 6, 2023

mht-sharma Feb 6, 2023

fxmarty Feb 6, 2023

mht-sharma Feb 6, 2023

michaelbenayoun left a comment

mht-sharma commented Feb 6, 2023

	past_key_values (`tuple(tuple(torch.FloatTensor), optional)`
	past_key_values (`tuple(tuple(torch.FloatTensor), optional)`

Add ORTModelForVision2Seq for VisionEncoderDecoder models inference #742

Add ORTModelForVision2Seq for VisionEncoderDecoder models inference #742

Conversation

mht-sharma commented Feb 3, 2023 • edited Loading

What does this PR do?

Usage

Limitations

Before submitting

HuggingFaceDocBuilderDev commented Feb 3, 2023 • edited Loading

fxmarty left a comment

Choose a reason for hiding this comment

fxmarty Feb 6, 2023

Choose a reason for hiding this comment

mht-sharma Feb 6, 2023

Choose a reason for hiding this comment

fxmarty Feb 6, 2023

Choose a reason for hiding this comment

mht-sharma Feb 6, 2023

Choose a reason for hiding this comment

michaelbenayoun left a comment

Choose a reason for hiding this comment

mht-sharma commented Feb 6, 2023

mht-sharma commented Feb 3, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 3, 2023 •

edited

Loading