Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add image height and width to ONNX dynamic axes #18915

Merged
merged 1 commit into from
Sep 7, 2022

Conversation

lewtun
Copy link
Member

@lewtun lewtun commented Sep 7, 2022

What does this PR do?

This PR enables dynamic axes for image height / width of ONNX vision models. This allows users to change the height and width of their inputs at runtime with values different from those used to trace the model during the export (usually 224 x 224 pixels)

Here's an example with ResNet and optimum:

import requests
from PIL import Image
from optimum.onnxruntime import ORTModelForImageClassification
from transformers import AutoFeatureExtractor

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
# Raw image size 480 x 640 pixels
image = Image.open(requests.get(url, stream=True).raw)
# Resize image to 40 x 40 pixels
preprocessor = AutoFeatureExtractor.from_pretrained("microsoft/resnet-50",  do_resize=True, size=40)
model = ORTModelForImageClassification.from_pretrained("onnx")
inputs = preprocessor(images=image, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
logits.shape

I've also checked the slow tests pass:

RUN_SLOW=1 pytest tests/onnx/test_onnx_v2.py -k "beit or clip or convnext or data2vec-vision or deit or detr or layoutlmv3 or levit or mobilevit or resnet or vit" -s

@@ -332,7 +332,7 @@ def inputs(self) -> Mapping[str, Mapping[int, str]]:
return OrderedDict(
[
("input_ids", {0: "batch", 1: "sequence"}),
("pixel_values", {0: "batch"}),
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that the CLIP export was also missing num_channels as a dynamic axis, so included it here as well

@@ -203,7 +203,7 @@ def inputs(self) -> Mapping[str, Mapping[int, str]]:
("input_ids", {0: "batch", 1: "sequence"}),
("attention_mask", {0: "batch", 1: "sequence"}),
("bbox", {0: "batch", 1: "sequence"}),
("pixel_values", {0: "batch", 1: "sequence"}),
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following #17976 I've renamed sequence to num_channels

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 7, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@regisss regisss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks @lewtun!!

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this!

@lewtun lewtun merged commit 6519150 into main Sep 7, 2022
@lewtun lewtun deleted the lewtun/add-onnx-vision-features branch September 7, 2022 20:42
oneraghavan pushed a commit to oneraghavan/transformers that referenced this pull request Sep 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants