-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add image height and width to ONNX dynamic axes #18915
Conversation
@@ -332,7 +332,7 @@ def inputs(self) -> Mapping[str, Mapping[int, str]]: | |||
return OrderedDict( | |||
[ | |||
("input_ids", {0: "batch", 1: "sequence"}), | |||
("pixel_values", {0: "batch"}), | |||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that the CLIP export was also missing num_channels
as a dynamic axis, so included it here as well
@@ -203,7 +203,7 @@ def inputs(self) -> Mapping[str, Mapping[int, str]]: | |||
("input_ids", {0: "batch", 1: "sequence"}), | |||
("attention_mask", {0: "batch", 1: "sequence"}), | |||
("bbox", {0: "batch", 1: "sequence"}), | |||
("pixel_values", {0: "batch", 1: "sequence"}), | |||
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following #17976 I've renamed sequence
to num_channels
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks @lewtun!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for working on this!
What does this PR do?
This PR enables dynamic axes for image height / width of ONNX vision models. This allows users to change the height and width of their inputs at runtime with values different from those used to trace the model during the export (usually 224 x 224 pixels)
Here's an example with ResNet and
optimum
:I've also checked the slow tests pass: