-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds OWLViT to models exportable with ONNX #18588
Conversation
The documentation is not available anymore as the PR was closed or merged. |
@unography That's strange that it does not work for object detection. It should actually work, DETR and YOLOS are exportable to ONNX for instance (see here). What is the error you get when trying to export the model for object detection? |
@regisss I think it just needs to be defined in the config for AutoModel, for Object detection here This is the stacktrace - _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cls = <class 'transformers.models.auto.modeling_auto.AutoModelForObjectDetection'>
config = OwlViTConfig {
"_commit_hash": "7cc55348dae46396474cd94bf00a542167a10f8d",
"_name_or_path": "google/owlvit-base-pa...nsformers_version": "4.22.0.dev0",
"typical_p": 1.0,
"use_bfloat16": false
},
"vision_config_dict": null
}
kwargs = {}, trust_remote_code = False
@classmethod
def from_config(cls, config, **kwargs):
trust_remote_code = kwargs.pop("trust_remote_code", False)
if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
if not trust_remote_code:
raise ValueError(
"Loading this model requires you to execute the modeling file in that repo "
"on your local machine. Make sure you have read the code there to avoid malicious use, then set "
"the option `trust_remote_code=True` to remove this error."
)
if kwargs.get("revision", None) is None:
logger.warning(
"Explicitly passing a `revision` is encouraged when loading a model with custom code to ensure "
"no malicious code has been contributed in a newer revision."
)
class_ref = config.auto_map[cls.__name__]
module_file, class_name = class_ref.split(".")
model_class = get_class_from_dynamic_module(config.name_or_path, module_file + ".py", class_name, **kwargs)
return model_class._from_config(config, **kwargs)
elif type(config) in cls._model_mapping.keys():
model_class = _get_model_class(config, cls._model_mapping)
return model_class._from_config(config, **kwargs)
> raise ValueError(
f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
)
E ValueError: Unrecognized configuration class <class 'transformers.models.owlvit.configuration_owlvit.OwlViTConfig'> for this kind of AutoModel: AutoModelForObjectDetection.
E Model type should be one of DetrConfig, YolosConfig.
src/transformers/models/auto/auto_factory.py:412: ValueError |
Hi @unography and @regisss! OWL-ViT is not a part of the object detection pipeline because it requires both image and search queries as input. We are planning to add a zero-shot-object-detection pipeline for OWL-ViT (see this issue). |
Thanks for the information @alaradirik :) @unography Let's keep only the default pipeline as you did then. I had to change one |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @unography!!
Thanks for this PR 😃
Pinging @sgugger for final approval |
@regisss ya sorry i missed the |
Hey @lewtun, would you like to have a look at this and merge if it looks good to you? |
@lewtun Can you take a quick look at this PR and merge it when you approve? 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for adding ONNX support for this brand new model @unography !
I've left a comment about the dynamic axes of pixel_values
, but otherwise this is looking great 🔥
return OrderedDict( | ||
[ | ||
("input_ids", {0: "batch", 1: "sequence"}), | ||
("pixel_values", {0: "batch"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the forward pass docs, pixel_values
has shape [batch_size, num_channels, height, width]
Does OWLViT typically work with dynamic shapes (e.g different sized images)? If yes, I think it would make sense to replace this with:
("pixel_values", {0: "batch"}), | |
("pixel_values", {0: "batch", 1: "num_channels", 2: "height", 3: "width"}), |
(Similar to what we did with YOLOS)
Do you agree @regisss ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it works with dynamic shapes, cc @alaradirik
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, makes sense, added the change
@@ -687,7 +687,10 @@ def forward( | |||
last_hidden_state = self.final_layer_norm(last_hidden_state) | |||
|
|||
# take features from the end of tokens embedding (end of token is the highest number in each sequence) | |||
pooled_output = last_hidden_state[torch.arange(last_hidden_state.shape[0]), input_ids.argmax(dim=-1)] | |||
# casting to torch.int for onnx compatibility: argmax doesn't support int64 inputs with opset 14 | |||
pooled_output = last_hidden_state[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking with @alaradirik whether this change to the modeling code is OK (I think it is)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lewtun yes, that's ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the dynamic shapes @unography - this LGTM 🚀 !
Gently pinging @sgugger for final approval
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks to everyone involved!
Output for tests on my local machine:
Note: Haven't tested this on GPU yet, don't have a GPU machine with me currently.
Also, this is for the
default
task of OWLViT. Theobject-detection
task isn't supported by AutoModel yet, because of which if I add that to onnx it's failing currently. Should I make the change for AutoModel as well?cc: @chainyo