-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ProcessorMixin doesn't properly instantiate image processors #29414
Comments
@NielsRogge Two things:
|
Ok, I've updated the PR description to link to the original comment. I see, so the use of autoclasses is allowed, as long as we raise an error? Would be great if we can make the verification of this general in the ProcessorMixin class so that we don't need to manually define the ValueErrors (cfr. my comment here). |
Copying the comment I made on the PR here, as my reply is basically the same but it's more relevant here:
If you've saved a processor out with a specific image processor. Then it should be possible to load with that one and then verify its one of the "allowed" processors. Part of the reason for having this protection at the moment is to try and make the limits of the processors clear: the LlavaProcessor isn't compatible with all image processors or all tokenizers, so indicating that in the docstring is misleading and unhelpful to users. Adding verification makes sure it fails early and warns users instead of things quietly behaving weirdly and us ending up with issues being raised. |
System Info
Transformers dev version v4.38.2
Who can help?
@ArthurZucker @amyeroberts
Reproduction
Let's say you define your processor as follows:
(this is mainly for demo purposes, since for PR #29012 I'd like to have
LlavaProcessor
work with 2 different image processor classes)Then, even though you create a processor as follows:
Reloading it:
This is still going to be of type
CLIPImageProcessor
, even though we want to loadViTImageProcessor
.This is because of the way we decide which class to load here. Namely, if one defines a tuple for the
image_processor_class
attribute of the processor, then always the first class is used.Expected behavior
The processor should reload the
ViTImageProcessor
instead ofCLIPImageProcessor
.The current workaround is to do this:
This correctly instantiates the image processor. However, in PR #29012, @amyeroberts suggested that the use of the Auto class is discouraged and it might be more appropriate to define the specific classes.
I remember from the past that Sylvain had no problem regarding the use of the Auto class, but I'm up for discussion. It's definitely a bit inconsistent that we define explicit classes for the tokenizers, but not for the image processor.
Looking at it, I think the Auto class serve the exact purpose of what we're trying to achieve: loading the proper image processor class based on the preprocessor_config.json. Hence I'm wondering whether we shouldn't be leveraging the Auto classes by default for the
image_processor_class
andtokenizer_class
attributes of multimodal processors.The text was updated successfully, but these errors were encountered: