Add support for args to ProcessorMixin for backward compatibility #33479

yonigozlan · 2024-09-13T16:23:27Z

What does this PR do?

Split from #32841 by @leloykun .
Add support for args inputs to processor calls for backward compatibility + improve generalization in ProcessorTesterMixin.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

yonigozlan · 2024-09-13T22:16:36Z

src/transformers/models/llava_onevision/processing_llava_onevision.py

        videos: VideoInput = None,
-        **kwargs,
+        **kwargs: Unpack[LlavaOnevisionProcessorKwargs],


@zucchini-nlp It seems that the Unpack was in the init instead of the call function of the processor. Do you think it's worth a separate PR or can we include it here?

oops, yes, can be here, thanks! Regarding adding audio=None before video, while audio won't be used for this model: seems to be counterintuitive to me.

@molbap welcome back! We had questions about changing order of main input args in your absence, and this is also distantnly related to that. Should we be adding unused args with this order?

hey! glad to be back, github notifications are hefty 😁
keeping audio=None is a bit strange, but it's the price for having always the same signature of constant size. I'd be pro-keeping always the same order as well. This is because in terms of integration of transformers in other tools, it makes things (a bit) easier, and would allow more future impact for processors. But if you think of something smart along the lines of informing a user of which modality/types are accepted by the model, I'm open to it!
edit: pressed enter too soon

yonigozlan · 2024-09-13T22:19:18Z

src/transformers/models/llava_onevision/processing_llava_onevision.py

@@ -82,7 +82,7 @@ class LlavaOnevisionProcessor(ProcessorMixin):
        "image_token",
        "video_token",
    ]
-    image_processor_class = "AutoImageProcessor"
+    image_processor_class = "LlavaOnevisionImageProcessor"


@zucchini-nlp This is because right now, using AutoImageProcessor.from_pretrained after saving the models in the kwargs tests will actually load the last preprocessor saved, which is a LlavaOnevisionVideoProcessor here. This looks like a larger problem so I'll open a separate issue.

Yes, sorry for not noting that when merging the model. Is there any errors caused apart from cases when image and video processors can have different config values?

I think we can remove these changes now this has been resolved for the onevision processor

yonigozlan · 2024-09-13T22:22:21Z

tests/test_processing_common.py

+        processor_components["image_processor"] = self.get_component(
+            "image_processor", do_rescale=True, rescale_factor=-1
+        )


I changed the testing of images kwargs from size and crop_size to do_rescale and rescale_factor, as the fact that not all image_processors accept size or crop_size was making it hard to generalize to most models.

edit: hadn't finished the last sentence 😅

That's a nice workaround if do_rescale is indeed present for all models with image processors. So what's tested now is

self.assertLessEqual(inputs[self.images_data_arg_name].mean(), 0)

With a rescale_factor of -1 as you set it, will that work in all cases? or rather, is it enough to say that the image has indeed been rescaled?

Since we know the images will be those from the prepare_inputs function and that they are randomly set in 0...255, this seems to be enough for me. I agree it's a hacky workaround though. But I really struggle to find something that would work with most image processors and that's easy to test...

Yes, fair enough - my point was that the prepare_image_inputs is necessary and thus inseparable from this test. I think do_rescale is used in 62 models, and not used in 3 (out of 65 models with image_processing capabilities) namely

pix2struct

imagegpt

layoutlmv2

two of these see less usage now and we can handle their processing on the side. In conclusion, OK for me, even though it's strange to see it, maybe just a small docstring to explain why this test exists

Thanks for pointing out which models do not support do_rescale. I will make sure to use other tests for those.
And you are right I will add a small docstring to explain what is happening.

yonigozlan · 2024-09-16T16:05:08Z

Looks like the test errors were unrelated to this PR, so it should be good to review now :). Also glad to see you back @molbap!

molbap

Tricky tricky stuff. Nice, looking forward to having this merged, will reduce complexity a fair bit

molbap · 2024-09-19T14:39:18Z

tests/test_processing_common.py

+        processor_components["image_processor"] = self.get_component(
+            "image_processor", do_rescale=True, rescale_factor=-1
+        )


That's a nice workaround if do_rescale is indeed present for all models with image processors. So what's tested now is

self.assertLessEqual(inputs[self.images_data_arg_name].mean(), 0)

With a rescale_factor of -1 as you set it, will that work in all cases? or rather, is it enough to say that the image has indeed been rescaled?

molbap · 2024-09-19T14:40:03Z

tests/test_processing_common.py

+    text_data_arg_name = "input_ids"
+    images_data_arg_name = "pixel_values"
+    videos_data_arg_name = "pixel_values_videos"


I'm not sure if it adds clarity to have these as class attributes rather than keeping the names around (I had to scroll up to figure out what it was). Nit though, it's fine

It comes in handy when a model uses different names, such as "decoder_input_ids" for text_data_arg_name. Then we can just change those attribute in the processor tester class instead of rewriting all the tests

I agree looking through the tests it's a bit confusing (I also had to scroll up to find) but like that it makes it more easily configurable.

Perhaps we could think of a different name? "data_arg_name" isn't obvious what it is. Maybe e.g. model_input_text or text_input_name?

amyeroberts

Very nice - thanks for adding and for improving the tests!

src/transformers/processing_utils.py

amyeroberts · 2024-09-17T21:27:18Z

src/transformers/processing_utils.py

@@ -958,6 +964,64 @@ def validate_init_kwargs(processor_config, valid_kwargs):
            unused_kwargs = {k: processor_config[k] for k in unused_keys}
        return unused_kwargs

+    def prepare_and_validate_optional_call_args(self, *args):
+        """


Very nice docstring :)

That's from @leloykun :)

src/transformers/processing_utils.py

amyeroberts · 2024-09-19T16:05:32Z

src/transformers/models/llava_onevision/processing_llava_onevision.py

@@ -82,7 +82,7 @@ class LlavaOnevisionProcessor(ProcessorMixin):
        "image_token",
        "video_token",
    ]
-    image_processor_class = "AutoImageProcessor"
+    image_processor_class = "LlavaOnevisionImageProcessor"


I think we can remove these changes now this has been resolved for the onevision processor

amyeroberts · 2024-09-19T16:08:59Z

tests/test_processing_common.py

+    text_data_arg_name = "input_ids"
+    images_data_arg_name = "pixel_values"
+    videos_data_arg_name = "pixel_values_videos"


I agree looking through the tests it's a bit confusing (I also had to scroll up to find) but like that it makes it more easily configurable.

Perhaps we could think of a different name? "data_arg_name" isn't obvious what it is. Maybe e.g. model_input_text or text_input_name?

HuggingFaceDocBuilderDev · 2024-09-19T18:31:36Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…rTesterMixin

… rescale_factor

yonigozlan · 2024-09-20T03:31:52Z

Hey @molbap ! I made a few new modifications, mostly to fit Pixtral, but it still works for other processors. I also removed unnecessary tests for newly merged uniformized processors.
Main concern is about the scale_factor/ do_rescale part. Although it is a bit hacky, I do think it's the most generalizable test we can do on images kwargs.
Let me know if you think we can merge this :)

molbap

Hey @yonigozlan ! added a comment or two about a couple last things but LGTM!

molbap · 2024-09-20T07:40:29Z

tests/test_processing_common.py

+        processor_components["image_processor"] = self.get_component(
+            "image_processor", do_rescale=True, rescale_factor=-1
+        )


Yes, fair enough - my point was that the prepare_image_inputs is necessary and thus inseparable from this test. I think do_rescale is used in 62 models, and not used in 3 (out of 65 models with image_processing capabilities) namely

pix2struct

imagegpt

layoutlmv2

two of these see less usage now and we can handle their processing on the side. In conclusion, OK for me, even though it's strange to see it, maybe just a small docstring to explain why this test exists

molbap · 2024-09-20T07:46:35Z

tests/test_processing_common.py

+class MyProcessor(ProcessorMixin):
+    attributes = ["image_processor", "tokenizer"]
+    image_processor_class = "CLIPImageProcessor"
+    tokenizer_class = ("CLIPTokenizer", "CLIPTokenizerFast")
+    optional_call_args = ["optional_1", "optional_2"]
+
+    def __init__(self, image_processor=None, tokenizer=None, processor_attr_1=1, processor_attr_2=True):
+        super().__init__(image_processor, tokenizer)
+
+        self.processor_attr_1 = processor_attr_1
+        self.processor_attr_2 = processor_attr_2
+
+
+@require_tokenizers
+@require_vision
+class ProcessorTest(unittest.TestCase):
+    processor_class = MyProcessor
+
+    def prepare_processor_dict(self):
+        return {"processor_attr_1": 1, "processor_attr_2": False}
+
+    def get_processor(self):
+        image_processor = CLIPImageProcessor.from_pretrained("openai/clip-vit-large-patch14")
+        tokenizer = CLIPTokenizerFast.from_pretrained("openai/clip-vit-large-patch14")
+        processor = MyProcessor(image_processor, tokenizer, **self.prepare_processor_dict())
+
+        return processor
+
+    def test_processor_to_json_string(self):
+        processor = self.get_processor()
+        obj = json.loads(processor.to_json_string())
+        for key, value in self.prepare_processor_dict().items():
+            self.assertEqual(obj[key], value)
+            self.assertEqual(getattr(processor, key, None), value)
+
+    def test_processor_from_and_save_pretrained(self):
+        processor_first = self.get_processor()
+
+        with tempfile.TemporaryDirectory() as tmpdirname:
+            saved_file = processor_first.save_pretrained(tmpdirname)[0]
+            check_json_file_has_correct_format(saved_file)
+            processor_second = self.processor_class.from_pretrained(tmpdirname)
+
+        self.assertEqual(processor_second.to_dict(), processor_first.to_dict())
+
+    def test_prepare_and_validate_optional_call_args(self):
+        processor = self.get_processor()
+        # test all optional call args are given
+        optional_call_args = processor.prepare_and_validate_optional_call_args("optional_1", "optional_2")
+        self.assertEqual(optional_call_args, {"optional_1": "optional_1", "optional_2": "optional_2"})
+        # test only one optional call arg is given
+        optional_call_args = processor.prepare_and_validate_optional_call_args("optional_1")
+        self.assertEqual(optional_call_args, {"optional_1": "optional_1"})
+        # test no optional call arg is given
+        optional_call_args = processor.prepare_and_validate_optional_call_args()
+        self.assertEqual(optional_call_args, {})
+        # test too many optional call args are given
+        with self.assertRaises(ValueError):
+            processor.prepare_and_validate_optional_call_args("optional_1", "optional_2", "optional_3")


I realize this was removed in a previous PR, because it only applied to one model I suppose? Can we take the tests here and move them to the tester class above instead, wdyt?

You are right, it probably existed to test the test Mixin before any processor with the mixin wasmerged but I guess it is not useful anymore.
I moved test_prepare_and_validate_optional_call_args to the tester class above (the other tests were already there).

molbap · 2024-09-20T07:47:00Z

tests/test_processing_common.py

+    text_input_name = "input_ids"
+    images_input_name = "pixel_values"
+    videos_input_name = "pixel_values_videos"


much better!

molbap · 2024-09-20T07:47:31Z

tests/models/altclip/test_processor_altclip.py

thanks for cleaning these up!

… docstring

…ggingface#33479) * add check and prepare args for BC to ProcessorMixin, improve ProcessorTesterMixin * change size and crop_size in processor kwargs tests to do_rescale and rescale_factor * remove unnecessary llava processor kwargs test overwrite * nit * change data_arg_name to input_name * Remove unnecessary test override * Remove unnecessary tests Paligemma * Move test_prepare_and_validate_optional_call_args to TesterMixin, add docstring

yonigozlan mentioned this pull request Sep 13, 2024

Uniformize model processors (models *with* special arg names) #32841

Open

5 tasks

yonigozlan commented Sep 13, 2024

View reviewed changes

yonigozlan requested review from amyeroberts and zucchini-nlp September 13, 2024 22:23

yonigozlan force-pushed the add-optional-args-processors branch from df6ddaf to a27a837 Compare September 16, 2024 15:35

yonigozlan mentioned this pull request Sep 16, 2024

Uniformize kwargs for image-text-to-text processors #32544

Merged

11 tasks

yonigozlan marked this pull request as ready for review September 16, 2024 23:26

yonigozlan mentioned this pull request Sep 19, 2024

Uniformize kwargs for Paligemma processor and update docs #33571

Merged

5 tasks

molbap reviewed Sep 19, 2024

View reviewed changes

amyeroberts approved these changes Sep 19, 2024

View reviewed changes

yonigozlan force-pushed the add-optional-args-processors branch from 5b7329c to 6eb5bbb Compare September 19, 2024 18:00

yonigozlan added 7 commits September 20, 2024 02:57

add check and prepare args for BC to ProcessorMixin, improve Processo…

3412934

…rTesterMixin

change size and crop_size in processor kwargs tests to do_rescale and…

be73449

… rescale_factor

remove unnecessary llava processor kwargs test overwrite

52744e9

nit

70799e4

change data_arg_name to input_name

2b5eaa0

Remove unnecessary test override

e678a77

Remove unnecessary tests Paligemma

5a30a6e

yonigozlan force-pushed the add-optional-args-processors branch from 1739434 to 5a30a6e Compare September 20, 2024 03:00

molbap approved these changes Sep 20, 2024

View reviewed changes

Move test_prepare_and_validate_optional_call_args to TesterMixin, add…

6a9bab2

… docstring

yonigozlan merged commit c0c6815 into huggingface:main Sep 20, 2024
21 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for args to ProcessorMixin for backward compatibility #33479

Add support for args to ProcessorMixin for backward compatibility #33479

yonigozlan commented Sep 13, 2024 •

edited

Loading

yonigozlan Sep 13, 2024

zucchini-nlp Sep 16, 2024

molbap Sep 16, 2024 •

edited

Loading

yonigozlan Sep 13, 2024

zucchini-nlp Sep 16, 2024

amyeroberts Sep 19, 2024

yonigozlan Sep 13, 2024 •

edited

Loading

molbap Sep 19, 2024

yonigozlan Sep 19, 2024

molbap Sep 20, 2024

yonigozlan Sep 20, 2024

yonigozlan commented Sep 16, 2024

molbap left a comment

molbap Sep 19, 2024

molbap Sep 19, 2024

yonigozlan Sep 19, 2024

amyeroberts Sep 19, 2024

amyeroberts left a comment

amyeroberts Sep 17, 2024

yonigozlan Sep 19, 2024

amyeroberts Sep 19, 2024

amyeroberts Sep 19, 2024

HuggingFaceDocBuilderDev commented Sep 19, 2024

yonigozlan commented Sep 20, 2024

molbap left a comment

molbap Sep 20, 2024

molbap Sep 20, 2024

yonigozlan Sep 20, 2024

molbap Sep 20, 2024

molbap Sep 20, 2024

Add support for args to ProcessorMixin for backward compatibility #33479

Add support for args to ProcessorMixin for backward compatibility #33479

Conversation

yonigozlan commented Sep 13, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

molbap Sep 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonigozlan Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonigozlan commented Sep 16, 2024

molbap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Sep 19, 2024

yonigozlan commented Sep 20, 2024

molbap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonigozlan commented Sep 13, 2024 •

edited

Loading

molbap Sep 16, 2024 •

edited

Loading

yonigozlan Sep 13, 2024 •

edited

Loading