Adds uniform processing kwargs to paligemma. #32377

MnCSSJ4x · 2024-08-01T13:55:31Z

What does this PR do?

Adds Uniform Processing kwargs for paligemma model.

Partially Fixes Issue 31911

Before submitting

Did you read the contributor guideline,
Pull Request section?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Can Review - @zucchini-nlp

zucchini-nlp

Hey @MnCSSJ4x ! Thanks for working on this! Left a few comments setting defaults values

Currently the tests are failing, so we have to fix them and make sure the paligemma_processing_tests contain the ProcessorTesterMixin. That is the way to ensure we're actually testing the new kwargs format.

Also, I want to ask some paligemma-specific tests for processors as it seems that Paligemma has several model-specific kwargs

cc @molbap if you have time to take a look

src/transformers/models/paligemma/processing_paligemma.py

zucchini-nlp · 2024-08-02T05:30:59Z

Oh, can you remove the "Fixes #" from body. If not merging this PR will close the linked issue :)

molbap

Thanks for working on this! left a couple comments

src/transformers/models/paligemma/processing_paligemma.py

MnCSSJ4x · 2024-08-05T10:18:18Z

@zucchini-nlp I changed the code to incorporate the new parameters. Let me know if I am going in the right direction. I haven't started writing the tests as I was occupied these few days. Will add that soon. Let me know if there is any reference which can help me understand the test or to take ideas from.

zucchini-nlp

Great, thanks for re-iterating on this! For the tests you can take a look at this one: https://github.com/huggingface/transformers/blob/main/tests/models/align/test_processor_align.py. If paligemma doesn't have any processing test, you have to create a new file. Otherwise adding ProcessorTesterMixin will enable new tests for Paligemma. The general info about tests is here :)

Additionally, can you add paligemma specific tests, which will test model-specific kwargs like suffix etc. When the CI turns greenm feel free to tag me and @molbap for review

src/transformers/models/paligemma/processing_paligemma.py

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

MnCSSJ4x · 2024-08-14T15:49:30Z

Hey @yonigozlan I wrote some tests however got failure in tokenization and others where I checked the circle CI log and noticed gated repo issue. Is there a way to fix it or mock it up?

yonigozlan

Thanks for working on the tests! Left a few comments on the processor refactor also.
For the tests, as I indicated below, the base tests for ProcessorTesterMixin were recently changed. You can do this to have the newest changes on your branch:
git fetch upstream
git rebase upstream/main
You'll see that you may not have to override some of the tests at all :).

yonigozlan · 2024-08-14T16:15:30Z

src/transformers/models/paligemma/processing_paligemma.py

        text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
        images: ImageInput = None,


These two inputs should be reversed and support for backward compatibility should be added. This should be similar to what is needed for Fuyu:

transformers/src/transformers/models/fuyu/processing_fuyu.py

Lines 522 to 532 in aa3bc0b

if (

text is not None

and not isinstance(text[0], str)

or images is not None

and (isinstance(images, str) or (isinstance(images, (list, tuple)) and isinstance(images[0], str)))

):

warnings.warn(

"It looks like you are passing the inputs in the wrong order. You should pass the images input first and the text input second."

"Images and text inputs will be swapped."

)

images, text = text, images

yonigozlan · 2024-08-14T16:18:38Z

src/transformers/models/paligemma/processing_paligemma.py

-        do_align_long_axis: bool = None,
-        do_rescale: bool = None,
-        suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
+        video=None,


audio=None Is still needed here for API consistency, even if this model doesn't support the audio modality.

Suggested change

video=None,

audio = None,

video=None,

yonigozlan · 2024-08-14T16:22:51Z

src/transformers/models/paligemma/processing_paligemma.py

+            tokenizer_init_kwargs=self.tokenizer.init_kwargs,
+            **kwargs,
+        )
+        suffix = output_kwargs["text_kwargs"]["suffix"]


If suffix is not specified as a kwargs, this will cause an error. Better to use:
suffix = output_kwargs["text_kwargs"].pop("suffix", None)

yonigozlan · 2024-08-14T16:24:58Z

src/transformers/models/paligemma/processing_paligemma.py

+    image_kwargs: PaliGemmaImagesKwargs
+    _defaults = {
+        "text_kwargs": {
+            "tokenize_newline_separately": True,


Looks like tokenize_newline_separately is not use anywhere, and it is not a default text_kwargs, so it might be best to remove it entirely?

Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it

yonigozlan · 2024-08-14T16:36:16Z

src/transformers/models/paligemma/processing_paligemma.py

+            **output_kwargs["text_kwargs"],
            return_token_type_ids=return_token_type_ids,


Suggested change

**output_kwargs["text_kwargs"],

return_token_type_ids=return_token_type_ids,

return_token_type_ids=return_token_type_ids,

**output_kwargs["text_kwargs"],

yonigozlan · 2024-08-14T16:44:30Z

tests/models/paligemma/test_processing_paligemma.py

+
+    def setUp(self):
+        self.tmpdirname = tempfile.mkdtemp()
+        processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")


This will indeed cause a gated repo issue. you could rebuild a processor without using this repo, something like

Suggested change

processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")

image_processor = SiglipImageProcessor.from_pretrained("google/siglip-so400m-patch14-384")

tokenizer = GemmaTokenizer(SAMPLE_VOCAB, keep_accents=True)

processor = PaliGemmaProcessor(image_processor=image_processor, tokenizer=tokenizer)

Where
SAMPLE_VOCAB = get_tests_dir("fixtures/test_sentencepiece.model")
as it is done for test_tokenization_gemma.py.

Not sure if that's the nicest way to fix this though, any idea @zucchini-nlp @molbap ?

The CI token can be updated so that it can read this repo: in test_modeling.py there is

self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")

so that should not be an issue already - any idea @ydshieh here?

Sorry, what is the issue here? This repo seem to be public no?

I suppose it is, with a license to accept?

wait, are we talking about

google/paligemma-3b-pt-224

or

google/siglip-so400m-patch14-384

but both are accessible even if I am using a firefox private window

yonigozlan · 2024-08-14T16:45:55Z

tests/models/paligemma/test_processing_paligemma.py

+            self.skipTest(f"image_processor attribute not present in {self.processor_class}")
+        image_processor = self.get_component(
+            "image_processor",
+            crop_size={"shortest_edge": 234, "longest_edge": 234},


This may not be needed anymore as the base tests were changed recently, same for other tests. Please fetch and rebase on upstream main :)

molbap

Nice work! Let's see with the upstream rebase, I think we can reduce loc count by a fair chunk 🤗

molbap · 2024-08-12T12:51:02Z

src/transformers/models/paligemma/processing_paligemma.py

-        do_align_long_axis: bool = None,
-        do_rescale: bool = None,
-        suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
+        video=None,


we can also advertise None audio kwarg here!

src/transformers/models/paligemma/processing_paligemma.py

molbap · 2024-08-14T17:44:18Z

tests/models/paligemma/test_processing_paligemma.py

+
+    def setUp(self):
+        self.tmpdirname = tempfile.mkdtemp()
+        processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")


The CI token can be updated so that it can read this repo: in test_modeling.py there is

self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")

so that should not be an issue already - any idea @ydshieh here?

molbap · 2024-08-14T17:46:46Z

tests/models/paligemma/test_processing_paligemma.py

+    def prepare_image_inputs(self):
+        """This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True,
+        or a list of PyTorch tensors if one specifies torchify=True.
+        """
+
+        image_inputs = [np.random.randint(255, size=(3, 30, 400), dtype=np.uint8)]
+
+        image_inputs = [Image.fromarray(np.moveaxis(x, 0, -1)) for x in image_inputs]
+
+        return image_inputs


I'm noticing more of this function across the repo and it's identical in 17 places, I think we can move it to processing_utils.py at some point and save some loc, same remark for above helper functions!

Maybe this one works, I usually use it for image processor tests? from tests.test_image_processing_common import prepare_image_inputs

Yep! Let's move. I also have my own personal agenda to remove the "numpify" and "torchify" arguments which are confusing, clash and inconsistent so would be a good opportunity for that

molbap · 2024-08-14T17:54:00Z

src/transformers/models/paligemma/processing_paligemma.py

+    image_kwargs: PaliGemmaImagesKwargs
+    _defaults = {
+        "text_kwargs": {
+            "tokenize_newline_separately": True,


Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it

Adds uniform processing to paligemma.

bfe6445

MnCSSJ4x changed the title ~~Adds uniform processing to paligemma.~~ Adds uniform processing kwargs to paligemma. Aug 1, 2024

MnCSSJ4x mentioned this pull request Aug 1, 2024

Uniform kwargs for processors #31911

Open

40 tasks

zucchini-nlp reviewed Aug 2, 2024

View reviewed changes

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved

molbap self-requested a review August 2, 2024 12:15

Added specific args and updated call.

ea95baf

molbap reviewed Aug 5, 2024

View reviewed changes

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved

Removed none defaults.

3c89e17

molbap self-requested a review August 5, 2024 11:35

zucchini-nlp reviewed Aug 5, 2024

View reviewed changes

yonigozlan reviewed Aug 9, 2024

View reviewed changes

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved

yonigozlan reviewed Aug 9, 2024

View reviewed changes

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved

yonigozlan reviewed Aug 9, 2024

View reviewed changes

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved

MnCSSJ4x and others added 5 commits August 14, 2024 20:45

Update src/transformers/models/paligemma/processing_paligemma.py

d62f00b

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Update src/transformers/models/paligemma/processing_paligemma.py

d5a5eed

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Update src/transformers/models/paligemma/processing_paligemma.py

b51e05f

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

Add test for processing_paligemma.py

580ef32

Update style using make style.

77baadc

yonigozlan reviewed Aug 14, 2024

View reviewed changes

molbap reviewed Aug 14, 2024

View reviewed changes

yonigozlan mentioned this pull request Sep 18, 2024

Uniformize kwargs for Paligemma processor and update docs #33571

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds uniform processing kwargs to paligemma. #32377

Adds uniform processing kwargs to paligemma. #32377

MnCSSJ4x commented Aug 1, 2024 •

edited

Loading

zucchini-nlp left a comment

zucchini-nlp commented Aug 2, 2024

molbap left a comment

MnCSSJ4x commented Aug 5, 2024

zucchini-nlp left a comment

MnCSSJ4x commented Aug 14, 2024

yonigozlan left a comment

yonigozlan Aug 14, 2024

yonigozlan Aug 14, 2024

yonigozlan Aug 14, 2024

yonigozlan Aug 14, 2024

molbap Aug 14, 2024

yonigozlan Aug 14, 2024

yonigozlan Aug 14, 2024

molbap Aug 14, 2024

ydshieh Aug 14, 2024

molbap Aug 14, 2024

ydshieh Aug 14, 2024

yonigozlan Aug 14, 2024

molbap left a comment

molbap Aug 12, 2024

molbap Aug 14, 2024

molbap Aug 14, 2024

zucchini-nlp Aug 15, 2024

amyeroberts Aug 15, 2024

molbap Aug 14, 2024

		text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
		images: ImageInput = None,

	if (
	text is not None
	and not isinstance(text[0], str)
	or images is not None
	and (isinstance(images, str) or (isinstance(images, (list, tuple)) and isinstance(images[0], str)))
	):
	warnings.warn(
	"It looks like you are passing the inputs in the wrong order. You should pass the images input first and the text input second."
	"Images and text inputs will be swapped."
	)
	images, text = text, images

		**output_kwargs["text_kwargs"],
		return_token_type_ids=return_token_type_ids,

-        processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")
+        image_processor = SiglipImageProcessor.from_pretrained("google/siglip-so400m-patch14-384")
+        tokenizer = GemmaTokenizer(SAMPLE_VOCAB, keep_accents=True)
+        processor = PaliGemmaProcessor(image_processor=image_processor, tokenizer=tokenizer)

Adds uniform processing kwargs to paligemma. #32377

Are you sure you want to change the base?

Adds uniform processing kwargs to paligemma. #32377

Conversation

MnCSSJ4x commented Aug 1, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

zucchini-nlp left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Aug 2, 2024

molbap left a comment

Choose a reason for hiding this comment

MnCSSJ4x commented Aug 5, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

MnCSSJ4x commented Aug 14, 2024

yonigozlan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

molbap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MnCSSJ4x commented Aug 1, 2024 •

edited

Loading