Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds uniform processing kwargs to paligemma. #32377

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

MnCSSJ4x
Copy link

@MnCSSJ4x MnCSSJ4x commented Aug 1, 2024

What does this PR do?

Adds Uniform Processing kwargs for paligemma model.

Partially Fixes Issue 31911

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Can Review - @zucchini-nlp

@MnCSSJ4x MnCSSJ4x changed the title Adds uniform processing to paligemma. Adds uniform processing kwargs to paligemma. Aug 1, 2024
@MnCSSJ4x MnCSSJ4x mentioned this pull request Aug 1, 2024
40 tasks
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @MnCSSJ4x ! Thanks for working on this! Left a few comments setting defaults values

Currently the tests are failing, so we have to fix them and make sure the paligemma_processing_tests contain the ProcessorTesterMixin. That is the way to ensure we're actually testing the new kwargs format.

Also, I want to ask some paligemma-specific tests for processors as it seems that Paligemma has several model-specific kwargs

cc @molbap if you have time to take a look

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved
src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved
src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved
@zucchini-nlp
Copy link
Member

Oh, can you remove the "Fixes #" from body. If not merging this PR will close the linked issue :)

@molbap molbap self-requested a review August 2, 2024 12:15
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! left a couple comments

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved
src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved
@MnCSSJ4x
Copy link
Author

MnCSSJ4x commented Aug 5, 2024

@zucchini-nlp I changed the code to incorporate the new parameters. Let me know if I am going in the right direction. I haven't started writing the tests as I was occupied these few days. Will add that soon. Let me know if there is any reference which can help me understand the test or to take ideas from.

@molbap molbap self-requested a review August 5, 2024 11:35
Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for re-iterating on this! For the tests you can take a look at this one: https://github.com/huggingface/transformers/blob/main/tests/models/align/test_processor_align.py. If paligemma doesn't have any processing test, you have to create a new file. Otherwise adding ProcessorTesterMixin will enable new tests for Paligemma. The general info about tests is here :)

Additionally, can you add paligemma specific tests, which will test model-specific kwargs like suffix etc. When the CI turns greenm feel free to tag me and @molbap for review

MnCSSJ4x and others added 5 commits August 14, 2024 20:45
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
@MnCSSJ4x
Copy link
Author

Hey @yonigozlan I wrote some tests however got failure in tokenization and others where I checked the circle CI log and noticed gated repo issue. Is there a way to fix it or mock it up?

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on the tests! Left a few comments on the processor refactor also.
For the tests, as I indicated below, the base tests for ProcessorTesterMixin were recently changed. You can do this to have the newest changes on your branch:
git fetch upstream
git rebase upstream/main
You'll see that you may not have to override some of the tests at all :).

Comment on lines 153 to 154
text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
images: ImageInput = None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two inputs should be reversed and support for backward compatibility should be added. This should be similar to what is needed for Fuyu:

if (
text is not None
and not isinstance(text[0], str)
or images is not None
and (isinstance(images, str) or (isinstance(images, (list, tuple)) and isinstance(images[0], str)))
):
warnings.warn(
"It looks like you are passing the inputs in the wrong order. You should pass the images input first and the text input second."
"Images and text inputs will be swapped."
)
images, text = text, images

do_align_long_axis: bool = None,
do_rescale: bool = None,
suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
video=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

audio=None Is still needed here for API consistency, even if this model doesn't support the audio modality.

Suggested change
video=None,
audio = None,
video=None,

tokenizer_init_kwargs=self.tokenizer.init_kwargs,
**kwargs,
)
suffix = output_kwargs["text_kwargs"]["suffix"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If suffix is not specified as a kwargs, this will cause an error. Better to use:
suffix = output_kwargs["text_kwargs"].pop("suffix", None)

image_kwargs: PaliGemmaImagesKwargs
_defaults = {
"text_kwargs": {
"tokenize_newline_separately": True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like tokenize_newline_separately is not use anywhere, and it is not a default text_kwargs, so it might be best to remove it entirely?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it

Comment on lines +279 to 280
**output_kwargs["text_kwargs"],
return_token_type_ids=return_token_type_ids,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**output_kwargs["text_kwargs"],
return_token_type_ids=return_token_type_ids,
return_token_type_ids=return_token_type_ids,
**output_kwargs["text_kwargs"],


def setUp(self):
self.tmpdirname = tempfile.mkdtemp()
processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will indeed cause a gated repo issue. you could rebuild a processor without using this repo, something like

Suggested change
processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")
image_processor = SiglipImageProcessor.from_pretrained("google/siglip-so400m-patch14-384")
tokenizer = GemmaTokenizer(SAMPLE_VOCAB, keep_accents=True)
processor = PaliGemmaProcessor(image_processor=image_processor, tokenizer=tokenizer)

Where
SAMPLE_VOCAB = get_tests_dir("fixtures/test_sentencepiece.model")
as it is done for test_tokenization_gemma.py.

Not sure if that's the nicest way to fix this though, any idea @zucchini-nlp @molbap ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI token can be updated so that it can read this repo: in test_modeling.py there is

        self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")

so that should not be an issue already - any idea @ydshieh here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, what is the issue here? This repo seem to be public no?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
I suppose it is, with a license to accept?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, are we talking about

google/paligemma-3b-pt-224

or

google/siglip-so400m-patch14-384

but both are accessible even if I am using a firefox private window

self.skipTest(f"image_processor attribute not present in {self.processor_class}")
image_processor = self.get_component(
"image_processor",
crop_size={"shortest_edge": 234, "longest_edge": 234},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may not be needed anymore as the base tests were changed recently, same for other tests. Please fetch and rebase on upstream main :)

Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Let's see with the upstream rebase, I think we can reduce loc count by a fair chunk 🤗

do_align_long_axis: bool = None,
do_rescale: bool = None,
suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None,
video=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can also advertise None audio kwarg here!

src/transformers/models/paligemma/processing_paligemma.py Outdated Show resolved Hide resolved

def setUp(self):
self.tmpdirname = tempfile.mkdtemp()
processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI token can be updated so that it can read this repo: in test_modeling.py there is

        self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")

so that should not be an issue already - any idea @ydshieh here?

Comment on lines +54 to +63
def prepare_image_inputs(self):
"""This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True,
or a list of PyTorch tensors if one specifies torchify=True.
"""

image_inputs = [np.random.randint(255, size=(3, 30, 400), dtype=np.uint8)]

image_inputs = [Image.fromarray(np.moveaxis(x, 0, -1)) for x in image_inputs]

return image_inputs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm noticing more of this function across the repo and it's identical in 17 places, I think we can move it to processing_utils.py at some point and save some loc, same remark for above helper functions!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this one works, I usually use it for image processor tests? from tests.test_image_processing_common import prepare_image_inputs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Let's move. I also have my own personal agenda to remove the "numpify" and "torchify" arguments which are confusing, clash and inconsistent so would be a good opportunity for that

image_kwargs: PaliGemmaImagesKwargs
_defaults = {
"text_kwargs": {
"tokenize_newline_separately": True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants