-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds uniform processing kwargs to paligemma. #32377
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @MnCSSJ4x ! Thanks for working on this! Left a few comments setting defaults values
Currently the tests are failing, so we have to fix them and make sure the paligemma_processing_tests contain the ProcessorTesterMixin
. That is the way to ensure we're actually testing the new kwargs format.
Also, I want to ask some paligemma-specific tests for processors as it seems that Paligemma has several model-specific kwargs
cc @molbap if you have time to take a look
Oh, can you remove the "Fixes #" from body. If not merging this PR will close the linked issue :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! left a couple comments
@zucchini-nlp I changed the code to incorporate the new parameters. Let me know if I am going in the right direction. I haven't started writing the tests as I was occupied these few days. Will add that soon. Let me know if there is any reference which can help me understand the test or to take ideas from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for re-iterating on this! For the tests you can take a look at this one: https://github.com/huggingface/transformers/blob/main/tests/models/align/test_processor_align.py. If paligemma doesn't have any processing test, you have to create a new file. Otherwise adding ProcessorTesterMixin
will enable new tests for Paligemma. The general info about tests is here :)
Additionally, can you add paligemma specific tests, which will test model-specific kwargs like suffix
etc. When the CI turns greenm feel free to tag me and @molbap for review
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Hey @yonigozlan I wrote some tests however got failure in tokenization and others where I checked the circle CI log and noticed gated repo issue. Is there a way to fix it or mock it up? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on the tests! Left a few comments on the processor refactor also.
For the tests, as I indicated below, the base tests for ProcessorTesterMixin
were recently changed. You can do this to have the newest changes on your branch:
git fetch upstream
git rebase upstream/main
You'll see that you may not have to override some of the tests at all :).
text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None, | ||
images: ImageInput = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two inputs should be reversed and support for backward compatibility should be added. This should be similar to what is needed for Fuyu:
transformers/src/transformers/models/fuyu/processing_fuyu.py
Lines 522 to 532 in aa3bc0b
if ( | |
text is not None | |
and not isinstance(text[0], str) | |
or images is not None | |
and (isinstance(images, str) or (isinstance(images, (list, tuple)) and isinstance(images[0], str))) | |
): | |
warnings.warn( | |
"It looks like you are passing the inputs in the wrong order. You should pass the images input first and the text input second." | |
"Images and text inputs will be swapped." | |
) | |
images, text = text, images |
do_align_long_axis: bool = None, | ||
do_rescale: bool = None, | ||
suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None, | ||
video=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
audio=None
Is still needed here for API consistency, even if this model doesn't support the audio modality.
video=None, | |
audio = None, | |
video=None, |
tokenizer_init_kwargs=self.tokenizer.init_kwargs, | ||
**kwargs, | ||
) | ||
suffix = output_kwargs["text_kwargs"]["suffix"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If suffix is not specified as a kwargs, this will cause an error. Better to use:
suffix = output_kwargs["text_kwargs"].pop("suffix", None)
image_kwargs: PaliGemmaImagesKwargs | ||
_defaults = { | ||
"text_kwargs": { | ||
"tokenize_newline_separately": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like tokenize_newline_separately
is not use anywhere, and it is not a default text_kwargs
, so it might be best to remove it entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it
**output_kwargs["text_kwargs"], | ||
return_token_type_ids=return_token_type_ids, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**output_kwargs["text_kwargs"], | |
return_token_type_ids=return_token_type_ids, | |
return_token_type_ids=return_token_type_ids, | |
**output_kwargs["text_kwargs"], |
|
||
def setUp(self): | ||
self.tmpdirname = tempfile.mkdtemp() | ||
processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will indeed cause a gated repo issue. you could rebuild a processor without using this repo, something like
processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224") | |
image_processor = SiglipImageProcessor.from_pretrained("google/siglip-so400m-patch14-384") | |
tokenizer = GemmaTokenizer(SAMPLE_VOCAB, keep_accents=True) | |
processor = PaliGemmaProcessor(image_processor=image_processor, tokenizer=tokenizer) |
Where
SAMPLE_VOCAB = get_tests_dir("fixtures/test_sentencepiece.model")
as it is done for test_tokenization_gemma.py
.
Not sure if that's the nicest way to fix this though, any idea @zucchini-nlp @molbap ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI token can be updated so that it can read this repo: in test_modeling.py
there is
self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")
so that should not be an issue already - any idea @ydshieh here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, what is the issue here? This repo seem to be public no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, are we talking about
google/paligemma-3b-pt-224
or
google/siglip-so400m-patch14-384
but both are accessible even if I am using a firefox private window
self.skipTest(f"image_processor attribute not present in {self.processor_class}") | ||
image_processor = self.get_component( | ||
"image_processor", | ||
crop_size={"shortest_edge": 234, "longest_edge": 234}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may not be needed anymore as the base tests were changed recently, same for other tests. Please fetch and rebase on upstream main :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Let's see with the upstream rebase, I think we can reduce loc count by a fair chunk 🤗
do_align_long_axis: bool = None, | ||
do_rescale: bool = None, | ||
suffix: Optional[Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]]] = None, | ||
video=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can also advertise None
audio
kwarg here!
|
||
def setUp(self): | ||
self.tmpdirname = tempfile.mkdtemp() | ||
processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CI token can be updated so that it can read this repo: in test_modeling.py
there is
self.processor = PaliGemmaProcessor.from_pretrained("google/paligemma-3b-pt-224")
so that should not be an issue already - any idea @ydshieh here?
def prepare_image_inputs(self): | ||
"""This function prepares a list of PIL images, or a list of numpy arrays if one specifies numpify=True, | ||
or a list of PyTorch tensors if one specifies torchify=True. | ||
""" | ||
|
||
image_inputs = [np.random.randint(255, size=(3, 30, 400), dtype=np.uint8)] | ||
|
||
image_inputs = [Image.fromarray(np.moveaxis(x, 0, -1)) for x in image_inputs] | ||
|
||
return image_inputs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm noticing more of this function across the repo and it's identical in 17 places, I think we can move it to processing_utils.py
at some point and save some loc, same remark for above helper functions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this one works, I usually use it for image processor tests? from tests.test_image_processing_common import prepare_image_inputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! Let's move. I also have my own personal agenda to remove the "numpify" and "torchify" arguments which are confusing, clash and inconsistent so would be a good opportunity for that
image_kwargs: PaliGemmaImagesKwargs | ||
_defaults = { | ||
"text_kwargs": { | ||
"tokenize_newline_separately": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not used anymore and is not needed - iiuc do_thumbnail, do_align_long_axis and do_rescale neither (FYI, they are not used here)
+1 for removing it
What does this PR do?
Adds Uniform Processing kwargs for paligemma model.
Partially Fixes Issue 31911
Before submitting
Pull Request section?
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
Can Review - @zucchini-nlp