-
Notifications
You must be signed in to change notification settings - Fork 27.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adding positional encoder changes and tests #32600
Conversation
@amyeroberts I have included the interpolation of positional embeddings in all the following models, and their respective tests:
Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding - looks great!
Just a handful of small nits. Before merge, we'll need to run the slow tests for the models affected. Could you trigger this by running git commit --allow-empty -m "[run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip"
model(**inputs, interpolate_pos_encoding=False) | ||
# forward pass | ||
with torch.no_grad(): | ||
outputs = model(**inputs, interpolate_pos_encoding=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice :)
@unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method") | ||
def test_inputs_embeds_matches_input_ids_with_generate(self): | ||
pass | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this as this logic is independent of this PR
@unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method") | |
def test_inputs_embeds_matches_input_ids_with_generate(self): | |
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I remove this, I will get the following error from the CI pipeline:
FAILED tests/models/git/test_modeling_git.py::GitModelTest::test_inputs_embeds_matches_input_ids_with_generate - ValueError: You passed `inputs_embeds` to `.generate()`, but the model class GitForCausalLM doesn't have its forwarding implemented. See the GPT2 implementation for an example (https://github.com/huggingface/transformers/pull/21405), and feel free to open a PR with it!
as shown here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you rebase on main? I believe this has been resolved upstream
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should still be removed as this tests is unrelated to this PR
return_dict (`bool`, *optional*): | ||
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -512,6 +526,8 @@ def _init_weights(self, module): | |||
output_hidden_states (`bool`, *optional*): | |||
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |||
more detail. | |||
interpolate_pos_encoding (`bool`, *optional*): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interpolate_pos_encoding (`bool`, *optional*): | |
interpolate_pos_encoding (`bool`, *optional*, defaults to `False`): |
@@ -549,6 +565,8 @@ def _init_weights(self, module): | |||
output_hidden_states (`bool`, *optional*): | |||
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for | |||
more detail. | |||
interpolate_pos_encoding (`bool`, *optional*): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
interpolate_pos_encoding (`bool`, *optional*): | |
interpolate_pos_encoding (`bool`, *optional*, defaults to `False`): |
OK, I've:
Please don't merge yet, just need some time to check and potentially fix the tests. |
GIT model test still requires to be fixed. Getting this error:
due to Still on it. |
@manuelsh Have you included the most recent updates from |
@amyeroberts done, all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful - thanks for adding this capability to our models and for iterating on a solution!
@manuelsh Just the failing slow tests to address! |
@amyeroberts , I think it is not just substituting I believe I can make them work with
but now all my tests are crashing for different reasons (different tensors outputs for example) and this will take longer. Why not getting back to the previous working commit (d44e070), merge it, and then open another PR like #33226 but for the clip family models? I would be happy to contribute to it. |
@amyeroberts I was able to fix all tests with the new function, so no need to do an additional PR. Please review. |
@manuelsh OK, great. Just the merge conflict to resolve an a final slow run to confirm everything passes now. |
@amyeroberts I have resolved the conflicts, run the tests in slow, corrected one test in clipseg model not related to this PR that was not working |
@amyeroberts there were two tensors to correct in clipseg. Done now. If you can kindly approve the run for clipseg slow tests. |
@manuelsh Done and all passing - let's merge! Thanks for this addition and patience on iterating on this :) |
Fantastic! Glad to see it. Thank you! |
* adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding right pretrained model * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_image_segmentation * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding ruff formatting * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding new interpolate_pos_encoding function * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing interpolate_POS funciton * adapting output tensor in teests * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * modifying output tensor * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding the correct tensor * [run_slow] clipseg * fixing spaces * [run_slow] clipseg * [run_slow] clipseg --------- Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
batch_size, _, height, width = pixel_values.shape | ||
if not interpolate_pos_encoding and (height != self.image_size or width != self.image_size): | ||
raise ValueError( | ||
f"Input image size ({height}*{width}) doesn't match model" f" ({self.image_size}*{self.image_size})." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this tested? I'm getting user reports of error Input image size (352*352) doesn't match model (224*224).
regardless of what size of image is put in - I'm assuming somewhere in ClipSeg code it resizes to 352 - which makes sense considering docs seem to indicate that 352 is an expected value https://huggingface.co/docs/transformers/model_doc/clipseg#transformers.CLIPSegForImageSegmentation.forward.example while indeed also showing that image_size is 224 https://huggingface.co/docs/transformers/model_doc/clipseg#transformers.CLIPSegVisionConfig.image_size
ie: using ClipSeg exactly as documented will necessarily throw this error.
The original clipseg model built by huggingface staff defines image_size 224 in config https://huggingface.co/CIDAS/clipseg-rd64-refined/blob/main/config.json#L127
and the preprocessor size as 352 https://huggingface.co/CIDAS/clipseg-rd64-refined/blob/main/preprocessor_config.json#L20
So either this check is wrong, or official configs have been wrong for years, or there's meant to be some handling of the sizes between the two that's gone missing
EDIT: Posted issue #34415
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to add interpolate_pos_encoding=True
in the signature when calling the model as discussed in the issue #34415.
* adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding right pretrained model * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_image_segmentation * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding ruff formatting * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding new interpolate_pos_encoding function * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing interpolate_POS funciton * adapting output tensor in teests * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * modifying output tensor * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding the correct tensor * [run_slow] clipseg * fixing spaces * [run_slow] clipseg * [run_slow] clipseg --------- Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
* adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding positional encoder changes and tests * adding ruff suggestions * changes added by python utils/check_copies.py --fix_and_overwrite * removing pos_encoding added by script * adding interpolation to clipseg * formatting * adding further testing to altclip and better documentation to kosmos2 * skipping test_inputs_embeds_matches_input_ids_with_generate in git model * fixing clipseg comment suggestions * fixing bridgetower test * fixing altclip tensor output POS test * adding ruff formatting * fixing several tests * formatting with ruff * adding right pretrained model * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_image_segmentation * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding ruff formatting * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding new interpolate_pos_encoding function * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * fixing interpolate_POS funciton * adapting output tensor in teests * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * modifying output tensor * [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip * adding the correct tensor * [run_slow] clipseg * fixing spaces * [run_slow] clipseg * [run_slow] clipseg --------- Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
@amyeroberts as there were some conflicts with merging with main on #31900 (possibly due to the
make
scripts), I have reimplemented all the changes of #30783 in a new branch, which is rebased to main.