Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adding positional encoder changes and tests #32600

Merged
merged 54 commits into from
Sep 25, 2024

Conversation

manuelsh
Copy link
Contributor

@amyeroberts as there were some conflicts with merging with main on #31900 (possibly due to the make scripts), I have reimplemented all the changes of #30783 in a new branch, which is rebased to main.

@manuelsh manuelsh marked this pull request as ready for review August 11, 2024 23:53
@manuelsh
Copy link
Contributor Author

manuelsh commented Aug 12, 2024

@amyeroberts I have included the interpolation of positional embeddings in all the following models, and their respective tests:

  1. altclip
  2. bridgetower
  3. chineseclip
  4. clip
  5. clipseg
  6. kosmos_2
  7. x_clip
  8. git

Thanks!

This was referenced Aug 13, 2024
Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding - looks great!

Just a handful of small nits. Before merge, we'll need to run the slow tests for the models affected. Could you trigger this by running git commit --allow-empty -m "[run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip"

model(**inputs, interpolate_pos_encoding=False)
# forward pass
with torch.no_grad():
outputs = model(**inputs, interpolate_pos_encoding=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice :)

Comment on lines 474 to 477
@unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method")
def test_inputs_embeds_matches_input_ids_with_generate(self):
pass

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this as this logic is independent of this PR

Suggested change
@unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method")
def test_inputs_embeds_matches_input_ids_with_generate(self):
pass

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove this, I will get the following error from the CI pipeline:

FAILED tests/models/git/test_modeling_git.py::GitModelTest::test_inputs_embeds_matches_input_ids_with_generate - ValueError: You passed `inputs_embeds` to `.generate()`, but the model class GitForCausalLM doesn't have its forwarding implemented. See the GPT2 implementation for an example (https://github.com/huggingface/transformers/pull/21405), and feel free to open a PR with it!

as shown here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rebase on main? I believe this has been resolved upstream

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be removed as this tests is unrelated to this PR

return_dict (`bool`, *optional*):
Whether or not to return a [`~utils.ModelOutput`] instead of a plain tuple.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@@ -512,6 +526,8 @@ def _init_weights(self, module):
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
interpolate_pos_encoding (`bool`, *optional*):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
interpolate_pos_encoding (`bool`, *optional*):
interpolate_pos_encoding (`bool`, *optional*, defaults to `False`):

@@ -549,6 +565,8 @@ def _init_weights(self, module):
output_hidden_states (`bool`, *optional*):
Whether or not to return the hidden states of all layers. See `hidden_states` under returned tensors for
more detail.
interpolate_pos_encoding (`bool`, *optional*):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
interpolate_pos_encoding (`bool`, *optional*):
interpolate_pos_encoding (`bool`, *optional*, defaults to `False`):

@manuelsh
Copy link
Contributor Author

manuelsh commented Aug 15, 2024

OK, I've:

  • added your three suggestions (thanks!)
  • I haven't removed the @unittest.skip(reason="GitForCausalLM does not support inputs_embeds in generate method")... lines, as per my comment, please let me know
  • I have run the slow tests with the git command sent. However, I don't think it is running the right slow tests as I just detected some errors on them which I am fixing now (for example in the "bridgetower" one)

Please don't merge yet, just need some time to check and potentially fix the tests.

@manuelsh
Copy link
Contributor Author

GIT model test still requires to be fixed. Getting this error:

tests.models.git.test_modeling_git.GitModelIntegrationTest.test_inference_interpolate_pos_encoding failed with error: <class 'IndexError'> index out of range in self
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/local/lib/python3.10/unittest/case.py", line 591, in run
    self._callTestMethod(testMethod)
  File "/usr/local/lib/python3.10/unittest/case.py", line 549, in _callTestMethod
    method()
  File "/usr/src/app/transformers/tests/models/git/test_modeling_git.py", line 588, in test_inference_interpolate_pos_encoding
    outputs = model(**inputs, interpolate_pos_encoding=True)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/src/app/transformers/src/transformers/models/git/modeling_git.py", line 1302, in forward
    embedding_output = self.embeddings(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/src/app/transformers/src/transformers/models/git/modeling_git.py", line 115, in forward
    embeddings = self.word_embeddings(input_ids)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 164, in forward
    return F.embedding(
  File "/usr/local/lib/python3.10/site-packages/torch/nn/functional.py", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

due to input_ids having values out of range (tensor([[49406, 768, 568, 530, 518, 2867, 49407]], dtype=torch.int32)). In concrete 49406 and 49407 are not accepted. Not sure why the processor is adding them.

Still on it.

@amyeroberts
Copy link
Collaborator

@manuelsh Have you included the most recent updates from main?

@manuelsh
Copy link
Contributor Author

@amyeroberts done, all interpolate_pos_encoding functions updated.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful - thanks for adding this capability to our models and for iterating on a solution!

@amyeroberts
Copy link
Collaborator

@manuelsh Just the failing slow tests to address!

@manuelsh
Copy link
Contributor Author

manuelsh commented Sep 16, 2024

@amyeroberts , I think it is not just substituting interpolate_pos_encoding function, but one needs to adapt it, as the position_embeddings tensor from #33226 is different from the position_embedding object in the code of this PR (note the s).

I believe I can make them work with

self.position_embeddings = self.position_embedding.weight.unsqueeze(0)

but now all my tests are crashing for different reasons (different tensors outputs for example) and this will take longer.

Why not getting back to the previous working commit (d44e070), merge it, and then open another PR like #33226 but for the clip family models?

I would be happy to contribute to it.

@manuelsh
Copy link
Contributor Author

@amyeroberts I was able to fix all tests with the new function, so no need to do an additional PR. Please review.

@amyeroberts
Copy link
Collaborator

@manuelsh OK, great. Just the merge conflict to resolve an a final slow run to confirm everything passes now.

@manuelsh
Copy link
Contributor Author

@amyeroberts I have resolved the conflicts, run the tests in slow, corrected one test in clipseg model not related to this PR that was not working test_inference_image_segmentation, and requested another slow run. Hopefully that's it!

@manuelsh
Copy link
Contributor Author

@amyeroberts there were two tensors to correct in clipseg. Done now. If you can kindly approve the run for clipseg slow tests.

@amyeroberts
Copy link
Collaborator

@manuelsh Done and all passing - let's merge! Thanks for this addition and patience on iterating on this :)

@amyeroberts amyeroberts merged commit a55adee into huggingface:main Sep 25, 2024
18 checks passed
@manuelsh
Copy link
Contributor Author

Fantastic! Glad to see it. Thank you!

amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Oct 2, 2024
* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding right pretrained model

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_image_segmentation

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding ruff formatting

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding new interpolate_pos_encoding function

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing interpolate_POS funciton

* adapting output tensor in teests

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* modifying output tensor

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding the correct tensor

* [run_slow]  clipseg

* fixing spaces

* [run_slow]  clipseg

* [run_slow]  clipseg

---------

Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
batch_size, _, height, width = pixel_values.shape
if not interpolate_pos_encoding and (height != self.image_size or width != self.image_size):
raise ValueError(
f"Input image size ({height}*{width}) doesn't match model" f" ({self.image_size}*{self.image_size})."
Copy link

@mcmonkey4eva mcmonkey4eva Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this tested? I'm getting user reports of error Input image size (352*352) doesn't match model (224*224). regardless of what size of image is put in - I'm assuming somewhere in ClipSeg code it resizes to 352 - which makes sense considering docs seem to indicate that 352 is an expected value https://huggingface.co/docs/transformers/model_doc/clipseg#transformers.CLIPSegForImageSegmentation.forward.example while indeed also showing that image_size is 224 https://huggingface.co/docs/transformers/model_doc/clipseg#transformers.CLIPSegVisionConfig.image_size

ie: using ClipSeg exactly as documented will necessarily throw this error.

The original clipseg model built by huggingface staff defines image_size 224 in config https://huggingface.co/CIDAS/clipseg-rd64-refined/blob/main/config.json#L127
and the preprocessor size as 352 https://huggingface.co/CIDAS/clipseg-rd64-refined/blob/main/preprocessor_config.json#L20

So either this check is wrong, or official configs have been wrong for years, or there's meant to be some handling of the sizes between the two that's gone missing

EDIT: Posted issue #34415

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add interpolate_pos_encoding=True in the signature when calling the model as discussed in the issue #34415.

@neggles neggles mentioned this pull request Oct 27, 2024
4 tasks
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding right pretrained model

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_image_segmentation

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding ruff formatting

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding new interpolate_pos_encoding function

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing interpolate_POS funciton

* adapting output tensor in teests

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* modifying output tensor

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding the correct tensor

* [run_slow]  clipseg

* fixing spaces

* [run_slow]  clipseg

* [run_slow]  clipseg

---------

Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding positional encoder changes and tests

* adding ruff suggestions

* changes added by python utils/check_copies.py --fix_and_overwrite

* removing pos_encoding added by script

* adding interpolation to clipseg

* formatting

* adding further testing to altclip and better documentation to kosmos2

* skipping test_inputs_embeds_matches_input_ids_with_generate in git model

* fixing clipseg comment suggestions

* fixing bridgetower test

* fixing altclip tensor output POS test

* adding ruff formatting

* fixing several tests

* formatting with ruff

* adding right pretrained model

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_image_segmentation

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing test_inference_interpolate_pos_encoding for the git model as there is no vision_model_output

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding ruff formatting

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding new interpolate_pos_encoding function

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* fixing interpolate_POS funciton

* adapting output tensor in teests

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* modifying output tensor

* [run_slow] altclip, bridgetower, chinese_clip, clip, clipseg, git, kosmos2, x_clip

* adding the correct tensor

* [run_slow]  clipseg

* fixing spaces

* [run_slow]  clipseg

* [run_slow]  clipseg

---------

Co-authored-by: Manuel Sanchez Hernandez <manuel.sanchez.hernandez@schibsted.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants