-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix _merge_input_ids_with_image_features
for llava model
#28333
Fix _merge_input_ids_with_image_features
for llava model
#28333
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the support for training 😉
Let's add a test as well
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Adressed the comments and moved the dummy test case into proper tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thanks, let's also make sure that loss.backward() works (we usually have an automatic test for this here
transformers/tests/test_modeling_common.py
Line 646 in 90224dd
def test_training(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! I left a single comment about the test
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @VictorSanh ! Thanks a lot for the fix!
Good to go ! Feel free to merge @VictorSanh 🤗 |
I am not cool enough to have merge access. The time where i am merging stuff whenever i wanted on hf transformers is well passed haha |
so either you @ArthurZucker or @younesbelkada need to merge lol 😅 |
Ooops 🤣 |
Considering it @VictorSanh! |
…ce#28333) * fix `_merge_input_ids_with_image_features` for llava model * Update src/transformers/models/llava/modeling_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * adress comments * style and tests * ooops * test the backward too * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update tests/models/vipllava/test_modeling_vipllava.py * style and quality --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
…ce#28333) * fix `_merge_input_ids_with_image_features` for llava model * Update src/transformers/models/llava/modeling_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * adress comments * style and tests * ooops * test the backward too * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update tests/models/vipllava/test_modeling_vipllava.py * style and quality --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
…ce#28333) * fix `_merge_input_ids_with_image_features` for llava model * Update src/transformers/models/llava/modeling_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * adress comments * style and tests * ooops * test the backward too * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update tests/models/vipllava/test_modeling_vipllava.py * style and quality --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
…ce#28333) * fix `_merge_input_ids_with_image_features` for llava model * Update src/transformers/models/llava/modeling_llava.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * adress comments * style and tests * ooops * test the backward too * Apply suggestions from code review Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update tests/models/vipllava/test_modeling_vipllava.py * style and quality --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Hi, final_labels[batch_indices, text_to_overwrite] = labels[batch_indices, non_image_indices] The same code works fine if I just change the model to another VLLM like InstructBlip. Thank you and kind regards, |
@alexandrosXe do you have a reproduction case we can start debugging from? |
|
Can you also make sure you are using the latest version of transformers? |
@ArthurZucker I am using the transformers 4.37.2 version. |
Yes, it seems
from the doc it should be length sequence length |
@ArthurZucker Thank you, it was my fault using wrong labels. Now everything works fine! |
Bug detected by @Sakshi-Bhargava
The method
LlavaForConditionalGeneration._merge_input_ids_with_image_features
takes care of merging the input_embeds with the hidden states obtained from the vision encoder. The merge output is fed to the language model part of the model.However,
labels
was omitted from the merge, and when trying to compute a loss, the shapes of the logits and the labels are not compatible.This fix ensures that
labels
is also properly merged.Dummy reproduction case (still respect the model hidden sizes):
will yield the following error without the fix
cc @gullalc @younesbelkada @amyeroberts