Fix vipllava for generation #29874

zucchini-nlp · 2024-03-26T10:38:21Z

What does this PR do?

When working on this PR, it was found that VipLlava fails when generating with kv cache, because of incorrectly indexing past_kv_length. This PR fixes it by indexing the past length as -2. Other Llava models work correctly.

HuggingFaceDocBuilderDev · 2024-03-26T11:01:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Thanks! Do you know if our slow CI caught this? Otherwise let's add a test!

zucchini-nlp · 2024-03-26T12:30:45Z

Yes, the tests/models/vipllava/test_modeling_vipllava.py::VipLlavaForConditionalGenerationIntegrationTest::test_small_model_integration_test should have caught it. Also, soon we will have all GenerationTesterMixin tests running for vision language models, to catch this type of errors without slow CI.

ArthurZucker

I am a bit surprised by the other model working as expected because other llava based model do use [-1] which is the head_dim no?

zucchini-nlp · 2024-03-26T12:53:37Z

Yeah, that's because other models index only the first head dim in this line , while VipLlava indexes the first head.

I found why we need this head indexing hack and tried generating with padded inputs, I did not see any difference between indexing first head or first head dimension.

src/transformers/models/vipllava/modeling_vipllava.py

zucchini-nlp · 2024-03-27T10:08:55Z

Made Llava model code consistent and ran all tests (+slow). Some that were failing do not have anything to do with the current changes, I confirmed they were failing long time before

ArthurZucker

Thanks

ArthurZucker · 2024-03-28T02:13:17Z

tests/models/llava/test_modeling_llava.py

@@ -403,7 +403,7 @@ def test_small_model_integration_test_llama(self):
        inputs = processor(prompt, raw_image, return_tensors="pt").to(torch_device, torch.float16)

        output = model.generate(**inputs, max_new_tokens=900, do_sample=False)
-        EXPECTED_DECODED_TEXT = "USER:  \nWhat are the things I should be cautious about when I visit this place?\nASSISTANT: When visiting this place, which is a pier or dock extending over a body of water, there are a few things to be cautious about. First, be aware of the weather conditions, as sudden changes in weather can make the pier unsafe to walk on. Second, be mindful of the water depth and any potential hazards, such as submerged rocks or debris, that could cause accidents or injuries. Additionally, be cautious of the presence of wildlife, such as birds or fish, and avoid disturbing their natural habitats. Lastly, be aware of any local regulations or guidelines for the use of the pier, as some areas may be restricted or prohibited for certain activities."  # fmt: skip
+        EXPECTED_DECODED_TEXT = "USER:  \nWhat are the things I should be cautious about when I visit this place?\nASSISTANT: When visiting this place, which appears to be a dock or pier extending over a body of water, there are a few things to be cautious about. First, be aware of the surroundings and potential hazards, such as slippery surfaces, uneven ground, or any obstacles in the water. Second, be mindful of the weather conditions, as sudden changes in weather can make the dock or pier unsafe to use. Third, be cautious of the water depth and any underwater hazards, such as rocks or debris, that could pose a risk to your safety. Lastly, be respectful of the environment and other visitors, and follow any rules or guidelines posted at the dock or pier."  # fmt: skip


Which hardware did you run this on ? should not be failing! but maybe it T4 vs A100

Related question: if llava was not touched in this PR and our daily slow CI is not complaining, why was this test changed? 🤔

Hmm, I was running the test on A100. If daily CI is passing, then prob it does not need change. Anyway, it's interesting for me that some tests are reliant on the hardware, which means a contributor might change failing tests without knowing that it's correct.

Should I just revert changes then?

ArthurZucker · 2024-03-28T02:13:41Z

src/transformers/models/llava_next/modeling_llava_next.py

gante · 2024-03-28T11:57:28Z

src/transformers/models/vipllava/modeling_vipllava.py

@@ -441,10 +441,10 @@ def forward(
                if past_key_values is not None and pixel_values is not None and input_ids.shape[1] == 1:
                    # Retrieve the first layer to inspect the logits and mask out the hidden states
                    # that are set to 0
-                    first_layer_past_key_value = past_key_values[0][0][:, 0, :, :]
+                    first_layer_past_key_value = past_key_values[0][0][:, :, :, 0]


This seems like an opportunity for # Copied from (not to be fixed in this PR, but in the future)

Noted for the next PR :)

gante · 2024-03-28T11:59:59Z

tests/models/llava/test_modeling_llava.py

@@ -403,7 +403,7 @@ def test_small_model_integration_test_llama(self):
        inputs = processor(prompt, raw_image, return_tensors="pt").to(torch_device, torch.float16)

        output = model.generate(**inputs, max_new_tokens=900, do_sample=False)
-        EXPECTED_DECODED_TEXT = "USER:  \nWhat are the things I should be cautious about when I visit this place?\nASSISTANT: When visiting this place, which is a pier or dock extending over a body of water, there are a few things to be cautious about. First, be aware of the weather conditions, as sudden changes in weather can make the pier unsafe to walk on. Second, be mindful of the water depth and any potential hazards, such as submerged rocks or debris, that could cause accidents or injuries. Additionally, be cautious of the presence of wildlife, such as birds or fish, and avoid disturbing their natural habitats. Lastly, be aware of any local regulations or guidelines for the use of the pier, as some areas may be restricted or prohibited for certain activities."  # fmt: skip
+        EXPECTED_DECODED_TEXT = "USER:  \nWhat are the things I should be cautious about when I visit this place?\nASSISTANT: When visiting this place, which appears to be a dock or pier extending over a body of water, there are a few things to be cautious about. First, be aware of the surroundings and potential hazards, such as slippery surfaces, uneven ground, or any obstacles in the water. Second, be mindful of the weather conditions, as sudden changes in weather can make the dock or pier unsafe to use. Third, be cautious of the water depth and any underwater hazards, such as rocks or debris, that could pose a risk to your safety. Lastly, be respectful of the environment and other visitors, and follow any rules or guidelines posted at the dock or pier."  # fmt: skip


Related question: if llava was not touched in this PR and our daily slow CI is not complaining, why was this test changed? 🤔

gante

LGTM 👍

Let's revert the changes in tests/models/llava/test_modeling_llava.py, then we can merge :)

zucchini-nlp · 2024-04-03T11:39:07Z

reverted changes back and rebased main, can be merged now

* fix vipllava generation * consistent llava code * revert llava tests changes

fix vipllava generation

5a4cf79

zucchini-nlp requested a review from gante March 26, 2024 10:38

ArthurZucker reviewed Mar 26, 2024

View reviewed changes

ArthurZucker approved these changes Mar 27, 2024

View reviewed changes

src/transformers/models/vipllava/modeling_vipllava.py Outdated Show resolved Hide resolved

consistent llava code

4102fed

ArthurZucker reviewed Mar 28, 2024

View reviewed changes

gante reviewed Mar 28, 2024

View reviewed changes

gante approved these changes Apr 3, 2024

View reviewed changes

zucchini-nlp added 2 commits April 3, 2024 12:45

Merge remote-tracking branch 'upstream/main' into llava_fix

c37de0d

revert llava tests changes

d21ea97

gante merged commit cc75f1a into huggingface:main Apr 3, 2024
18 checks passed

ArthurZucker pushed a commit that referenced this pull request Apr 22, 2024

Fix vipllava for generation (#29874)

10825bd

* fix vipllava generation * consistent llava code * revert llava tests changes

itazap pushed a commit that referenced this pull request May 14, 2024

Fix vipllava for generation (#29874)

bb9c64f

* fix vipllava generation * consistent llava code * revert llava tests changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix vipllava for generation #29874

Fix vipllava for generation #29874

zucchini-nlp commented Mar 26, 2024

HuggingFaceDocBuilderDev commented Mar 26, 2024

ArthurZucker left a comment

zucchini-nlp commented Mar 26, 2024 •

edited

Loading

ArthurZucker left a comment •

edited

Loading

zucchini-nlp commented Mar 26, 2024

zucchini-nlp commented Mar 27, 2024

ArthurZucker left a comment

ArthurZucker Mar 28, 2024

gante Mar 28, 2024

zucchini-nlp Apr 1, 2024

ArthurZucker Mar 28, 2024

gante Mar 28, 2024

zucchini-nlp Apr 1, 2024

gante Mar 28, 2024

gante left a comment

zucchini-nlp commented Apr 3, 2024

Fix vipllava for generation #29874

Fix vipllava for generation #29874

Conversation

zucchini-nlp commented Mar 26, 2024

What does this PR do?

HuggingFaceDocBuilderDev commented Mar 26, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Mar 26, 2024 • edited Loading

ArthurZucker left a comment • edited Loading

Choose a reason for hiding this comment

zucchini-nlp commented Mar 26, 2024

zucchini-nlp commented Mar 27, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Mar 28, 2024

Choose a reason for hiding this comment

gante Mar 28, 2024

Choose a reason for hiding this comment

zucchini-nlp Apr 1, 2024

Choose a reason for hiding this comment

ArthurZucker Mar 28, 2024

Choose a reason for hiding this comment

gante Mar 28, 2024

Choose a reason for hiding this comment

zucchini-nlp Apr 1, 2024

Choose a reason for hiding this comment

gante Mar 28, 2024

Choose a reason for hiding this comment

gante left a comment

Choose a reason for hiding this comment

zucchini-nlp commented Apr 3, 2024

zucchini-nlp commented Mar 26, 2024 •

edited

Loading

ArthurZucker left a comment •

edited

Loading