simple align qwen2vl kv_seq_len calculation with qwen2 #33161

simonJJJ · 2024-08-28T09:48:46Z

What does this PR do?

it's a simple align of qwen2vl kv_seq_len calculation with qwen2 new code.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

simonJJJ · 2024-08-28T09:50:27Z

@ArthurZucker @zucchini-nlp we made a simple kv_seq_len caculation align to the newest qwen2 code.

zucchini-nlp

IMO we should not be using cache.get_length and should rely on cache_position. Not all models follow that yet, we're still working on supporting cache class in all models. CRIIW @ArthurZucker

simonJJJ · 2024-08-28T11:17:54Z

it's not conflict with the current qwen2 code change? we just want to flash-attention works right.

ArthurZucker

Do you have a reproducer with flash attention issue?
Overall the kv seqlen is exactly cache position (that is what we use for other models)

simonJJJ · 2024-08-29T09:33:40Z

Do you have a reproducer with flash attention issue? Overall the kv seqlen is exactly cache position (that is what we use for other models)

Yes, when using pure text input with flash attention, the position_ids is wrong before fixing by the PR.

zucchini-nlp · 2024-08-30T11:56:47Z

Hmm, does making it max(cache_position[-1], position_ids[:, -1].max().item() + 1) if position_ids is not None else cache_position[-1] work? 🤔 since kv_seq_len should be same as cache_position

simonJJJ · 2024-08-30T12:22:08Z

Hmm, does making it max(cache_position[-1], position_ids[:, -1].max().item() + 1) if position_ids is not None else cache_position[-1] work? 🤔 since kv_seq_len should be same as cache_position

tbh, we just want the implementation to be the "same" with the qwen2 model, so we just copy the code from modeling_qwen2.py

ShuaiBai623 · 2024-09-03T10:12:38Z

Hi@zucchini-nlp @ArthurZucker , if you have any further questions regarding this change, please let us know.

zucchini-nlp · 2024-09-03T10:14:24Z

@ShuaiBai623 Nice to have a test pls, of there wasn't any

ShuaiBai623 · 2024-09-04T14:26:54Z

Is it required to add a test for flashatt2 in the Qwen2VLIntegrationTest, like this:

@slow
@require_bitsandbytes
def test_small_model_integration_test_batch_wo_image_flashatt2(self):
    model = Qwen2VLForConditionalGeneration.from_pretrained(
       "Qwen/Qwen2-VL-7B-Instruct",
       torch_dtype=torch.bfloat16,
       attn_implementation="flash_attention_2",
       device_map="auto",
    )
    text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
    messages2 = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you?"},
    ]
    text2 = self.processor.apply_chat_template(messages2, tokenize=False, add_generation_prompt=True)
    inputs = self.processor(text=[text, text2], images=[self.image], return_tensors="pt").to(torch_device)

    # it should not matter whether two images are the same size or not
    output = model.generate(**inputs, max_new_tokens=30)

    EXPECTED_DECODED_TEXT = [
        "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?assistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and outgoing personalities, as well as their",
        "system\nYou are a helpful assistant.user\nWho are you?assistant\nI am Qwen, a large language model created by Alibaba Cloud. I am designed to assist with various tasks and answer a wide range of questions to",
    ]

    self.assertEqual(
        self.processor.batch_decode(output, skip_special_tokens=True),
        EXPECTED_DECODED_TEXT,
    )

zucchini-nlp

Yes, as long as it reproduces the error we're fixing, the tests are okay. Can you add an empty commit with message [run-slow] qwen2_vl pls so we can run the added test as part of CI?

Not sure now if we ran tests when adding the model 🤔

zucchini-nlp · 2024-09-05T09:57:53Z

Seems like the slow tests result in OOM, and I believe it should be fixed. Maybe we can use smaller resolution for tests, so that image tokens don't take up much space? Or is there any other reason for OOM (our runners are 14GB)?

HuggingFaceDocBuilderDev · 2024-09-05T10:00:33Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp · 2024-09-05T11:55:15Z

Thanks a lot! I see some qwen2-vl tests are failing which I guess is due to torch-version or hardware that causes small numerical differences and thus different generations. I can fix those to match our runner's results (in scope of this PR)

ping @ArthurZucker for final review

ShuaiBai623 · 2024-09-05T12:27:34Z

thanks ❤️

ArthurZucker

TLDRl; this is a quick fix, we will update this on our side later on to not rely on get usable length

zucchini-nlp

Oke, cool, let's merge then. Final nits for the tests. I added what should be the generated text to pass the CI, can you plz check and verify that ? For ex the batch test seems weird as the two examples aren't identical

tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

ShuaiBai623 · 2024-09-05T15:47:43Z

In my local env (A100 + torch2.4), the test passed. There was a slight difference at the 30th token during batch test, which I consider acceptable, so I removed the check for "they are identical."

…3161) * qwen2vl_align_kv_seqlen_to_qwen2 * flash att test * [run-slow] qwen2_vl * [run-slow] qwen2_vl fix OOM * [run-slow] qwen2_vl * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz> * code quality --------- Co-authored-by: baishuai.bs <1051314669@qq.com> Co-authored-by: ShuaiBai623 <baishuai623@icloud.com> Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

qwen2vl_align_kv_seqlen_to_qwen2

bfe1a58

zucchini-nlp reviewed Aug 28, 2024

View reviewed changes

ArthurZucker reviewed Aug 29, 2024

View reviewed changes

kq-chen mentioned this pull request Sep 2, 2024

Does Qwen2VL support text-only input? QwenLM/Qwen2-VL#69

Closed

dongZheX mentioned this pull request Sep 4, 2024

RuntimeError: CUDA Error: Device-Side Assert Triggered QwenLM/Qwen2-VL#26

Closed

zucchini-nlp mentioned this pull request Sep 4, 2024

Out-of-Index Error when training by Qwen2VLFlashAttention2 #33302

Closed

4 tasks

flash att test

c99c6bf

zucchini-nlp approved these changes Sep 5, 2024

View reviewed changes

zucchini-nlp added the run-slow label Sep 5, 2024

[run-slow] qwen2_vl

35f0e2d

[run-slow] qwen2_vl fix OOM

ba455d3

[run-slow] qwen2_vl

ca2828c

ArthurZucker approved these changes Sep 5, 2024

View reviewed changes

zucchini-nlp reviewed Sep 5, 2024

View reviewed changes

tests/models/qwen2_vl/test_modeling_qwen2_vl.py Outdated Show resolved Hide resolved

tests/models/qwen2_vl/test_modeling_qwen2_vl.py Show resolved Hide resolved

ShuaiBai623 and others added 3 commits September 5, 2024 23:22

Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

00a1415

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

2094e39

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

code quality

661b148

zucchini-nlp mentioned this pull request Sep 5, 2024

add self.head_dim for VisionAttention in Qwen2-VL #33211

Merged

3 tasks

zucchini-nlp merged commit 21fac7a into huggingface:main Sep 5, 2024
21 checks passed

ShuaiBai623 mentioned this pull request Sep 6, 2024

7B模型，当设置attn_implementation="eager"后，推理结果会大幅度下降，有人遇到这个情况吗？ QwenLM/Qwen2-VL#117

Closed

kq-chen mentioned this pull request Sep 10, 2024

Text only input CUDA error when using flash attention 2 QwenLM/Qwen2-VL#158

Open

fyabc mentioned this pull request Sep 11, 2024

[Model][VLM] Add Qwen2-VL model support vllm-project/vllm#7905

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

simple align qwen2vl kv_seq_len calculation with qwen2 #33161

simple align qwen2vl kv_seq_len calculation with qwen2 #33161

simonJJJ commented Aug 28, 2024

simonJJJ commented Aug 28, 2024

zucchini-nlp left a comment

simonJJJ commented Aug 28, 2024

ArthurZucker left a comment •

edited

Loading

simonJJJ commented Aug 29, 2024

zucchini-nlp commented Aug 30, 2024

simonJJJ commented Aug 30, 2024

ShuaiBai623 commented Sep 3, 2024

zucchini-nlp commented Sep 3, 2024

ShuaiBai623 commented Sep 4, 2024

zucchini-nlp left a comment •

edited

Loading

zucchini-nlp commented Sep 5, 2024

HuggingFaceDocBuilderDev commented Sep 5, 2024

zucchini-nlp commented Sep 5, 2024

ShuaiBai623 commented Sep 5, 2024

ArthurZucker left a comment

zucchini-nlp left a comment

ShuaiBai623 commented Sep 5, 2024

simple align qwen2vl kv_seq_len calculation with qwen2 #33161

simple align qwen2vl kv_seq_len calculation with qwen2 #33161

Conversation

simonJJJ commented Aug 28, 2024

What does this PR do?

Before submitting

Who can review?

simonJJJ commented Aug 28, 2024

zucchini-nlp left a comment

Choose a reason for hiding this comment

simonJJJ commented Aug 28, 2024

ArthurZucker left a comment • edited Loading

Choose a reason for hiding this comment

simonJJJ commented Aug 29, 2024

zucchini-nlp commented Aug 30, 2024

simonJJJ commented Aug 30, 2024

ShuaiBai623 commented Sep 3, 2024

zucchini-nlp commented Sep 3, 2024

ShuaiBai623 commented Sep 4, 2024

zucchini-nlp left a comment • edited Loading

Choose a reason for hiding this comment

zucchini-nlp commented Sep 5, 2024

HuggingFaceDocBuilderDev commented Sep 5, 2024

zucchini-nlp commented Sep 5, 2024

ShuaiBai623 commented Sep 5, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

zucchini-nlp left a comment

Choose a reason for hiding this comment

ShuaiBai623 commented Sep 5, 2024

ArthurZucker left a comment •

edited

Loading

zucchini-nlp left a comment •

edited

Loading