Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

simple align qwen2vl kv_seq_len calculation with qwen2 #33161

Merged

Conversation

simonJJJ
Copy link
Contributor

What does this PR do?

it's a simple align of qwen2vl kv_seq_len calculation with qwen2 new code.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@simonJJJ
Copy link
Contributor Author

@ArthurZucker @zucchini-nlp we made a simple kv_seq_len caculation align to the newest qwen2 code.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should not be using cache.get_length and should rely on cache_position. Not all models follow that yet, we're still working on supporting cache class in all models. CRIIW @ArthurZucker

@simonJJJ
Copy link
Contributor Author

it's not conflict with the current qwen2 code change? we just want to flash-attention works right.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a reproducer with flash attention issue?
Overall the kv seqlen is exactly cache position (that is what we use for other models)

@simonJJJ
Copy link
Contributor Author

Do you have a reproducer with flash attention issue? Overall the kv seqlen is exactly cache position (that is what we use for other models)

Yes, when using pure text input with flash attention, the position_ids is wrong before fixing by the PR.

@zucchini-nlp
Copy link
Member

Hmm, does making it max(cache_position[-1], position_ids[:, -1].max().item() + 1) if position_ids is not None else cache_position[-1] work? 🤔 since kv_seq_len should be same as cache_position

@simonJJJ
Copy link
Contributor Author

Hmm, does making it max(cache_position[-1], position_ids[:, -1].max().item() + 1) if position_ids is not None else cache_position[-1] work? 🤔 since kv_seq_len should be same as cache_position

tbh, we just want the implementation to be the "same" with the qwen2 model, so we just copy the code from modeling_qwen2.py

@ShuaiBai623
Copy link
Contributor

Hi@zucchini-nlp @ArthurZucker , if you have any further questions regarding this change, please let us know.

@zucchini-nlp
Copy link
Member

@ShuaiBai623 Nice to have a test pls, of there wasn't any

@ShuaiBai623
Copy link
Contributor

Is it required to add a test for flashatt2 in the Qwen2VLIntegrationTest, like this:

@slow
@require_bitsandbytes
def test_small_model_integration_test_batch_wo_image_flashatt2(self):
    model = Qwen2VLForConditionalGeneration.from_pretrained(
       "Qwen/Qwen2-VL-7B-Instruct",
       torch_dtype=torch.bfloat16,
       attn_implementation="flash_attention_2",
       device_map="auto",
    )
    text = self.processor.apply_chat_template(self.messages, tokenize=False, add_generation_prompt=True)
    messages2 = [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Who are you?"},
    ]
    text2 = self.processor.apply_chat_template(messages2, tokenize=False, add_generation_prompt=True)
    inputs = self.processor(text=[text, text2], images=[self.image], return_tensors="pt").to(torch_device)

    # it should not matter whether two images are the same size or not
    output = model.generate(**inputs, max_new_tokens=30)

    EXPECTED_DECODED_TEXT = [
        "system\nYou are a helpful assistant.\nuser\nWhat kind of dog is this?assistant\nThe dog in the picture appears to be a Labrador Retriever. Labradors are known for their friendly and outgoing personalities, as well as their",
        "system\nYou are a helpful assistant.user\nWho are you?assistant\nI am Qwen, a large language model created by Alibaba Cloud. I am designed to assist with various tasks and answer a wide range of questions to",
    ]

    self.assertEqual(
        self.processor.batch_decode(output, skip_special_tokens=True),
        EXPECTED_DECODED_TEXT,
    )

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as long as it reproduces the error we're fixing, the tests are okay. Can you add an empty commit with message [run-slow] qwen2_vl pls so we can run the added test as part of CI?

Not sure now if we ran tests when adding the model 🤔

@zucchini-nlp
Copy link
Member

Seems like the slow tests result in OOM, and I believe it should be fixed. Maybe we can use smaller resolution for tests, so that image tokens don't take up much space? Or is there any other reason for OOM (our runners are 14GB)?

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@zucchini-nlp
Copy link
Member

Thanks a lot! I see some qwen2-vl tests are failing which I guess is due to torch-version or hardware that causes small numerical differences and thus different generations. I can fix those to match our runner's results (in scope of this PR)

ping @ArthurZucker for final review

@ShuaiBai623
Copy link
Contributor

thanks ❤️

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDRl; this is a quick fix, we will update this on our side later on to not rely on get usable length

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oke, cool, let's merge then. Final nits for the tests. I added what should be the generated text to pass the CI, can you plz check and verify that ? For ex the batch test seems weird as the two examples aren't identical

tests/models/qwen2_vl/test_modeling_qwen2_vl.py Outdated Show resolved Hide resolved
ShuaiBai623 and others added 3 commits September 5, 2024 23:22
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
@ShuaiBai623
Copy link
Contributor

In my local env (A100 + torch2.4), the test passed. There was a slight difference at the 30th token during batch test, which I consider acceptable, so I removed the check for "they are identical."

@zucchini-nlp zucchini-nlp merged commit 21fac7a into huggingface:main Sep 5, 2024
21 checks passed
zucchini-nlp added a commit to zucchini-nlp/transformers that referenced this pull request Sep 6, 2024
…3161)

* qwen2vl_align_kv_seqlen_to_qwen2

* flash att test

* [run-slow] qwen2_vl

* [run-slow] qwen2_vl fix OOM

* [run-slow] qwen2_vl

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* code quality

---------

Co-authored-by: baishuai.bs <1051314669@qq.com>
Co-authored-by: ShuaiBai623 <baishuai623@icloud.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
itazap pushed a commit to NielsRogge/transformers that referenced this pull request Sep 20, 2024
…3161)

* qwen2vl_align_kv_seqlen_to_qwen2

* flash att test

* [run-slow] qwen2_vl

* [run-slow] qwen2_vl fix OOM

* [run-slow] qwen2_vl

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* code quality

---------

Co-authored-by: baishuai.bs <1051314669@qq.com>
Co-authored-by: ShuaiBai623 <baishuai623@icloud.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
…3161)

* qwen2vl_align_kv_seqlen_to_qwen2

* flash att test

* [run-slow] qwen2_vl

* [run-slow] qwen2_vl fix OOM

* [run-slow] qwen2_vl

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* code quality

---------

Co-authored-by: baishuai.bs <1051314669@qq.com>
Co-authored-by: ShuaiBai623 <baishuai623@icloud.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024
…3161)

* qwen2vl_align_kv_seqlen_to_qwen2

* flash att test

* [run-slow] qwen2_vl

* [run-slow] qwen2_vl fix OOM

* [run-slow] qwen2_vl

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* Update tests/models/qwen2_vl/test_modeling_qwen2_vl.py

Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>

* code quality

---------

Co-authored-by: baishuai.bs <1051314669@qq.com>
Co-authored-by: ShuaiBai623 <baishuai623@icloud.com>
Co-authored-by: ShuaiBai623 <43326198+ShuaiBai623@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants