-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with transformers 4.36 #1252
Issue with transformers 4.36 #1252
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
cc: @gante as well. I don't have the bandwidth to look into this today |
LMK if there is no bandwidth for this issue, then we should pin the transformers version for now, otherwise CI will remain broken. |
uhmmm... the catch is that the function is missing one input, the layer index. If if past_key_value is not None:
if isinstance(past_key_value, tuple):
seq_len += past_key_value[0].shape[-2]
else:
seq_len += past_key_value.get_seq_length(layer_idx) |
For context, the layer index is now held in the decoder layer and in the attention layer themselves, e.g. here |
Thanks @gante! I made the following change: if past_key_value is not None:
if isinstance(past_key_value, tuple):
seq_len += past_key_value[0].shape[-2]
else:
# since transformers 4.36, this is a DynamicCache instance
seq_len += past_key_value.get_seq_length(model.layer_idx) This makes the first batch of tests pass 🎉 Unfortunately, the size mismatch in
|
@BenjaminBossan the second one is tricker :D would you be able to share a snippet so I can reproduce on my end? |
When checking out peft, this test fails: pytest tests/ -k test_disable_adapter_45_test_HuggingFaceM4_tiny_random_LlamaForCausalLM_prompt_encoder Possible this PEFT code is the issue, but I'm not sure: Lines 1162 to 1167 in e73967e
edit: Maybe not, I couldn't spot a difference here between transformers 4.35 and 4.36 edit2: When I comment out this line: Line 1122 in e73967e
the error disappears (but a later assertion that PEFT changes the result fails, as expected). However, when I jump into the output of |
@BenjaminBossan I think the attention mask in The exception pops up because the input has a length of 17 (13 from the cache, 4 from the new input_ids) and the attention mask has a length of 14 (10 from If we replace if model_kwargs.get("attention_mask", None) is not None:
prefix_attention_mask = torch.ones(
model_kwargs["input_ids"].shape[0], peft_config.num_virtual_tokens
).to(model_kwargs["input_ids"].device)
model_kwargs["attention_mask"] = torch.cat(
(prefix_attention_mask, model_kwargs["attention_mask"]), dim=1
) by if model_kwargs.get("attention_mask", None) is not None:
if model_kwargs["past_key_values"] is None:
prefix_attention_mask = torch.ones(
model_kwargs["input_ids"].shape[0], peft_config.num_virtual_tokens
).to(model_kwargs["input_ids"].device)
else:
prefix_attention_mask = torch.ones(
model_kwargs["input_ids"].shape[0], model_kwargs["past_key_values"][0][0].shape[-2]
).to(model_kwargs["input_ids"].device)
model_kwargs["attention_mask"] = torch.cat(
(prefix_attention_mask, model_kwargs["attention_mask"]), dim=1
) then test pass because the attention mask is now of the expected shape. However, I'm not fully qualified to tell whether it is the right fix for PEFT :) |
Thanks for digging into this. Your fix indeed makes the test pass -- alas, only for transformers 4.36 and not for 4.35, even though the shapes are the same for both. I'll try to dig deeper tomorrow.
That part of the code is also not very familiar to me :D |
ttributeError Traceback (most recent call last) File /opt/conda/lib/python3.10/site-packages/transformers/init.py:26 File /opt/conda/lib/python3.10/site-packages/transformers/dependency_versions_check.py:16 File /opt/conda/lib/python3.10/site-packages/transformers/utils/init.py:61 File /opt/conda/lib/python3.10/site-packages/transformers/utils/hub.py:94 AttributeError: module 'huggingface_hub.constants' has no attribute 'HF_HUB_CACHE' import os import torch !pip install trl transformers accelerate git+https://github.com/huggingface/peft.git -Uqqq |
@Jaykumaran That sounds like your |
@younesbelkada @tomaarsen @gante Please check if the fix/workaround is good. Ideally, I'd like not to hard-code the supported architectures, not sure if transformers provides a way to check that instead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, otherwise LGTM 👍
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
All model architectures should now support cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome investigation!
Thanks @gante, @younesbelkada & @BenjaminBossan for looking into this! |
See huggingface#1252 for more context. The initial idea was for transformers 4.37 to add the new caching to all architectures, but this was postponed to 4.38. The code needs to be adapted for prompt tuning not to break when transformers 4.37 is released.
After spending 2 hours in the trenches of |
@pacman100 I feel you 😢 I'm sorry for breaking the old default behavior, which was used here -- it was the only solution I could find to ensure all new generation methods worked correctly. The |
It seems that transformers==4.36 breaks some of our tests. Locally, they pass with 4.35 but fail with 4.36.
One offending line is this:
peft/src/peft/tuners/adaption_prompt/utils.py
Line 76 in b08e6fa
It seems that
past_key_values
used to be a tuple of tensors, now it is aDynamicCache
, whose__getitem__
returns a tuple of tensors. How can we rewrite the PEFT code in a backwards compatible fashion (i.e. it should also work with older transformers versions)?Another error is this:
If there is no easy fix, LMK and we'll have to pin the transformers version for now.
ping @tomaarsen @younesbelkada