-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate: detect special architectures when loaded from PEFT #24198
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing!
Just a q about robustness w/ listed architectures
src/transformers/generation/utils.py
Outdated
@@ -4512,7 +4517,8 @@ def _crop_past_key_values(model, past_key_values, maximum_length): | |||
) | |||
) | |||
past_key_values = tuple(new_past) | |||
elif "bloom" in model.__class__.__name__.lower(): # bloom is special | |||
# bloom is special | |||
elif "bloom" in model.__class__.__name__.lower() or "bloom" in model.config.architectures[0].lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we guaranteed it's always the 0th indexed architecture?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never seen more than 1 item in architectures
-- in fact, the only PT reference writing into .config.architrectures
is this one
I am assuming it works in the vast majority of cases -- even if the user decides to append more info into this config member, it should work.
Not a super satisfactory answer, I know 😅
src/transformers/generation/utils.py
Outdated
@@ -4521,7 +4527,8 @@ def _crop_past_key_values(model, past_key_values, maximum_length): | |||
) | |||
) | |||
past_key_values = tuple(new_past) | |||
elif "gptbigcode" in model.__class__.__name__.lower(): # gptbigcode is too | |||
# gptbigcode is too | |||
elif "gptbigcode" in model.__class__.__name__.lower() or "gptbigcode" in model.config.architectures[0].lower(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here re 0 indexing
The documentation is not available anymore as the PR was closed or merged. |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
What does this PR do?
Fixes #23686
As identified in this comment, a PEFT-loaded BLOOM can't be used as an assistant with assisted generation.
BLOOM (and GPTBigCode) need special handling due to their different cache API, and the architecture detection code was incompatible with PEFT models. This PR adds the logic to detect these special architectures when loaded with PEFT.