-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generation utils update (minor) #1468
base: main
Are you sure you want to change the base?
Conversation
- Fix the type hint, dtype can not be a str - Fix the device hint - Remove the pad token id arg, the decoder_attention_mask is a binary of 0, and 1
- Added an early return - Extracted is_mqa_model and lazy_mode to avoid repeated dictionary lookups - Used more descriptive variable names and simplified the nested loops for better readability
The text-generation CI has been executed and will be compared with the main branch once the run is complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yafshar , Just a couple of comments below.
Please post results of CI, before and after change.
@yafshar , Makes sense. |
What does this PR do?
transformers.streamers
->transformers.generation.streamers
return x.index_fill(1, torch.tensor(0), 1)
uses the wrong index oftorch.tensor(0)
, it is fixed to the correct index on the correct deviceindex = torch.tensor(0, device=device)
Before submitting