-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix QA example #30580
Fix QA example #30580
Conversation
6e6749a
to
a01cc68
Compare
a01cc68
to
a6fa6ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the quick fix on this!
Just a request to do a quick test run on the old default models, and then with one of the newer ones on just one of these scripts to make sure it all runs smoothly
Will do now before merging! |
Tried running it on Falcon myself, but it turns out there are multiple other issues here - in particular, the code also assumes a I think we can maybe still merge the PR (since there are likely MLM models either now or in the future that will be missing CLS), but @ananegru you'll probably need to use a masked language model as a base rather than a causal language model! Maybe one of the DeBERTa models? |
Just confirmed that the example still works with this branch and My bad for going through all this before realizing the root of the problems was a CLM being used in MLM examples, though! |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
I didn't notice either! Happy for this to be merged in if it still works with existing models. Thanks again for adapting the scripts so quickly |
The QA example assumes the model has a
CLS
token, but many newer models don't. This PR falls back to theBOS
token if one isn't present, and finally falls back to the start of the sequence if that doesn't work either.Fixes #30570