-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[t5 tokenizer] add info logs #9897
Conversation
I don't think this is something that should be done in I wasn't aware havin this file was mandatory for some models to use the fast tokenizer. Are you sure you have sentencepiece installed? It might be due to this that the conversion slow to fast does not work automatically Anyhow, once we have found the right way to generate that |
I don't have a problem to add it anywhere else, who do we tag on this?
If you look at the trace it is hunting for that file and can't find it.
Agreed! |
ok, so as @sgugger suggested on slack, the fast tokenizer saving will be handled on the core-level some time in the future, so I removed that part from this PR, leaving just the logger part. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for updating!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @stas00
This PR (was modified from the original):
tokenizer.save_pretrained()
original PR note
This PR
tokenizer.json
file ontokenizer.save_pretrained()
tokenizer.save_pretrained()
Context:
tokenizer.json
.Here is an example:
I'm not sure why I needed to save both:
note
tokenization_t5.py
doesn't have it! both t5 tokenizer files:As I flagged on slack
https://huggingface.co/sshleifer/t5-tinier-random
fails to be used since it's missing this fasttokenizer.json
file from the s3 set of files,it could be a symptom for another problem in our code.
@LysandreJik, @sgugger