-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tokenizer issue with Vicuna V1.1, EOS, BOS tokens seem to be blank #408
Comments
I have the following from within the code (debugging):
But on my system, once I asked a question, the ASSISTANT will go on forever with the conversation on his own. So I believe there is something odd with those tokenizers |
Could you try the following steps?
Hugging face did some changes to the llama tokenizer recently. |
ok I upgraded to
And I still get the same problem, i.e. the assistant is doing the whole conversation between assistant and user on its own :( |
What weight version did you use? V0 or V1.1? Could you share your chat history so we can know what happened? |
I have the same issue with Vicuna V1.1 |
I have the same issue with fastchat 0.2.1. I have tried to update huggingface transformers and restart workers, but still not work. |
I fine with vicuna v1.0 and fastchat 0.2.1, but my model is converted on 0.1.9 |
New models and v.0.1.10 works for me |
so
|
I guess the blank EOS/BOS is not only related to fastchat or Vicuna weights but it is also related to how you convert the base llama model. I suggest you use In terms of the combability, FastChat/fastchat/serve/inference.py Line 30 in 898d4fc
|
redownload the models and do the transformer again will fix this. |
Thanks. After applying delta with latest fastchat, I still get the blank EOS/BOS in special_tokens_map.json The problem solved, after copying special_tokens_map.json and tokenizer_config.json Package Version accelerate 0.18.0 |
Thanks everyone, converting the LLaMA weights using the new converter from hugging face + applying the Vicuna v1.1 delta worked out of the box. |
Hello,
When I try and get the BOS and EOS token from the tokenizer. I'm getting
''
as both EOS and BOS tokens. Tried it with both AutoTokenizer as well as LlamaTokenizer.The documentation on HuggingFace says that the EOS token is
"</s>"
. I further suspect that it is not the case since this is the special_tokens_map.json fileCould Anyone tell me if they're experiencing the same and if it might be an error
The text was updated successfully, but these errors were encountered: