-
Notifications
You must be signed in to change notification settings - Fork 10.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ #6551
Comments
Interestingly, this is the token id that is inconsistent across tokenizer.json files in Cohere's repo for fp16 & their 4bit bnb one. See: huggingface/transformers#30027 Basically, in the "original" / fp16 model repo, token id 255001 <|END_OF_TURN_TOKEN|>, special is set to False. In the 4bit bnb one, it's True. |
I'm pretty certain something is broken in the tokenizer.json. Running the fp16 tokenizer.json fixed my random Russian, but replaced it with random Persian. Using the 4bit bnb one caused the same random Russian. Using MLX on macos since I don't have the vram to confirm this on CUDA, but would be curious to see what's happening there. |
Looks like this is already a known issue, and is captured by this issue here: |
Actually, looking in more detail at that linked issue, it only says that some models break up token like <|im_end|> instead of actually having a single token for that. This issue here is kind of the opposite issue, which is that the model does have a single token, and main isn't able to recognize it in the interactive input stream. |
Yeah, I can't figure it out. But burned enough steam on this and no longer at my workstation - if you figure it out, lemme know. :) |
As a workaround, I found that using llama.cpp's ./server instead of ./main (based on a tip from @Jeximo) allows me to interact with the model, and those tokens work okay there: (I pressed Start just after <|CHATBOT_TOKEN|>). The model stopped by itself after replying to my question, with no text-based repetition of any special tokens. With the correct tokens in place, the model's IQ jumps way up compared to when they were just being interpreted as text strings. |
Good. It may not be necessary to hardcode Cmd R+ template as it appears to work as expected. If someone decided to hardcode it into --chat-template, then see here: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template#how-to-add-a-new-template Otherwise, I'm glad it's working for |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
Hi, I'm trying to use interactive mode to work with the new cmd-r+. I'm currently on the llama branch that supports it.
But I'm unable to use this model properly as I cannot enter the tokens required by the template for this model. I'm supposed to pass in various tokens:
But, when I enter tokens like <|END_OF_TURN_TOKEN|> into the interactive prompt, these get converted into a group of text tokens, rather than a single special token. I believe this because if I enter this prompt:
then the model continues:
I halted inference there. You can see it's partway through writing an <|END_OF_TURN_TOKEN|>, which means that it isn't a single token. And that means it's very likely to be copying the <|END_OF_TURN_TOKEN|> from my prompt, which it shouldn't be able to read as multiple tokens if it was passed to the model as a single token.
(By the way I skipped the <BOS_TOKEN> from my prompt as I think llama.cpp adds that automatically for me, although I'm not certain about that. It's not relevant to this issue.)
The text was updated successfully, but these errors were encountered: