Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ #6551

Closed
araleza opened this issue Apr 8, 2024 · 9 comments
Closed

It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ #6551

araleza opened this issue Apr 8, 2024 · 9 comments

Comments

@araleza
Copy link

araleza commented Apr 8, 2024

Hi, I'm trying to use interactive mode to work with the new cmd-r+. I'm currently on the llama branch that supports it.

But I'm unable to use this model properly as I cannot enter the tokens required by the template for this model. I'm supposed to pass in various tokens:

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>{prompt}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{response}

But, when I enter tokens like <|END_OF_TURN_TOKEN|> into the interactive prompt, these get converted into a group of text tokens, rather than a single special token. I believe this because if I enter this prompt:

<|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hi, can you tell me the capital of Bulgaria please<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

then the model continues:

The capital of Bulgaria is Sofia. Can I help you with anything else?<|END_

I halted inference there. You can see it's partway through writing an <|END_OF_TURN_TOKEN|>, which means that it isn't a single token. And that means it's very likely to be copying the <|END_OF_TURN_TOKEN|> from my prompt, which it shouldn't be able to read as multiple tokens if it was passed to the model as a single token.

(By the way I skipped the <BOS_TOKEN> from my prompt as I think llama.cpp adds that automatically for me, although I'm not certain about that. It's not relevant to this issue.)

@fblissjr
Copy link

fblissjr commented Apr 8, 2024

Interestingly, this is the token id that is inconsistent across tokenizer.json files in Cohere's repo for fp16 & their 4bit bnb one.

See: huggingface/transformers#30027

Basically, in the "original" / fp16 model repo, token id 255001 <|END_OF_TURN_TOKEN|>, special is set to False. In the 4bit bnb one, it's True.

@araleza
Copy link
Author

araleza commented Apr 8, 2024

I used the '-ptc 1' debugging option to confirm that the input prompt's <|END_OF_TURN_TOKEN|> becomes a cluster of 12 tokens:

Without <|END_OF_TURN_TOKEN|> in prompt (4 tokens, presumably BOS,1,2,3):

image

With <|END_OF_TURN_TOKEN|> in prompt (16 tokens now, rather than BOS, EOT, 1, 2, 3):

image

@fblissjr
Copy link

fblissjr commented Apr 8, 2024

I'm pretty certain something is broken in the tokenizer.json. Running the fp16 tokenizer.json fixed my random Russian, but replaced it with random Persian. Using the 4bit bnb one caused the same random Russian.

Using MLX on macos since I don't have the vram to confirm this on CUDA, but would be curious to see what's happening there.

@araleza
Copy link
Author

araleza commented Apr 8, 2024

Looks like this is already a known issue, and is captured by this issue here:

#6391

@araleza araleza closed this as completed Apr 8, 2024
@araleza araleza reopened this Apr 8, 2024
@araleza
Copy link
Author

araleza commented Apr 8, 2024

Actually, looking in more detail at that linked issue, it only says that some models break up token like <|im_end|> instead of actually having a single token for that. This issue here is kind of the opposite issue, which is that the model does have a single token, and main isn't able to recognize it in the interactive input stream.

@fblissjr
Copy link

fblissjr commented Apr 8, 2024

Yeah, I can't figure it out. But burned enough steam on this and no longer at my workstation - if you figure it out, lemme know. :)

@araleza
Copy link
Author

araleza commented Apr 10, 2024

As a workaround, I found that using llama.cpp's ./server instead of ./main (based on a tip from @Jeximo) allows me to interact with the model, and those tokens work okay there:

image

(I pressed Start just after <|CHATBOT_TOKEN|>). The model stopped by itself after replying to my question, with no text-based repetition of any special tokens.

With the correct tokens in place, the model's IQ jumps way up compared to when they were just being interpreted as text strings.

@Jeximo
Copy link
Contributor

Jeximo commented Apr 10, 2024

The model stopped by itself after replying to my question, with no text-based repetition of any special tokens.

Good. server readme shows --chat-template defaults to template within a models metadata.

It may not be necessary to hardcode Cmd R+ template as it appears to work as expected. If someone decided to hardcode it into --chat-template, then see here: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template#how-to-add-a-new-template

Otherwise, I'm glad it's working for server.

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants