It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ #6551

araleza · 2024-04-08T20:17:12Z

Hi, I'm trying to use interactive mode to work with the new cmd-r+. I'm currently on the llama branch that supports it.

But I'm unable to use this model properly as I cannot enter the tokens required by the template for this model. I'm supposed to pass in various tokens:

<BOS_TOKEN><|START_OF_TURN_TOKEN|><|USER_TOKEN|>{prompt}<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>{response}

But, when I enter tokens like <|END_OF_TURN_TOKEN|> into the interactive prompt, these get converted into a group of text tokens, rather than a single special token. I believe this because if I enter this prompt:

<|START_OF_TURN_TOKEN|><|USER_TOKEN|>Hi, can you tell me the capital of Bulgaria please<|END_OF_TURN_TOKEN|><|START_OF_TURN_TOKEN|><|CHATBOT_TOKEN|>

then the model continues:

The capital of Bulgaria is Sofia. Can I help you with anything else?<|END_

I halted inference there. You can see it's partway through writing an <|END_OF_TURN_TOKEN|>, which means that it isn't a single token. And that means it's very likely to be copying the <|END_OF_TURN_TOKEN|> from my prompt, which it shouldn't be able to read as multiple tokens if it was passed to the model as a single token.

(By the way I skipped the <BOS_TOKEN> from my prompt as I think llama.cpp adds that automatically for me, although I'm not certain about that. It's not relevant to this issue.)

The text was updated successfully, but these errors were encountered:

fblissjr · 2024-04-08T20:40:16Z

Interestingly, this is the token id that is inconsistent across tokenizer.json files in Cohere's repo for fp16 & their 4bit bnb one.

See: huggingface/transformers#30027

Basically, in the "original" / fp16 model repo, token id 255001 <|END_OF_TURN_TOKEN|>, special is set to False. In the 4bit bnb one, it's True.

araleza · 2024-04-08T20:41:18Z

I used the '-ptc 1' debugging option to confirm that the input prompt's <|END_OF_TURN_TOKEN|> becomes a cluster of 12 tokens:

Without <|END_OF_TURN_TOKEN|> in prompt (4 tokens, presumably BOS,1,2,3):

With <|END_OF_TURN_TOKEN|> in prompt (16 tokens now, rather than BOS, EOT, 1, 2, 3):

fblissjr · 2024-04-08T20:46:54Z

I'm pretty certain something is broken in the tokenizer.json. Running the fp16 tokenizer.json fixed my random Russian, but replaced it with random Persian. Using the 4bit bnb one caused the same random Russian.

Using MLX on macos since I don't have the vram to confirm this on CUDA, but would be curious to see what's happening there.

araleza · 2024-04-08T22:38:19Z

Looks like this is already a known issue, and is captured by this issue here:

#6391

araleza · 2024-04-08T22:42:31Z

Actually, looking in more detail at that linked issue, it only says that some models break up token like <|im_end|> instead of actually having a single token for that. This issue here is kind of the opposite issue, which is that the model does have a single token, and main isn't able to recognize it in the interactive input stream.

fblissjr · 2024-04-08T23:09:50Z

Yeah, I can't figure it out. But burned enough steam on this and no longer at my workstation - if you figure it out, lemme know. :)

araleza · 2024-04-10T11:13:55Z

As a workaround, I found that using llama.cpp's ./server instead of ./main (based on a tip from @Jeximo) allows me to interact with the model, and those tokens work okay there:

(I pressed Start just after <|CHATBOT_TOKEN|>). The model stopped by itself after replying to my question, with no text-based repetition of any special tokens.

With the correct tokens in place, the model's IQ jumps way up compared to when they were just being interpreted as text strings.

Jeximo · 2024-04-10T15:28:12Z

The model stopped by itself after replying to my question, with no text-based repetition of any special tokens.

Good. server readme shows --chat-template defaults to template within a models metadata.

It may not be necessary to hardcode Cmd R+ template as it appears to work as expected. If someone decided to hardcode it into --chat-template, then see here: https://github.com/ggerganov/llama.cpp/wiki/Templates-supported-by-llama_chat_apply_template#how-to-add-a-new-template

Otherwise, I'm glad it's working for server.

github-actions · 2024-05-25T01:06:46Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

araleza added the bug-unconfirmed label Apr 8, 2024

araleza closed this as completed Apr 8, 2024

araleza reopened this Apr 8, 2024

Jeximo mentioned this issue Apr 11, 2024

Command R+ outputs gibberish when used with text-generation-webui #6596

Closed

github-actions bot added the stale label May 11, 2024

github-actions bot closed this as completed May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ #6551

It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ #6551

araleza commented Apr 8, 2024 •

edited

Loading

fblissjr commented Apr 8, 2024

araleza commented Apr 8, 2024 •

edited

Loading

fblissjr commented Apr 8, 2024

araleza commented Apr 8, 2024

araleza commented Apr 8, 2024

fblissjr commented Apr 8, 2024

araleza commented Apr 10, 2024 •

edited

Loading

Jeximo commented Apr 10, 2024

github-actions bot commented May 25, 2024

It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ #6551

It's not possible to enter <|END_OF_TURN_TOKEN|> when using cmd-r+ #6551

Comments

araleza commented Apr 8, 2024 • edited Loading

fblissjr commented Apr 8, 2024

araleza commented Apr 8, 2024 • edited Loading

fblissjr commented Apr 8, 2024

araleza commented Apr 8, 2024

araleza commented Apr 8, 2024

fblissjr commented Apr 8, 2024

araleza commented Apr 10, 2024 • edited Loading

Jeximo commented Apr 10, 2024

github-actions bot commented May 25, 2024

araleza commented Apr 8, 2024 •

edited

Loading

araleza commented Apr 8, 2024 •

edited

Loading

araleza commented Apr 10, 2024 •

edited

Loading