Vicuna 1.1 / special_tokens_map.json support #1812

ghost · 2023-06-12T05:53:50Z

As far as I can tell, Vicuna 1.1 uses </s> as the separator for dialogue responses. It's tokenized as the EOS token (2) when I tried it in a Python script using Transformers, but it's tokenized as a normal string when using llama.cpp. I got something like this instead:

main: prompt: ' </s>'
main: number of tokens in prompt = 4
     1 -> ''
  1533 -> ' </'
 29879 -> 's'
 29958 -> '>'

I found these docs when looking for a reference of how the prompt should look like for Vicuna 1.1, to check if </s> should appear in the prompt:
https://github.com/lm-sys/FastChat/blob/7ae721fa3c881e1e24cf181305d127a316acd463/docs/vicuna_weights_version.md#example-prompt-weight-v11

A chat between a user and an assistant.

USER: Hello!
ASSISTANT: Hello!</s>
USER: How are you?
ASSISTANT: I am good.</s>

The docs at the end mention a special_tokens_map.json file that has something like this, but it doesn't seem to be used by convert.py:

  "eos_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  }

The text was updated successfully, but these errors were encountered:

JWNoctis · 2023-06-15T02:15:34Z

Related to #1501 .

github-actions · 2024-04-10T01:07:23Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

Igoorx mentioned this issue Jun 19, 2023

Improve support for special tokens #1931

Closed

mll59 mentioned this issue Aug 29, 2023

[BUG] llama.cpp special token handling SillyTavern/SillyTavern#1049

Closed

earzamastsev mentioned this issue Oct 23, 2023

Error with special tokens tokenization abetlen/llama-cpp-python#838

Closed

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vicuna 1.1 / special_tokens_map.json support #1812

Vicuna 1.1 / special_tokens_map.json support #1812

ghost commented Jun 12, 2023

JWNoctis commented Jun 15, 2023

github-actions bot commented Apr 10, 2024

Vicuna 1.1 / special_tokens_map.json support #1812

Vicuna 1.1 / special_tokens_map.json support #1812

Comments

ghost commented Jun 12, 2023

JWNoctis commented Jun 15, 2023

github-actions bot commented Apr 10, 2024