Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vicuna 1.1 / special_tokens_map.json support #1812

Closed
ghost opened this issue Jun 12, 2023 · 2 comments
Closed

Vicuna 1.1 / special_tokens_map.json support #1812

ghost opened this issue Jun 12, 2023 · 2 comments
Labels

Comments

@ghost
Copy link

ghost commented Jun 12, 2023

As far as I can tell, Vicuna 1.1 uses </s> as the separator for dialogue responses. It's tokenized as the EOS token (2) when I tried it in a Python script using Transformers, but it's tokenized as a normal string when using llama.cpp. I got something like this instead:

main: prompt: ' </s>'
main: number of tokens in prompt = 4
     1 -> ''
  1533 -> ' </'
 29879 -> 's'
 29958 -> '>'

I found these docs when looking for a reference of how the prompt should look like for Vicuna 1.1, to check if </s> should appear in the prompt:
https://github.com/lm-sys/FastChat/blob/7ae721fa3c881e1e24cf181305d127a316acd463/docs/vicuna_weights_version.md#example-prompt-weight-v11

A chat between a user and an assistant.

USER: Hello!
ASSISTANT: Hello!</s>
USER: How are you?
ASSISTANT: I am good.</s>

The docs at the end mention a special_tokens_map.json file that has something like this, but it doesn't seem to be used by convert.py:

  "eos_token": {
    "content": "</s>",
    "lstrip": false,
    "normalized": true,
    "rstrip": false,
    "single_word": false
  }
@JWNoctis
Copy link

Related to #1501 .

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant