-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Special tokens are not rendered correctly (as empty) -- llama3 specific? #6770
Comments
I believe I'm also running into this issue using Meta-Llama-3-70B-Instruct.IQ3_XS.gguf - I'm seeing tokens being output from the model but decoding them all return empty strings (I let it run for a few hundred tokens). I'm not seeing this behaviour on a Meta-Llama-3-8B-Instruct.Q6_K.gguf model. Offloading to ROCm, only loading ~25 layers for 70B. |
KoboldCpp has somewhat of a fix: https://github.com/LostRuins/koboldcpp/releases/tag/v1.63
Commit: LostRuins@3170284 As far as I can tell, it will still not render the tokens, but at least stopping should work. |
There's also something wrong with the existing tokenizer -- (Note: Only tried via the LLAMA API using LLamaSharp) I think tokenizer's integration in GGUFs could use some attention overall ( |
Please wait for: |
Hey @phymbert -- did you check the description of the issue? I don't think anything in the issues you linked is really relevant or solving this problem -- the problem being that special tokens are not rendered. |
We can start rendering special tokens here: Lines 17017 to 17019 in 0e4802b
But my personal opinion is that parsing the text of special/control tokens is a poor practice. AFAICT it seems to have worked so far since we have incorrectly exported tokens such as In #6745 we will introduce |
Thanks for the PR @ggerganov, awesome. As for whether it's bad practice depends very much on the use case. For the use case most people deal with, which is generating assistant response based on conversations history, I agree it's not needed -- just pass the input in EOM / EOT token.with the header and stop on end-of-message token. There are other use cases though, where you fine-tune the model to generate multiple turns. Simplest example would be multiple messages from multiple role-play characters at once, where the message header contains character name and possibly other metadata. Or generating multiple function-call instructions. In those cases special tokens allow you to properly parse the response, rather than rely on ad-hoc formatting. |
Hello!
Using this GGUF: https://huggingface.co/LoneStriker/opus-v1.2-llama-3-8b-GGUF
When the output contains any of the special tokens, like
<|im_start|>
or<|im_end|>
, they are rendered as empty string. This breaks custom stopping string functionality (e.g. adding "<|im_end|>" to stop strings does not work as it relies on string comparison).The tokens are tokenized correctly, just not rendered:
I first tested this with old commit:
And replicated with fresh main:
The text was updated successfully, but these errors were encountered: