-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for chat templates #3810
Comments
Just wanted to point to the pull request that I created for vLLM to support chat templates: vllm-project/vllm#1493 |
This needs to be added to the converter.py. I believe @KerfuffleV2 works on the converter. |
I'm just a random person that contributed some stuff (including to the converter). Looking at this, I don't think we can really use those templates. It appears to just directly embed JavaScript: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json Embedding something like that instruction format in GGUF files isn't a bad idea, but it would have to be something that doesn't include raw code like that. |
@KerfuffleV2 So the templates are not plain JavaScript. They are jinja templates. It is a subset of functionality intended to provide a safe-ish template language. So there are limits on what they are able to do as far as code execution, though there have been certain vulnerabilities related to injection attacks and returning data stored in global variables I've read about previously. I do appreciate you bringing this up though, it can be discussed and we can determine if this is in fact a safe way to go. HF is trying to get this standardized, and if this isn't a safe approach, it should be stopped now rather than later. |
Fair enough, I stand corrected with the "raw code" comment.
I wasn't really talking about safety. The problem is it basically could only be used for the Note: I don't speak for the project, so this is just my personal opinion about it. |
Yes, it is intended for use with a "chat" endpoint. Completion endpoints don't use it (at least currently) But that doesn't mean it has to be only the server example that implements that functionality. You could have cpp methods that take in the expected transport format for the chat endpoints pretty easily. If there is no will to add the template to the gguf format itself, I have at least added support for a --chat-template flag that can accept a separate file containing the desired template to vLLM. Similar could be done here. |
As far as I know, the I'm probably the wrong person to look at to gauge general interest in this, I don't even use the server and pretty much only interact with llama.cpp through the commandline stuff. (Actually, I don't even use interactive mode the majority of the time.) As far as I know, the GGUF format supports adding arbitrary keys/values so it probably wouldn't be hard to add support. From the other pull request you linked, you seem to know Python. Why not submit a pull request to just pull in that template and add it to a key in the GGUF file? See what kind of comments/etc you get. If you need to ask questions about how |
I could take a stab at it when I have some free time (which is likely a month or so out). If others have time before I get to it, feel free. I'll mention here if I start working on it. On another note, there seems to be a pretty full implementation of jinja templates for cpp: https://github.com/jinja2cpp/Jinja2Cpp |
I think it would be good to add the jinja templates as metadata to GGUF as soon as possible, it would be a minimal change and it would allow frontends that support jinja to use them. If we don't do this, using GGUF models will be harder than HF models because users will be forced to enter the chat template manually. We can add support for it in our examples later. |
It would also help llama.cpp as a standalone tremendously, as it's not needed anymore to set the prompt template in a complicated manner, but it just can use that metadata to set the prompt template automatically. |
This is being discussed in #4216 so I am going to close this. |
Prerequisites
Please answer the following questions for yourself before submitting an issue.
Feature Description
Add support for HF chat templates: https://huggingface.co/docs/transformers/chat_templating
Motivation
Provide a safe and consistent way to interact with the model through an OpenAI style API using the integrated jinja2 template in the tokenizer.
Ensures that consistency is maintained from training to inference for optimal model performance.
The text was updated successfully, but these errors were encountered: