Support for chat templates #3810

Tostino · 2023-10-27T04:25:15Z

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Feature Description

Add support for HF chat templates: https://huggingface.co/docs/transformers/chat_templating

Motivation

Provide a safe and consistent way to interact with the model through an OpenAI style API using the integrated jinja2 template in the tokenizer.
Ensures that consistency is maintained from training to inference for optimal model performance.

Tostino · 2023-10-28T03:41:08Z

Just wanted to point to the pull request that I created for vLLM to support chat templates: vllm-project/vllm#1493

Dampfinchen · 2023-10-31T16:38:04Z

This needs to be added to the converter.py. I believe @KerfuffleV2 works on the converter.

KerfuffleV2 · 2023-10-31T23:48:54Z

I'm just a random person that contributed some stuff (including to the converter). Looking at this, I don't think we can really use those templates. It appears to just directly embed JavaScript: https://huggingface.co/HuggingFaceH4/zephyr-7b-beta/blob/main/tokenizer_config.json

Embedding something like that instruction format in GGUF files isn't a bad idea, but it would have to be something that doesn't include raw code like that.

Tostino · 2023-11-01T00:56:37Z

@KerfuffleV2 So the templates are not plain JavaScript. They are jinja templates. It is a subset of functionality intended to provide a safe-ish template language.

So there are limits on what they are able to do as far as code execution, though there have been certain vulnerabilities related to injection attacks and returning data stored in global variables I've read about previously.

I do appreciate you bringing this up though, it can be discussed and we can determine if this is in fact a safe way to go.

HF is trying to get this standardized, and if this isn't a safe approach, it should be stopped now rather than later.

KerfuffleV2 · 2023-11-01T07:29:56Z

They are jinja templates.

Fair enough, I stand corrected with the "raw code" comment.

HF is trying to get this standardized, and if this isn't a safe approach, it should be stopped now rather than later.

I wasn't really talking about safety. The problem is it basically could only be used for the server example, right? Since it pretty much depends on a JavaScript interpreter. It's not just a general template that can be used for instructing the model. That being the case, I personally wouldn't really include it in the file format. If something like the server example wants to read/use those templates then that would be different.

Note: I don't speak for the project, so this is just my personal opinion about it.

Tostino · 2023-11-01T09:24:17Z

Yes, it is intended for use with a "chat" endpoint. Completion endpoints don't use it (at least currently)

But that doesn't mean it has to be only the server example that implements that functionality. You could have cpp methods that take in the expected transport format for the chat endpoints pretty easily.

If there is no will to add the template to the gguf format itself, I have at least added support for a --chat-template flag that can accept a separate file containing the desired template to vLLM. Similar could be done here.

KerfuffleV2 · 2023-11-01T09:38:17Z

But that doesn't mean it has to be only the server example that implements that functionality.

As far as I know, the server example is the only thing that currently deals with web stuff. That's what I was talking about. For example, you couldn't use this type of chat template with the main example in interactive mode.

I'm probably the wrong person to look at to gauge general interest in this, I don't even use the server and pretty much only interact with llama.cpp through the commandline stuff. (Actually, I don't even use interactive mode the majority of the time.)

As far as I know, the GGUF format supports adding arbitrary keys/values so it probably wouldn't be hard to add support.

From the other pull request you linked, you seem to know Python. Why not submit a pull request to just pull in that template and add it to a key in the GGUF file? See what kind of comments/etc you get. If you need to ask questions about how convert.py or the GGUF Python stuff works, feel free to ask me. I'll help if I can, though I'm not an expert on this stuff.

Tostino · 2023-11-01T12:25:53Z

I could take a stab at it when I have some free time (which is likely a month or so out).

If others have time before I get to it, feel free. I'll mention here if I start working on it.

On another note, there seems to be a pretty full implementation of jinja templates for cpp: https://github.com/jinja2cpp/Jinja2Cpp

slaren · 2023-11-01T16:28:12Z

I think it would be good to add the jinja templates as metadata to GGUF as soon as possible, it would be a minimal change and it would allow frontends that support jinja to use them. If we don't do this, using GGUF models will be harder than HF models because users will be forced to enter the chat template manually. We can add support for it in our examples later.

Dampfinchen · 2023-11-06T12:47:16Z

I think it would be good to add the jinja templates as metadata to GGUF as soon as possible, it would be a minimal change and it would allow frontends that support jinja to use them. If we don't do this, using GGUF models will be harder than HF models because users will be forced to enter the chat template manually. We can add support for it in our examples later.

It would also help llama.cpp as a standalone tremendously, as it's not needed anymore to set the prompt template in a complicated manner, but it just can use that metadata to set the prompt template automatically.

Tostino · 2023-11-29T16:52:19Z

This is being discussed in #4216 so I am going to close this.

martindevans mentioned this issue Nov 7, 2023

Unexpected behavior in ChatSession.ChatAsync methods SciSharp/LLamaSharp#261

Open

Dampfinchen mentioned this issue Nov 8, 2023

GGUF file format specification ggerganov/ggml#302

Merged

Tostino mentioned this issue Nov 28, 2023

server : improvements and maintenance #4216

Open

10 tasks

Tostino closed this as completed Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for chat templates #3810

Support for chat templates #3810

Tostino commented Oct 27, 2023

Tostino commented Oct 28, 2023

Dampfinchen commented Oct 31, 2023 •

edited

Loading

KerfuffleV2 commented Oct 31, 2023

Tostino commented Nov 1, 2023

KerfuffleV2 commented Nov 1, 2023

Tostino commented Nov 1, 2023

KerfuffleV2 commented Nov 1, 2023

Tostino commented Nov 1, 2023

slaren commented Nov 1, 2023

Dampfinchen commented Nov 6, 2023

Tostino commented Nov 29, 2023

Support for chat templates #3810

Support for chat templates #3810

Comments

Tostino commented Oct 27, 2023

Prerequisites

Feature Description

Motivation

Tostino commented Oct 28, 2023

Dampfinchen commented Oct 31, 2023 • edited Loading

KerfuffleV2 commented Oct 31, 2023

Tostino commented Nov 1, 2023

KerfuffleV2 commented Nov 1, 2023

Tostino commented Nov 1, 2023

KerfuffleV2 commented Nov 1, 2023

Tostino commented Nov 1, 2023

slaren commented Nov 1, 2023

Dampfinchen commented Nov 6, 2023

Tostino commented Nov 29, 2023

Dampfinchen commented Oct 31, 2023 •

edited

Loading