CHAT_TEMPLATE_CACHE keys are not what I expect #976

afg1 · 2024-08-05T10:38:28Z

The bug
When trying to load a Llama.cpp model with a chat template, I run into errors that the given template cannot be loaded. When I investigated a bit, I see that the keys in the CHAT_TEMPLATE_CACHE dictionary are the templates themselves. This does not seem right to me.

To Reproduce
I think this should work, and should not give an error:

model = LlamaCpp(model=model_path, echo=False, n_gpu_layers=-1, n_ctx=0, chat_template="ChatML')

However, I get this error: UserWarning: Chat template ChatML was unable to be loaded directly into guidance. and the model then loads the default ChatML template anyway.

When I look in the template cache, I see key:value pairs that look like this:

"{%formessageinmessages%}{{'<|im_start|>'+message['role']+'\n'+message['content']+'<|im_end|>'+'\n'}}{%endfor%}": <class 'guidance.chat.ChatMLTemplate'>

When I tried changing the key used in the cache to "ChatML" the first option worked without error.

So, unless I'm missing something I think the keys to the chat template cache should be the names of the template, not the template themselves. Its a tiny change, but I'm happy to do it and submit a PR if its useful.

System info (please complete the following information):
MacOS Sonoma 14.5
Guidance version 0.1.15

The text was updated successfully, but these errors were encountered:

Harsha-Nori · 2024-08-12T05:08:47Z

Hey @afg1,

Good suggestion! We started having the primary keys here go down this path, but quickly found that even "standard" chat templates like ChatML actually come in many variants (typically small details like amount of whitespace. Furthermore, even within a model family, there are often frequent updates to these templates that invalidate the cache we build on the guidance side (e.g. Mistral, Phi, llama, etc. all have made chat template updates several times between model releases).

I wonder if the best solution here is a bit of a hybrid, where perhaps we provide some duplicate aliases to the ChatTemplateCache (e.g. where "ChatML" also points to one of the hand implemented ones), but don't use it as a default for any particular model. That way people can still intuitively pass the parameter in the way you described, but that we still appropriately show a warning if our cache has gone stale thanks to a model update on Huggingface/Llama.cpp. Curious about your thoughts and would definitely welcome a PR!

-Harsha

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHAT_TEMPLATE_CACHE keys are not what I expect #976

CHAT_TEMPLATE_CACHE keys are not what I expect #976

afg1 commented Aug 5, 2024

Harsha-Nori commented Aug 12, 2024

CHAT_TEMPLATE_CACHE keys are not what I expect #976

CHAT_TEMPLATE_CACHE keys are not what I expect #976

Comments

afg1 commented Aug 5, 2024

Harsha-Nori commented Aug 12, 2024