Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHAT_TEMPLATE_CACHE keys are not what I expect #976

Open
afg1 opened this issue Aug 5, 2024 · 1 comment
Open

CHAT_TEMPLATE_CACHE keys are not what I expect #976

afg1 opened this issue Aug 5, 2024 · 1 comment

Comments

@afg1
Copy link

afg1 commented Aug 5, 2024

The bug
When trying to load a Llama.cpp model with a chat template, I run into errors that the given template cannot be loaded. When I investigated a bit, I see that the keys in the CHAT_TEMPLATE_CACHE dictionary are the templates themselves. This does not seem right to me.

To Reproduce
I think this should work, and should not give an error:

model = LlamaCpp(model=model_path, echo=False, n_gpu_layers=-1, n_ctx=0, chat_template="ChatML')

However, I get this error: UserWarning: Chat template ChatML was unable to be loaded directly into guidance. and the model then loads the default ChatML template anyway.

When I look in the template cache, I see key:value pairs that look like this:

"{%formessageinmessages%}{{'<|im_start|>'+message['role']+'\n'+message['content']+'<|im_end|>'+'\n'}}{%endfor%}": <class 'guidance.chat.ChatMLTemplate'>

When I tried changing the key used in the cache to "ChatML" the first option worked without error.

So, unless I'm missing something I think the keys to the chat template cache should be the names of the template, not the template themselves. Its a tiny change, but I'm happy to do it and submit a PR if its useful.

System info (please complete the following information):
MacOS Sonoma 14.5
Guidance version 0.1.15

@Harsha-Nori
Copy link
Collaborator

Hey @afg1,

Good suggestion! We started having the primary keys here go down this path, but quickly found that even "standard" chat templates like ChatML actually come in many variants (typically small details like amount of whitespace. Furthermore, even within a model family, there are often frequent updates to these templates that invalidate the cache we build on the guidance side (e.g. Mistral, Phi, llama, etc. all have made chat template updates several times between model releases).

I wonder if the best solution here is a bit of a hybrid, where perhaps we provide some duplicate aliases to the ChatTemplateCache (e.g. where "ChatML" also points to one of the hand implemented ones), but don't use it as a default for any particular model. That way people can still intuitively pass the parameter in the way you described, but that we still appropriately show a warning if our cache has gone stale thanks to a model update on Huggingface/Llama.cpp. Curious about your thoughts and would definitely welcome a PR!

-Harsha

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants