-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
epic: llama.cpp params are settable via API call or model.yaml
#1151
Comments
model.yaml
can handle llama.cpp params correctly
model.yaml
can handle llama.cpp params correctlymodel.yaml
Generally, I will break down this epic into tasks:
Out-of-scope:
model.yaml Function calling Function calling is essentially an advanced form of prompt engineering. It involves crafting a specialized prompt that instructs the model to identify appropriate functions and their parameters based on the input question. However, there's no universal approach to implementing this feature, as each model has undergone unique training processes, necessitating model-specific prompting strategies.
Given these challenges, it's crucial to approach the implementation of a generalized function calling feature with caution. The goal of supporting every model and every user-defined function is likely unattainable due to the inherent variability and complexity involved. Instead, it may be more practical to focus on optimizing the feature for specific, well-defined use cases or a limited set of models. I also check chat-gpt, mistral, groq, ... they also support function calling but the different is they do this feature for their own models. |
model.yamlLessons learned from Jan
Sync parameters between Jan and enginesThat would be great if we can apply something like Template parsing should be done from cortex.cpp?We currently have to parse the model template in order to convert the Jinja template into ai_prompt, user_prompt, and system_prompt, so that engines can load it accordingly. Load model request should be simplified. |
Research input:
So by using a seperate model.yaml we just make another wrapper for the config of the model that is already there inside either gguf or huggingface config file. In practice, it has proven to be extremely inconvenient to use. The config of the model should bind to the entity of the user, the model is already contained within itself. |
I agree - given that GGUF already has built-in configs, we should optional However:
|
@nguyenhoangthuan99 If My focus for now is to catch up to llama.cpp and ensure a stable product - we can explore upstream improvements later on. |
@nguyenhoangthuan99 @louis-jan I agree. Let's scope this to supporting per-model function calling:
We can do this for llama3.1 first, and use it as a test case to develop a framework that can be generalized to other models in the future. Given the high number of llama3.1 finetunes, this may mean prioritize the |
The model binary fail -> we won't create any model.yaml file because we cannot use it.
Currently, we only download model base on repo name/branch in hugging face, the version in the model.yaml is parse from gguf file, this part may related to @namchuai . |
This PR can resolve:
Since function calling is separated as different issue janhq/models#16 , I'll move function calling out of this epic. |
@nguyenhoangthuan99 Quick check: there's a Jan issue asking for Beam search. Do we support it?
If it's not in llama.cpp main branch, we don't need to support it. I just want to keep up with stable for now |
This is a multi-sprint epic (including function calling), pushing to sprint 22 |
Closing, merging into #295 |
Goal
/v1/models/<model_id>/start
)/chat/completions
)Tasklist
I am using this epic to aggregate all llama.cpp params issues, including llama3.1 function calling + tool use
response_format
and structured JSON responses. jan#3785model.yaml
model.yaml
as optional? (i.e. depend on GGUF params)model.yaml
#1151 (comment)model.yaml
should be well documented with approrpriate naming conventionsOut-of-scope:
chat_completion
#1163Related
The text was updated successfully, but these errors were encountered: