-
Notifications
You must be signed in to change notification settings - Fork 164
epic: Implement new Model Folder and model.yaml #1154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Legacy model folder structure: menloresearch/jan#3541 (comment) |
Model detection should not depend on model folder?Model detection depends on the model folder, which would introduce performance issues since:
This means we likely depend on the manifest file, where it is the source of trust, as it links to all available models with different folder structures.
Structures do not work well in the past or I have seen around:1. Shallow structureAll of the YAML files are placed in the root of the directory. Pros: Fast lookup - just filter out YAML files from the root folder to list models. Cons: Easy to duplicate, cannot work with different model families. Same name for different branches/authors/engines. E.g. llama3 of cortexhub | gguf | Q4 | Q8 | onnx | thebloke
2. One level of deep structureAll files are placed in a model folder. Pros: Easy to manage model by model, 1 Cons: Slow list iteration, the app has to loop through every single folder and check if the model file exists. So many FS operations.
Getting our Filesystem hierarchy less wrong with these 3 following Principles:Principle 1: The Single-Question Principle Principle 2: The Domain Principle Principle 3: The Depth Principle The structures would be similar to this:
OR
ALTERNATIVE PATH: I'm still thinking about another path that could address a couple of problems that arise from the structure above where:
Inspired by the PPA repositories list mechanism. This approach simply put all of the model files in a But there is also a con where users could not search or view a model folder file without opening the sources.list using an external editor.
Design model.yaml structureTo me, a clean and functioning focused 1. Build for Functionality, Not Decoration Its primary role is to allow users to configure advanced model and inference settings, giving them the ability to control and fine-tune how the app interacts with the model. For instance, in cases where legacy models lack certain parameters metadata, maintainers or users can easily edit and update the configuration. 2. Model Configuration, Not App Caching or Storage It is not intended for managing app caching, storage, or persistence layers. All fields must be relevant to controlling the model’s interaction and performance. 3. Unified Structure for Public Sharing and Best Practices This structure encourages the publishing and sharing of model configuration settings for various use cases, creating a community-driven trend where the best configurations for different tasks and models are easily accessible. The model.yaml would be similar to this # BEGIN GENERAL GGUF METADATA
model: gemma-2-9b-it-Q8_0 # Model ID which is used for request construct - should be unique between models (author / quantization)
name: Llama 3.1 # metadata.general.name
version: 1 # metadata.version
sources: # can be universal protocol (models://) OR absolute local file path (file://) OR https remote URL (https://)
- models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf
- models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00002-of-00005.gguf
# END GENERAL GGUF METADATA
# BEGIN INFERENCE PARAMETERS
# BEGIN REQUIRED
stop: # tokenizer.ggml.eos_token_id
- <|end_of_text|>
- <|eot_id|>
- <|eom_id|>
# END REQUIRED
# BEGIN OPTIONAL
stream: true # Default true?
top_p: 0.9 # Ranges: 0 to 1
temperature: 0.6 # Ranges: 0 to 1
frequency_penalty: 0 # Ranges: 0 to 1
presence_penalty: 0 # Ranges: 0 to 1
max_tokens: 8192 # Should be default to context length
# END OPTIONAL
# END INFERENCE PARAMETERS
# BEGIN MODEL LOAD PARAMETERS
# BEGIN REQUIRED
prompt_template: |+ # tokenizer.chat_template
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
# END REQUIRED
# BEGIN OPTIONAL
ctx_len: 0 # llama.context_length | 0 or undefined = loaded from model
ngl: 33 # Undefined = loaded from model
# END OPTIONAL
# END MODEL LOAD PARAMETERS As described before, other fields like We should use the term The Model sources/files is a messy issue, where the program does not know the model is downloaded or not, what is the correct downloaded path, what is the remote path (to redownload). So I really like to use this universal source protocol where:
cc @dan-homebrew @0xSage |
Model FolderI like the approach recommended by
I would like to brainstorm a few simplification ideas: Suggestion 1: "pull name" as folder nameI wonder if this is more user recognizable, vs. multiple nested folders.
Suggestion 2: models.listA lot of how effective this will be, depends on
Suggestion 3:
|
I like suggestion 1. Most flexible, i.e. model binaries can be anywhere. |
|
We did try this approach before with Cortex, but the colon |
Built-in Model LibraryI see. In that case, can we consider just have a 2-deep file structure?
Huggingface Repos
|
Love it! |
I'll summary the implementation for model folder and model.yaml, break it into tasks
model.list content:
Model.yaml changed
Tasks:
I'll create subtasks corresponding to above task. cc @dan-homebrew @0xSage @vansangpfiev @namchuai @louis-jan |
Questions / edge cases:
|
@dan-homebrew @nguyenhoangthuan99 I just read back comments. This is the one we should NOT do: hack paths together. E.g. bartowski_Mixtral-8x22B-v0.1
We introduce models.list is to NOT worry about nested levels. E.g. import from other applications.
|
I'm comming up a solution like docker, when start a container, the container ID is uuid just like |
@louis-jan Yeah, I think you are right. I think a central problem is that our ability to pull from different sources, leads to different model folder formats:
We should bear in mind that Cortex's Built-in Model Library may be mirrored across several hosts in the future (e.g. not just huggingface). This leads to a format more similar to @louis-jan's original proposal. Or is there a more generalizable way to deal with this? EDIT:After giving it more thought, I think I can more clearly articulate that we are solving for two problems:
For Huggingface:
For Cortex Model Repo
|
@louis-jan @nguyenhoangthuan99 Additionally, for
|
@nguyenhoangthuan99 I am shifting this to @vansangpfiev and tracking Tasklist items, just to keep big picture sitpic of progress. |
Goal
janhq/llama3:7b-tensorrt-llm
)bartowski/llama3-gguf
)model.yaml
work?Tasklist
Decisions
Bugs
cortex pull invalid_url
creates a model folder #1270Edge Cases
cortex model update <model>
work?The text was updated successfully, but these errors were encountered: