epic: Implement new Model Folder and model.yaml #1154

dan-menlo · 2024-09-08T08:30:54Z

Goal

We should have a model folder that is able to handle different models
- Built-in models (e.g. janhq/llama3:7b-tensorrt-llm)
- Huggingface GGUF repos with multiple quants (e.g. bartowski/llama3-gguf)
- Huggingface specific GGUF (may have multiple from same directory)
- In future: Nvidia NGC or TensorRT Cloud
Do we use sub-folders?
How does model.yaml work?
Model detection should not depend on model folder

Tasklist

Decisions

Cortex.cpp: Model Folder #1113
Cortex.cpp: model.yaml Format #1123
Cortex.cpp: Built-in model library format #1178
Legacy model folder structure: Discussion: Jan Data Structures jan#3541 (comment)

Bugs

Edge Cases

What if we download multiple GGUFs from the same Huggingface repo?
- Saved in the same repo folder, with multiple .gguf files (see feat: models delete with new model folder structure #1320 (comment))
How does cortex model update <model> work?
- feat: update data folder path #1121

The text was updated successfully, but these errors were encountered:

freelerobot · 2024-09-09T14:13:25Z

Legacy model folder structure: menloresearch/jan#3541 (comment)

louis-menlo · 2024-09-10T16:13:40Z

Model detection should not depend on model folder?

Model detection depends on the model folder, which would introduce performance issues since:

Every time the app loads, it needs to scan through the model folder hierarchy.
Costly filesystem watching (Notify the app of changes)

This means we likely depend on the manifest file, where it is the source of trust, as it links to all available models with different folder structures.

This introduces a periodically scanning folder watchdog. A delay problem may occur.
Everything just works with references or symlinks.

Structures do not work well in the past or I have seen around:

1. Shallow structure

All of the YAML files are placed in the root of the directory.

Pros: Fast lookup - just filter out YAML files from the root folder to list models.

Cons: Easy to duplicate, cannot work with different model families. Same name for different branches/authors/engines.
Slower over time since n models = 2n items. 2 rm operations to remove a model.


/models
    /[model1]
       /[model1].bin
       /[model1].gguf
    /[model1].yaml | json

2. One level of deep structure

All files are placed in a model folder.

Pros: Easy to manage model by model, 1 rm operation can remove the entire model folder.

Cons: Slow list iteration, the app has to loop through every single folder and check if the model file exists. So many FS operations.


/models
    /[model1]
       /[model].bin
       /[model].gguf
       /[model].yaml | json

Getting our Filesystem hierarchy less wrong with these 3 following Principles:

Principle 1: The Single-Question Principle
At each level of the hierarchy, strive to make all folder names answer the same question.

Principle 2: The Domain Principle
Organize files in different domains differently.

Principle 3: The Depth Principle
Prefer deep hierarchies over shallow ones.

The structures would be similar to this:

/models
  ├── manifest.yaml
  ├── /metadatas
  │   ├── llama3.1-7B_Q4_KM.yaml (How to generate a file name that is unique across models?)
  │   └── mistral-7B_Q4_KM.yaml
  └── /sources
      ├── /huggingface
      │   ├── /cortexso
      │   │   ├── /llama3-1
      │   │   │   ├── /gguf
      │   │   │   │   ├── /main
      │   │   │   │   │   └── llama3.1_Q4_KM.gguf
      │   │   │   │   └── /7b
      │   │   │   │       ├── llama3.1_Q4_KM.gguf
      │   │   │   │       └── llama3.1_Q8_KM.gguf
      │   │   │   ├── /onnx
      │   │   │   │   └── /7b
      │   │   │   │       ├── llama3.1.onnx
      │   │   │   │       ├── tokenizer.json
      │   │   │   │       └── gen_config.json
      │   │   │   └── /tensorrt-llm
      │   │   │       └── /7b
      │   │   │           ├── rank0.engine
      │   │   │           ├── tokenizer.model
      │   │   │           └── config.json
      │   │   └── /phi-3
      │   │       └── /onnx
      │   └── /bartowski
      │       └── /Mixtral-8x22B-v0.1
      │           └── /gguf
      │               └── /main
      │                   ├── Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf
      │                   └── Mixtral-8x22B-v0.1-IQ3_M-00002-of-00005.gguf
      └── /nvidia-ngc
          └── /llama3-1

OR

/models
  ├── manifest.yaml
  ├── /modelfiles
  │   ├── llama3.1-7B_Q4_KM.yaml
  │   └── mistral-7B_Q4_KM.yaml
  └── /sources
      ├── /huggingface
      │   ├── /cortexso
      │   │   ├── /llama3-1
      │   │   │   ├── /gguf
      │   │   │   │   ├── /main
      │   │   │   │   │   └── llama3.1_Q4_KM.gguf
      │   │   │   │   └── /7b
      │   │   │   │       ├── llama3.1_Q4_KM.gguf
      │   │   │   │       └── llama3.1_Q8_KM.gguf
      │   │   │   ├── /onnx
      │   │   │   │   └── /7b
      │   │   │   │       ├── llama3.1.onnx
      │   │   │   │       ├── tokenizer.json
      │   │   │   │       └── gen_config.json
      │   │   │   └── /tensorrt-llm
      │   │   │       └── /7b
      │   │   │           ├── rank0.engine
      │   │   │           ├── tokenizer.model
      │   │   │           └── config.json
      │   │   └── /phi-3
      │   │       └── /onnx
      │   └── /bartowski
      │       └── /Mixtral-8x22B-v0.1
      │           └── /gguf
      │               └── /main
      │                   ├── Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf
      │                   └── Mixtral-8x22B-v0.1-IQ3_M-00002-of-00005.gguf
      └── /nvidia-ngc
          └── /llama3-1

At each individual level of the sources hierarchy, all options are different responses to the same question: What hub is the source? What repo is in the hub? What model types are supported in the repo? What branch is the model pulled from? What model and what quantization are pulled?
model.yaml files are flattened for quick search in the metadatas so users can easily find the one they want to edit, which boosts the performance of model listing. The filename is a normalized form of model_id.
From the sources folder hierarchy, we can determine the author andformat | engine, so we can get rid of model.yaml's redundant fields. The 'engine' should not be in model.yaml since it's not related to the model (application level) and cannot be reused across applications.
Files are organized differently in different domains (metadatas / sources).
Everything is a symlink, from [model].yaml, we can retrieve the source hierarchies.
Manifest is for caching (optional), which can improve UX and boost performance. This allows us to avoid including all computed fields (such as decorations, sorting order - drag and drop later, or sorting results) in model.yaml (e.g., size, quantization). These fields are not essential when constructing or modifying model.yaml, but they do increase the risk of errors. Since they can be retrieved from the source files, we only need to cache them when populating the model.
Unified model's URL - Determine whether the model is downloaded or not, the local URL and remote URL are quite messy (before file:// and https://). With that model folder hierarchy we can just use 1 universal URL for both. E.g. models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf.

ALTERNATIVE PATH:
Inspired by /etc/apt/sources.list

I'm still thinking about another path that could address a couple of problems that arise from the structure above where:

model.yaml file name can be duplicated
An unique model id should be autogenerated somehow. If generate using folder path, it should be where model.yaml is located not the source file. So model.yaml could not be flattened. (might be there is another option that can generate a human readable model name).
Less complex structure

Inspired by the PPA repositories list mechanism. This approach simply put all of the model files in a sources.list. So app can list all of the nested model.yaml without worrying about the performance issue. Also can use any editor to Ctrl + Click to open model.yaml file. (previously I find it hard to look up a model.yaml in a nested model folder)

But there is also a con where users could not search or view a model folder file without opening the sources.list using an external editor.

/models
  ├── sources.list (aka models list: models.list)
  └── /sources
      ├── /huggingface
      │   ├── /cortexso
      │   │   ├── /llama3-1
      │   │   │   ├── /gguf
      │   │   │   │   ├── /main
      │   │   │   │   │   ├── llama3.1_Q4_KM.yaml
      │   │   │   │   │   └── llama3.1_Q4_KM.gguf
      │   │   │   │   └── /7b
      │   │   │   │       ├── llama3.1_Q4_KM.yaml
      │   │   │   │       ├── llama3.1_Q4_KM.gguf
      │   │   │   │       ├── llama3.1_Q8_KM.yaml
      │   │   │   │       └── llama3.1_Q8_KM.gguf
      │   │   │   ├── /onnx
      │   │   │   │   └── /7b
      │   │   │   │       ├── llama3.1.yaml
      │   │   │   │       ├── llama3.1.onnx
      │   │   │   │       ├── tokenizer.json
      │   │   │   │       └── gen_config.json
      │   │   │   └── /tensorrt-llm
      │   │   │       └── /7b
      │   │   │           ├── llama3.1.yaml
      │   │   │           ├── rank0.engine
      │   │   │           ├── tokenizer.model
      │   │   │           └── config.json
      │   │   └── /phi-3
      │   │       └── /onnx
      │   └── /bartowski
      │       └── /Mixtral-8x22B-v0.1
      │           └── /gguf
      │               └── /main
      │                   ├── Mixtral-8x22B-v0.1-IQ3_M.yaml
      │                   ├── Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf
      │                   └── Mixtral-8x22B-v0.1-IQ3_M-00002-of-00005.gguf
      └── /nvidia-ngc
          └── /llama3-1

Design model.yaml structure

To me, a clean and functioning focused model.yaml file should follow these principles:

1. Build for Functionality, Not Decoration
The model.yaml file is built to define core functionalities of the app rather than superficial decorations.

Its primary role is to allow users to configure advanced model and inference settings, giving them the ability to control and fine-tune how the app interacts with the model. For instance, in cases where legacy models lack certain parameters metadata, maintainers or users can easily edit and update the configuration.

2. Model Configuration, Not App Caching or Storage
The file serves as a configuration file for controlling requests and managing model behaviors.

It is not intended for managing app caching, storage, or persistence layers. All fields must be relevant to controlling the model’s interaction and performance.

3. Unified Structure for Public Sharing and Best Practices
The model.yaml follows a unified structure that aims to create a standard practice among authors and developers.

This structure encourages the publishing and sharing of model configuration settings for various use cases, creating a community-driven trend where the best configurations for different tasks and models are easily accessible.

The model.yaml would be similar to this

# BEGIN GENERAL GGUF METADATA
model: gemma-2-9b-it-Q8_0 # Model ID which is used for request construct - should be unique between models (author / quantization)
name: Llama 3.1      # metadata.general.name
version: 1           # metadata.version
sources:             # can be universal protocol (models://) OR absolute local file path (file://) OR https remote URL (https://)
  - models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf
  - models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00002-of-00005.gguf
# END GENERAL GGUF METADATA

# BEGIN INFERENCE PARAMETERS
# BEGIN REQUIRED
stop:                # tokenizer.ggml.eos_token_id
  - <|end_of_text|>
  - <|eot_id|>
  - <|eom_id|>
# END REQUIRED
# BEGIN OPTIONAL
stream: true         # Default true?
top_p: 0.9           # Ranges: 0 to 1
temperature: 0.6     # Ranges: 0 to 1
frequency_penalty: 0 # Ranges: 0 to 1
presence_penalty: 0  # Ranges: 0 to 1
max_tokens: 8192     # Should be default to context length
# END OPTIONAL
# END INFERENCE PARAMETERS

# BEGIN MODEL LOAD PARAMETERS
# BEGIN REQUIRED
prompt_template: |+  # tokenizer.chat_template
  <|begin_of_text|><|start_header_id|>system<|end_header_id|>

  {system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>

  {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
# END REQUIRED
# BEGIN OPTIONAL
ctx_len: 0          # llama.context_length | 0 or undefined = loaded from model
ngl: 33             # Undefined = loaded from model
# END OPTIONAL
# END MODEL LOAD PARAMETERS

As described before, other fields like author or engine could be determined by the model folder, or cortex.cpp can detect the model file type.

We should use the term model since it's consolidated, whereas id is quite server/system specific and not directly related to the LLM model.

The model value could be generated in case we run local models. It's a DTO property rather than stored property, since it is only used to determine what model (actually from which model is running from what folder path). Overwrite model from model.yaml file is to prevent auto generated mechanism which would work for remote models. E.g. openai/gpt-3.5-turbo.

Model sources/files is a messy issue, where the program does not know the model is downloaded or not, what is the correct downloaded path, what is the remote path (to redownload). So I really like to use this universal source protocol where:

models://[hub]/[author]/[repo]/[branch]/[file] which present a remote file URL can be downloaded into the models folder. The logic is to check the file exist following a constructed local file path or download from a constructed remote path built from the universal path.

cc @dan-homebrew @0xSage

dan-menlo · 2024-09-11T04:40:16Z

Model Folder

I like the approach recommended by /etc/apt/sources.list, and agree with the following principles:

Manifest file > file system watching
Single question, domain, depth principles

I would like to brainstorm a few simplification ideas:

Suggestion 1: "pull name" as folder name

I wonder if this is more user recognizable, vs. multiple nested folders.

/models
    models.list (index)
    /llama3.1
         llama3.1.gguf
    /llama3.1:tensorrt-llm
         ...
    /huggingface.co/bartowski/llama3.1-gguf-7b
         llama3.1-7b-gguf

Suggestion 2: models.list

A lot of how effective this will be, depends on models.list format.

Need to articulate how that will work
Does it point to folders?

Suggestion 3: `model.yaml` is optional

We should move to a paradigm where model.yaml files are optional:

GGUF has its own param packaging nowadays
We can use model.yaml as a shorthand method for customization

Suggestion 4: `model.yaml` is co-located with source files

It is still highly beneficial for the model.yaml to be in same folder as source files, for packaging and proximity purposes.
However, we should also be agnostic to whether it's called model.yaml or <model_id>.yaml
However - need to protect against edge case where there are multiple .yaml files in the model folder

freelerobot · 2024-09-11T04:49:57Z

I like suggestion 1. Most flexible, i.e. model binaries can be anywhere.

dan-menlo · 2024-09-11T05:07:33Z

`model.yaml` Structure

I agree with @louis-jan suggestions above.

However, I'm a bit mixed on what sources should refer to:

Individual files? (very tedious to upkeep the model.yaml
Repo? (i.e. collection of files/tags)

Given that our main integration is with Huggingface, and our own Built-in repos use Git, I think Repos would be a better abstraction.

louis-menlo · 2024-09-11T05:29:24Z

I like suggestion 1. Most flexible, i.e. model binaries can be anywhere.

We did try this approach before with Cortex, but the colon : is not allowed. It turned out that the model folder is not really the pull name as designed.

dan-menlo · 2024-09-11T07:29:52Z

I like suggestion 1. Most flexible, i.e. model binaries can be anywhere.

We did try this approach before with Cortex, but the colon : is not allowed. It turned out that the model folder is not really the pull name as designed.

Built-in Model Library

I see. In that case, can we consider just have a 2-deep file structure?

1st level: "pull name"
2nd level: tag

/llama3.1
    /7b

Huggingface Repos

For Huggingface repos, folder can just be a string (can it accomodate / in folder name?)

louis-menlo · 2024-09-11T07:50:06Z

I like suggestion 1. Most flexible, i.e. model binaries can be anywhere.

We did try this approach before with Cortex, but the colon : is not allowed. It turned out that the model folder is not really the pull name as designed.

Built-in Model Library

I see. In that case, can we consider just have a 2-deep file structure?

1st level: "pull name"

2nd level: tag
/llama3.1
    /7b 
Huggingface Repos

For Huggingface repos, folder can just be a string (can it accomodate / in folder name?)

Love it!

nguyenhoangthuan99 · 2024-09-12T02:56:42Z

I'll summary the implementation for model folder and model.yaml, break it into tasks

/models
   ├── model.list 
   └── /llama3-1
   |   ├── /main
   |   |   ├── model.yaml
   |   |   └── llama3.1_Q4_KM.gguf
   |   └── /7b-gguf
   |   |   ├── llama3.1_Q4_KM.yaml
   |   |   ├── llama3.1_Q4_KM.gguf
   |   |   ├── llama3.1_Q8_KM.yaml
   |   |   └── llama3.1_Q8_KM.gguf
   |   ├── /onnx
   |   │   ├── llama3.1.yaml
   |   │   ├── llama3.1.onnx
   |   │   ├── tokenizer.json
   |   │   └── gen_config.json
   |   └── /tensorrt-llm
   |   |   ├── llama3.1.yaml
   |   |   ├── rank0.engine
   |   │   ├── tokenizer.model
   |   └───└── config.json
   └── /bartowski_Mixtral-8x22B-v0.1 #huggingface repos "/" will be replace by "_" or orther special character
   |   └── /main
   |       ├── Mixtral-8x22B-v0.1-IQ3_M.yaml
   |       ├── Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf
   |       └── Mixtral-8x22B-v0.1-IQ3_M-00002-of-00005.gguf
   └── /nvidia-ngc
        └── /llama3-1-windows-RTX3090
           └──model.engine

model.list content:

model-id author_repo-id branch-name path-to-model.yaml model-alias

How model-id is constructed author_repo-id_branch-name_gguf-file-name (why need gguf-file-name, because under a branch can have multiple gguf files/models with different quant)
model-alias is shorter name for model-id and also unique, user can set alias with command cortex-cpp model alias model_id model_alias, then model_alias should work exactly like model_id

Model.yaml changed

# BEGIN GENERAL GGUF METADATA
model: gemma-2-9b-it-Q8_0 # Model ID which is used for request construct - should be unique between models (author / quantization)
name: Llama 3.1      # metadata.general.name
version: 1           # metadata.version
sources:             # can be universal protocol (models://) OR absolute local file path (file://) OR https remote URL (https://)
  - models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf
  - models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00002-of-00005.gguf
# END GENERAL GGUF METADATA

# BEGIN INFERENCE PARAMETERS
# BEGIN REQUIRED
stop:                # tokenizer.ggml.eos_token_id
  - <|end_of_text|>
  - <|eot_id|>
  - <|eom_id|>
# END REQUIRED
# BEGIN OPTIONAL
stream: true         # Default true?
top_p: 0.9           # Ranges: 0 to 1
temperature: 0.6     # Ranges: 0 to 1
frequency_penalty: 0 # Ranges: 0 to 1
presence_penalty: 0  # Ranges: 0 to 1
max_tokens: 8192     # Should be default to context length
# END OPTIONAL
# END INFERENCE PARAMETERS

# BEGIN MODEL LOAD PARAMETERS
# BEGIN REQUIRED
prompt_template: |+  # tokenizer.chat_template
  <|begin_of_text|><|start_header_id|>system<|end_header_id|>

  {system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>

  {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
# END REQUIRED
# BEGIN OPTIONAL
ctx_len: 0          # llama.context_length | 0 or undefined = loaded from model
ngl: 33             # Undefined = loaded from model
# END OPTIONAL
# END MODEL LOAD PARAMETERS

Tasks:

Pull model (check model.list -> create folder -> download model -> create .yaml)
Model get (check model.list -> go to target folder -> read yaml -> return result)
Model list (check model.list -> go to target folder -> read yaml -> return result)
Start model/Run ()
model update
model delete
model alias (new command to set alias for a model id)
Update gguf parser vs yaml parser for new model.yaml template and add more inference params to support llamacpp

I'll create subtasks corresponding to above task. cc @dan-homebrew @0xSage @vansangpfiev @namchuai @louis-jan

freelerobot · 2024-09-12T03:22:23Z

Questions / edge cases:

How do we handle branch aliases? i.e. I've seen in some repos 7b also being 7b-gguf, or 7b-q4. Maybe not a concern at this scope? Maybe we just assume that branches are unique for now.
I think model-alias will confuse users. Internally, do we intend to use model-alias interchangeably with model-id? In with case, does it make more sense for model-id to be a uuid, which users should never change, that way we are guaranteed uniqueness & persistence.

louis-menlo · 2024-09-12T03:38:05Z

@dan-homebrew @nguyenhoangthuan99 I just read back comments. This is the one we should NOT do: hack paths together.

E.g. bartowski_Mixtral-8x22B-v0.1
Anyone can break the app by creating two repositories as below:

bartowski_/Mixtral-8x22B-v0.1
bartowski/_Mixtral-8x22B-v0.1

We introduce models.list is to NOT worry about nested levels. E.g. import from other applications.

For Huggingface repos, folder can just be a string (can it accomodate / in folder name?)

nguyenhoangthuan99 · 2024-09-12T03:56:59Z

Questions / edge cases:

How do we handle branch aliases? i.e. I've seen in some repos 7b also being 7b-gguf, or 7b-q4. Maybe not a concern at this scope? Maybe we just assume that branches are unique for now.

I think model-alias will confuse users. Internally, do we intend to use model-alias interchangeably with model-id? In with case, does it make more sense for model-id to be a uuid, which users should never change, that way we are guaranteed uniqueness & persistence.

This feature should be added, I think we can do it after model folder and model.yaml is stable.
currently, model_id is used for running model, start a model cortex-cpp run <model_id>. But when we want to support run model from many sources with many different cases, we have to make the model_id not only human-readable for user but also unique, but it turn out in some cases model_id is too long bartowski_Mixtral-8x22B-v0.1_Mixtral-8x22B-v0.1-IQ3_M so we decided to make alias command that allow user to make it shorter. But as Louis commented above bartowski_Mixtral-8x22B-v0.1_Mixtral-8x22B-v0.1-IQ3_M still cannot unique.

I'm comming up a solution like docker, when start a container, the container ID is uuid just like model_id as Nicole recommended, and the name will be random. User can set name for that container but the name should be unique, can we implement this?

dan-menlo · 2024-09-12T07:30:16Z

@louis-jan Yeah, I think you are right.

I think a central problem is that our ability to pull from different sources, leads to different model folder formats:

Cortex "Model Repo" format (tag based)
Huggingface GGUF models (multiple quantizations in a single repo)

We should bear in mind that Cortex's Built-in Model Library may be mirrored across several hosts in the future (e.g. not just huggingface).

This leads to a format more similar to @louis-jan's original proposal.

Or is there a more generalizable way to deal with this?

EDIT:

After giving it more thought, I think I can more clearly articulate that we are solving for two problems:

Huggingface Repos (which have different conventions (e.g. GGUF, TensorRT-LLM, even base models)
Cortex Built-in Model Library format (loosely inspired by Ollama and Docker)

For Huggingface:

We should use a folder structure that matches their URL format
We should try to store files as similar to them as possible (I take back my earlier idea to store quantizations in different folder)
Given that there might be multiple model quants in the same folder, the model.yaml should match the quant filename
In this case, there will be two entries in model.list

For Cortex Model Repo

We should have the Docker-based tag format, that is represented by folders
We will curate models in our own format
In the future, there may be other repos that adopt this standard, which can then be registered via root URL

/models
    model.list
    
    # Huggingface GGUF Model Folder format
    # This assumes we have some sort of quantization selection wizard when downloading?
    /huggingface.co
        /bartowski
            /Mixtral-8x22b-v0.1-gguf (includes all quants)
                Mixtral-8x22B-v0.1-IQ3_M.yaml
                Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf...
                Mixtral-8x22B-v0.1-Q8_M.yaml
                Mixtral-8x22B-v0.1-Q8_M-00001-of-00005.gguf...

    # Built-in Library Model Folder format
    /cortex.so (this is our Built-in Model Library, based on Git, that will be mirrored across a few sites)
        /llama3.1 (model)
            /q4-tensorrt-llm (tag)
                ...engine_files
                model.yaml
            /q8-gguf **(tag)**
                model.yaml

    # Future Model Source
    # Has its own model folder format

dan-menlo · 2024-09-12T08:53:19Z

@louis-jan @nguyenhoangthuan99 Additionally, for model.yaml, how do we intend to generate the model ID?

Is there a way we use the tag name as the model ID?
e.g. for Chat Completions, it is routed to model: llama3.1:7b

# BEGIN GENERAL GGUF METADATA
model: gemma-2-9b-it-Q8_0 # Model ID which is used for request construct - should be unique between models (author / quantization)
name: Llama 3.1      # metadata.general.name
version: 1           # metadata.version
sources:             # can be universal protocol (models://) OR absolute local file path (file://) OR https remote URL (https://)
  - models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00001-of-00005.gguf
  - models://huggingface/bartowski/Mixtral-8x22B-v0.1/main/Mixtral-8x22B-v0.1-IQ3_M-00002-of-00005.gguf
# END GENERAL GGUF METADATA

# BEGIN INFERENCE PARAMETERS
# BEGIN REQUIRED
stop:                # tokenizer.ggml.eos_token_id
  - <|end_of_text|>
  - <|eot_id|>
  - <|eom_id|>
# END REQUIRED
# BEGIN OPTIONAL
stream: true         # Default true?
top_p: 0.9           # Ranges: 0 to 1
temperature: 0.6     # Ranges: 0 to 1
frequency_penalty: 0 # Ranges: 0 to 1
presence_penalty: 0  # Ranges: 0 to 1
max_tokens: 8192     # Should be default to context length
# END OPTIONAL
# END INFERENCE PARAMETERS

# BEGIN MODEL LOAD PARAMETERS
# BEGIN REQUIRED
prompt_template: |+  # tokenizer.chat_template
  <|begin_of_text|><|start_header_id|>system<|end_header_id|>

  {system_message}<|eot_id|><|start_header_id|>user<|end_header_id|>

  {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
# END REQUIRED
# BEGIN OPTIONAL
ctx_len: 0          # llama.context_length | 0 or undefined = loaded from model
ngl: 33             # Undefined = loaded from model
# END OPTIONAL
# END MODEL LOAD PARAMETERS

dan-menlo · 2024-09-27T02:23:35Z

@nguyenhoangthuan99 I am shifting this to @vansangpfiev and tracking Tasklist items, just to keep big picture sitpic of progress.

dan-menlo added this to Menlo Sep 8, 2024

dan-menlo converted this from a draft issue Sep 8, 2024

dan-menlo assigned vansangpfiev Sep 8, 2024

dan-menlo added the type: epic A major feature or initiative label Sep 8, 2024

dan-menlo assigned namchuai and unassigned vansangpfiev Sep 8, 2024

dan-menlo changed the title ~~epic: Model Folder finalize structure~~ epic: Finalize how Model Folder and model.yaml works Sep 8, 2024

dan-menlo assigned louis-menlo and unassigned namchuai Sep 8, 2024

dan-menlo moved this to Scheduled in Menlo Sep 8, 2024

dan-menlo assigned nguyenhoangthuan99 Sep 8, 2024

freelerobot added category: model management Model pull, yaml, model state P0: critical Mission critical labels Sep 9, 2024

dan-menlo mentioned this issue Sep 10, 2024

epic: Fix Local Engine issues (llama.cpp) menloresearch/jan#3614

Closed

10 tasks

This was referenced Sep 11, 2024

feat: Jan supports new Cortex's Model Folder and model.yaml architecture menloresearch/jan#3633

Closed

epic: Cortex Updater can migrate data structure changes #1184

Closed

This was referenced Sep 18, 2024

Update GGUF parser and yaml parser for new model.yml #1244

Closed

Model import command #1249

Closed

feat: add support HF model handle and quant selection #1239

Merged

dan-menlo changed the title ~~epic: Finalize how Model Folder and model.yaml works~~ epic: Implement new Model Folder and model.yaml Sep 19, 2024

This was referenced Sep 19, 2024

feat: cortex models get <MODEL_ID> and cortex models list #1075

Closed

Model alias command #1255

Closed

irfanpena mentioned this issue Sep 20, 2024

docs: Update modelyaml, llama.cpp parameters, log in the data folder menloresearch/cortex.so#202

Merged

3 tasks

nguyenhoangthuan99 assigned namchuai and vansangpfiev Sep 23, 2024

namchuai mentioned this issue Sep 23, 2024

feat: cortex pull with new model data structure #1302

Closed

3 tasks

nguyenhoangthuan99 mentioned this issue Sep 23, 2024

Model update command/api #1309

Merged

This was referenced Sep 23, 2024

bug: Fail to get list model information: Unable to create model.list file #1285

Closed

feat: pulling interact with new model.list #1312

Merged

nguyenhoangthuan99 mentioned this issue Sep 24, 2024

Feat/new model folder #1327

Merged

nguyenhoangthuan99 closed this as completed in #1327 Sep 24, 2024

github-project-automation bot moved this from In Progress to Completed in Menlo Sep 24, 2024

nguyenhoangthuan99 moved this from Completed to QA in Menlo Sep 24, 2024

dan-menlo reopened this Sep 27, 2024

github-project-automation bot moved this from QA to In Progress in Menlo Sep 27, 2024

dan-menlo unassigned namchuai and nguyenhoangthuan99 Sep 27, 2024

gabrielle-ong mentioned this issue Oct 3, 2024

feat: update data folder path #1121

Closed

gabrielle-ong added this to the v1.0.0 milestone Oct 3, 2024

This was referenced Oct 3, 2024

feat: cortex run with new model folder structure #1300

Closed

feat: models delete with new model folder structure #1320

Closed

gabrielle-ong moved this from Review + QA to Completed in Menlo Oct 3, 2024

gabrielle-ong closed this as completed Oct 3, 2024

github-project-automation bot moved this from Completed to Review + QA in Menlo Oct 3, 2024

gabrielle-ong moved this from Review + QA to Completed in Menlo Oct 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epic: Implement new Model Folder and model.yaml #1154

epic: Implement new Model Folder and model.yaml #1154

dan-menlo commented Sep 8, 2024 •

edited by gabrielle-ong

Loading

freelerobot commented Sep 9, 2024

louis-menlo commented Sep 10, 2024 •

edited

Loading

dan-menlo commented Sep 11, 2024 •

edited

Loading

freelerobot commented Sep 11, 2024

dan-menlo commented Sep 11, 2024

louis-menlo commented Sep 11, 2024

dan-menlo commented Sep 11, 2024 •

edited

Loading

louis-menlo commented Sep 11, 2024

Built-in Model Library

Huggingface Repos

nguyenhoangthuan99 commented Sep 12, 2024 •

edited

Loading

freelerobot commented Sep 12, 2024

louis-menlo commented Sep 12, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 12, 2024

dan-menlo commented Sep 12, 2024 •

edited

Loading

dan-menlo commented Sep 12, 2024

dan-menlo commented Sep 27, 2024

epic: Implement new Model Folder and model.yaml #1154

epic: Implement new Model Folder and model.yaml #1154

Comments

dan-menlo commented Sep 8, 2024 • edited by gabrielle-ong Loading

Goal

Tasklist

Decisions

Bugs

Edge Cases

freelerobot commented Sep 9, 2024

louis-menlo commented Sep 10, 2024 • edited Loading

Model detection should not depend on model folder?

Structures do not work well in the past or I have seen around:

1. Shallow structure

2. One level of deep structure

Getting our Filesystem hierarchy less wrong with these 3 following Principles:

Design model.yaml structure

dan-menlo commented Sep 11, 2024 • edited Loading

Model Folder

Suggestion 1: "pull name" as folder name

Suggestion 2: models.list

Suggestion 3: model.yaml is optional

Suggestion 4: model.yaml is co-located with source files

freelerobot commented Sep 11, 2024

dan-menlo commented Sep 11, 2024

model.yaml Structure

louis-menlo commented Sep 11, 2024

dan-menlo commented Sep 11, 2024 • edited Loading

Built-in Model Library

Huggingface Repos

louis-menlo commented Sep 11, 2024

Built-in Model Library

Huggingface Repos

nguyenhoangthuan99 commented Sep 12, 2024 • edited Loading

freelerobot commented Sep 12, 2024

louis-menlo commented Sep 12, 2024 • edited Loading

nguyenhoangthuan99 commented Sep 12, 2024

dan-menlo commented Sep 12, 2024 • edited Loading

EDIT:

dan-menlo commented Sep 12, 2024

dan-menlo commented Sep 27, 2024

dan-menlo commented Sep 8, 2024 •

edited by gabrielle-ong

Loading

louis-menlo commented Sep 10, 2024 •

edited

Loading

dan-menlo commented Sep 11, 2024 •

edited

Loading

Suggestion 3: `model.yaml` is optional

Suggestion 4: `model.yaml` is co-located with source files

`model.yaml` Structure

dan-menlo commented Sep 11, 2024 •

edited

Loading

nguyenhoangthuan99 commented Sep 12, 2024 •

edited

Loading

louis-menlo commented Sep 12, 2024 •

edited

Loading

dan-menlo commented Sep 12, 2024 •

edited

Loading