-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Implement a mechanism to detect the type of model being read #147
Comments
This would require some sort of central registry (something simple, just in the GGML source code) that maps uints to model architecture types. It's possible that the GGJT version could be used to convey this information (RWKV is already taking this approach with the "reserved" GGJT version of 100), or a new GGJT version could be introduced that conveys the model architecture ID separately. |
Why not go even further? Make the common infrastructure of llama.cpp become something like "ggml-llm" and the code for the specific llm architectures (llama, gpt-2, gpt-j, mpt and others) become like add-on modules at compile time. |
FWIW that sounds pretty similar to a Rust project I've been contributing to 😅 https://github.com/rustformers/llm |
Haha yeap - I originally proposed the idea in rustformers/llm. I thought it might make sense if there's some kind of metadata within ggml for quick retrieval of that info (?). |
I'm one of the maintainers of The best heuristic I can think of - matching up the tensor names - requires you to be able to locate the tensors, which requires you to skip past the hyperparameters, which requires you to know what hyperparameters to skip past. Additionally, there are now variants of the same architecture with different configurations; RedPajama uses the GPT-NeoX architecture with I believe this is an issue that @LostRuins of the koboldcpp project has encountered, too: https://www.reddit.com/r/LocalLLaMA/comments/13bpqro/koboldcpp_added_new_redpajama_neox_support_would/ For the next version of the file format, I suggest replacing the hyperparameters with encoded key/value pairs (the format is up to you, but JSON's always easy), and then including the architecture and any other parameters in there, similar to This would allow readers to be able to identify the architecture and/or intelligently handle slight discrepancies in format. |
As Far as I know, there's nothing like "ggml file format," as in TensorFlow or Pytorch. It's an arbitrary binary file and, and it's up to you how you implement it. For example, you can do the following:
So it does not require a change in ggml code itself, and it can be implemented in user code. Am I missing something? |
There is a semi-formal GGML file format - it's what's produced by the There are now four variants of this format, and there are hundreds of GGML-format models floating around on Hugging Face. It is impossible to know what any architecture any of these models are for from the files alone, as their structure is That is to say - I'm entirely fine encoding the architecture into formats I control, but the GGML format has become somewhat of a standard, and its current iterations are not flexible enough to describe the complexity of the model ecosystem. That should be rectified sooner rather than later. For reference, we've been discussing what a stable model format would look like here: rustformers/llm#143 |
With all the variant of ML model out now - gpt2/gptneox/llama/gptj, I wonder if theres a way to infer the model's type from reading it?...
Right now, if someone gives me a random model file with obscured name, I'd first need to checksum it, then look up the hash on HF for the model cards, then look through their docs/paper for the model type, and sometime I'd get confused between gptj/gptneox/llama hahah
The text was updated successfully, but these errors were encountered: