-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
Please consider adding support for Ring-1T and Ling-1T models.
In this discussion bartowski mentioned it is not yet supported in llama.cpp yet: https://huggingface.co/inclusionAI/Ring-1T-preview/discussions/5 - hence no GGUF files for this model yet.
Motivation
It is a series of 1T modes that appeared recently:
https://huggingface.co/inclusionAI/Ring-1T-preview
https://huggingface.co/inclusionAI/Ling-1T
https://huggingface.co/inclusionAI/Ring-1T
It potentially may be even better than Kimi K2, and Ring-1T has thinking capability that Kimi K2 lacks. This model claimed to be one of the best open weight models. It would be awesome to run it locally.
Support for it would help greatly to keep memory requirement reasonable and performance good, for example by allowing to run it as IQ4 GGUF quant (the same quant type I use to run Kimi K2 as my daily driver, which is also 1T model) - great for 768GB or 1TB systems where FP8 would not fit; lower GGUF quants like IQ2 or IQ3 potentially could work on 512 GB systems, making it more accessible (as accessible as running 1T model can be).
Possible Implementation
No response