Replies: 3 comments
-
Yes, the consequence of #1305 would be that you have to re-quantize the Q4 and Q5 files from the F16 files. That is if you choose to update to master if and when it gets merged - you always have to option of staying on an old revision. I guess there is some mismatch in how people see this project. @ggerganov and others like to advance and try new things - I think that is a good thing in general, but in the case of #1305 it causes a slight hassle and no advantages for people not using Apple processors. Maybe we could make the quantization process more integrated with the main llama.cpp process, so that quantization is done automatically if necessary, the result then being cached onto disk and mmaped on future runs. |
Beta Was this translation helpful? Give feedback.
-
I think they should at least have some kind of name change to make it clear that they are in the new format. The models are already marked with a magic number, but something human readable would be more ideal. |
Beta Was this translation helpful? Give feedback.
-
Here's a quick Bash script to print the magic and version number of a model:
Find all models that aren't
|
Beta Was this translation helpful? Give feedback.
-
If I'm interpreting the notice correctly it sound like you are planning to take all existing
Q4
,Q5
, etc models and make them incompatible with the new llama.cpp versions while simultaneously replacing them with new formats with the same names that are only compatible with the new version of llama.cpp. Is that correct?I feel like I must be misunderstanding something because that sound like a terrible idea. There is already a lot of confusion out there caused by the nearly dozen different ggml formats that currently exist, but at least they are all currently supported in llama.cpp without issue. Suddenly dropping support for all of them while silently replacing them with version that will look identical to most users will cause so much confusion and frustration that it's nearly unimaginable. It will also cause chaos for developers that are using llama.cpp (or a wrapper around it) in their own projects. Suddenly breaking all of your old models is a great way of completely losing the trust of other developers, and also a great way to invite forks. Which in turn will cause even more conflicts and confusion.
If I did really misunderstand the notice then I apologize for this post. But I would like to get things clarified.
Beta Was this translation helpful? Give feedback.
All reactions