Am I interpreting the "Breaking change" notice correctly? #1373

EliEron · 2023-05-08T21:45:21Z

EliEron
May 8, 2023

If I'm interpreting the notice correctly it sound like you are planning to take all existing Q4, Q5, etc models and make them incompatible with the new llama.cpp versions while simultaneously replacing them with new formats with the same names that are only compatible with the new version of llama.cpp. Is that correct?

I feel like I must be misunderstanding something because that sound like a terrible idea. There is already a lot of confusion out there caused by the nearly dozen different ggml formats that currently exist, but at least they are all currently supported in llama.cpp without issue. Suddenly dropping support for all of them while silently replacing them with version that will look identical to most users will cause so much confusion and frustration that it's nearly unimaginable. It will also cause chaos for developers that are using llama.cpp (or a wrapper around it) in their own projects. Suddenly breaking all of your old models is a great way of completely losing the trust of other developers, and also a great way to invite forks. Which in turn will cause even more conflicts and confusion.

If I did really misunderstand the notice then I apologize for this post. But I would like to get things clarified.

sw · 2023-05-09T07:44:22Z

sw
May 9, 2023

Yes, the consequence of #1305 would be that you have to re-quantize the Q4 and Q5 files from the F16 files. That is if you choose to update to master if and when it gets merged - you always have to option of staying on an old revision.

I guess there is some mismatch in how people see this project. @ggerganov and others like to advance and try new things - I think that is a good thing in general, but in the case of #1305 it causes a slight hassle and no advantages for people not using Apple processors.

Maybe we could make the quantization process more integrated with the main llama.cpp process, so that quantization is done automatically if necessary, the result then being cached onto disk and mmaped on future runs.

0 replies

BetaDoggo · 2023-05-09T13:51:31Z

BetaDoggo
May 9, 2023

I think they should at least have some kind of name change to make it clear that they are in the new format. The models are already marked with a magic number, but something human readable would be more ideal.

0 replies

gjmulder · 2023-05-18T19:27:03Z

gjmulder
May 18, 2023
Collaborator

If I'm interpreting the notice correctly it sound like you are planning to take all existing Q4, Q5, etc models and make them incompatible with the new llama.cpp versions while simultaneously replacing them with new formats with the same names that are only compatible with the new version of llama.cpp. Is that correct?

I feel like I must be misunderstanding something because that sound like a terrible idea. There is already a lot of confusion out there caused by the nearly dozen different ggml formats that currently exist, but at least they are all currently supported in llama.cpp without issue. Suddenly dropping support for all of them while silently replacing them with version that will look identical to most users will cause so much confusion and frustration that it's nearly unimaginable. It will also cause chaos for developers that are using llama.cpp (or a wrapper around it) in their own projects. Suddenly breaking all of your old models is a great way of completely losing the trust of other developers, and also a great way to invite forks. Which in turn will cause even more conflicts and confusion.

If I did really misunderstand the notice then I apologize for this post. But I would like to get things clarified.

Here's a quick Bash script to print the magic and version number of a model:

#!/bin/bash
xxd $1 | awk -v fname=$1 '{printf("magic: 0x%8s, version: 0x%4s, file: %s\n", $3$2, $4, fname);exit}'

Find all models that aren't llama.cpp version 0x0100 or 0x0200:

$ find . -name "*.bin" -exec ./ggml_file.sh {} \; | sort -n -k 4,4 | egrep -v "version: (0x0100|0x0200)"
magic: 0x0304504b, version: 0x0000, file: ./alpaca-native-4bit/training_args.bin
magic: 0x67676c6d, version: 0x0010, file: ./MPT-7B-GGML/mpt7b.ggmlv2.fp16.bin
magic: 0x67676c6d, version: 0x0010, file: ./MPT-7B-GGML/mpt7b.ggmlv2.q4_0.bin
magic: 0x67676c6d, version: 0x0010, file: ./MPT-7B-GGML/mpt7b.ggmlv2.q4_1.bin
magic: 0x67676c6d, version: 0x0010, file: ./MPT-7B-GGML/mpt7b.ggmlv2.q5_0.bin
magic: 0x67676c6d, version: 0x0010, file: ./MPT-7B-GGML/mpt7b.ggmlv2.q5_1.bin
magic: 0x67676c6d, version: 0x0010, file: ./MPT-7B-GGML/mpt7b.ggmlv2.q8_0.bin
magic: 0x67676c6d, version: 0x007d, file: ./alpaca-13B-ggml/ggml1-model-q4_0.bin
magic: 0x67676c6d, version: 0x007d, file: ./alpaca-30B-ggml/ggml1-model-q4_0.bin
magic: 0x67676c6d, version: 0x00c5, file: ./RedPajama-INCITE-Instruct-3B-v1-GGML/rp-instruct-3b-v1-ggml-model-q4_0.bin
magic: 0x67676c6d, version: 0x00c5, file: ./RedPajama-INCITE-Instruct-3B-v1-GGML/rp-instruct-3b-v1-ggml-model-q5_1.bin
magic: 0x67676c6d, version: 0x51c4, file: ./Cerebras-GPT-2.7B-Alpaca-SP-ggml/ggml-model-f16.bin

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Am I interpreting the "Breaking change" notice correctly? #1373

{{title}}

Replies: 3 comments

{{title}}

{{title}}

{{title}}

Select a reply

Am I interpreting the "Breaking change" notice correctly? #1373

EliEron May 8, 2023

Replies: 3 comments

sw May 9, 2023

BetaDoggo May 9, 2023

gjmulder May 18, 2023 Collaborator

EliEron
May 8, 2023

sw
May 9, 2023

BetaDoggo
May 9, 2023

gjmulder
May 18, 2023
Collaborator