Converting Ilama 4bit GPTQ Model from HF does not work #746

xonfour · 2023-04-03T18:53:49Z

Hi! I tried to use the 13B Model from https://huggingface.co/maderix/llama-65b-4bit/

I converted the model using

python convert-gptq-to-ggml.py models/llama13b-4bit.pt models/tokenizer.model models/llama13b-4bit.bin

If I understand it correctly I still need to migrate the model and I tried it using

python migrate-ggml-2023-03-30-pr613.py models/llama13b-4bit.bin models/llama13b-4bit-new.bin

But after a few seconds this breaks with the following error:

Processing part 1 of 1

Processing tensor b'tok_embeddings.weight' with shape: [32000, 5120] and type: F16
Traceback (most recent call last):
  File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 311, in <module>
    main()
  File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 306, in main
    copy_tensors(fin, fout, part_id, n_parts)
  File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 169, in copy_tensors
    assert n_dims in (1, 2)
AssertionError

Is it an error or am I the one to blame?

The text was updated successfully, but these errors were encountered:

howard0su · 2023-04-04T11:31:26Z

as today's master, you don't need to run migrate script. convert-gptq-ggml.py generated the latest version of model. Check the first 4 bytes of the generated file. the latest version should be 0x67676d66, the old version which needs migration should be: 0x67676d6c.

xonfour · 2023-04-04T12:23:28Z

ah I see! Well it is 6d66 already, but main expects a different version:

llama13b-4bit.bin: invalid model file (bad magic [got 0x67676d66 want 0x67676a74])

I did a git pull a few hours ago and converted the model afterwards.

xonfour · 2023-04-04T12:27:49Z

I guess convert-gptq-to-ggml.py needs an update? I just changed the version bytes and now it works!

JohnnyOpcode · 2023-04-04T12:36:33Z

How many different GGML BIN file headers are there floating around now?
3?
4?

Asking for a friend..

howard0su · 2023-04-04T15:08:34Z

I think there are 3. the original one A , the new one B, the one recently introduced C.

In order to get from A -> B, run convert-unversioned-ggml-to-ggml.py
In order to get from B -> C, run migrate-ggml-2023-03-30-pr613.py

LoriTosoChef · 2023-04-04T16:04:52Z

@xonfour how did you change the version bytes?

prusnak · 2023-04-04T17:48:40Z

@xonfour how did you change the version bytes?

Just change 0x67676d66 to 0x67676a74 on line 39 of convert-gptq-to-ggml.py and rerun the script.

I will prepare a pull request with the fix soon (after I test it).

prusnak · 2023-04-05T07:25:17Z

Fix in #770

xportz · 2023-04-11T21:10:32Z

After converting GPTQ to GGML do you still get the benefits of GPTQ with its better accuracy compared to RTN quantization?

prusnak · 2023-04-14T13:16:48Z

try the new convert.py script that is now in master, please

wyklq · 2023-05-22T07:40:09Z

@xonfour by looking at the commit log of "convert.py" (notes on latest GPTQ-for-LLaMA format), the issue has been solved with the latest convert.py with:
python convert.py llama-7b-4bit.pt --vocab-dir models --outtype=f16 --outfile models/7B/ggml-model.bin

xonfour changed the title ~~[User] Insert summary of your issue or enhancement..~~ Converting Ilama 4bit GPTQ Model from HF does not work Apr 3, 2023

gjmulder added bug Something isn't working high priority Very important issue labels Apr 6, 2023

prusnak closed this as completed May 22, 2023

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting Ilama 4bit GPTQ Model from HF does not work #746

Converting Ilama 4bit GPTQ Model from HF does not work #746

xonfour commented Apr 3, 2023

howard0su commented Apr 4, 2023

xonfour commented Apr 4, 2023

xonfour commented Apr 4, 2023

JohnnyOpcode commented Apr 4, 2023

howard0su commented Apr 4, 2023 •

edited

Loading

LoriTosoChef commented Apr 4, 2023

prusnak commented Apr 4, 2023 •

edited

Loading

prusnak commented Apr 5, 2023

xportz commented Apr 11, 2023

prusnak commented Apr 14, 2023

wyklq commented May 22, 2023

Converting Ilama 4bit GPTQ Model from HF does not work #746

Converting Ilama 4bit GPTQ Model from HF does not work #746

Comments

xonfour commented Apr 3, 2023

howard0su commented Apr 4, 2023

xonfour commented Apr 4, 2023

xonfour commented Apr 4, 2023

JohnnyOpcode commented Apr 4, 2023

howard0su commented Apr 4, 2023 • edited Loading

LoriTosoChef commented Apr 4, 2023

prusnak commented Apr 4, 2023 • edited Loading

prusnak commented Apr 5, 2023

xportz commented Apr 11, 2023

prusnak commented Apr 14, 2023

wyklq commented May 22, 2023

howard0su commented Apr 4, 2023 •

edited

Loading

prusnak commented Apr 4, 2023 •

edited

Loading