Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting Ilama 4bit GPTQ Model from HF does not work #746

Closed
xonfour opened this issue Apr 3, 2023 · 11 comments
Closed

Converting Ilama 4bit GPTQ Model from HF does not work #746

xonfour opened this issue Apr 3, 2023 · 11 comments
Labels
bug Something isn't working high priority Very important issue

Comments

@xonfour
Copy link

xonfour commented Apr 3, 2023

Hi! I tried to use the 13B Model from https://huggingface.co/maderix/llama-65b-4bit/

I converted the model using

python convert-gptq-to-ggml.py models/llama13b-4bit.pt models/tokenizer.model models/llama13b-4bit.bin

If I understand it correctly I still need to migrate the model and I tried it using

python migrate-ggml-2023-03-30-pr613.py models/llama13b-4bit.bin models/llama13b-4bit-new.bin

But after a few seconds this breaks with the following error:

Processing part 1 of 1

Processing tensor b'tok_embeddings.weight' with shape: [32000, 5120] and type: F16
Traceback (most recent call last):
  File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 311, in <module>
    main()
  File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 306, in main
    copy_tensors(fin, fout, part_id, n_parts)
  File "/home/dust/llama.cpp/migrate-ggml-2023-03-30-pr613.py", line 169, in copy_tensors
    assert n_dims in (1, 2)
AssertionError

Is it an error or am I the one to blame?

@xonfour xonfour changed the title [User] Insert summary of your issue or enhancement.. Converting Ilama 4bit GPTQ Model from HF does not work Apr 3, 2023
@howard0su
Copy link
Collaborator

as today's master, you don't need to run migrate script. convert-gptq-ggml.py generated the latest version of model. Check the first 4 bytes of the generated file. the latest version should be 0x67676d66, the old version which needs migration should be: 0x67676d6c.

@xonfour
Copy link
Author

xonfour commented Apr 4, 2023

ah I see! Well it is 6d66 already, but main expects a different version:

llama13b-4bit.bin: invalid model file (bad magic [got 0x67676d66 want 0x67676a74])

I did a git pull a few hours ago and converted the model afterwards.

@xonfour
Copy link
Author

xonfour commented Apr 4, 2023

I guess convert-gptq-to-ggml.py needs an update? I just changed the version bytes and now it works!

@JohnnyOpcode
Copy link

How many different GGML BIN file headers are there floating around now?
3?
4?

Asking for a friend..

@howard0su
Copy link
Collaborator

howard0su commented Apr 4, 2023

I think there are 3. the original one A , the new one B, the one recently introduced C.

In order to get from A -> B, run convert-unversioned-ggml-to-ggml.py
In order to get from B -> C, run migrate-ggml-2023-03-30-pr613.py

@LoriTosoChef
Copy link

@xonfour how did you change the version bytes?

@prusnak
Copy link
Collaborator

prusnak commented Apr 4, 2023

@xonfour how did you change the version bytes?

Just change 0x67676d66 to 0x67676a74 on line 39 of convert-gptq-to-ggml.py and rerun the script.

I will prepare a pull request with the fix soon (after I test it).

@prusnak
Copy link
Collaborator

prusnak commented Apr 5, 2023

Fix in #770

@gjmulder gjmulder added bug Something isn't working high priority Very important issue labels Apr 6, 2023
@xportz
Copy link

xportz commented Apr 11, 2023

After converting GPTQ to GGML do you still get the benefits of GPTQ with its better accuracy compared to RTN quantization?

@prusnak
Copy link
Collaborator

prusnak commented Apr 14, 2023

try the new convert.py script that is now in master, please

@wyklq
Copy link

wyklq commented May 22, 2023

@xonfour by looking at the commit log of "convert.py" (notes on latest GPTQ-for-LLaMA format), the issue has been solved with the latest convert.py with:
python convert.py llama-7b-4bit.pt --vocab-dir models --outtype=f16 --outfile models/7B/ggml-model.bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working high priority Very important issue
Projects
None yet
Development

No branches or pull requests

8 participants