Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_init_from_file: failed to load model #62

Closed
vmajor opened this issue Jul 24, 2023 · 9 comments

Comments

@vmajor
Copy link

vmajor commented Jul 24, 2023

This error occurs with quantized 70B model that works with the latest current master branch of llama.cpp

llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected  8192 x  8192, got  8192 x  1024
llama_init_from_file: failed to load model

I am guessing that you would just need to update the PyPi package. Will try to build from source in the meantime.

@vmajor
Copy link
Author

vmajor commented Jul 24, 2023

...actually fixing this is well beyond my skill level, but it is indeed related to the newness of the 70B model handling. Quantization changed:
ggerganov/llama.cpp#2276

@klosax
Copy link

klosax commented Jul 24, 2023

@vmajor

Did you use the -gqa 8 parameter, which is needed for using the 70b model?

@vmajor
Copy link
Author

vmajor commented Jul 24, 2023

with ctransformers? no... OK I did not think that it would be passed. I will try it now.

@klosax
Copy link

klosax commented Jul 24, 2023

No sorry, my mistake. You will need the newest master from llama.cpp and it needs the -gqa 8 parameter for 70b models.

@vmajor
Copy link
Author

vmajor commented Jul 24, 2023

...and no, cannot set up the LLM instance, invalid argument. I have llama.cpp working, but that does not help as I need python bindings. I am waiting for llama-cpp-python to update, otherwise I can build the required .so by pulling the working llama.cpp and building from source.

Cannot do the same with ctransformers as it is written in c++ and I do not speak that language

@marella
Copy link
Owner

marella commented Jul 29, 2023

Added support for LLaMA 2 70B models in the latest version 0.2.15

Since gqa param appears to be a temporary solution, I haven't added it as a config parameter. In order to use 70B models, the model path or repo name must contain the word 70B. For example, llama-2-70b.bin, llama-2-70b/ggml-model.bin, TheBloke/Llama-2-70B-GGML etc.

@TheBloke models should work out of the box without any additional configuration:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-70B-GGML")

@TheBloke
Copy link

TheBloke commented Jul 29, 2023

Good to see ctransformers support. Looking for '70b' wouldn't have worked with https://huggingface.co/TheBloke/StableBeluga2-GGML so I have renamed it StableBeluga2-70B-GGML

@marella marella closed this as completed Aug 5, 2023
@viktor-ferenczi
Copy link

viktor-ferenczi commented Aug 27, 2023

The ggml loader relies on this horrible hack:

std::regex pattern_70b(R"((\b|_)70b(\b|_))", std::regex_constants::icase);

There seem to be no way to set n_gqa=8 while using ctransformers from Python code.

Any solution for GGML other than waiting for GGUF support?

Workaround: Add _70b_ in the ggml file's name. Ugly, but works.

@marella
Copy link
Owner

marella commented Aug 27, 2023

GGUF support is added in 0.2.24

n_gqa was a temporary parameter so I didn't add it. It is now no longer supported in llama.cpp
For ggml models add 70b to file name as mentioned above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants