llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_init_from_file: failed to load model #62

vmajor · 2023-07-24T09:41:45Z

This error occurs with quantized 70B model that works with the latest current master branch of llama.cpp

llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected  8192 x  8192, got  8192 x  1024
llama_init_from_file: failed to load model

I am guessing that you would just need to update the PyPi package. Will try to build from source in the meantime.

The text was updated successfully, but these errors were encountered:

vmajor · 2023-07-24T09:45:06Z

...actually fixing this is well beyond my skill level, but it is indeed related to the newness of the 70B model handling. Quantization changed:
ggerganov/llama.cpp#2276

klosax · 2023-07-24T12:41:05Z

@vmajor

Did you use the -gqa 8 parameter, which is needed for using the 70b model?

vmajor · 2023-07-24T12:42:33Z

with ctransformers? no... OK I did not think that it would be passed. I will try it now.

klosax · 2023-07-24T13:21:17Z

No sorry, my mistake. You will need the newest master from llama.cpp and it needs the -gqa 8 parameter for 70b models.

vmajor · 2023-07-24T13:23:11Z

...and no, cannot set up the LLM instance, invalid argument. I have llama.cpp working, but that does not help as I need python bindings. I am waiting for llama-cpp-python to update, otherwise I can build the required .so by pulling the working llama.cpp and building from source.

Cannot do the same with ctransformers as it is written in c++ and I do not speak that language

marella · 2023-07-29T00:12:30Z

Added support for LLaMA 2 70B models in the latest version 0.2.15

Since gqa param appears to be a temporary solution, I haven't added it as a config parameter. In order to use 70B models, the model path or repo name must contain the word 70B. For example, llama-2-70b.bin, llama-2-70b/ggml-model.bin, TheBloke/Llama-2-70B-GGML etc.

@TheBloke models should work out of the box without any additional configuration:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/Llama-2-70B-GGML")

TheBloke · 2023-07-29T07:13:15Z

Good to see ctransformers support. Looking for '70b' wouldn't have worked with https://huggingface.co/TheBloke/StableBeluga2-GGML so I have renamed it StableBeluga2-70B-GGML

viktor-ferenczi · 2023-08-27T03:28:35Z

The ggml loader relies on this horrible hack:

std::regex pattern_70b(R"((\b|_)70b(\b|_))", std::regex_constants::icase);

There seem to be no way to set n_gqa=8 while using ctransformers from Python code.

Any solution for GGML other than waiting for GGUF support?

Workaround: Add _70b_ in the ggml file's name. Ugly, but works.

marella · 2023-08-27T12:58:25Z

GGUF support is added in 0.2.24

n_gqa was a temporary parameter so I didn't add it. It is now no longer supported in llama.cpp
For ggml models add 70b to file name as mentioned above.

vmajor mentioned this issue Jul 28, 2023

Feature request: add support for 70B ggml models #65

Closed

marella closed this as completed Aug 5, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_init_from_file: failed to load model #62

llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_init_from_file: failed to load model #62

vmajor commented Jul 24, 2023

vmajor commented Jul 24, 2023

klosax commented Jul 24, 2023 •

edited

Loading

vmajor commented Jul 24, 2023

klosax commented Jul 24, 2023

vmajor commented Jul 24, 2023

marella commented Jul 29, 2023

TheBloke commented Jul 29, 2023 •

edited

Loading

viktor-ferenczi commented Aug 27, 2023 •

edited

Loading

marella commented Aug 27, 2023

llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_init_from_file: failed to load model #62

llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_init_from_file: failed to load model #62

Comments

vmajor commented Jul 24, 2023

vmajor commented Jul 24, 2023

klosax commented Jul 24, 2023 • edited Loading

vmajor commented Jul 24, 2023

klosax commented Jul 24, 2023

vmajor commented Jul 24, 2023

marella commented Jul 29, 2023

TheBloke commented Jul 29, 2023 • edited Loading

viktor-ferenczi commented Aug 27, 2023 • edited Loading

marella commented Aug 27, 2023

klosax commented Jul 24, 2023 •

edited

Loading

TheBloke commented Jul 29, 2023 •

edited

Loading

viktor-ferenczi commented Aug 27, 2023 •

edited

Loading