13b model issue tensor 'tok_embeddings.weight' has wrong size in model file #24

Tarang · 2023-03-11T21:04:18Z

I try the following with the latest master (6b2cb63)

python convert-pth-to-ggml.py models/13B/ 1
./quantize ./models/13B/ggml-model-f16.bin   ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2

ls models/13B/
checklist.chk         consolidated.00.pth   consolidated.01.pth   ggml-model-f16.bin    ggml-model-f16.bin.1  ggml-model-q4_0.bin   ggml-model-q4_0.bin.1 params.json

./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128
main: seed = 1678568386
llama_model_load: loading model from './models/13B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 5120
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 40
llama_model_load: n_layer = 40
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 13824
llama_model_load: ggml ctx size = 8559.49 MB
llama_model_load: memory_size =   800.00 MB, n_mem = 20480
llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file
main: failed to load model from './models/13B/ggml-model-q4_0.bin'
llama_model_load: ⏎

What would tensor 'tok_embeddings.weight' has wrong size in model file mean?

The text was updated successfully, but these errors were encountered:

djkz · 2023-03-11T21:52:37Z

It means you are running it on the old version. recompile your main and quantize and re-quantize the weights, it should work after.

Tarang · 2023-03-11T21:57:23Z

That was it!

Ionaut · 2023-03-21T15:03:55Z

How is this done exactly?

Komal-99 · 2023-09-11T09:21:50Z

Hi,
I am not able to Quantize my model after running convert.py from Llama.cpp the mode has been converted into gguf type but while running. '
./quantize C:\PrivateGPT\privategpt\privateGPT-main\llama.cpp-master\models\ggml-model-f16.gguf C:\PrivateGPT\privategpt\privateGPT-main\llama.cpp-master\models\ggml-model-q4_0.gguf q4_0

Error Occured :- ./quantize is not a cmdlet or script function.
Also @Tarang Can you please tell how you are able to create .bin file convert.py is creating .gguf file by default.

sync master

Tarang closed this as completed Mar 11, 2023

gjmulder added the build Compilation issues label Mar 15, 2023

jwooldridge234 mentioned this issue Mar 20, 2023

13B Alpaca seems to be available cocktailpeanut/dalai#97

Open

srhm-ca mentioned this issue Mar 24, 2023

Using non LoRA Alpaca model #303

Closed

cksac mentioned this issue Mar 30, 2023

Did we support gpt4all-lora-quantized.bin? rustformers/llm#86

Closed

Deadsg pushed a commit to Deadsg/llama.cpp that referenced this issue Dec 19, 2023

Add basic tests. Closes ggml-org#24

c3972b6

zkh2016 pushed a commit to zkh2016/llama.cpp that referenced this issue Oct 18, 2024

Merge pull request ggml-org#24 from OpenBMB/master

ce0d1a6

sync master

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

13b model issue tensor 'tok_embeddings.weight' has wrong size in model file #24

13b model issue tensor 'tok_embeddings.weight' has wrong size in model file #24

Tarang commented Mar 11, 2023

djkz commented Mar 11, 2023

Tarang commented Mar 11, 2023

Ionaut commented Mar 21, 2023

Komal-99 commented Sep 11, 2023

13b model issue tensor 'tok_embeddings.weight' has wrong size in model file #24

13b model issue tensor 'tok_embeddings.weight' has wrong size in model file #24

Comments

Tarang commented Mar 11, 2023

djkz commented Mar 11, 2023

Tarang commented Mar 11, 2023

Ionaut commented Mar 21, 2023

Komal-99 commented Sep 11, 2023