We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Used this model: https://huggingface.co/meta-llama/Llama-2-70b
Used these commands:
$ convert-pth-to-ggml.py models/LLaMa2-70B-meta 1
$ ./quantize ./models/LLaMa2-70B-meta/ggml-model-f16.bin ./models/LLaMa2-70B-meta/ggml-model-q4_0.bin 2
7B and 11B models work without any problems. This is only when using the 70B model.
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024
llama.cpp: loading model from /Users/xyz/Desktop/llama.cpp/models/LLaMa2-70B-meta/ggml-model-q4_0.bin llama_model_load_internal: format = ggjt v3 (latest) llama_model_load_internal: n_vocab = 32000 llama_model_load_internal: n_ctx = 512 llama_model_load_internal: n_embd = 8192 llama_model_load_internal: n_mult = 256 llama_model_load_internal: n_head = 64 llama_model_load_internal: n_layer = 80 llama_model_load_internal: n_rot = 128 llama_model_load_internal: freq_base = 10000.0 llama_model_load_internal: freq_scale = 1 llama_model_load_internal: ftype = 2 (mostly Q4_0) llama_model_load_internal: n_ff = 22016 llama_model_load_internal: model size = 65B llama_model_load_internal: ggml ctx size = 0.19 MB error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024 llama_load_model_from_file: failed to load model --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[2], line 1 ----> 1 llm = Llama(model_path="/Users/xyz/Desktop/llama.cpp/models/LLaMa2-70B-meta/ggml-model-q4_0.bin", n_ctx=512, seed=43, n_threads=8, n_gpu_layers=1) File /opt/homebrew/Caskroom/miniforge/base/envs/tensorflow_m1/lib/python3.11/site-packages/llama_cpp/llama.py:305, in Llama.__init__(self, model_path, n_ctx, n_parts, n_gpu_layers, seed, f16_kv, logits_all, vocab_only, use_mmap, use_mlock, embedding, n_threads, n_batch, last_n_tokens_size, lora_base, lora_path, low_vram, tensor_split, rope_freq_base, rope_freq_scale, verbose) 300 raise ValueError(f"Model path does not exist: {model_path}") 302 self.model = llama_cpp.llama_load_model_from_file( 303 self.model_path.encode("utf-8"), self.params 304 ) --> 305 assert self.model is not None 307 self.ctx = llama_cpp.llama_new_context_with_model(self.model, self.params) 309 assert self.ctx is not None AssertionError:
The text was updated successfully, but these errors were encountered:
llama-70b model utilizes GQA and is not compatible yet. Work is being done in PR #2276
Sorry, something went wrong.
No branches or pull requests
Used this model: https://huggingface.co/meta-llama/Llama-2-70b
Used these commands:
7B and 11B models work without any problems. This is only when using the 70B model.
error loading model: llama.cpp: tensor 'layers.0.attention.wk.weight' has wrong shape; expected 8192 x 8192, got 8192 x 1024
The text was updated successfully, but these errors were encountered: