Does llama.cpp support gpt-2-q4? #1386

realcarlos · 2023-05-10T02:42:35Z

I noticed ggml could run gpt-2 , I wonder if llama.cpp support too, I have download gpt-2-q4 model from huggingface ,but failed to run.

realcarlos · 2023-05-10T03:26:14Z

I tried to convert Cerebras-GPT-111M for llama.cpp, it could run with ggml , but I got error like below

❯ ./quantize ./models/Cerebras-GPT-111M/ggml-model-f16.bin ./models/output/cerebras-q4-0.bin q4_0
llama.cpp: loading model from ./models/Cerebras-GPT-111M/ggml-model-f16.bin
llama_model_quantize: failed to quantize: missing tok_embeddings.weight
main: failed to quantize model from './models/Cerebras-GPT-111M/ggml-model-f16.bin'

realcarlos · 2023-05-10T03:29:09Z

❯ python3 convert.py ./models/Cerebras-GPT-111M/
Loading model file models/Cerebras-GPT-111M/ggml-model-f16.bin
Traceback (most recent call last):
File "/Users/bytedance/llama.cpp/convert.py", line 1149, in
main()
File "/Users/bytedance/llama.cpp/convert.py", line 1137, in main
vocab = load_vocab(vocab_dir)
File "/Users/bytedance/llama.cpp/convert.py", line 1078, in load_vocab
raise FileNotFoundError(f"Could not find tokenizer.model in {path} or its parent; if it's in another directory, pass the directory as --vocab-dir")
FileNotFoundError: Could not find tokenizer.model in models/Cerebras-GPT-111M or its parent; if it's in another directory, pass the directory as --vocab-dir

realcarlos · 2023-05-10T03:30:26Z

this is the screenshot of "models/Cerebras-GPT-111M"

LostRuins · 2023-05-12T12:33:03Z

Cerebras is a GPT2 based model, not llama, and won't work with this repo.

You can run it with the gpt2 example main.cpp example in the ggml repo, or via KoboldCpp which automatically detects format.

(Disclaimer: I am a koboldcpp dev)

github-actions · 2024-04-09T01:09:28Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Bearsaerker mentioned this issue Mar 12, 2025

Eval bug: Gemma 3 extremly slow prompt processing when using quantized kv cache. #12352

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does llama.cpp support gpt-2-q4? #1386

Does llama.cpp support gpt-2-q4? #1386

realcarlos commented May 10, 2023

realcarlos commented May 10, 2023

realcarlos commented May 10, 2023

realcarlos commented May 10, 2023

LostRuins commented May 12, 2023

github-actions bot commented Apr 9, 2024

Does llama.cpp support gpt-2-q4? #1386

Does llama.cpp support gpt-2-q4? #1386

Comments

realcarlos commented May 10, 2023

realcarlos commented May 10, 2023

realcarlos commented May 10, 2023

realcarlos commented May 10, 2023

LostRuins commented May 12, 2023

github-actions bot commented Apr 9, 2024