Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does llama.cpp support gpt-2-q4? #1386

Closed
realcarlos opened this issue May 10, 2023 · 5 comments
Closed

Does llama.cpp support gpt-2-q4? #1386

realcarlos opened this issue May 10, 2023 · 5 comments
Labels

Comments

@realcarlos
Copy link

I noticed ggml could run gpt-2 , I wonder if llama.cpp support too, I have download gpt-2-q4 model from huggingface ,but failed to run.

@realcarlos
Copy link
Author

I tried to convert Cerebras-GPT-111M for llama.cpp, it could run with ggml , but I got error like below

❯ ./quantize ./models/Cerebras-GPT-111M/ggml-model-f16.bin ./models/output/cerebras-q4-0.bin q4_0
llama.cpp: loading model from ./models/Cerebras-GPT-111M/ggml-model-f16.bin
llama_model_quantize: failed to quantize: missing tok_embeddings.weight
main: failed to quantize model from './models/Cerebras-GPT-111M/ggml-model-f16.bin'

@realcarlos
Copy link
Author

❯ python3 convert.py ./models/Cerebras-GPT-111M/
Loading model file models/Cerebras-GPT-111M/ggml-model-f16.bin
Traceback (most recent call last):
File "/Users/bytedance/llama.cpp/convert.py", line 1149, in
main()
File "/Users/bytedance/llama.cpp/convert.py", line 1137, in main
vocab = load_vocab(vocab_dir)
File "/Users/bytedance/llama.cpp/convert.py", line 1078, in load_vocab
raise FileNotFoundError(f"Could not find tokenizer.model in {path} or its parent; if it's in another directory, pass the directory as --vocab-dir")
FileNotFoundError: Could not find tokenizer.model in models/Cerebras-GPT-111M or its parent; if it's in another directory, pass the directory as --vocab-dir

@realcarlos
Copy link
Author

截屏2023-05-10 11 29 51 this is the screenshot of "models/Cerebras-GPT-111M"

@LostRuins
Copy link
Collaborator

Cerebras is a GPT2 based model, not llama, and won't work with this repo.

You can run it with the gpt2 example main.cpp example in the ggml repo, or via KoboldCpp which automatically detects format.

(Disclaimer: I am a koboldcpp dev)

@github-actions github-actions bot added the stale label Mar 25, 2024
Copy link
Contributor

github-actions bot commented Apr 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants