Skip to content

Latest commit

 

History

History
24 lines (19 loc) · 1.19 KB

README.md

File metadata and controls

24 lines (19 loc) · 1.19 KB

Creating a GGUF model

  1. Run download_snapshot.py for your desired model
  2. Clone llama.cpp. Ideally use the latest version, but if there is no convert_hf_to_gguf.py file, you can run git checkout 19d8762.
  3. pip install -r llama.cpp/requirements.txt
  4. Check that your model has a tokenizer.model file. If not, you'll need to get it from the base model. e.g for Phi-3-mini-4k-instruct-graph, there was no such file, so I downloaded it from the original Phi-3-mini repo. Put this file in your downloaded model's dir. PLEASE NOTE: if the tokenizer/vocab was modified from the base model to your desried finetuned model, this approach will likely cause issues.
  5. Run:
python llama.cpp/convert_hf_to_gguf.py create-gguf/Phi-3-mini-4k-instruct-graph \
  --outfile create-gguf/Phi-3-mini-4k-instruct-graph.Q8_0.gguf \
  --outtype q8_0

Adding a model to Huggingface

To add to a huggingface model repo, follow these steps

Tip

To clear huggingface cache:

pip install -U "huggingface_hub[cli]"
huggingface-cli delete-cache