- Run download_snapshot.py for your desired model
- Clone llama.cpp. Ideally use the latest version, but if there is no
convert_hf_to_gguf.py
file, you can rungit checkout 19d8762
. pip install -r llama.cpp/requirements.txt
- Check that your model has a
tokenizer.model
file. If not, you'll need to get it from the base model. e.g forPhi-3-mini-4k-instruct-graph
, there was no such file, so I downloaded it from the original Phi-3-mini repo. Put this file in your downloaded model's dir. PLEASE NOTE: if the tokenizer/vocab was modified from the base model to your desried finetuned model, this approach will likely cause issues. - Run:
python llama.cpp/convert_hf_to_gguf.py create-gguf/Phi-3-mini-4k-instruct-graph \
--outfile create-gguf/Phi-3-mini-4k-instruct-graph.Q8_0.gguf \
--outtype q8_0
To add to a huggingface model repo, follow these steps
To clear huggingface cache:
pip install -U "huggingface_hub[cli]"
huggingface-cli delete-cache