RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist #748

okoliechykwuka · 2024-07-09T14:16:42Z

The below error occured while trying to convert model to gguf format.

I noticed that quantized folder resides in llama.cpp/examples/quantize

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?

# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")

The text was updated successfully, but these errors were encountered:

danielhanchen · 2024-07-10T09:29:31Z

Weird I just tried it in the last hour and it works

scherbakovdmitri · 2024-07-10T16:50:08Z

It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth
ggerganov/llama.cpp#8107
if you run make command in llama.cpp folder it will work

danielhanchen · 2024-07-12T06:23:51Z

Weird it stopped working? Hmm I shall try this in Colab and report back!

Zhangy-ly · 2024-07-19T05:57:19Z

I have the same problem. Is there a solution now?

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?

danielhanchen · 2024-07-19T06:21:30Z

It should function - are you using Colab?

Zhangy-ly · 2024-07-19T06:29:14Z

It should function - are you using Colab?

Well, mine is as follows:
NVIDIA V100
Driver Version: 535.146.02
CUDA Version: 12.1

I temporarily solved this problem by rolling back llama.cpp

cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1

okoliechykwuka · 2024-07-19T16:44:06Z

@danielhanchen Yes, I am using colab, but I am still having the same error.

danielhanchen · 2024-07-20T20:17:22Z

Wait weird I just ran it with no errors in Colab - it's best to use our updated notebooks on our Github and start a fresh

Deluxer · 2024-07-28T22:39:48Z

@Zhangy-ly That is an effective workaround.

cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1

theodufort · 2024-08-01T13:56:01Z

@Zhangy-ly That is an effective workaround.

cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1

To anyone having error while using those bash commands, use: ! before each command

danielhanchen · 2024-08-02T06:03:09Z

Wait so the issue persists? Are people using Colab / Runpod?

Zhangy-ly · 2024-08-02T06:31:09Z

Wait so the issue persists? Are people using Colab / Runpod?

Hi Daniel,

Thank you for your response.

To clarify, the issue persists on my Ubuntu setup, although it seems to run without problems on Colab. Is there any other information you need to help diagnose the issue? plz tell me.

Ubuntu, NVIDIA V100, Driver Version: 535.146.02, CUDA Version: 12.1

packages in environment:

Name Version

_libgcc_mutex 0.1
_openmp_mutex 5.1
accelerate 0.32.1
aiohttp 3.9.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.43.2
blas 1.0
brotli-python 1.0.9
bzip2 1.0.8
ca-certificates 2024.3.11
certifi 2024.7.4
charset-normalizer 2.0.4
cuda-cudart 12.1.105
cuda-cupti 12.1.105
cuda-libraries 12.1.0
cuda-nvrtc 12.1.105
cuda-nvtx 12.1.105
cuda-opencl 12.5.39
cuda-runtime 12.1.0
cuda-version 12.5
datasets 2.20.0
dill 0.3.8
docstring-parser 0.16
ffmpeg 4.3
filelock 3.13.1
freetype 2.12.1
frozenlist 1.4.1
fsspec 2024.2.0
gguf 0.9.1
gmp 6.2.1
gmpy2 2.1.2
gnutls 3.6.15
huggingface-hub 0.23.4
idna 3.7
intel-openmp 2023.1.0
jinja2 3.1.3
jpeg 9e
lame 3.100
lcms2 2.12
ld_impl_linux-64 2.38
lerc 3.0
libcublas 12.1.0.26
libcufft 11.0.2.4
libcufile 1.10.1.7
libcurand 10.3.6.82
libcusolver 11.4.4.55
libcusparse 12.0.2.55
libdeflate 1.17
libffi 3.4.4
libgcc-ng 11.2.0
libgomp 11.2.0
libiconv 1.16
libidn2 2.3.4
libjpeg-turbo 2.0.0
libnpp 12.0.2.50
libnvjitlink 12.1.105
libnvjpeg 12.1.1.14
libpng 1.6.39
libstdcxx-ng 11.2.0
libtasn1 4.19.0
libtiff 4.5.1
libunistring 0.9.10
libuuid 1.41.5
libwebp-base 1.3.2
llvm-openmp 14.0.6
lz4-c 1.9.4
markdown-it-py 3.0.0
markupsafe 2.1.5
mdurl 0.1.2
mkl 2023.1.0
mkl-service 2.4.0
mkl_fft 1.3.8
mkl_random 1.2.4
mpc 1.1.0
mpfr 4.0.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
ncurses 6.4
nettle 3.7.3
networkx 3.2.1
numpy 1.26.4
numpy-base 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.1.105
nvidia-nvtx-cu12 12.1.105
openh264 2.1.1
openjpeg 2.4.0
openssl 3.0.14
packaging 24.1
pandas 2.2.2
peft 0.11.1
pillow 10.3.0
pip 24.0
protobuf 3.20.3
psutil 6.0.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
pygments 2.18.0
pysocks 1.7.1
python 3.10.13
python-dateutil 2.9.0.post0
pytorch-cuda 12.1
pytorch-mutex 1.0
pytz 2024.1
pyyaml 6.0.1
readline 8.2
regex 2024.5.15
requests 2.32.2
rich 13.7.1
safetensors 0.4.3
sentencepiece 0.2.0
setuptools 69.5.1
shtab 1.7.1
six 1.16.0
sqlite 3.45.3
sympy 1.12
tbb 2021.8.0
tk 8.6.14
tokenizers 0.19.1
torch 2.2.0+cu121
torchaudio 2.2.0
torchvision 0.17.0
tqdm 4.66.4
transformers 4.43.1
triton 2.2.0
trl 0.8.6
typing-extensions 4.9.0
tyro 0.8.5
tzdata 2024.1
unsloth 2024.7
urllib3 2.2.2
wheel 0.43.0
xformers 0.0.24
xxhash 3.4.1
xz 5.4.6
yaml 0.2.5
yarl 1.9.4
zlib 1.2.13
zstd 1.5.5

jeehunseo · 2024-08-07T08:47:58Z

@Zhangy-ly That is an effective workaround.

cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1

solved my situation. There are no llama-quantize and quantize file in the newest git source(08/07/2024). So, unslothai should install the specific version of llama.cpp to fix this issue. Thank you! ;)

thyarles · 2024-08-13T19:40:38Z

It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth ggerganov/llama.cpp#8107 if you run make command in llama.cpp folder it will work

Same problem here. This tip solved the issue.

$ cd llama.cpp
make

yuxiaojian · 2024-08-20T11:49:22Z

manually make works. it generates llama.cpp/llama-quantize

$ cd llama.cpp
make

It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth ggerganov/llama.cpp#8107 if you run make command in llama.cpp folder it will work

Same problem here. This tip solved the issue.
$ cd llama.cpp
make

danielhanchen · 2024-08-24T00:19:47Z

Hmm I might have to re-take a look why it's not working - maybe my calling mechanisms aren't functioning correctly

whisper-bye · 2024-08-29T15:54:32Z

On windows I need to remove the extension llama-quantize.exe

and then

%}{{'<|im_start|>user
' + message['content'] + '<|im_end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|im_start|>assistant
' + message['content'] + '<|im_end|>
' }}{% else %}{{ '<|im_start|>system
' + message['content'] + '<|im_end|>
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}{% else %}{% for message in messages %}{% if message['from'] == 'human' %}{{'<|im_start|>user
' + message['value'] + '<|im_end|>
'}}{% elif message['from'] == 'gpt' %}{{'<|im_start|>assistant
' + message['value'] + '<|im_end|>
' }}{% else %}{{ '<|im_start|>system
' + message['value'] + '<|im_end|>
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}{% endif %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:model\unsloth.BF16.gguf: n_tensors = 339, total_size = 15.2G
Writing: 100%|██████████| 15.2G/15.2G [01:00<00:00, 250Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to model\unsloth.BF16.gguf
Unsloth: Conversion completed! Output location: ./model/unsloth.BF16.gguf
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This will take 20 minutes...
'.' is not recognized as an internal or external command,
operable program or batch file.

RuntimeError: Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.

throttlekitty · 2024-09-09T03:20:53Z

A bit of a noob here, but I have a workaround. I had built llama.cpp with VS2022 using cmake. I had a llama.cpp\bin\Releases with the resulting dll and exe files, which unsloth couldn't find. Simply copying that whole folder to llama.cpp\llama-quantize worked. I was initially confused as to what exactly unsloth was looking for.

danielhanchen · 2024-09-10T08:26:41Z

Sorry on the issues on llama.cpp :(
I might actually make a section with exact details on how to do llama.cpp properly

Antonytm · 2024-09-21T20:14:27Z

Same for me

File ~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:975, in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)
    [973](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:973)     quantize_location = "llama.cpp/llama-quantize"
    [974](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:974) else:
--> [975](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:975)     raise RuntimeError(
    [976](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:976)         "Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.\n"\
    [977](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:977)         "But we expect this file to exist! Maybe the llama.cpp developers changed the name?"
    [978](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:978)     )
...
    [981](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:981) # See https://github.com/unslothai/unsloth/pull/730
    [982](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:982) # Filenames changed again!

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?

danielhanchen · 2024-10-01T08:44:04Z

@Antonytm Would https://github.com/unslothai/unsloth/wiki#manually-saving-to-gguf be helpful? Sorry on the delay!

rorubyy · 2024-10-09T01:42:52Z

@danielhanchen yes! It works. 👍

jainpradeep · 2024-12-06T12:00:38Z

I tried building the same with cmake but exe's and dll's are not getting generated. I have manually copied the dll's and exe's from the release builds but I get the same issue. I then converted the model to gguf manually

python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16

mo del file gets generated but on creating this model with ollama from the gguf file I get the following error

`C:\Users\Desktop\New folder>ollama create unsloth_m -f "C:\Users\Desktop\New folder\op.gguf"

Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"`

Please help.

lastrei · 2024-12-10T14:50:49Z

git log -1

save me work,thanks

lastrei · 2024-12-10T14:56:11Z

I tried building the same with cmake but exe's and dll's are not getting generated. I have manually copied the dll's and exe's from the release builds but I get the same issue. I then converted the model to gguf manually

python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16

mo del file gets generated but on creating this model with ollama from the gguf file I get the following error

`C:\Users\Desktop\New folder>ollama create unsloth_m -f "C:\Users\Desktop\New folder\op.gguf"

Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"`

Please help.

seems there is someting wrong with your modelfile , usually on the top is FROM model_name.gguf

danielhanchen · 2024-12-12T09:32:04Z

@jainpradeep Windows right? Also apologies on the delay - Modelfile should look like https://github.com/ollama/ollama/blob/main/docs/modelfile.md and Windows building for llama.cpp can be tough - see https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md

I was planning to add more stable support for Windows in the future

jainpradeep · 2024-12-13T06:49:39Z

The solution provided by @Zhangy-ly checking out llama.cpp branch doesn't seem to work anymore. I used cmake
as advised in the updated build documentation

git checkout b3345 git submodule update --init --recursive cmake -B build cmake --build build --config Release git log -1

CMake generates an out-of-source build by default, meaning the build artifacts (compiled binaries, etc.) are placed in a separate build folder (e.g., build/Release) instead of the source folder (llama.cpp). I copied all the binaries in the build folder to the root folder and re-ran the unsloth Llama_3_2_1B+3B_Conversational_+_2x_faster_finetuning collab. But still I get the same

RuntimeError( "Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.\n"\ "But we expect this file to exist! Maybe the llama.cpp developers changed the name?" )

As a alternate workaround I tried converting the model to gguf manually using
python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16 but the generated outputfile dosent work with ollama.

`Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"``

Merged model files as suggested by @danielhanchen are in order and the config and safe tensor files are present in the folder and there are no errors while generating the merged model.

Can someone please suggest me how I can use the model in ollama without converting it to gguf. I have been trying this to work since 1 month. There were many issues related to corporate proxy, SSL issues, Timeout issues, issues due to dependency versions, issues for building llama.cpp (I tried make, cmake, ninja, vs2022 I have tried everything) but I am stuck on the final step for the model to work with ollama to use it in openweb-ui.

Please suggest what am I doing wrong?

hideaki · 2024-12-16T14:29:08Z

I got this issue on ubuntu, and the following steps worked for me.

Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.
The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.
Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)

hwpoison · 2024-12-25T07:39:00Z

I got this issue on ubuntu, and the following steps worked for me.

Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.

The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.

Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)

It works for me.

In Collab env i used

!(cd llama.cpp; cmake -B build;cmake --build build --config Release)

Then i copied the executable to /content/llama.cpp directory with cp then i re-ran the celd.

danielhanchen added the currently fixing Am fixing now! label Jul 12, 2024

thegenerativegeneration mentioned this issue Jul 26, 2024

pin llama.cpp commit #816

Open

GeneralProtectionFault mentioned this issue Sep 7, 2024

Failed at model.save_pretrained_gguf #341

Open

lastrei mentioned this issue Dec 13, 2024

[TEMP FIX] Ollama / llama.cpp: cannot find tokenizer merges in model file #1065

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist #748

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist #748

okoliechykwuka commented Jul 9, 2024

danielhanchen commented Jul 10, 2024

scherbakovdmitri commented Jul 10, 2024 •

edited

Loading

danielhanchen commented Jul 12, 2024

Zhangy-ly commented Jul 19, 2024

danielhanchen commented Jul 19, 2024

Zhangy-ly commented Jul 19, 2024

okoliechykwuka commented Jul 19, 2024

danielhanchen commented Jul 20, 2024

Deluxer commented Jul 28, 2024

theodufort commented Aug 1, 2024

danielhanchen commented Aug 2, 2024

Zhangy-ly commented Aug 2, 2024

jeehunseo commented Aug 7, 2024

thyarles commented Aug 13, 2024

yuxiaojian commented Aug 20, 2024

danielhanchen commented Aug 24, 2024

whisper-bye commented Aug 29, 2024 •

edited

Loading

throttlekitty commented Sep 9, 2024

danielhanchen commented Sep 10, 2024

Antonytm commented Sep 21, 2024

danielhanchen commented Oct 1, 2024

rorubyy commented Oct 9, 2024 •

edited

Loading

jainpradeep commented Dec 6, 2024 •

edited

Loading

lastrei commented Dec 10, 2024

lastrei commented Dec 10, 2024

danielhanchen commented Dec 12, 2024

jainpradeep commented Dec 13, 2024

hideaki commented Dec 16, 2024

hwpoison commented Dec 25, 2024

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist #748

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist #748

Comments

okoliechykwuka commented Jul 9, 2024

danielhanchen commented Jul 10, 2024

scherbakovdmitri commented Jul 10, 2024 • edited Loading

danielhanchen commented Jul 12, 2024

Zhangy-ly commented Jul 19, 2024

danielhanchen commented Jul 19, 2024

Zhangy-ly commented Jul 19, 2024

okoliechykwuka commented Jul 19, 2024

danielhanchen commented Jul 20, 2024

Deluxer commented Jul 28, 2024

theodufort commented Aug 1, 2024

danielhanchen commented Aug 2, 2024

Zhangy-ly commented Aug 2, 2024

Name Version

jeehunseo commented Aug 7, 2024

thyarles commented Aug 13, 2024

yuxiaojian commented Aug 20, 2024

danielhanchen commented Aug 24, 2024

whisper-bye commented Aug 29, 2024 • edited Loading

throttlekitty commented Sep 9, 2024

danielhanchen commented Sep 10, 2024

Antonytm commented Sep 21, 2024

danielhanchen commented Oct 1, 2024

rorubyy commented Oct 9, 2024 • edited Loading

jainpradeep commented Dec 6, 2024 • edited Loading

lastrei commented Dec 10, 2024

lastrei commented Dec 10, 2024

danielhanchen commented Dec 12, 2024

jainpradeep commented Dec 13, 2024

hideaki commented Dec 16, 2024

hwpoison commented Dec 25, 2024

scherbakovdmitri commented Jul 10, 2024 •

edited

Loading

whisper-bye commented Aug 29, 2024 •

edited

Loading

rorubyy commented Oct 9, 2024 •

edited

Loading

jainpradeep commented Dec 6, 2024 •

edited

Loading