Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist #748

Open
okoliechykwuka opened this issue Jul 9, 2024 · 29 comments
Labels
currently fixing Am fixing now!

Comments

@okoliechykwuka
Copy link

The below error occured while trying to convert model to gguf format.

I noticed that quantized folder resides in llama.cpp/examples/quantize

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?

# Save to q4_k_m GGUF
if True: model.save_pretrained_gguf("model", tokenizer, quantization_method = "q4_k_m")
@danielhanchen
Copy link
Contributor

Weird I just tried it in the last hour and it works

@scherbakovdmitri
Copy link

scherbakovdmitri commented Jul 10, 2024

It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth
ggerganov/llama.cpp#8107
if you run make command in llama.cpp folder it will work

@danielhanchen
Copy link
Contributor

Weird it stopped working? Hmm I shall try this in Colab and report back!

@danielhanchen danielhanchen added the currently fixing Am fixing now! label Jul 12, 2024
@Zhangy-ly
Copy link

I have the same problem. Is there a solution now?

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?

@danielhanchen
Copy link
Contributor

It should function - are you using Colab?

@Zhangy-ly
Copy link

It should function - are you using Colab?

Well, mine is as follows:
NVIDIA V100
Driver Version: 535.146.02
CUDA Version: 12.1

I temporarily solved this problem by rolling back llama.cpp

cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1

@okoliechykwuka
Copy link
Author

@danielhanchen Yes, I am using colab, but I am still having the same error.

@danielhanchen
Copy link
Contributor

Wait weird I just ran it with no errors in Colab - it's best to use our updated notebooks on our Github and start a fresh

@Deluxer
Copy link

Deluxer commented Jul 28, 2024

@Zhangy-ly That is an effective workaround.

cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1

@theodufort
Copy link

@Zhangy-ly That is an effective workaround.

cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1

To anyone having error while using those bash commands, use: ! before each command

@danielhanchen
Copy link
Contributor

Wait so the issue persists? Are people using Colab / Runpod?

@Zhangy-ly
Copy link

Wait so the issue persists? Are people using Colab / Runpod?

Hi Daniel,

Thank you for your response.

To clarify, the issue persists on my Ubuntu setup, although it seems to run without problems on Colab. Is there any other information you need to help diagnose the issue? plz tell me.

Ubuntu, NVIDIA V100, Driver Version: 535.146.02, CUDA Version: 12.1

packages in environment:

Name Version

_libgcc_mutex 0.1
_openmp_mutex 5.1
accelerate 0.32.1
aiohttp 3.9.5
aiosignal 1.3.1
async-timeout 4.0.3
attrs 23.2.0
bitsandbytes 0.43.2
blas 1.0
brotli-python 1.0.9
bzip2 1.0.8
ca-certificates 2024.3.11
certifi 2024.7.4
charset-normalizer 2.0.4
cuda-cudart 12.1.105
cuda-cupti 12.1.105
cuda-libraries 12.1.0
cuda-nvrtc 12.1.105
cuda-nvtx 12.1.105
cuda-opencl 12.5.39
cuda-runtime 12.1.0
cuda-version 12.5
datasets 2.20.0
dill 0.3.8
docstring-parser 0.16
ffmpeg 4.3
filelock 3.13.1
freetype 2.12.1
frozenlist 1.4.1
fsspec 2024.2.0
gguf 0.9.1
gmp 6.2.1
gmpy2 2.1.2
gnutls 3.6.15
huggingface-hub 0.23.4
idna 3.7
intel-openmp 2023.1.0
jinja2 3.1.3
jpeg 9e
lame 3.100
lcms2 2.12
ld_impl_linux-64 2.38
lerc 3.0
libcublas 12.1.0.26
libcufft 11.0.2.4
libcufile 1.10.1.7
libcurand 10.3.6.82
libcusolver 11.4.4.55
libcusparse 12.0.2.55
libdeflate 1.17
libffi 3.4.4
libgcc-ng 11.2.0
libgomp 11.2.0
libiconv 1.16
libidn2 2.3.4
libjpeg-turbo 2.0.0
libnpp 12.0.2.50
libnvjitlink 12.1.105
libnvjpeg 12.1.1.14
libpng 1.6.39
libstdcxx-ng 11.2.0
libtasn1 4.19.0
libtiff 4.5.1
libunistring 0.9.10
libuuid 1.41.5
libwebp-base 1.3.2
llvm-openmp 14.0.6
lz4-c 1.9.4
markdown-it-py 3.0.0
markupsafe 2.1.5
mdurl 0.1.2
mkl 2023.1.0
mkl-service 2.4.0
mkl_fft 1.3.8
mkl_random 1.2.4
mpc 1.1.0
mpfr 4.0.2
mpmath 1.3.0
multidict 6.0.5
multiprocess 0.70.16
ncurses 6.4
nettle 3.7.3
networkx 3.2.1
numpy 1.26.4
numpy-base 1.26.4
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-nccl-cu12 2.19.3
nvidia-nvjitlink-cu12 12.1.105
nvidia-nvtx-cu12 12.1.105
openh264 2.1.1
openjpeg 2.4.0
openssl 3.0.14
packaging 24.1
pandas 2.2.2
peft 0.11.1
pillow 10.3.0
pip 24.0
protobuf 3.20.3
psutil 6.0.0
pyarrow 16.1.0
pyarrow-hotfix 0.6
pygments 2.18.0
pysocks 1.7.1
python 3.10.13
python-dateutil 2.9.0.post0
pytorch-cuda 12.1
pytorch-mutex 1.0
pytz 2024.1
pyyaml 6.0.1
readline 8.2
regex 2024.5.15
requests 2.32.2
rich 13.7.1
safetensors 0.4.3
sentencepiece 0.2.0
setuptools 69.5.1
shtab 1.7.1
six 1.16.0
sqlite 3.45.3
sympy 1.12
tbb 2021.8.0
tk 8.6.14
tokenizers 0.19.1
torch 2.2.0+cu121
torchaudio 2.2.0
torchvision 0.17.0
tqdm 4.66.4
transformers 4.43.1
triton 2.2.0
trl 0.8.6
typing-extensions 4.9.0
tyro 0.8.5
tzdata 2024.1
unsloth 2024.7
urllib3 2.2.2
wheel 0.43.0
xformers 0.0.24
xxhash 3.4.1
xz 5.4.6
yaml 0.2.5
yarl 1.9.4
zlib 1.2.13
zstd 1.5.5

@jeehunseo
Copy link

@Zhangy-ly That is an effective workaround.

cd llama.cpp
git checkout b3345
git submodule update --init --recursive
make clean
make all -j
git log -1

solved my situation. There are no llama-quantize and quantize file in the newest git source(08/07/2024). So, unslothai should install the specific version of llama.cpp to fix this issue. Thank you! ;)

@thyarles
Copy link

It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth ggerganov/llama.cpp#8107 if you run make command in llama.cpp folder it will work

Same problem here. This tip solved the issue.

$ cd llama.cpp
make

@yuxiaojian
Copy link

manually make works. it generates llama.cpp/llama-quantize

$ cd llama.cpp
make

It looks we need to first run make on llama cpp folder manually, not sure why it stopped working in unsloth ggerganov/llama.cpp#8107 if you run make command in llama.cpp folder it will work

Same problem here. This tip solved the issue.

$ cd llama.cpp
make

@danielhanchen
Copy link
Contributor

Hmm I might have to re-take a look why it's not working - maybe my calling mechanisms aren't functioning correctly

@whisper-bye
Copy link

whisper-bye commented Aug 29, 2024

On windows I need to remove the extension llama-quantize.exe

and then

%}{{'<|im_start|>user
' + message['content'] + '<|im_end|>
'}}{% elif message['role'] == 'assistant' %}{{'<|im_start|>assistant
' + message['content'] + '<|im_end|>
' }}{% else %}{{ '<|im_start|>system
' + message['content'] + '<|im_end|>
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}{% else %}{% for message in messages %}{% if message['from'] == 'human' %}{{'<|im_start|>user
' + message['value'] + '<|im_end|>
'}}{% elif message['from'] == 'gpt' %}{{'<|im_start|>assistant
' + message['value'] + '<|im_end|>
' }}{% else %}{{ '<|im_start|>system
' + message['value'] + '<|im_end|>
' }}{% endif %}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant
' }}{% endif %}{% endif %}
INFO:hf-to-gguf:Set model quantization version
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:model\unsloth.BF16.gguf: n_tensors = 339, total_size = 15.2G
Writing: 100%|██████████| 15.2G/15.2G [01:00<00:00, 250Mbyte/s]
INFO:hf-to-gguf:Model successfully exported to model\unsloth.BF16.gguf
Unsloth: Conversion completed! Output location: ./model/unsloth.BF16.gguf
Unsloth: [2] Converting GGUF 16bit into q4_k_m. This will take 20 minutes...
'.' is not recognized as an internal or external command,
operable program or batch file.
RuntimeError: Unsloth: Quantization failed! You might have to compile llama.cpp yourself, then run this again.
You do not need to close this Python program. Run the following commands in a new terminal:
You must run this in the same folder as you're saving your model.
git clone --recursive https://github.com/ggerganov/llama.cpp
cd llama.cpp && make clean && make all -j
Once that's done, redo the quantization.

@throttlekitty
Copy link

A bit of a noob here, but I have a workaround. I had built llama.cpp with VS2022 using cmake. I had a llama.cpp\bin\Releases with the resulting dll and exe files, which unsloth couldn't find. Simply copying that whole folder to llama.cpp\llama-quantize worked. I was initially confused as to what exactly unsloth was looking for.

@danielhanchen
Copy link
Contributor

Sorry on the issues on llama.cpp :(
I might actually make a section with exact details on how to do llama.cpp properly

@Antonytm
Copy link

Same for me

File ~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:975, in save_to_gguf(model_type, model_dtype, is_sentencepiece, model_directory, quantization_method, first_conversion, _run_installer)
    [973](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:973)     quantize_location = "llama.cpp/llama-quantize"
    [974](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:974) else:
--> [975](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:975)     raise RuntimeError(
    [976](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:976)         "Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.\n"\
    [977](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:977)         "But we expect this file to exist! Maybe the llama.cpp developers changed the name?"
    [978](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:978)     )
...
    [981](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:981) # See https://github.com/unslothai/unsloth/pull/730
    [982](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/mnt/c/source/sitecore-llm/train/~/anaconda3/envs/finetuning/lib/python3.10/site-packages/unsloth/save.py:982) # Filenames changed again!

RuntimeError: Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.
But we expect this file to exist! Maybe the llama.cpp developers changed the name?

@danielhanchen
Copy link
Contributor

@Antonytm Would https://github.com/unslothai/unsloth/wiki#manually-saving-to-gguf be helpful? Sorry on the delay!

@rorubyy
Copy link

rorubyy commented Oct 9, 2024

@danielhanchen yes! It works. 👍

@jainpradeep
Copy link

jainpradeep commented Dec 6, 2024

I tried building the same with cmake but exe's and dll's are not getting generated. I have manually copied the dll's and exe's from the release builds but I get the same issue. I then converted the model to gguf manually

python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16

mo del file gets generated but on creating this model with ollama from the gguf file I get the following error

`C:\Users\Desktop\New folder>ollama create unsloth_m -f "C:\Users\Desktop\New folder\op.gguf"

Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"`

Please help.

@lastrei
Copy link

lastrei commented Dec 10, 2024

git log -1

save me work,thanks

@lastrei
Copy link

lastrei commented Dec 10, 2024

I tried building the same with cmake but exe's and dll's are not getting generated. I have manually copied the dll's and exe's from the release builds but I get the same issue. I then converted the model to gguf manually

python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16

mo del file gets generated but on creating this model with ollama from the gguf file I get the following error

`C:\Users\Desktop\New folder>ollama create unsloth_m -f "C:\Users\Desktop\New folder\op.gguf"

Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"`

Please help.

seems there is someting wrong with your modelfile , usually on the top is FROM model_name.gguf

@danielhanchen
Copy link
Contributor

@jainpradeep Windows right? Also apologies on the delay - Modelfile should look like https://github.com/ollama/ollama/blob/main/docs/modelfile.md and Windows building for llama.cpp can be tough - see https://github.com/ggerganov/llama.cpp/blob/master/docs/build.md

I was planning to add more stable support for Windows in the future

@jainpradeep
Copy link

The solution provided by @Zhangy-ly checking out llama.cpp branch doesn't seem to work anymore. I used cmake
as advised in the updated build documentation

git checkout b3345 git submodule update --init --recursive cmake -B build cmake --build build --config Release git log -1

CMake generates an out-of-source build by default, meaning the build artifacts (compiled binaries, etc.) are placed in a separate build folder (e.g., build/Release) instead of the source folder (llama.cpp). I copied all the binaries in the build folder to the root folder and re-ran the unsloth Llama_3_2_1B+3B_Conversational_+_2x_faster_finetuning collab. But still I get the same

RuntimeError( "Unsloth: The file 'llama.cpp/llama-quantize' or 'llama.cpp/quantize' does not exist.\n"\ "But we expect this file to exist! Maybe the llama.cpp developers changed the name?" )

As a alternate workaround I tried converting the model to gguf manually using
python llama.cpp/convert_lora_to_gguf.py "C:\Users\Desktop\New folder\model" --outfile "C:\Users\Desktop\New folder\op.gguf" --outtype f16 but the generated outputfile dosent work with ollama.

`Error: (line 1): command must be one of "from", "license", "template", "system", "adapter", "parameter", or "message"``

Merged model files as suggested by @danielhanchen are in order and the config and safe tensor files are present in the folder and there are no errors while generating the merged model.

Can someone please suggest me how I can use the model in ollama without converting it to gguf. I have been trying this to work since 1 month. There were many issues related to corporate proxy, SSL issues, Timeout issues, issues due to dependency versions, issues for building llama.cpp (I tried make, cmake, ninja, vs2022 I have tried everything) but I am stuck on the final step for the model to work with ollama to use it in openweb-ui.

Please suggest what am I doing wrong?

@hideaki
Copy link

hideaki commented Dec 16, 2024

I got this issue on ubuntu, and the following steps worked for me.

  1. Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.
  2. The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.
  3. Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)

@hwpoison
Copy link

I got this issue on ubuntu, and the following steps worked for me.

  1. Build manually with cmake (It seems that make does not work anymore.), following the llama.cpp build instruction.
  2. The above creates executable files (llama-*) under llama.cpp/build/bin. Copy them over to directly under llama.cpp/.
  3. Re-run the failed unsloth call. (In my case, push_to_hub_gguf.)

It works for me.

In Collab env i used

!(cd llama.cpp; cmake -B build;cmake --build build --config Release)

Then i copied the executable to /content/llama.cpp directory with cp then i re-ran the celd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
currently fixing Am fixing now!
Projects
None yet
Development

No branches or pull requests