Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TEMP FIX] Ollama / llama.cpp: cannot find tokenizer merges in model file #1065

Open
thackmann opened this issue Sep 27, 2024 · 36 comments
Open
Labels
fixed - pending confirmation Fixed, waiting for confirmation from poster URGENT BUG Urgent bug

Comments

@thackmann
Copy link

Thank you for developing this useful resource. The Ollama notebook reports

{"error":"llama runner process has terminated: error loading modelvocabulary: cannot find tokenizer merges in model file"}

This is the notebook with the error. It is a copy of the original notebook.

This seems similar to the issue reported in #1062.

@laoc81
Copy link

laoc81 commented Sep 27, 2024

Thank you for miraculous "unsloth"!! IT was working very well las week.

Now, i am having the same problem that @thackmann:

My notebook -> transformers 4.44.2 (the same last week).

Error: llama runner process has terminated: error loading model: error loading model vocabulary: cannot find tokenizer merges in model file

@xmaayy
Copy link

xmaayy commented Sep 27, 2024

Same issue!

1 similar comment
@ThaisBarrosAlvim
Copy link

ThaisBarrosAlvim commented Sep 28, 2024

Same issue!

@kingabzpro
Copy link

same issue.

@Mukunda-Gogoi
Copy link

facing similar issues is there a fix?? I m blocked!

@Saber120
Copy link

Saber120 commented Sep 28, 2024

same issue with llama3.2 3B , any solution please

@shimmyshimmer
Copy link
Collaborator

Hey guys working on a fix. The new transformers version kind of broke everything

@adampetr
Copy link

Same issue .. anyone have an idea where the problem is located.

@kingabzpro
Copy link

same issue with llama3.2 3B , any solution please

Yes. I tried to work around. Using llama.cpp but it didnt worked. The issues arise when we fine-tune and save the model.

@williamzebrowskI
Copy link

williamzebrowskI commented Sep 28, 2024

Same issue. Huge bummer - literally spent hours fine tuning and uploading to HF to get these error the past couple of days thinking it was me.

@Franky-W
Copy link

same issue here.

thank you @shimmyshimmer for working on the fix!

@mahiatlinux
Copy link
Contributor

Hey guys. Yes, this is a current issue. But the boys are working to fix it. If you saved LORA, you might not have to rerun training.

@williamzebrowskI
Copy link

There is a workaround that was posted here and it worked for me.

#1062 (comment)

@kingabzpro
Copy link

There is a workaround that was posted here and it worked for me.

#1062 (comment)

This will not work for Llama 3.2 models.

@gianmarcoalessio
Copy link

same issue!!

@David33706
Copy link

same issue

@shimmyshimmer shimmyshimmer added currently fixing Am fixing now! URGENT BUG Urgent bug help wanted Help from the OSS community wanted! labels Sep 29, 2024
@FotieMConstant
Copy link

FotieMConstant commented Sep 29, 2024

same issue here, any fix anyone?

here is the error i get aftery trying to run a ft model via ollama

Error: llama runner process has terminated: error loading modelvocabulary: cannot find tokenizer merges in model file

@avvRobertoAlma
Copy link

I have same issue with llama 3
llama.cpp error: 'error loading model vocabulary: cannot find tokenizer merges in model file
'

@danielhanchen
Copy link
Contributor

Apologies guys - was out for a few days and its been hectic, so sorry on the delay!! Will get to the bottom of fix and hopefully can fix it today! Sorry and thank you all for your patience!

@danielhanchen
Copy link
Contributor

I can reproduce the error - in fact all of llama.cpp and thus Ollama etc do not work with transformers>=4.45.1 - I'll update everyone on a fix - it looks like HuggingFace's update most likely broke something in tokenizer exports

@danielhanchen danielhanchen changed the title Ollama: cannot find tokenizer merges in model file Ollama / llama.cpp: cannot find tokenizer merges in model file Sep 30, 2024
@danielhanchen danielhanchen removed the help wanted Help from the OSS community wanted! label Sep 30, 2024
@drsanta-1337
Copy link

drsanta-1337 commented Sep 30, 2024

@danielhanchen
check this comment out, see if it helps.

huggingface/tokenizers#1553 (comment)

@danielhanchen
Copy link
Contributor

danielhanchen commented Sep 30, 2024

I just communicated with the Hugging Face team - they will upstream updates to llama.cpp later in the week. It seems like tokenizers>=0.20.0 is the culprit.

I re-uploaded all Llama-3.2 models and as a temporary fix, Unsloth will use transformers==4.44.2.

Please try again and see if it works! This unfortunately means you need to re-finetune the model if you did not save the 16bit merged HF weights or the LoRA weights - extreme apologiesnif you saved them, simply update Unsloth then reload them and convert to GGUF.

Update Unsloth via:

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

I will update everyone once the Hugging Face team resolves the issue! Sorry again!

Pinging everyone (and apologies for the issues and inconvenience again!!) @xmaayy @avvRobertoAlma @thackmann @kingabzpro @williamzebrowskI @FotieMConstant @laoc81 @gianmarcoalessio @ThaisBarrosAlvim @Franky-W @Saber120 @adampetr @David33706 @Mukunda-Gogoi

@danielhanchen danielhanchen added fixed - pending confirmation Fixed, waiting for confirmation from poster and removed currently fixing Am fixing now! labels Sep 30, 2024
@danielhanchen danielhanchen changed the title Ollama / llama.cpp: cannot find tokenizer merges in model file [TEMP FIX] Ollama / llama.cpp: cannot find tokenizer merges in model file Sep 30, 2024
@danielhanchen danielhanchen pinned this issue Sep 30, 2024
@LysandreJik
Copy link

LysandreJik commented Sep 30, 2024

Thanks @danielhanchen, and sorry for the disturbances; to give the context as to what is happening here, we updated the format of merges serialization in tokenizers to be much more flexible (this was done in this commit):

image

The change was done to be backwards-compatible : tokenizers and all libraries that depend on it will keep the ability to load merge files which were serialized in the old way.

However, it could not be forwards-compatible: if a file is serialized with the new format, older versions of tokenizers will not be able to load it.

This is why we're seeing this issue: new files are serialized using the new version, and these files are not loadable in llama.cpp, yet. We're updating all other codepaths (namely llama.cpp) to adapt to the new version. Once that is shipped, all your trained checkpoints will be directly loadable as usual. We're working with llama.cpp to ship this as fast as possible.

Thank you!

Issue tracker in llama.cpp: ggerganov/llama.cpp#9692

@danielhanchen
Copy link
Contributor

Sorry for the poor wording! Yep so if anyone has already saved the LoRA or 16bit weights (before converting to GGUF or ollama) you can reload it in Unsloth then save again after updating unsloth as a temporary solution as well.

@Saber120
Copy link

I just communicated with the Hugging Face team - they will upstream updates to llama.cpp later in the week. It seems like tokenizers>=0.20.0 is the culprit.

I re-uploaded all Llama-3.2 models and as a temporary fix, Unsloth will use transformers==4.44.2.

Please try again and see if it works! This unfortunately means you need to re-finetune the model if you did not save the 16bit merged HF weights or the LoRA weights - extreme apologiesnif you saved them, simply update Unsloth then reload them and convert to GGUF.

Update Unsloth via:

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

I will update everyone once the Hugging Face team resolves the issue! Sorry again!

Pinging everyone (and apologies for the issues and inconvenience again!!) @xmaayy @avvRobertoAlma @thackmann @kingabzpro @williamzebrowskI @FotieMConstant @laoc81 @gianmarcoalessio @ThaisBarrosAlvim @Franky-W @Saber120 @adampetr @David33706 @Mukunda-Gogoi

Thank you for the update! I followed the steps you provided, and I’m happy to report that it worked perfectly on my end. I updated Unsloth, reloaded the saved weights, and successfully converted them to GGUF. Everything is running smoothly now with the transformers==4.44.2 fix.

I appreciate the quick re-upload and the detailed instructions. I’ll keep an eye out for the official update from Hugging Face, but for now, everything seems to be working great.

Thanks again for your efforts!

Best regards,

@thackmann
Copy link
Author

Thank you @danielhanchen for the quick fix. The original notebook is now working.

@kingabzpro
Copy link

The fix is not working on Kaggle.

@FotieMConstant
Copy link

I just communicated with the Hugging Face team - they will upstream updates to llama.cpp later in the week. It seems like tokenizers>=0.20.0 is the culprit.

I re-uploaded all Llama-3.2 models and as a temporary fix, Unsloth will use transformers==4.44.2.

Please try again and see if it works! This unfortunately means you need to re-finetune the model if you did not save the 16bit merged HF weights or the LoRA weights - extreme apologiesnif you saved them, simply update Unsloth then reload them and convert to GGUF.

Update Unsloth via:

pip uninstall unsloth -y
pip install --upgrade --no-cache-dir "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"

I will update everyone once the Hugging Face team resolves the issue! Sorry again!

Pinging everyone (and apologies for the issues and inconvenience again!!) @xmaayy @avvRobertoAlma @thackmann @kingabzpro @williamzebrowskI @FotieMConstant @laoc81 @gianmarcoalessio @ThaisBarrosAlvim @Franky-W @Saber120 @adampetr @David33706 @Mukunda-Gogoi

image I get this error when i run the collab after applying the changes, seems to be an issue

@danielhanchen
Copy link
Contributor

@kingabzpro I just updated pypi so pip install unsloth should have the temporary fixes - you might have to restart Kaggle

@kingabzpro
Copy link

@kingabzpro I just updated pypi so pip install unsloth should have the temporary fixes - you might have to restart Kaggle

It is working on Kaggle now. Thank you.

@danielhanchen danielhanchen unpinned this issue Oct 19, 2024
@lastrei
Copy link

lastrei commented Dec 12, 2024

i'm sorry , but in still error in Version: 2024.12.4
i install unsloth with pip install "unsloth[cu121-torch240] @ git+https://github.com/unslothai/unsloth.git"
the transformer version is

Name: transformers
Version: 4.46.3

numpy version is

Name: numpy
Version: 1.26.4

i have save the adapter model,and convert to gguf ,when i run it in ollama ,it's still the same error:

Error: llama runner process has terminated: error loading model: error loading model vocabulary: cannot find tokenizer merges in model file

@danielhanchen
Copy link
Contributor

@lastrei Apologies - do you know which model exactly?

@lastrei
Copy link

lastrei commented Dec 13, 2024

@lastrei Apologies - do you know which model exactly?

thanks danielhanchen,
the model is unsloth/meta-llama-3.1-8b-instruct-bnb-4bit
btw:
I used llama-cpp to manually convert apater to gguf, which can be used in ollama with basemodel. This transformer upgrade brought a lot of troubles, and there are also corresponding llama-cpp problems:#748

@ethanelasky
Copy link

ethanelasky commented Dec 13, 2024

I am getting a similar issue with the model meta-llama/Llama-3.1-8B-Instruct and the same numpy and transformers versions as @lastrei.

Torch: 2.5.1, Cuda toolkit 12.1

@JohnWangCH
Copy link

I encountered the same problem a few hours ago.
The problem has been solved with the following steps on my end:

  • Rebuild llama.cpp with the follow commands:
cd llama.cpp
git checkout git checkout a6744e43e80f4be6398fc7733a01642c846dce1d
git submodule update --init --recursive
make clean
make all -j
  • Call model.save_pretrained_gguf again
if True: model.save_pretrained_gguf("model", tokenizer,)

Then, ollama can run the model w/o problems. FYR.

My env:
model_name = "unsloth/Llama-3.2-3B-Instruct",

Name: numpy
Version: 1.26.4

Name: transformers
Version: 4.46.3

Name: unsloth
Version: 2024.12.4

@lastrei
Copy link

lastrei commented Dec 14, 2024

I encountered the same problem a few hours ago. The problem has been solved with the following steps on my end:

  • Rebuild llama.cpp with the follow commands:
cd llama.cpp
git checkout git checkout a6744e43e80f4be6398fc7733a01642c846dce1d
git submodule update --init --recursive
make clean
make all -j
  • Call model.save_pretrained_gguf again
if True: model.save_pretrained_gguf("model", tokenizer,)

Then, ollama can run the model w/o problems. FYR.

My env: model_name = "unsloth/Llama-3.2-3B-Instruct",

Name: numpy Version: 1.26.4

Name: transformers Version: 4.46.3

Name: unsloth Version: 2024.12.4

thanks JohnWangCH,i will try it


it's works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fixed - pending confirmation Fixed, waiting for confirmation from poster URGENT BUG Urgent bug
Projects
None yet
Development

No branches or pull requests