-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues loading tokenizer/Support loading tokenizer.model? #239
Comments
Usually we look for a different repository that uses the same tokenizer and has According to this paragraph the "fast tokenizer" (dumped/loaded from We can send a PR to the hf repo with the tokenizer file, which we did for a couple repos in the past, so I will keep this open :) |
Thanks a lot! I was looking around pretty hard for a 1B or 3B model to test with on my laptop since I don't have the memory really needed to run with a 7B+ model but that makes sense. For my own reference and usage, is generating a # python
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("<model>")
tokenizer.save_pretrained("<dir_to_save_to>") Do you have any sense for the work to handle the model file natively in elixir is? |
@bianchidotdev this is precisely it! When you call If you want to open a PR on the HF repos, here's an example, just make sure you have latest transformers installed locally before doing the conversion. No pressure though, I can also do it later :) |
@bianchidotdev I opened a PR while testing a new conversion tool and I noticed you opened one already, thanks! FTR you don't have to wait for the PR to be merged, you can just reference the PR commit directly: {:ok, tokenizer} =
Bumblebee.load_tokenizer(
{:hf, "openlm-research/open_llama_3b_v2",
revision: "52944fc4e35e6ca00e733b95df79498728016e1d"}
) |
Also, I improved the error messages in #256, so it will be clear why the tokenizer cannot be loaded. And we have a new section in the README with actions the user may take :) |
I'm having issues loading certain models on Huggingface that might largely be an issue with those repos rather than bumblebee.
What I'm seeing:
It looks like it's failing searching for a
tokenizer.json
. Unfortunately, the huggingface repo ships only with atokenizer.model
and related config files but not atokenizer.json
and it appears quite a few models on huggingface follow suit.I'm not sure what the effort would be to support loading the model directly or if there are other ways around this.
The text was updated successfully, but these errors were encountered: