-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot load codeqwen Tokenizer #30324
Comments
Hi @zch-cc, thanks for reporting! I'm able to replicate. From bisect, seems to be coming from #30289 cc @Narsil @ArthurZucker. Happens even when upgrading tokenizers. |
This problem also affects other models, for example https://huggingface.co/sdadas/mmlw-retrieval-roberta-large |
The same issue for me of CodeQwen 1.5 7B |
same issue! |
Quick and dirty solution: downgrade tokenizers requrements.txt:
$ pip install -r requrements.txt It worked for me |
Mmm there seems to be something wrong with the serialization / de-serialization. |
I think the authors fixed the format 🤗 It was probably not serialized with |
System Info
transformers
version: 4.40.0Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I cannot load model CodeQwen with AutoTokenizer
Code Qwen comes out this week and I would like to use it. https://huggingface.co/Qwen/CodeQwen1.5-7B
But I run into this error when loading the tokenizer
According to ggerganov/llama.cpp#6707 the difference between codeqwen and qwen1.5 is they use different tokenizer based on sentencepiece
Expected behavior
This issue maybe belong to a new model support but I think changing the tokenizer will work Successfully load the tokenizer
The text was updated successfully, but these errors were encountered: