Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

Closed
1 of 4 tasks
Itime-ren opened this issue Oct 8, 2024 · 3 comments
Labels
bug Core: Tokenization Internals of the library; Tokenization.

Comments

@Itime-ren
Copy link

System Info

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in #24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Traceback (most recent call last):
File "/Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 479, in
main()
File "/Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 457, in main
write_tokenizer(
File "/Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 367, in write_tokenizer
tokenizer = tokenizer_class(input_tokenizer_path)
File "/home/transformers/src/transformers/models/llama/tokenization_llama_fast.py", line 157, in init
super().init(
File "/home/transformers/src/transformers/tokenization_utils_fast.py", line 132, in init
slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
File "/home/transformers/src/transformers/models/llama/tokenization_llama.py", line 171, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
File "/home/transformers/src/transformers/models/llama/tokenization_llama.py", line 198, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model

Who can help?

@ArthurZucker @itazap

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

python3 /Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py
--input_dir /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct
--model_size 1B
--output_dir /Data_disk/meta_llama/meta_llama3.2/out

Expected behavior

get safetensors

@Itime-ren Itime-ren added the bug label Oct 8, 2024
@LysandreJik LysandreJik added the Core: Tokenization Internals of the library; Tokenization. label Oct 8, 2024
@LysandreJik
Copy link
Member

Hey @Itime-ren, what's the content of /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct?

If trying to use the llama 3.2 1B Instruct, why don't you use this repo which is already transformers-compatible?

Copy link

github-actions bot commented Nov 7, 2024

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@pilotofbalance
Copy link

I'm getting the same error while trying to parse tokenizer.model return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from ./llama3_weights/tokenizer.model

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Core: Tokenization Internals of the library; Tokenization.
Projects
None yet
Development

No branches or pull requests

3 participants