RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

Itime-ren · 2024-10-08T02:37:08Z

System Info

You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama_fast.LlamaTokenizerFast'>. This is expected, and simply means that the legacy (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set legacy=False. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in #24565 - if you loaded a llama tokenizer from a GGUF file you can ignore this message.
Traceback (most recent call last):
File "/Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 479, in
main()
File "/Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 457, in main
write_tokenizer(
File "/Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py", line 367, in write_tokenizer
tokenizer = tokenizer_class(input_tokenizer_path)
File "/home/transformers/src/transformers/models/llama/tokenization_llama_fast.py", line 157, in init
super().init(
File "/home/transformers/src/transformers/tokenization_utils_fast.py", line 132, in init
slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
File "/home/transformers/src/transformers/models/llama/tokenization_llama.py", line 171, in init
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
File "/home/transformers/src/transformers/models/llama/tokenization_llama.py", line 198, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/usr/local/lib/python3.10/dist-packages/sentencepiece/init.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model

Who can help?

@ArthurZucker @itazap

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

python3 /Data_disk/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py
--input_dir /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct
--model_size 1B
--output_dir /Data_disk/meta_llama/meta_llama3.2/out

Expected behavior

get safetensors

The text was updated successfully, but these errors were encountered:

LysandreJik · 2024-10-08T10:17:43Z

Hey @Itime-ren, what's the content of /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct?

If trying to use the llama 3.2 1B Instruct, why don't you use this repo which is already transformers-compatible?

github-actions · 2024-11-07T08:03:45Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

pilotofbalance · 2025-01-04T21:23:38Z

I'm getting the same error while trying to parse tokenizer.model return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
RuntimeError: Internal: could not parse ModelProto from ./llama3_weights/tokenizer.model

Itime-ren added the bug label Oct 8, 2024

LysandreJik added the Core: Tokenization Internals of the library; Tokenization. label Oct 8, 2024

github-actions bot closed this as completed Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

Itime-ren commented Oct 8, 2024

LysandreJik commented Oct 8, 2024

github-actions bot commented Nov 7, 2024

pilotofbalance commented Jan 4, 2025

RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

RuntimeError: Internal: could not parse ModelProto from /Data_disk/meta_llama/meta_llama3.2/Llama3.2-1B-Instruct/tokenizer.model #34017

Comments

Itime-ren commented Oct 8, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

LysandreJik commented Oct 8, 2024

github-actions bot commented Nov 7, 2024

pilotofbalance commented Jan 4, 2025