-
Notifications
You must be signed in to change notification settings - Fork 30.9k
Open
Labels
Description
System Info
Getting the following unhelpful error when trying to load Voxtral's tokenizer with AutoTokenizer without mistral-common installed.
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:1144: in from_pretrained
return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2070: in from_pretrained
return cls._from_pretrained(
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2108: in _from_pretrained
slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2316: in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py:171: in __init__
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py:198: in get_spm_processor
tokenizer.Load(self.vocab_file)
../../.conda/envs/et_new/lib/python3.10/site-packages/sentencepiece/__init__.py:961: in Load
return self.LoadFromFile(model_file)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <sentencepiece.SentencePieceProcessor; proxy of <Swig Object of type 'sentencepiece::SentencePieceProcessor *' at 0x7f5e7e25f780> >, arg = None
def LoadFromFile(self, arg):
> return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
E TypeError: not a string
../../.conda/envs/et_new/lib/python3.10/site-packages/sentencepiece/__init__.py:316: TypeError
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
pip install transformersfrom transformers import AutoTokenizer; AutoTokenizer.from_pretrained("mistralai/Voxtral-Mini-3B-2507")
Expected behavior
A clearer error message, suggesting to pip install mistral-common