Skip to content

Bad error message for AutoTokenizer loading Voxtral #41553

@jackzhxng

Description

@jackzhxng

System Info

Getting the following unhelpful error when trying to load Voxtral's tokenizer with AutoTokenizer without mistral-common installed.

../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py:1144: in from_pretrained
    return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2070: in from_pretrained
    return cls._from_pretrained(
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2108: in _from_pretrained
    slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:2316: in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py:171: in __init__
    self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
../../.conda/envs/et_new/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py:198: in get_spm_processor
    tokenizer.Load(self.vocab_file)
../../.conda/envs/et_new/lib/python3.10/site-packages/sentencepiece/__init__.py:961: in Load
    return self.LoadFromFile(model_file)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <sentencepiece.SentencePieceProcessor; proxy of <Swig Object of type 'sentencepiece::SentencePieceProcessor *' at 0x7f5e7e25f780> >, arg = None

    def LoadFromFile(self, arg):
>       return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
E       TypeError: not a string

../../.conda/envs/et_new/lib/python3.10/site-packages/sentencepiece/__init__.py:316: TypeError

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

  1. pip install transformers
  2. from transformers import AutoTokenizer; AutoTokenizer.from_pretrained("mistralai/Voxtral-Mini-3B-2507")

Expected behavior

A clearer error message, suggesting to pip install mistral-common

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions