Skip to content

Conversation

@Elon7069
Copy link

Summary: Fix confusing TypeError from SentencePiece when loading Voxtral tokenizer without mistral-common by raising a clear ImportError instead. Fixes #41553.
Rationale: Without mistral-common, loading via AutoTokenizer ends in a low-level TypeError inside sentencepiece. Users should see a direct, actionable message.
What’s changed
AutoTokenizer guard:
In AutoTokenizer.from_pretrained, after resolving config and before class selection, raise a clear error if config.model_type == "voxtral" and mistral-common is missing.
Message:
"The Voxtral tokenizer requires the 'mistral-common' package. Please install it using pip install mistral-common."
Tests:
Added tests/models/voxtral/test_tokenization_voxtral.py
Mocks is_mistral_common_available to False and get_tokenizer_config to avoid network, then asserts ImportError mentioning "mistral-common".
Why this approach
Keeps mapping logic intact; avoids unexpected fallbacks.
Ensures the user sees a clear, actionable message as early as possible in the loading path.
Testing
Targeted test: pytest tests/models/voxtral/test_tokenization_voxtral.py -q → passes.
No network calls thanks to test monkeypatching.
Backward compatibility
No behavior change when mistral-common is installed.
Only affects Voxtral when the dependency is missing.

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, voxtral

@Elon7069
Copy link
Author

Maintainer please check my PR lemme know what i have to improve more

@Rocketknight1
Copy link
Member

I think this is a duplicate of #41592!

@Elon7069
Copy link
Author

I think both are different ki lemme check> I think this is a duplicate of #41592!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bad error message for AutoTokenizer loading Voxtral

2 participants