Skip to content

Conversation

@Khansa435
Copy link

@Khansa435 Khansa435 commented Oct 14, 2025

What does this PR do?

Adds a clearer ImportError message when users try to load a Voxtral tokenizer without having mistral-common installed.

Issue

Fixes #41553
Fixes misleading TypeError: not a string when loading Voxtral models.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@ArthurZucker @Rocketknight1
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Khansa435
Copy link
Author

Khansa435 commented Oct 14, 2025

Hi @ArthurZucker @Rocketknight1 My PR passes 21 out of 22 checks and 1 is pending.
Plz review it and give your valuable suggestions :)

@Khansa435
Copy link
Author

Khansa435 commented Oct 15, 2025

@ArthurZucker @Rocketknight1 It passed all checks after the changes I made but then when I merged both branches to stay up-to-date now it gave an error in tests but now all checks are being successful.

Plz review!

@Khansa435
Copy link
Author

@vasqu Can you please review this PR.It passess 21 out of 22 checks and none failed.One is pending.

@vasqu
Copy link
Contributor

vasqu commented Oct 15, 2025

Hey @Khansa435, I directly responded in the issue #41553 (comment). Thx for the PR, but I will leave the decision to the respective maintainers who know more than me on this :D

@vasqu
Copy link
Contributor

vasqu commented Oct 15, 2025

Also dont worry about the CI, we had some issue which should be good now

@AvinashDwivedi
Copy link

@vasqu may you please help me to contribute on some issue as I'm new to opensource I just want to learn.

@vasqu
Copy link
Contributor

vasqu commented Oct 15, 2025

Hey @AvinashDwivedi, sorry about this one. It got a bit messy, usually we should only open 1 PR / coordinate with other people that already provided something. No worries, we will guide you as best as we can if the PR makes sense! We have contributions guide here for example https://huggingface.co/docs/transformers/contributing and those labeled with good first issue are good to start with in general; just make sure that not others already worked on it or that your solution is not the same

@Khansa435
Copy link
Author

Hi, Can you please review this and let me know if it satisfies the requirements :)
@ArthurZucker @ethanknights @itazap @jackzhxng

@Khansa435
Copy link
Author

Khansa435 commented Oct 26, 2025

Hi, Just checking in to see if there’s any update on this PR. Hacktoberfest is wrapping up soon, and I’d love to have this issue resolved before the event ends.
Please let me know if there’s anything else I should update or address.
Thanks for your time and review!
@ArthurZucker @ethanknights @itazap @jackzhxng @Rocketknight1

@github-actions
Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto

@Khansa435
Copy link
Author

Reverted to the original error message and adjusted indentation as suggested. Thanks for clarifying!
@vasqu Please take a look

@vasqu
Copy link
Contributor

vasqu commented Oct 30, 2025

Thanks for bearing with me and iterating!

I started to look into this myself as this fix didn't work. Turns out that this problem goes a bit deeper so please wait a bit.
The core issue is that it allows a Llama fallback with improper files on the hub --> might indicate some faulty behavior in the saving of the mistral tokenizer backend

@Khansa435
Copy link
Author

Thanks a lot for the update and for taking the time to look into this deeper!
I’ll hold off on any further changes for now.

I really enjoyed working on this and learned a lot through your feedback. This was my first deeper dive into the AutoTokenizer internals.
Since I initially picked this up for Hacktoberfest and spent quite some time on it, I would be pleased if there’s anything else I could help with or another good first issue you’d recommend to complete within today for hacktoberfest?
I’d love to continue contributing to Transformers. 😊

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bad error message for AutoTokenizer loading Voxtral

4 participants