[Bugfix] Relax lang pin for voxtral #21833

sanchit-gandhi · 2025-07-29T13:27:38Z

Purpose

Voxtral transcription works completely independently of whether the language token is specified. Currently, if the language token is not specified, it is forced to "en" silently, leading to unexpected results.

This PR:

Allows the language token to be None for Voxtral
Defaults to en for Whisper (existing behaviour), but with a warning

github-actions · 2025-07-29T13:27:46Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request successfully relaxes the language requirement for Voxtral models, allowing None as a valid language token, and adds a warning for Whisper models when the language defaults to 'en'. The changes align well with the stated objectives. I've identified two high-severity issues related to code duplication and a confusing error message in voxtral.py that should be addressed to improve maintainability and user experience.

vllm/model_executor/models/voxtral.py

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

NickLucche

Hey, thanks for addressing this TODO!

As I commented in the review, I would like to keep the FE as architecture agnostic as possible, so I would propose we lower your changes into the SupportsTranscription interface.

One easy thing we can do is to expand validate_language functionality to something like

def validate_language(cls, language: Optional[str]) -> Optional[str]:

or 

def get_preferred_language(cls, language: Optional[str]) -> Optional[str]:

where each model is free to overwrite to provide the preferred default language.

I would also have the default implementation of this function be the one Voxtral is currently using, namely:

if lang is not provided return None
else check if the language is supported and return it

I think this would fit most use-cases better imo. We must then have Whisper overwrite the default logic to return en instead of None.

NickLucche · 2025-07-29T15:52:59Z

vllm/model_executor/models/whisper.py

 from transformers import (BatchFeature, WhisperConfig, WhisperFeatureExtractor,
                          WhisperProcessor)
 from transformers.models.whisper.modeling_whisper import sinusoids
+from transformers.models.whisper.tokenization_whisper import LANGUAGES


nit: I am not 100% sure this was the same list, I think I got mine from comparing https://platform.openai.com/docs/guides/speech-to-text and the whisper repo.
We might even move this to a centralized utils in vllm (future work)

This is one-to-one the same as the one in the Whisper repo. Think we should be using this as our ground-truth for Whisper, rather than the OAI API guide, as there may be languages supported in the API that are not supported by Whisper.

vllm/entrypoints/openai/speech_to_text.py

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

NickLucche

Looking much better thanks! I left a few minor comments, other than those and pre-commit fixes, this lgtm.
@DarkLight1337 can you also take a quick look if you have the time?

vllm/entrypoints/openai/speech_to_text.py

vllm/model_executor/models/interfaces.py

NickLucche · 2025-07-30T10:16:03Z

vllm/model_executor/models/whisper.py

+            # TODO language should be optional and can be guessed.
+            # For now we default to en. See
+            # https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/generation_whisper.py#L1520
            logger.warning(


I am thinking perhaps this should be a logger.warning_once case.

This is a pretty big assumption that's being made in creating the token ids, e.g. it completely nullifies the model's ability to do multilingual transcription.

Until the TODO is resolved, the best practice when using Whisper is to always specify the language. Otherwise, you end up with undefined behaviour (e.g. audio in Spanish, task set to "transcribe", lang token set to "en" → mis-match with how the model was trained!). This would happen silently if you passed a mix of transcription requests, some with and some without the language field.

Ideally, we would enforce this best practice by throwing an exception if the language is not specified. However, since that's not backwards compatible, a persistent warning is the best we can do. So I'd be in favour of keeping it as logger.warning!

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

vllm/model_executor/models/interfaces.py

vllm/model_executor/models/voxtral.py

vllm/model_executor/models/whisper.py

vllm/model_executor/models/interfaces.py

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

vllm/model_executor/models/whisper.py

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

DarkLight1337

LGTM

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: x22x22 <wadeking@qq.com>

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Noam Gat <noamgat@gmail.com>

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

mergify bot added the frontend label Jul 29, 2025

gemini-code-assist bot reviewed Jul 29, 2025

View reviewed changes

vllm/model_executor/models/voxtral.py Outdated Show resolved Hide resolved

vllm/model_executor/models/voxtral.py Outdated Show resolved Hide resolved

sanchit-gandhi force-pushed the stt-unpin-lang branch from 7db5c68 to 063171f Compare July 29, 2025 13:39

mergify bot added the v1 label Jul 29, 2025

sanchit-gandhi and others added 3 commits July 29, 2025 14:43

relax lang pin for voxtral

110fbb5

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

gemini advice

9344b34

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

Update vllm/model_executor/models/voxtral.py

1df3084

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

sanchit-gandhi force-pushed the stt-unpin-lang branch from 063171f to 1df3084 Compare July 29, 2025 13:44

sanchit-gandhi added 3 commits July 29, 2025 15:15

lint

63e6ead

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

up

9ac2525

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

up

afb32ce

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

NickLucche requested changes Jul 29, 2025

View reviewed changes

sanchit-gandhi added 2 commits July 30, 2025 09:56

requested changes

702f759

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

lint

2a12816

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

sanchit-gandhi marked this pull request as ready for review July 30, 2025 09:23

sanchit-gandhi requested a review from aarnphm as a code owner July 30, 2025 09:23

NickLucche requested changes Jul 30, 2025

View reviewed changes

sanchit-gandhi added 3 commits July 30, 2025 14:05

var name

27a13c4

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

docstring

cd4e0a1

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

lint

24a9c0e

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

sanchit-gandhi requested a review from patrickvonplaten as a code owner July 30, 2025 13:19

DarkLight1337 reviewed Jul 30, 2025

View reviewed changes

vllm/model_executor/models/interfaces.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 30, 2025

View reviewed changes

vllm/model_executor/models/voxtral.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 30, 2025

View reviewed changes

vllm/model_executor/models/whisper.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Jul 30, 2025

View reviewed changes

vllm/model_executor/models/interfaces.py Outdated Show resolved Hide resolved

up

4a8ae88

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

DarkLight1337 reviewed Jul 30, 2025

View reviewed changes

vllm/model_executor/models/whisper.py Outdated Show resolved Hide resolved

value error

2447fbe

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

DarkLight1337 approved these changes Jul 30, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) July 30, 2025 16:23

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 30, 2025

lint

253027d

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>

auto-merge was automatically disabled July 30, 2025 16:55
Head branch was pushed to by a user without write access

vllm-bot merged commit ec02e53 into vllm-project:main Jul 31, 2025
69 of 71 checks passed

Uh oh!

[Bugfix] Relax lang pin for voxtral #21833

[Bugfix] Relax lang pin for voxtral #21833

Uh oh!

Conversation

sanchit-gandhi commented Jul 29, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Uh oh!

github-actions bot commented Jul 29, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

NickLucche left a comment

Choose a reason for hiding this comment

Uh oh!

NickLucche Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NickLucche left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

NickLucche Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

sanchit-gandhi Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

sanchit-gandhi commented Jul 29, 2025 •

edited by github-actions bot

Loading

NickLucche left a comment •

edited

Loading

sanchit-gandhi Jul 30, 2025 •

edited

Loading