Skip to content

Conversation

@sanchit-gandhi
Copy link
Contributor

@sanchit-gandhi sanchit-gandhi commented Jul 29, 2025

Purpose

Voxtral transcription works completely independently of whether the language token is specified. Currently, if the language token is not specified, it is forced to "en" silently, leading to unexpected results.

This PR:

  1. Allows the language token to be None for Voxtral
  2. Defaults to en for Whisper (existing behaviour), but with a warning

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the frontend label Jul 29, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request successfully relaxes the language requirement for Voxtral models, allowing None as a valid language token, and adds a warning for Whisper models when the language defaults to 'en'. The changes align well with the stated objectives. I've identified two high-severity issues related to code duplication and a confusing error message in voxtral.py that should be addressed to improve maintainability and user experience.

sanchit-gandhi and others added 3 commits July 29, 2025 14:43
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for addressing this TODO!

As I commented in the review, I would like to keep the FE as architecture agnostic as possible, so I would propose we lower your changes into the SupportsTranscription interface.

One easy thing we can do is to expand validate_language functionality to something like

def validate_language(cls, language: Optional[str]) -> Optional[str]:

or 

def get_preferred_language(cls, language: Optional[str]) -> Optional[str]:

where each model is free to overwrite to provide the preferred default language.

I would also have the default implementation of this function be the one Voxtral is currently using, namely:

  • if lang is not provided return None
  • else check if the language is supported and return it

I think this would fit most use-cases better imo. We must then have Whisper overwrite the default logic to return en instead of None.

from transformers import (BatchFeature, WhisperConfig, WhisperFeatureExtractor,
WhisperProcessor)
from transformers.models.whisper.modeling_whisper import sinusoids
from transformers.models.whisper.tokenization_whisper import LANGUAGES
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I am not 100% sure this was the same list, I think I got mine from comparing https://platform.openai.com/docs/guides/speech-to-text and the whisper repo.
We might even move this to a centralized utils in vllm (future work)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one-to-one the same as the one in the Whisper repo. Think we should be using this as our ground-truth for Whisper, rather than the OAI API guide, as there may be languages supported in the API that are not supported by Whisper.

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
@sanchit-gandhi sanchit-gandhi marked this pull request as ready for review July 30, 2025 09:23
@sanchit-gandhi sanchit-gandhi requested a review from aarnphm as a code owner July 30, 2025 09:23
Copy link
Collaborator

@NickLucche NickLucche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better thanks! I left a few minor comments, other than those and pre-commit fixes, this lgtm.
@DarkLight1337 can you also take a quick look if you have the time?

# TODO language should be optional and can be guessed.
# For now we default to en. See
# https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/generation_whisper.py#L1520
logger.warning(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking perhaps this should be a logger.warning_once case.

Copy link
Contributor Author

@sanchit-gandhi sanchit-gandhi Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a pretty big assumption that's being made in creating the token ids, e.g. it completely nullifies the model's ability to do multilingual transcription.

Until the TODO is resolved, the best practice when using Whisper is to always specify the language. Otherwise, you end up with undefined behaviour (e.g. audio in Spanish, task set to "transcribe", lang token set to "en" → mis-match with how the model was trained!). This would happen silently if you passed a mix of transcription requests, some with and some without the language field.

Ideally, we would enforce this best practice by throwing an exception if the language is not specified. However, since that's not backwards compatible, a persistent warning is the best we can do. So I'd be in favour of keeping it as logger.warning!

Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) July 30, 2025 16:23
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 30, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
auto-merge was automatically disabled July 30, 2025 16:55

Head branch was pushed to by a user without write access

@vllm-bot vllm-bot merged commit ec02e53 into vllm-project:main Jul 31, 2025
69 of 71 checks passed
liuyumoye pushed a commit to liuyumoye/vllm that referenced this pull request Jul 31, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
vadiklyutiy pushed a commit to CentML/vllm that referenced this pull request Aug 5, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: x22x22 <wadeking@qq.com>
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Noam Gat <noamgat@gmail.com>
paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Paul Pak <paulpak58@gmail.com>
diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Diego-Castan <diego.castan@ibm.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
Signed-off-by: Sanchit Gandhi <sgandhi3141@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

frontend ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants