[Bugfix] Catch and log invalid token ids in detokenizer #2 #26445

njhill · 2025-10-08T21:45:02Z

This is an update to the "workaround" added in #24351.

That PR insulates against occasional negative token ids that can be produced occasionally, though we still don't know the root cause (see #21951).

With the update to tokenizers 0.22.1, this error manifests as a TypeError rather than an OverflowError, so the patch needs to be updated to account for this.

Mitigates #26438, #26071, #25821.

This is an update to the "workaround" added in vllm-project#24351. That PR insulates against occasional negative token ids that can be produced occasionally, though we still don't know the root cause. With the update to tokenizers 0.22.1, this error manifests as a TypeError rather than an OverflowError, so the patch needs to be updated to account for this. Signed-off-by: Nick Hill <nhill@redhat.com>

gemini-code-assist

Code Review

I've reviewed your pull request. The change to catch TypeError is correct based on the updated behavior of the tokenizers library. I've found one high-severity issue related to this change that could cause problems in the exception handling logic. Please see my detailed comment below.

vllm/v1/engine/detokenizer.py

Signed-off-by: Nick Hill <nhill@redhat.com>

yewentao256

LGTM, thanks for the work!

…to loader * 'loader' of https://github.com/dsxsteven/vllm_splitPR: (778 commits) [torchao] Add support for ModuleFqnToConfig using regex (vllm-project#26001) Add: Support for multiple hidden layers in Eagle3 (vllm-project#26164) Enable `RMSNorm` substitution for Transformers backend (vllm-project#26353) [Model] Gemma3: Fix GGUF loading and quantization (vllm-project#26189) Bump Flashinfer to v0.4.0 (vllm-project#26326) Update Dockerfile and install runai-model-streamer[gcs] package (vllm-project#26464) [Core] Relax the LoRA max rank (vllm-project#26461) [CI/Build] Fix model nightly tests (vllm-project#26466) [Hybrid]: Decouple Kernel Block Size from KV Page Size (vllm-project#24486) [Core][KVConnector] Propagate all tokens on resumed preemptions (vllm-project#24926) [MM][Doc] Add documentation for configurable mm profiling (vllm-project#26200) [Hardware][AMD] Enable FlexAttention backend on ROCm (vllm-project#26439) [Bugfix] Incorrect another MM data format in vllm bench throughput (vllm-project#26462) [Bugfix] Catch and log invalid token ids in detokenizer #2 (vllm-project#26445) [Minor] Change warning->warning_once in preprocess (vllm-project#26455) [Bugfix] Set the minimum python version for gpt-oss (vllm-project#26392) [Misc] Redact ray runtime env before logging (vllm-project#26302) Separate MLAAttention class from Attention (vllm-project#25103) [Attention] Register FLASHMLA_SPARSE (vllm-project#26441) [Kernels] Modular kernel refactor (vllm-project#24812) ...

vllm-project#26445) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

vllm-project#26445) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

vllm-project#26445) Signed-off-by: Nick Hill <nhill@redhat.com>

njhill added the bug Something isn't working label Oct 8, 2025

njhill requested review from WoosukKwon, alexm-redhat, comaniac, robertgshaw2-redhat and ywang96 as code owners October 8, 2025 21:45

njhill requested a review from yewentao256 October 8, 2025 21:45

mergify bot added the v1 label Oct 8, 2025

njhill mentioned this pull request Oct 8, 2025

[Bug]: TypeError: argument 'id': StreamInput must be either an integer or a list of integers #26438

Closed

1 task

gemini-code-assist bot reviewed Oct 8, 2025

View reviewed changes

vllm/v1/engine/detokenizer.py Show resolved Hide resolved

change format specifier

d327da0

Signed-off-by: Nick Hill <nhill@redhat.com>

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 8, 2025

njhill mentioned this pull request Oct 8, 2025

[Bug]: Incremental detokenization error when running llama-3.3-70b-fp8 model #21951

Open

yewentao256 approved these changes Oct 8, 2025

View reviewed changes

yewentao256 enabled auto-merge (squash) October 8, 2025 23:59

vllm-bot merged commit bb6d8c2 into vllm-project:main Oct 9, 2025
46 of 48 checks passed

njhill deleted the negative-tok-id branch October 9, 2025 04:26

xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025

[Bugfix] Catch and log invalid token ids in detokenizer vllm-project#2 (

3a08e80

vllm-project#26445) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025

[Bugfix] Catch and log invalid token ids in detokenizer vllm-project#2 (

df36514

vllm-project#26445) Signed-off-by: Nick Hill <nhill@redhat.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[Bugfix] Catch and log invalid token ids in detokenizer vllm-project#2 (

1e126aa

vllm-project#26445) Signed-off-by: Nick Hill <nhill@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Catch and log invalid token ids in detokenizer #2 #26445

[Bugfix] Catch and log invalid token ids in detokenizer #2 #26445

Uh oh!

njhill commented Oct 8, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[Bugfix] Catch and log invalid token ids in detokenizer #2 #26445

[Bugfix] Catch and log invalid token ids in detokenizer #2 #26445

Uh oh!

Conversation

njhill commented Oct 8, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

njhill commented Oct 8, 2025 •

edited by github-actions bot

Loading