Skip to content

Conversation

@faaany
Copy link
Contributor

@faaany faaany commented Sep 3, 2025

Purpose

This PR updates the _protected_step method to incorporate the latest change of DecodeStream in tokenizers 0.22.0 to avoid the following UT failure:

pytest -rA tests/v1/engine/test_fast_incdec_prefix_err.py::test_fast_inc_detok_invalid_utf8_err_case

Error Log:

self = <vllm.v1.engine.detokenizer.FastIncrementalDetokenizer object at 0x7f0bae57d9f0>, next_token_id = 237167

    def _protected_step(self, next_token_id: int) -> Optional[str]:
        try:
>           token = self.stream.step(self.tokenizer, next_token_id)
E           Exception: Invalid prefix encountered while decoding stream. Token ID: 237167, Expected prefix: ' ', Actual string: '���»'

vllm/v1/engine/detokenizer.py:235: Exception

Since the latest transformers v4.56 would automatically install the latest tokenizers 0.22.0, we need to update the code logic introduced in PR #19449 to make it work for both tokenizers 0.21.4 and 0.22.0.

Test Result

With the fix in this PR, test_fast_inc_detok_invalid_utf8_err_case can pass:

========================================================================= test session starts ==========================================================================
platform linux -- Python 3.10.12, pytest-8.4.1, pluggy-1.6.0
rootdir: /workspace/vllm
configfile: pyproject.toml
plugins: hypothesis-6.136.6, typeguard-4.4.4, anyio-4.9.0
collected 1 item                                                                                                                                                       

tests/v1/engine/test_fast_incdec_prefix_err.py .                                                                                                                 [100%]

================================================================================ PASSES ================================================================================
______________________________________________________________ test_fast_inc_detok_invalid_utf8_err_case _______________________________________________________________
------------------------------------------------------------------------- Captured stdout call -------------------------------------------------------------------------
WARNING 09-03 16:38:16 [detokenizer.py:243] Encountered invalid prefix detokenization error for request test, resetting decode stream.
======================================================================= short test summary info ========================================================================
PASSED tests/v1/engine/test_fast_incdec_prefix_err.py::test_fast_inc_detok_invalid_utf8_err_case
========================================================================== 1 passed in 3.63s ===========================================================================

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a compatibility issue with tokenizers==0.22.0 which changed an error message format, causing failures in incremental detokenization. The changes correctly adapt the error handling to work with both old and new versions of the library by using a substring check instead of an exact match on the error message, and by using a keyword argument for DecodeStream which is now required. My review includes a suggestion to make the error message check even more robust by using startswith instead of in, to reduce the chance of incorrectly handling unrelated exceptions.

faaany and others added 2 commits September 3, 2025 15:34
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: Fanli Lin <fanli0116@gmail.com>
@faaany
Copy link
Contributor Author

faaany commented Sep 3, 2025

cc @jikunshang @yma11 @rogerxfeng8

" for request %s, resetting decode stream.", self.request_id)
self.stream = DecodeStream(self.skip_special_tokens)
self.stream = DecodeStream(
skip_special_tokens=self.skip_special_tokens)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change necessary? DecodeStream introduce more args?

Copy link
Contributor Author

@faaany faaany Sep 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, as can be seen from: https://github.com/huggingface/tokenizers/pull/1856/files#diff-780be0b9b76e7260ed2be2249d17c4a879b7e2e98e9f30f26dc9c65501f775d1R672. Otherwise, we would get an error that ids should not be Boolean type.

@jikunshang
Copy link
Collaborator

cc @njhill Please take a review, thanks!

Copy link
Member

@njhill njhill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @faaany!

@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 3, 2025
@robertgshaw2-redhat robertgshaw2-redhat changed the title fix incremental detokenization edge case error for tokenizers == 0.22.0 [Bugfix] Fix Incremental Detokenization with tokenizers == 0.22.0 Sep 3, 2025
@jikunshang jikunshang enabled auto-merge (squash) September 4, 2025 01:36
@jikunshang
Copy link
Collaborator

seems CI failed due to unrelated error, can we force merge this to unblock other PR?
cc @njhill @simon-mo

@vllm-bot vllm-bot merged commit 2c301ee into vllm-project:main Sep 4, 2025
36 of 38 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
…llm-project#24159)

Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli0116@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…llm-project#24159)

Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli0116@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…llm-project#24159)

Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli0116@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…llm-project#24159)

Signed-off-by: Fanli Lin <fanli.lin@intel.com>
Signed-off-by: Fanli Lin <fanli0116@gmail.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants