fix test_phi3v #15321

pansicheng · 2025-03-22T04:41:34Z

FIX #14677 (link existing issues this PR will resolve)

github-actions · 2025-03-22T04:41:42Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

DarkLight1337 · 2025-03-22T06:12:33Z

This breaks the processing correctness tests in test_common.py, PTAL.

pansicheng · 2025-03-23T09:05:52Z

This breaks the processing correctness tests in test_common.py, PTAL.

Thank you for pointing this out.

After investigating, I found the issue primarily stems from differences in input preprocessing for the phi3v model. Referencing: Phi-3.5-vision-instruct's processing_phi3_v.py#L407, here's the breakdown:

No images: Phi3VProcessor tokenizes the prompt directly:

if not len(images):
    model_inputs = self.tokenizer(texts, return_tensors=return_tensors, padding=padding, truncation=truncation, max_length=max_length)

Images present: The prompt is split using r"<|image_\d+|>" regex before tokenizing each chunk:

pattern = r"<\|image_\d+\|>"  
prompt_chunks = [self.tokenizer(chunk).input_ids for chunk in re.split(pattern, texts)]

I've now aligned the implementation in phi3v.py and adjusted the test cases to follow the logic defined in the model's official processing_phi3_v.py.

Please let me know if there's anything further I can clarify!

DarkLight1337 · 2025-03-23T13:13:18Z

Could you explain what is the problem with the existing processor? We should rely on _get_prompt_updates as much as possible to detect and replace the image placeholders.

DarkLight1337 · 2025-03-23T13:14:28Z

tests/models/multimodal/processing/test_common.py

We call tokenizer directly to tokenize the prompt in online inference, so we cannot rely on special cases like this.

pansicheng · 2025-03-25T16:40:52Z

Could you explain what is the problem with the existing processor? We should rely on _get_prompt_updates as much as possible to detect and replace the image placeholders.

Sorry for the delayed reply. Thank you for your patience. Here is what I have observed:

HF Processor:

Prompt Splitting: HF splits prompts using the regex r"<|image_\d+|>", separating text-only chunks with <|image_\d|> placeholders. For example, the prompt <|image_1|> Text is split into ["", " Text"].
Tokenization Flow: Each text segment (e.g., "" and " Text") is tokenized separately, with the resulting token IDs interleaved with image_ids_pad.

VLLM Processor:

Monolithic Tokenization: VLLM uses apply_hf_processor_text_only to tokenize the entire prompt as one string (without splitting on <|image\d|>). After tokenization, placeholder tokens are replaced with image_ids_pad.

I am attempting to align VLLM’s workflow with HF’s prompt splitting approach in both phi3v.py and test_common.py.

The current modification mistakenly splits prompts using _apply_hf_processor_text_only regardless of whether they are truly text-only. This discrepancy from HF's behavior necessitates introducing a parameter to differentiate between real text-only prompts and prompts meant to separate images.

Regarding the test scenario using processor.apply(token_prompt, mm_data=mm_data, hf_processor_mm_kwargs={}), it might not be fully compatible with phi3v due to HF’s prompt splitting. I would sincerely appreciate any guidance or suggestions for this context.

DarkLight1337 · 2025-03-26T03:30:58Z

How does this lead to different results? In BaseMultiModalProcessor._apply_prompt_updates, we apply prompt replacements by converting the tokens back to text, applying those replacements based on text, and finally tokenizing the result back into tokens.

Perhaps it would be best if you show a full example.

mergify · 2025-03-26T03:31:20Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @pansicheng.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

pansicheng · 2025-03-26T04:59:19Z

How does this lead to different results? In BaseMultiModalProcessor._apply_prompt_updates, we apply prompt replacements by converting the tokens back to text, applying those replacements based on text, and finally tokenizing the result back into tokens.

Perhaps it would be best if you show a full example.

here is an example

prompt="<|image_1|> Select the portion of the image that isolates the object of the given label: The label of the object is stop sign"
<|image_1|>: [529, 29989, 3027, 29918, 29896, 29989, 29958]

vllm:   [1, 529, 29989, 3027, 29918, 29896, 29989, 29958, 7605, 278, 11910, 310, 278, 1967, 393, 11695, 1078, 278, 1203, 310, 278, 2183, 3858, 29901, 450, 3858, 310, 278, 1203, 338, 5040, 1804]
         "  <|image_1|>                                   Select the portion ..."
            |<- image_ids_pad                         ->|

hf:     [[[1],       [1, 29871, 7605, 278, 11910, 310, 278, 1967, 393, 11695, 1078, 278, 1203, 310, 278, 2183, 3858, 29901, 450, 3858, 310, 278, 1203, 338, 5040, 1804]]]
          ""     ^   " Select the portion ..."
                 |
            image_ids_pad

get_replacement_phi3v will add an "1" after [529, 29989, 3027, 29918, 29896, 29989, 29958],
the difference is the "29871" introduced by tokenizing " Select the portion ..."

DarkLight1337 · 2025-03-26T09:15:50Z

Thanks for the example. Isn't that why the original code had bos_token_id in get_replacement_phi3v? So I guess the discrepancy is coming from elsewhere.

pansicheng · 2025-03-26T14:43:06Z

Thanks for the example. Isn't that why the original code had bos_token_id in get_replacement_phi3v? So I guess the discrepancy is coming from elsewhere.

Yes, I also believe that's the reason for adding bos_token_id in get_replacement_phi3v, but this doesn't handle all situations, such as the two test cases below:

"<|image_1|> Select the portion of the image ..."
" Select the portion of the image ..."

processor.tokenizer("<|image_1|> Select the portion of the image that isolates the object of the given label: The label of the object is stop sign")
[1, 529, 29989, 3027, 29918, 29896, 29989, 29958,        7605, 278, 11910, 310, 278, 1967, 393, 11695, 1078, 278, 1203, 310, 278, 2183, 3858, 29901, 450, 3858, 310, 278, 1203, 338, 5040, 1804]

processor.tokenizer.encode(
  " Select the portion of the image that isolates the object of the given label: The label of the object is stop sign"
)
[1,                                               29871, 7605, 278, 11910, 310, 278, 1967, 393, 11695, 1078, 278, 1203, 310, 278, 2183, 3858, 29901, 450, 3858, 310, 278, 1203, 338, 5040, 1804]

The fundamental reason lies in that the phi3v tokenizer cannot ensure that the same substring, when appearing in different strings, is tokenized into the same token id sequence.
Therefore, my current attempt is to modify the processing logic of vllm to align with the processing logic of the model files.

DarkLight1337 · 2025-03-26T14:56:55Z

I suggest we try to edit the tokens in the end so that the overall result is the same. Otherwise, there will be too many cases (text input, token input, text input with cache, token input with cache; where the token input is from online serving and created by directly applying tokenizer to the text) which makes the code a mess if we try to handle them separately.

pansicheng · 2025-03-29T03:32:44Z

I suggest we try to edit the tokens in the end so that the overall result is the same. Otherwise, there will be too many cases (text input, token input, text input with cache, token input with cache; where the token input is from online serving and created by directly applying tokenizer to the text) which makes the code a mess if we try to handle them separately.

I've limited the modifications to _apply_prompt_updates, please take a look

DarkLight1337 · 2025-03-29T11:33:08Z

Multi-modal tests pass which is great! Can you update the entrypoints tests with respect to the updated token count?

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

pansicheng · 2025-03-30T05:33:47Z

Multi-modal tests pass which is great! Can you update the entrypoints tests with respect to the updated token count?

It seems that the Multi-modal tests and the entrypoints tests are complete, could you please assist with the readthedocs build?

tests/entrypoints/openai/test_vision.py

DarkLight1337

Thanks for bearing with me, the PR looks good now. I'll just force-merge the PR since the relevant tests have passed.

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

pansicheng mentioned this pull request Mar 22, 2025

[Bug]: Unit test tests/models/embedding/vision_language/test_phi3v.py failing #14677

Closed

DarkLight1337 self-assigned this Mar 22, 2025

pansicheng force-pushed the fix/14677 branch from 33a2ff4 to e7123a1 Compare March 23, 2025 08:50

pansicheng requested review from DarkLight1337 and ywang96 as code owners March 23, 2025 08:50

mergify bot added the multi-modality Related to multi-modality (#4194) label Mar 23, 2025

pansicheng changed the title ~~fix tests/models/embedding/vision_language/test_phi3v.py~~ fix test_phi3v Mar 23, 2025

DarkLight1337 reviewed Mar 23, 2025

View reviewed changes

mergify bot added the needs-rebase label Mar 26, 2025

pansicheng force-pushed the fix/14677 branch from e7123a1 to 61145d2 Compare March 29, 2025 03:29

mergify bot removed the needs-rebase label Mar 29, 2025

pansicheng force-pushed the fix/14677 branch from 61145d2 to bd81c65 Compare March 29, 2025 15:56

pansicheng requested review from robertgshaw2-redhat and simon-mo as code owners March 29, 2025 15:56

pansicheng force-pushed the fix/14677 branch from bd81c65 to 7a6fbb9 Compare March 30, 2025 02:03

fix tests/models/embedding/vision_language/test_phi3v.py

b5769e8

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

pansicheng force-pushed the fix/14677 branch from 7a6fbb9 to b5769e8 Compare March 30, 2025 02:22

DarkLight1337 reviewed Mar 30, 2025

View reviewed changes

tests/entrypoints/openai/test_vision.py Show resolved Hide resolved

DarkLight1337 approved these changes Mar 30, 2025

View reviewed changes

vllm-bot merged commit 7fd8c0f into vllm-project:main Mar 30, 2025
13 of 14 checks passed

pansicheng deleted the fix/14677 branch March 30, 2025 09:20

Alex4210987 pushed a commit to LeiWang1999/vllm-bitblas that referenced this pull request Apr 5, 2025

fix test_phi3v (vllm-project#15321)

3159af2

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>

lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025

fix test_phi3v (vllm-project#15321)

5694675

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

lk-chen pushed a commit to lk-chen/vllm that referenced this pull request Apr 29, 2025

fix test_phi3v (vllm-project#15321)

d4ca980

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025

fix test_phi3v (vllm-project#15321)

9297c28

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>

RichardoMrMu pushed a commit to RichardoMrMu/vllm that referenced this pull request May 12, 2025

fix test_phi3v (vllm-project#15321)

2cae078

Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

Uh oh!

fix test_phi3v #15321

fix test_phi3v #15321

Uh oh!

Conversation

pansicheng commented Mar 22, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 22, 2025

Uh oh!

DarkLight1337 commented Mar 22, 2025

Uh oh!

pansicheng commented Mar 23, 2025

Uh oh!

DarkLight1337 commented Mar 23, 2025

Uh oh!

DarkLight1337 Mar 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pansicheng commented Mar 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DarkLight1337 commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify bot commented Mar 26, 2025

Uh oh!

pansicheng commented Mar 26, 2025

Uh oh!

DarkLight1337 commented Mar 26, 2025

Uh oh!

pansicheng commented Mar 26, 2025

Uh oh!

DarkLight1337 commented Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pansicheng commented Mar 29, 2025

Uh oh!

DarkLight1337 commented Mar 29, 2025

Uh oh!

pansicheng commented Mar 30, 2025

Uh oh!

Uh oh!

DarkLight1337 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pansicheng commented Mar 22, 2025 •

edited by github-actions bot

Loading

DarkLight1337 Mar 23, 2025 •

edited

Loading

pansicheng commented Mar 25, 2025 •

edited

Loading

DarkLight1337 commented Mar 26, 2025 •

edited

Loading

DarkLight1337 commented Mar 26, 2025 •

edited

Loading

DarkLight1337 left a comment •

edited

Loading