-
-
Notifications
You must be signed in to change notification settings - Fork 11k
fix test_phi3v #15321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix test_phi3v #15321
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
|
This breaks the processing correctness tests in |
Thank you for pointing this out. After investigating, I found the issue primarily stems from differences in input preprocessing for the phi3v model. Referencing: Phi-3.5-vision-instruct's processing_phi3_v.py#L407, here's the breakdown: No images: Phi3VProcessor tokenizes the prompt directly: if not len(images):
model_inputs = self.tokenizer(texts, return_tensors=return_tensors, padding=padding, truncation=truncation, max_length=max_length)Images present: The prompt is split using r"<|image_\d+|>" regex before tokenizing each chunk: pattern = r"<\|image_\d+\|>"
prompt_chunks = [self.tokenizer(chunk).input_ids for chunk in re.split(pattern, texts)] I've now aligned the implementation in phi3v.py and adjusted the test cases to follow the logic defined in the model's official processing_phi3_v.py. Please let me know if there's anything further I can clarify! |
|
Could you explain what is the problem with the existing processor? We should rely on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We call tokenizer directly to tokenize the prompt in online inference, so we cannot rely on special cases like this.
Sorry for the delayed reply. Thank you for your patience. Here is what I have observed: HF Processor:
VLLM Processor:
I am attempting to align VLLM’s workflow with HF’s prompt splitting approach in both phi3v.py and test_common.py. The current modification mistakenly splits prompts using _apply_hf_processor_text_only regardless of whether they are truly text-only. This discrepancy from HF's behavior necessitates introducing a parameter to differentiate between real text-only prompts and prompts meant to separate images. Regarding the test scenario using processor.apply(token_prompt, mm_data=mm_data, hf_processor_mm_kwargs={}), it might not be fully compatible with phi3v due to HF’s prompt splitting. I would sincerely appreciate any guidance or suggestions for this context. |
|
How does this lead to different results? In Perhaps it would be best if you show a full example. |
|
This pull request has merge conflicts that must be resolved before it can be |
here is an example get_replacement_phi3v will add an "1" after [529, 29989, 3027, 29918, 29896, 29989, 29958], |
|
Thanks for the example. Isn't that why the original code had |
Yes, I also believe that's the reason for adding bos_token_id in get_replacement_phi3v, but this doesn't handle all situations, such as the two test cases below: The fundamental reason lies in that the phi3v tokenizer cannot ensure that the same substring, when appearing in different strings, is tokenized into the same token id sequence. |
|
I suggest we try to edit the tokens in the end so that the overall result is the same. Otherwise, there will be too many cases (text input, token input, text input with cache, token input with cache; where the token input is from online serving and created by directly applying tokenizer to the text) which makes the code a mess if we try to handle them separately. |
I've limited the modifications to _apply_prompt_updates, please take a look |
|
Multi-modal tests pass which is great! Can you update the entrypoints tests with respect to the updated token count? |
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
It seems that the Multi-modal tests and the entrypoints tests are complete, could you please assist with the readthedocs build? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bearing with me, the PR looks good now. I'll just force-merge the PR since the relevant tests have passed.
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com>
Signed-off-by: pansicheng <sicheng.pan.chn@gmail.com> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>
FIX #14677 (link existing issues this PR will resolve)