-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix]: Make chat content text allow type content #9358
[Bugfix]: Make chat content text allow type content #9358
Conversation
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
I have verified that the fix works as expected.
|
Nice! I think we need to add a test here. @DarkLight1337 can you suggest a good place to test the model? |
We can add this to |
If you want to test this model specifically, we can also add a test to |
Hey @simon-mo @DarkLight1337 , thanks for the quick feedback! I did add a test (but I don't think it is currently testing the right thing and I will fix it), but one thing I wanted to confirm was whether there is any way to ascertain whether this change might break the chat template behavior for any other models vLLM currently supports? Is there any way to figure out if there is any model, where keeping the format with My concern comes from the fact whether this can be treated as the default behavior or is the llama-1B model handling it the edge case? Who would be the best person on the maintainer team who can probably answer this? |
We currently don't test the correctness of chat templates since the chat template is often outside of our control, i.e. depends on HF repo for that model. You can try setting up a regression test by passing a short prompt to each model and recording the output, then checking that the output remains the same after your change. |
A full test would be infeasible to be run in CI, so instead you can write a script that's designed to be run/modified locally. |
@DarkLight1337 Thanks for the tip! I did a rudimentary test and it seems most models handle their own chat templates pretty consistent with how they expect content. One issue that I've been seeing with the CI tests is that the test_complex_message_content test and the test_custom_role seem to be in direct contradiction with the desired behavior of this bug fix. I am looking through the chat template of the model in question, and I think vLLM makes some implicit assumptions on how we want to handle messages that do not come in the format expected by the chat template. Is this behavior we can change, now that we do not expect a default chat template but expect all models to have it's own? I would expect the correct behavior would be to throw an error if we have chat messages that do not conform to the given chat template as opposed to trying to internally morph it. Please let me know the steps I should take ahead |
The use of |
From my understanding, the issue now is that some chat templates expect |
Correct. And I think normally the chat template of an application only allows for one sort of schema. But apart from rendering both types of requests with the model chat template, and figuring out which request schema it supports, I am not sure how to resolve this without ambiguity. Any thoughts on how to proceed with this?
I think more precisely I do not want messages sent in the format |
As for the tests you're proposing to remove, you're arguing that we should pass the message contents in a format based on what the chat template supports (for Zephyr, it should be a string instead of |
Hmm. That makes sense. |
I suggest adding an abstraction between the OpenAI API and the chat template. Similar to how we can set This way, there is no need for server operators to provide a full chat template just to get Llama-Guard-1B to work. |
@DarkLight1337 is there anything pending from my side that you would like me to do to wrap this up? |
People with write access to the repo can override the DCO status, so don't worry about it. In future PRs, it is recommended to set up auto-signoff. |
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now, thanks for your patience!
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Alvant <alvasian@yandex.ru>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Erkin Sagiroglu <erkin@infra-aipipeline-1-at1-prox-prod-a.ipa.corp.telnyx.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Shanshan Wang <shanshan.wang@h2o.ai>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: qishuai <ferdinandzhong@gmail.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: NickLucche <nlucches@redhat.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Sumit Dubey <sumit.dubey2@ibm.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Signed-off-by: Vinay Damodaran <vrdn@hey.com> Signed-off-by: Tyler Michael Smith <tyler@neuralmagic.com>
FILL IN THE PR DESCRIPTION HERE
FIX #9294 (link existing issues this PR will resolve)
BEFORE SUBMITTING, PLEASE READ THE CHECKLIST BELOW AND FILL IN THE DESCRIPTION ABOVE
PR Checklist (Click to Expand)
Thank you for your contribution to vLLM! Before submitting the pull request, please ensure the PR meets the following criteria. This helps vLLM maintain the code quality and improve the efficiency of the review process.
PR Title and Classification
Only specific types of PRs will be reviewed. The PR title is prefixed appropriately to indicate the type of change. Please use one of the following:
[Bugfix]
for bug fixes.[CI/Build]
for build or continuous integration improvements.[Doc]
for documentation fixes and improvements.[Model]
for adding a new model or improving an existing model. Model name should appear in the title.[Frontend]
For changes on the vLLM frontend (e.g., OpenAI API server,LLM
class, etc.)[Kernel]
for changes affecting CUDA kernels or other compute kernels.[Core]
for changes in the core vLLM logic (e.g.,LLMEngine
,AsyncLLMEngine
,Scheduler
, etc.)[Hardware][Vendor]
for hardware-specific changes. Vendor name should appear in the prefix (e.g.,[Hardware][AMD]
).[Misc]
for PRs that do not fit the above categories. Please use this sparingly.Note: If the PR spans more than one category, please include all relevant prefixes.
Code Quality
The PR need to meet the following code quality standards:
format.sh
to format your code.docs/source/
if the PR modifies the user-facing behaviors of vLLM. It helps vLLM user understand and utilize the new features or changes.Adding or changing kernels
Each custom kernel needs a schema and one or more implementations to be registered with PyTorch.
Tensors
require meta-functions. Meta-functions should be implemented and registered in python so that dynamic dims can be handled automatically. See above documents for a description of meta-functions.torch.libary.opcheck()
to test the function registration and meta-function for any registered ops. Seetests/kernels
for examples.Notes for Large Changes
Please keep the changes as concise as possible. For major architectural changes (>500 LOC excluding kernel/data/config/test), we would expect a GitHub issue (RFC) discussing the technical design and justification. Otherwise, we will tag it with
rfc-required
and might not go through the PR.What to Expect for the Reviews
The goal of the vLLM team is to be a transparent reviewing machine. We would like to make the review process transparent and efficient and make sure no contributor feel confused or frustrated. However, the vLLM team is small, so we need to prioritize some PRs over others. Here is what you can expect from the review process:
action-required
label on the PR if there are changes required. The contributor should address the comments and ping the reviewer to re-review the PR.Thank You
Finally, thank you for taking the time to read these guidelines and for your interest in contributing to vLLM. Your contributions make vLLM a great tool for everyone!