-
-
Couldn't load subscription status.
- Fork 10.8k
[Frontend] Add LLM.reward specific to reward models #21720
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to improve the API for reward models by adding a dedicated LLM.reward method and making the pooling_task in LLM.encode explicit. The changes are in the right direction, but the implementation of the new reward method seems incorrect as it uses pooling_task="encode" and lacks necessary validation. I've left a critical comment with suggestions to align it with other task-specific methods like embed and classify. Additionally, I've suggested improving an error message for better user guidance. Addressing these points will significantly improve the correctness and consistency of the new API.
|
Quick review |
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
|
Let's wait for #18321 to be merged first, then update this PR |
|
buildkite/ci/pr/language-models-test-extended-pooling The failed test can be fixed by #21747 |
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: x22x22 <wadeking@qq.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Noam Gat <noamgat@gmail.com>
Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>
Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>
Signed-off-by: wang.yuqi <noooop@126.com>
Signed-off-by: wang.yuqi <noooop@126.com>
|
@noooop @DarkLight1337 For online server, which api is the best entrypoint for the reward model? |
/pooling I personally think that /pooling needs to be refactored to better support the reward models PTAL #20538 |
Now i'm using embedding api and i have no scenario for pooling api yet |
According to vllm's taxonomy
|
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
After #21470, LLM.encode pooling_task defaults to using "encode", meaning that reward models actually use LLM.encode. However, most libraries' encode corresponds to an embedding model, which can lead to potential errors.
Test Plan
Test Result
(Optional) Documentation Update