[Frontend] Add LLM.reward specific to reward models #21720

noooop · 2025-07-28T05:59:00Z

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

After #21470, LLM.encode pooling_task defaults to using "encode", meaning that reward models actually use LLM.encode. However, most libraries' encode corresponds to an embedding model, which can lead to potential errors.

Add LLM.reward specified to reward models
The pooling_task for LLM.encode defaults to None, and pooling_task needs to be specified

Test Plan

Test Result

(Optional) Documentation Update

github-actions · 2025-07-28T05:59:07Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Signed-off-by: wang.yuqi <noooop@126.com>

gemini-code-assist

Code Review

This pull request aims to improve the API for reward models by adding a dedicated LLM.reward method and making the pooling_task in LLM.encode explicit. The changes are in the right direction, but the implementation of the new reward method seems incorrect as it uses pooling_task="encode" and lacks necessary validation. I've left a critical comment with suggestions to align it with other task-specific methods like embed and classify. Additionally, I've suggested improving an error message for better user guidance. Addressing these points will significantly improve the correctness and consistency of the new API.

vllm/entrypoints/llm.py

noooop · 2025-07-28T06:01:14Z

@DarkLight1337

Quick review

Signed-off-by: wang.yuqi <noooop@126.com>

vllm/entrypoints/llm.py

Signed-off-by: wang.yuqi <noooop@126.com>

docs/models/pooling_models.md

Signed-off-by: wang.yuqi <noooop@126.com>

docs/models/pooling_models.md

Signed-off-by: wang.yuqi <noooop@126.com>

vllm/entrypoints/llm.py

Signed-off-by: wang.yuqi <noooop@126.com>

vllm/entrypoints/llm.py

Signed-off-by: wang.yuqi <noooop@126.com>

DarkLight1337 · 2025-07-29T07:37:47Z

Let's wait for #18321 to be merged first, then update this PR

noooop · 2025-07-30T03:45:05Z

buildkite/ci/pr/language-models-test-extended-pooling

The failed test can be fixed by #21747

Signed-off-by: wang.yuqi <noooop@126.com>

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: x22x22 <wadeking@qq.com>

Signed-off-by: wang.yuqi <noooop@126.com>

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Noam Gat <noamgat@gmail.com>

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

Signed-off-by: wang.yuqi <noooop@126.com>

xsank · 2025-09-11T03:22:35Z

@noooop @DarkLight1337 For online server, which api is the best entrypoint for the reward model?

noooop · 2025-09-11T03:38:12Z

@noooop @DarkLight1337 For online server, which api is the best entrypoint for the reward model?

https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_pooling.py

/pooling

I personally think that /pooling needs to be refactored to better support the reward models

PTAL #20538

xsank · 2025-09-16T09:13:56Z

@noooop @DarkLight1337 For online server, which api is the best entrypoint for the reward model?

https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_pooling.py

/pooling

I personally think that /pooling needs to be refactored to better support the reward models

PTAL #20538

Now i'm using embedding api and i have no scenario for pooling api yet

noooop · 2025-09-16T09:24:14Z

@noooop @DarkLight1337 For online server, which api is the best entrypoint for the reward model?

https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_pooling.py
/pooling
I personally think that /pooling needs to be refactored to better support the reward models
PTAL #20538

Now i'm using embedding api and i have no scenario for pooling api yet

According to vllm's taxonomy

the reward model means using all pooling (and step pooling, let's temporarily not discuss it. ), meaning there is an output for each token.
the embedding model means using [CLS], LAST, MEAN pooling, which means there is only one output for a sequence

mergify bot added the frontend label Jul 28, 2025

+ reward

be054e2

Signed-off-by: wang.yuqi <noooop@126.com>

noooop force-pushed the llm_reward branch from d8f7dbb to be054e2 Compare July 28, 2025 05:59

gemini-code-assist bot reviewed Jul 28, 2025

View reviewed changes

vllm/entrypoints/llm.py Show resolved Hide resolved

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

+ fix

f33d084

Signed-off-by: wang.yuqi <noooop@126.com>

DarkLight1337 reviewed Jul 28, 2025

View reviewed changes

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

noooop added 2 commits July 28, 2025 14:13

+ warning

5025a2f

Signed-off-by: wang.yuqi <noooop@126.com>

+ reward

3bf071e

Signed-off-by: wang.yuqi <noooop@126.com>

mergify bot added the documentation Improvements or additions to documentation label Jul 28, 2025

noooop marked this pull request as ready for review July 28, 2025 07:29

noooop requested review from aarnphm, hmellor and ywang96 as code owners July 28, 2025 07:29

fix

332ffea

Signed-off-by: wang.yuqi <noooop@126.com>

DarkLight1337 reviewed Jul 28, 2025

View reviewed changes

docs/models/pooling_models.md Outdated Show resolved Hide resolved

noooop added 2 commits July 28, 2025 16:13

fix

6dc06a0

Signed-off-by: wang.yuqi <noooop@126.com>

fix

8ed7dc9

Signed-off-by: wang.yuqi <noooop@126.com>

noooop commented Jul 28, 2025

View reviewed changes

docs/models/pooling_models.md Show resolved Hide resolved

+ encoder_config check

26d6f8c

Signed-off-by: wang.yuqi <noooop@126.com>

noooop commented Jul 28, 2025

View reviewed changes

vllm/entrypoints/llm.py Show resolved Hide resolved

noooop marked this pull request as draft July 28, 2025 09:22

noooop added 2 commits July 28, 2025 18:34

fix

6e69ae8

Signed-off-by: wang.yuqi <noooop@126.com>

+ check if "embed" in self.supported_tasks

900ffcc

Signed-off-by: wang.yuqi <noooop@126.com>

noooop marked this pull request as ready for review July 28, 2025 12:40

DarkLight1337 reviewed Jul 28, 2025

View reviewed changes

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

noooop added 3 commits July 29, 2025 08:57

fix

5bbaa22

Signed-off-by: wang.yuqi <noooop@126.com>

fix

4faeee5

Signed-off-by: wang.yuqi <noooop@126.com>

fix

731cb1b

Signed-off-by: wang.yuqi <noooop@126.com>

Merge branch 'vllm-project:main' into llm_reward

f9a7e35

Merge branch 'vllm-project:main' into llm_reward

5a5907b

DarkLight1337 enabled auto-merge (squash) July 29, 2025 10:48

vllm-bot merged commit 65f311c into vllm-project:main Jul 30, 2025
66 of 69 checks passed

liuyumoye pushed a commit to liuyumoye/vllm that referenced this pull request Jul 31, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

188fcb0

Signed-off-by: wang.yuqi <noooop@126.com>

noooop deleted the llm_reward branch August 5, 2025 11:37

vadiklyutiy pushed a commit to CentML/vllm that referenced this pull request Aug 5, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

658c56f

Signed-off-by: wang.yuqi <noooop@126.com>

x22x22 pushed a commit to x22x22/vllm that referenced this pull request Aug 5, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

61f843b

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: x22x22 <wadeking@qq.com>

npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

f7f5c7a

Signed-off-by: wang.yuqi <noooop@126.com>

jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

ffe61a0

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>

noamgat pushed a commit to noamgat/vllm that referenced this pull request Aug 9, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

8d8c6e4

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Noam Gat <noamgat@gmail.com>

paulpak58 pushed a commit to paulpak58/vllm that referenced this pull request Aug 13, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

a06c5ed

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

diegocastanibm pushed a commit to diegocastanibm/vllm that referenced this pull request Aug 15, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

3e2b55e

Signed-off-by: wang.yuqi <noooop@126.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

dc48d9c

Signed-off-by: wang.yuqi <noooop@126.com>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[Frontend] Add LLM.reward specific to reward models (vllm-project#21720)

96e2cf3

Signed-off-by: wang.yuqi <noooop@126.com>

noooop mentioned this pull request Sep 11, 2025

[Feature]: support messages input for classify api skywork-reward model #24650

Open

1 task

noooop mentioned this pull request Sep 19, 2025

[Bug]: "pooling_type='ALL' no longer supported for embeddings #25165

Open

1 task

Uh oh!

Uh oh!

[Frontend] Add LLM.reward specific to reward models #21720

[Frontend] Add LLM.reward specific to reward models #21720

Uh oh!

Conversation

noooop commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

github-actions bot commented Jul 28, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

noooop commented Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Jul 29, 2025

Uh oh!

noooop commented Jul 30, 2025

Uh oh!

Uh oh!

xsank commented Sep 11, 2025

Uh oh!

noooop commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xsank commented Sep 16, 2025

Uh oh!

noooop commented Sep 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

noooop commented Jul 28, 2025 •

edited

Loading

noooop commented Sep 11, 2025 •

edited

Loading