Update vLLM to 0.10.0 and add `max_model_len` by hiyuchang · Pull Request #172 · agentscope-ai/Trinity-RFT

hiyuchang · 2025-08-07T09:35:53Z

Description

Upgrade vllm to 0.10.0
Add max_model_len parameter

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist

Summary of Changes

Hello @hiyuchang, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

I've made significant changes to how token limits are managed within the system, primarily by introducing a max_model_len parameter. This new parameter consolidates the concept of maximum sequence length, replacing the previously separate max_prompt_tokens and max_response_tokens in many configurations. This change simplifies token management and aligns with the capabilities of the upgraded vllm library, which I've also updated to version 0.10.0. The goal is to provide a more direct and efficient way to control the total length of sequences processed by the models.

Highlights

Introduction of max_model_len: I've introduced a new max_model_len parameter across the codebase to define the maximum total token length for a model's sequence, which includes both prompt and response tokens. This provides a more unified and explicit way to manage token limits.
Deprecation of max_prompt_tokens: The max_prompt_tokens parameter has been deprecated and largely removed from configurations and code. Where necessary, existing max_prompt_tokens and max_response_tokens values will be used to derive max_model_len for backward compatibility, but direct usage of max_model_len is now preferred.
vLLM Version Upgrade: I've updated the vllm dependency in pyproject.toml to version 0.10.0, ensuring compatibility with the latest features and performance improvements from the vllm library.
Configuration File Updates: I've updated numerous example configuration files (.yaml files) to reflect the new max_model_len parameter, replacing the deprecated max_prompt_tokens where applicable and adjusting values to align with the new token length management.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the max_model_len parameter to replace the deprecated max_prompt_tokens, and upgrades the vllm dependency. The changes are mostly in configuration files and the related Python code that parses them.

My review has found a few issues:

There are typos and inconsistencies in the new configuration key (model_model_len, max_token_len) in some example YAML files.
In a couple of example files, the max_response_tokens parameter was unintentionally removed, which is still required for generation.
A debug print statement was left in the vllm_model.py file.

I've provided specific suggestions to fix these issues. Overall, the changes are in the right direction but need these corrections to be complete.

examples/async_gsm8k/explorer.yaml

examples/async_gsm8k/trainer.yaml

examples/dapo_math/dapo.yaml

examples/dpo_humanlike/dpo.yaml

trinity/common/models/vllm_model.py

hiyuchang · 2025-08-08T02:55:42Z

/unittest-all

Copilot

Pull Request Overview

This PR upgrades vLLM from 0.9.1-0.9.2 to 0.10.0 and introduces a new max_model_len parameter to replace the deprecated max_prompt_tokens field. The change simplifies token length management by using a single parameter instead of separate prompt and response token limits.

Updates vLLM version constraint to include 0.10.0
Replaces max_prompt_tokens with max_model_len across configuration files and codebase
Updates token length calculations to use the new parameter

Reviewed Changes

Copilot reviewed 42 out of 42 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
pyproject.toml	Updates vLLM version constraint to allow 0.10.0
trinity/common/config.py	Adds `max_model_len` field and deprecates `max_prompt_tokens`
trinity/common/models/vllm_model.py	Updates model initialization to use `max_model_len`
trinity/common/models/api/vllm_patch.py	Updates version check to support vLLM 0.10.0
trinity/manager/	Updates UI components and config generation to use new parameter
examples/	Updates all example configurations to use `max_model_len`
docs/	Updates documentation to reflect new parameter

trinity/common/config.py

.github/workflows/docker/docker-compose.yaml

…y large prompts and preventing vllm from throwing exception To prevent vllm throwing exceptions like: ```ERROR 08-17 23:32:15 scheduler.py:86] ValueError: The decoder prompt (length 42861) is longer than the maximum model length of 32768. Make sure that `max_model_len` is no smaller than the number of text tokens. ``` `truncate_prompt_tokens=config.max_model_len-1` is used to ensure at least one output token A similar setting was used before agentscope-ai#172, and got removed without an explanation that I could find

update vllm params

656650f

gemini-code-assist bot reviewed Aug 7, 2025

View reviewed changes

hiyuchang added 3 commits August 7, 2025 17:41

fix typo

6f3f619

fix vllm patch

d4a7f6c

fix comment

1f49061

pan-x-c changed the title ~~Add max_model_len~~ Update vLLM to 0.10.0 and add max_model_len Aug 8, 2025

hiyuchang added 2 commits August 8, 2025 10:35

Merge branch 'main' into dev/merge_vllm_v0.10.0

588af3c

change model for unittest

758d0a8

pan-x-c linked an issue Aug 8, 2025 that may be closed by this pull request

Migration to newer versions of vllm #168

Closed

hiyuchang added 2 commits August 8, 2025 12:22

rm some comments

0f837aa

remove test file

a2f8d22

pan-x-c requested a review from Copilot August 8, 2025 04:27

Copilot AI reviewed Aug 8, 2025

View reviewed changes

trinity/common/config.py Show resolved Hide resolved

trinity/common/config.py Show resolved Hide resolved

.github/workflows/docker/docker-compose.yaml Show resolved Hide resolved

pan-x-c approved these changes Aug 8, 2025

View reviewed changes

pan-x-c merged commit 068da40 into agentscope-ai:main Aug 8, 2025
2 checks passed

vadimkantorov mentioned this pull request Aug 18, 2025

Set truncate_prompt_tokens in SamplingParams, silently truncating very large prompts and preventing vllm from throwing exception #198

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update vLLM to 0.10.0 and add `max_model_len`#172

Update vLLM to 0.10.0 and add `max_model_len`#172
pan-x-c merged 8 commits intoagentscope-ai:mainfrom
hiyuchang:dev/merge_vllm_v0.10.0

hiyuchang commented Aug 7, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hiyuchang commented Aug 8, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hiyuchang commented Aug 7, 2025

Description

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hiyuchang commented Aug 8, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants