-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Bugfix][Core] Prevent token lengths exceeding max_model_len in V0
#19348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: 22quinn <33176974+22quinn@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @22quinn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini or gemini-code-assist here, providing a summary of this pull request. This PR addresses a bug where sequences could potentially exceed the configured max_model_len by one token. The core change involves adjusting the length check in the stop condition logic to ensure sequences are capped precisely at max_model_len. A new test case has been added to specifically verify this behavior, ensuring that the total number of tokens (prompt + generated) does not exceed the maximum allowed length.
Highlights
- Bug Fix: Fixes issue #19337 by preventing sequences from exceeding
max_model_len. - Length Capping Logic: The condition for capping sequences based on length has been updated from strictly greater than (
>) to greater than or equal to (>=)max_model_len. - New Test Case: A dedicated test (
test_max_model_len) is introduced to verify that the total sequence length (prompt + output) correctly adheres to the specifiedmax_model_len.
Changelog
- tests/entrypoints/llm/test_generate.py
- Added a new pytest fixture
v1to ensure the test runs with both engine versions. - Introduced
test_max_model_lento specifically test themax_model_lencapping behavior, asserting that the final sequence length equalsmax_model_lenwhenmax_tokensis set higher.
- Added a new pytest fixture
- vllm/engine/output_processor/stop_checker.py
- Modified the length check condition in
maybe_stop_sequencefromseq.get_len() > self._get_max_model_len(lora_req)toseq.get_len() >= self._get_model_len(lora_req)to correctly cap sequences atmax_model_len.
- Modified the length check condition in
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request effectively addresses the bug where token lengths could exceed max_model_len by one token in V0. The fix in vllm/engine/output_processor/stop_checker.py by changing the comparison operator from > to >= is correct and directly resolves the issue.
The accompanying test, test_max_model_len in tests/entrypoints/llm/test_generate.py, is well-constructed. It clearly sets up a scenario to trigger the max_model_len capping and correctly asserts that the total token length (prompt + output) adheres to this limit. The use of a small max_model_len and max_tokens greater than what max_model_len would allow for output tokens ensures the capping mechanism is properly tested.
The addition of the v1 autouse fixture to run tests with both engines is also a good practice for ensuring broader compatibility and correctness.
Overall, the changes are clear, concise, and the PR includes appropriate testing. Great job!
Summary of Findings
- Correction of max_model_len Enforcement: The PR successfully corrects an off-by-one error in
vllm/engine/output_processor/stop_checker.pythat allowed sequences to exceedmax_model_lenby one token. The change fromseq.get_len() > self._get_max_model_len(lora_req)toseq.get_len() >= self._get_max_model_len(lora_req)ensures strict adherence to the specified maximum model length. - Effective Unit Testing: A new unit test,
test_max_model_len, has been added. This test effectively verifies the fix by ensuring that the total length of prompt and generated tokens does not exceed the configuredmax_model_len, even whenSamplingParams.max_tokenswould otherwise allow for a longer sequence.
Merge Readiness
The pull request appears to be in good shape and addresses the identified bug effectively with appropriate tests. I am unable to approve pull requests, but based on this review, the changes are sound and ready for further review and merging by authorized maintainers.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks for the fix with test.
|
Hi @22quinn , the test function |
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.Purpose
Fix #19337
Test Plan
Test Result
Unit test passed
(Optional) Documentation Update