-
Notifications
You must be signed in to change notification settings - Fork 28
Prefix repetition scenario #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Prefix repetition scenario #102
Conversation
Summary of ChangesHello @JimmyWhitaker, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the benchmarking capabilities of Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new PrefixRepetitionScenario for benchmarking KV cache performance, which is a valuable addition. The implementation is solid, with a clear separation between the scenario definition and the sampling logic. The prefix caching mechanism in the TextSampler is well-designed, including the necessary cache reset functionality. The changes are also well-documented and thoroughly tested. My feedback includes a few minor suggestions to improve code style by moving local imports to the top of their respective files, which will enhance readability and maintain consistency with Python best practices.
| from genai_bench.scenarios.text import PrefixRepetitionScenario | ||
|
|
||
| if isinstance(scenario, PrefixRepetitionScenario): | ||
| return self._sample_prefix_repetition_request(scenario) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The import of PrefixRepetitionScenario is done locally within the _sample_chat_request method. According to PEP 8, imports should usually be at the top of the file. This improves readability and helps avoid potential issues with circular dependencies (though there doesn't seem to be one here). Please move from genai_bench.scenarios.text import PrefixRepetitionScenario to the top of the file and simplify this block.
| from genai_bench.scenarios.text import PrefixRepetitionScenario | |
| if isinstance(scenario, PrefixRepetitionScenario): | |
| return self._sample_prefix_repetition_request(scenario) | |
| if isinstance(scenario, PrefixRepetitionScenario): | |
| return self._sample_prefix_repetition_request(scenario) |
| self._shared_prefix_cache[cache_key] = prefix | ||
|
|
||
| # Calculate hash for verification | ||
| import hashlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| # Log cache reuse (only for first few to avoid spam) | ||
| if self._suffix_counter < 5: | ||
| import hashlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| # Log suffix info for first few requests | ||
| if self._suffix_counter <= 5: | ||
| import hashlib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """ | ||
| # Parse P(prefix_len,suffix_len)/output_len | ||
| # params_str will be "(2000,500)/200" | ||
| import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| def test_prefix_repetition_scenario_invalid_format(): | ||
| """Test PrefixRepetitionScenario parsing with invalid format.""" | ||
| import pytest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Description
This PR adds support for prefix repetition scenarios, a new traffic scenario type designed for benchmarking KV cache performance. The prefix repetition scenario generates requests where all concurrent requests share the exact same prefix text (enabling KV cache reuse), while each request has a unique suffix to ensure different completions. This is particularly useful for evaluating how well LLM serving systems can leverage prefix caching to improve performance.
PR Type / Label
/kind feature
Related Issue
No related issues, but there are alternative implementations in #36 and part of #95 .
Changes
PrefixRepetitionScenarioclass with formatP(prefix_len,suffix_len)/output_lenP(2000,500)/200creates requests with 2000-token shared prefix, 500-token unique suffix, and 200-token outputTextSamplerto support prefix repetition scenarios_sample_prefix_repetition_request()methodreset_prefix_cache()method for cache management between scenario runsdocs/user-guide/scenario-definition.md- Scenario format and usageCorrectness Tests
All tests pass successfully:
Scenario Tests:
Test Command:
make testChecklist
git pull origin main)make checkto ensure code is properly formatted and passes all lint checksmake testormake test_changedto verify test coverage (~90% required)Additional Information
Example Usage