-
Notifications
You must be signed in to change notification settings - Fork 36
Add support for fixed schedule warmup #366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds support for fixed schedule warmup by modifying request scheduling, adjusting input parsing, and updating command-line validations. Key changes include:
- Introducing a new dataset_offset parameter in worker and thread configuration classes.
- Modifying the ModelParser to initialise fixed schedule inputs via a new constructor.
- Refactoring CustomRequestScheduleManager to distinguish between warmup and benchmark schedules.
Reviewed Changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/request_rate_worker.h | Added dataset_offset parameter to worker construction. |
| src/request_rate_manager.h & .cc | Updated thread configuration and worker creation APIs. |
| src/perf_analyzer.cc | Adjusted ModelParser instantiation for fixed schedule. |
| src/model_parser.{h,cc} | Introduced a constructor that initializes fixed schedule inputs and removed duplicate code in InitOpenAI. |
| src/custom_request_schedule_manager.{h,cc} | Refactored schedule generation to separate warmup and benchmark schedules. |
| src/command_line_parser.cc | Updated help messages and validation checks for fixed schedule mode. |
| Test and documentation files | Updated to support new warmup functionality. |
nicolasnoble
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah looks good to me. The change is large due to propagation of some of the parameters, but that's mostly mechanical work. I've left only a few nits here and there, otherwise it's fine with me.
Previously, someone would run PA fixed schedule mode like this:
with an
input_data.jsonlike this:{ "data": [ { "payload": [{ "model": "facebook/opt-125m", "messages": [{"role": "user","content": "my_prompt_1"}], "max_completion_tokens": 1 }], "timestamp": [1000] }, { "payload": [{ "model": "facebook/opt-125m", "messages": [{"role": "user","content": "my_prompt_2"}], "max_completion_tokens": 1 }], "timestamp": [2000] } ] }But they wouldn't be able to use the warmup feature (
--warmup-request-count).Now, with this PR, users can use the warmup feature with the fixed schedule feature:
Basically, if
--warmup-request-count=N, the firstNpayloads ininput_data.jsonare sent as "warmup" requests (i.e. excluded from the final performance metric/statistic calculations and profile export JSON), and the rest are part of the standard benchmark.