Skip to content

Conversation

@matthewkotila
Copy link
Contributor

@matthewkotila matthewkotila commented Apr 23, 2025

Previously, someone would run PA fixed schedule mode like this:

perf_analyzer \
  --fixed-schedule \
  --input-data=input_data.json \
  -m facebook/opt-125m \
  --service-kind=openai \
  --endpoint=v1/chat/completions \
  --async

with an input_data.json like this:

{
  "data": [
    {
      "payload": [{
          "model": "facebook/opt-125m",
          "messages": [{"role": "user","content": "my_prompt_1"}],
          "max_completion_tokens": 1
        }],
      "timestamp": [1000]
    },
    {
      "payload": [{
          "model": "facebook/opt-125m",
          "messages": [{"role": "user","content": "my_prompt_2"}],
          "max_completion_tokens": 1
        }],
      "timestamp": [2000]
    }
  ]
}

But they wouldn't be able to use the warmup feature (--warmup-request-count).

Now, with this PR, users can use the warmup feature with the fixed schedule feature:

perf_analyzer \
  --fixed-schedule \
  --warmup-request-count=1 \
  --input-data=input_data.json \
  -m facebook/opt-125m \
  --service-kind=openai \
  --endpoint=v1/chat/completions \
  --async

Basically, if --warmup-request-count=N, the first N payloads in input_data.json are sent as "warmup" requests (i.e. excluded from the final performance metric/statistic calculations and profile export JSON), and the rest are part of the standard benchmark.

@matthewkotila matthewkotila requested a review from Copilot April 24, 2025 16:45
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for fixed schedule warmup by modifying request scheduling, adjusting input parsing, and updating command-line validations. Key changes include:

  • Introducing a new dataset_offset parameter in worker and thread configuration classes.
  • Modifying the ModelParser to initialise fixed schedule inputs via a new constructor.
  • Refactoring CustomRequestScheduleManager to distinguish between warmup and benchmark schedules.

Reviewed Changes

Copilot reviewed 25 out of 25 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/request_rate_worker.h Added dataset_offset parameter to worker construction.
src/request_rate_manager.h & .cc Updated thread configuration and worker creation APIs.
src/perf_analyzer.cc Adjusted ModelParser instantiation for fixed schedule.
src/model_parser.{h,cc} Introduced a constructor that initializes fixed schedule inputs and removed duplicate code in InitOpenAI.
src/custom_request_schedule_manager.{h,cc} Refactored schedule generation to separate warmup and benchmark schedules.
src/command_line_parser.cc Updated help messages and validation checks for fixed schedule mode.
Test and documentation files Updated to support new warmup functionality.

nicolasnoble
nicolasnoble previously approved these changes Apr 24, 2025
Copy link
Contributor

@nicolasnoble nicolasnoble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah looks good to me. The change is large due to propagation of some of the parameters, but that's mostly mechanical work. I've left only a few nits here and there, otherwise it's fine with me.

@matthewkotila matthewkotila merged commit dbdaff8 into main Apr 25, 2025
6 of 7 checks passed
@matthewkotila matthewkotila deleted the matthewkotila-fixed-schedule-warmup branch April 25, 2025 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants