Skip to content

Conversation

@b8zhong
Copy link
Contributor

@b8zhong b8zhong commented Feb 24, 2025

Implement Goodput Calculation + Fix TPOT Calc in Benchmark

This PR touches benchmark_serving_guided.py by adding goodput calculation and correcting TPOT assignment.

Changes:

  • Added goodput calculation:

    • Introduced the --goodput argument (consistent with benchmark_serving.py), accepting SLOs as "KEY:VALUE" pairs (e.g., --goodput ttft:500 tpot:100), where keys are metric names (ttft, tpot, e2el) and values are in milliseconds.
    • Updated calculate_metrics to compute good_completed, counting requests that meet all specified SLOs.
  • Fixed TPOT calculation:

    • Corrected outputs[i].tpot to store the per-request TPOT value instead of a cumulative average. Not sure if it was the intended behaviour as this is how it was done in benchmark_serving.py...
  • Minor:

    • Update the example docstring to use benchmark_serving_guided.py instead of benchmark_serving.py.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

…tion correction

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
@b8zhong b8zhong force-pushed the remove-arg-references branch from 41b5609 to a14d223 Compare February 24, 2025 03:43
…tion correction

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
@b8zhong
Copy link
Contributor Author

b8zhong commented Feb 24, 2025

The relevant benchmarks on master vs. this branch (goodput doesn't change any calculations; it's TPOT here:

python benchmarks/benchmark_serving_guided.py \
  --port 8080 \
  --model meta-llama/Llama-3.2-3B-Instruct \
  --dataset xgrammar_bench \
  --request-rate 32 \
  --backend openai-chat \
  --endpoint /v1/chat/completions \
  --output-len 512 \
  --num-prompts 128 \
  --guided-decoding-ratio 1.0 \
  --save-results \
  --goodput ttft:10000 tpot:100 e2el:20000
===== 20 Runs =====
Successful requests: 128.00
Benchmark duration (s): 13.81
Request throughput (req/s): 9.32
Request goodput (req/s): 9.22
Output token throughput (tok/s): 616.80
Total Token throughput (tok/s): 3475.52
Mean TTFT (ms): 6570.85
Median TTFT (ms): 6593.25
P99 TTFT (ms): 7673.53
Mean TPOT (ms): 51.86
Median TPOT (ms): 51.56
P99 TPOT (ms): 95.16
Mean ITL (ms): 43.92
Median ITL (ms): 42.16
P99 ITL (ms): 156.03

On master:

===== 20 Runs =====
Successful requests: 128.00
Benchmark duration (s): 13.71
Request throughput (req/s): 9.39
Output token throughput (tok/s): 621.27
Total Token throughput (tok/s): 3500.70
Mean TTFT (ms): 6550.89
Median TTFT (ms): 6537.03
P99 TTFT (ms): 7564.42
Mean TPOT (ms): 51.90
Median TPOT (ms): 51.17
P99 TPOT (ms): 100.34
Mean ITL (ms): 43.87
Median ITL (ms): 42.15
P99 ITL (ms): 129.51
==============================================

@njhill
Copy link
Member

njhill commented Feb 24, 2025

Thanks @b8zhong for the contribution!

  • Corrected outputs[i].tpot to store the per-request TPOT value instead of a cumulative average. Not sure if it was the intended behaviour as this is how it was done in benchmark_serving.py...

Actually this is something I had been thinking we should change in benchmark_serving.py. I think it makes more sense for the average TPOT to be based on output token times across all responses rather than average of per-request averages.

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! @b8zhong

Comment on lines -293 to +294
outputs[i].tpot = sum(tpots) / len(tpots) if len(tpots) else 0
outputs[i].tpot = tpot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! This is probably an oversight, but at least I don't think we use outputs.[i].tpot for the actual results anyways since they're all based on tpots variable. See line 345-347

In fact I'm not sure where we actually use outputs[i].tpot so need to double check.

Copy link
Contributor Author

@b8zhong b8zhong Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's used down here on L548:

result = {
    # ... (other metrics)
    "tpot_description": pd.Series([output.tpot for output in outputs]).describe().to_dict(),
    # ... (other metrics)
}

Then results is also used

Copy link
Collaborator

@aarnphm aarnphm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this on merged and I can follow up with v1 benchmark scripts

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Feb 25, 2025
@DarkLight1337 DarkLight1337 merged commit ec8a5e5 into vllm-project:main Feb 26, 2025
37 of 39 checks passed
@b8zhong b8zhong deleted the remove-arg-references branch February 26, 2025 12:36
Akshat-Tripathi pushed a commit to krai/vllm that referenced this pull request Mar 3, 2025
…tion refactor (vllm-project#13736)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
lulmer pushed a commit to lulmer/vllm that referenced this pull request Apr 7, 2025
…tion refactor (vllm-project#13736)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
shreyankg pushed a commit to shreyankg/vllm that referenced this pull request May 3, 2025
…tion refactor (vllm-project#13736)

Signed-off-by: Brayden Zhong <b8zhong@uwaterloo.ca>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed structured-output

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants