monitor metrics of tokens per step using cudagraph batchsizes #11031

youkaichao · 2024-12-09T19:27:26Z

will be used for torch.compile

Signed-off-by: youkaichao <youkaichao@gmail.com>

github-actions · 2024-12-09T19:27:37Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

youkaichao · 2024-12-09T19:29:09Z

test code:

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)

# Create an LLM.
llm = LLM(model="facebook/opt-125m", disable_log_stats=False)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")

stat_logger = llm.llm_engine.stat_loggers['prometheus']
metric = stat_logger.metrics.histogram_iteration_tokens.labels(**stat_logger.labels)
values = metric._upper_bounds
counts = [x.get() for x in metric._buckets]
value_and_counts = zip(values, counts)
value_and_counts = sorted(value_and_counts, key=lambda x: x[1], reverse=True)
for x in value_and_counts:
    print(x)

bins:

(4.0, 15.0)
(1.0, 1.0)
(32.0, 1.0)
(2.0, 0.0)
(8.0, 0.0)
(16.0, 0.0)
(24.0, 0.0)
(40.0, 0.0)
(48.0, 0.0)
(56.0, 0.0)
(64.0, 0.0)
(72.0, 0.0)
(80.0, 0.0)
(88.0, 0.0)
(96.0, 0.0)
(104.0, 0.0)
(112.0, 0.0)
(120.0, 0.0)
(128.0, 0.0)
(136.0, 0.0)
(144.0, 0.0)
(152.0, 0.0)
(160.0, 0.0)
(168.0, 0.0)
(176.0, 0.0)
(184.0, 0.0)
(192.0, 0.0)
(200.0, 0.0)
(208.0, 0.0)
(216.0, 0.0)
(224.0, 0.0)
(232.0, 0.0)
(240.0, 0.0)
(248.0, 0.0)
(256.0, 0.0)
(inf, 0.0)

Signed-off-by: youkaichao <youkaichao@gmail.com>

…roject#11031) Signed-off-by: youkaichao <youkaichao@gmail.com>

Basing bucket sizes on cudagraph capture sizes was introduced in PR vllm-project#11031 and vllm-project#12243. Signed-off-by: Mark McLoughlin <markmc@redhat.com>

youkaichao added 3 commits December 9, 2024 10:57

change to give vllm_config

2fe2d9e

Signed-off-by: youkaichao <youkaichao@gmail.com>

change buckets

e603cf1

Signed-off-by: youkaichao <youkaichao@gmail.com>

fix order

8c68fe3

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao requested review from zhuohan123, alexm-redhat, comaniac and njhill as code owners December 9, 2024 19:27

fix tests

9901fbc

Signed-off-by: youkaichao <youkaichao@gmail.com>

youkaichao requested a review from DarkLight1337 December 10, 2024 04:26

DarkLight1337 approved these changes Dec 10, 2024

View reviewed changes

youkaichao enabled auto-merge (squash) December 10, 2024 04:32

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Dec 10, 2024

youkaichao disabled auto-merge December 10, 2024 06:35

youkaichao merged commit ebf7780 into vllm-project:main Dec 10, 2024
49 of 55 checks passed

youkaichao deleted the metrics branch December 10, 2024 06:35

youkaichao mentioned this pull request Dec 10, 2024

[torch.compile] add a flag to track batchsize statistics #11059

Merged

sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024

monitor metrics of tokens per step using cudagraph batchsizes (vllm-p…

193b020

…roject#11031) Signed-off-by: youkaichao <youkaichao@gmail.com>

BKitor pushed a commit to BKitor/vllm that referenced this pull request Dec 30, 2024

monitor metrics of tokens per step using cudagraph batchsizes (vllm-p…

7495e4b

…roject#11031) Signed-off-by: youkaichao <youkaichao@gmail.com>

markmc mentioned this pull request Feb 14, 2025

[V1][Metrics] Add iteration_tokens_total histogram from V0 #13288

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

monitor metrics of tokens per step using cudagraph batchsizes #11031

monitor metrics of tokens per step using cudagraph batchsizes #11031

youkaichao commented Dec 9, 2024

github-actions bot commented Dec 9, 2024

youkaichao commented Dec 9, 2024

monitor metrics of tokens per step using cudagraph batchsizes #11031

monitor metrics of tokens per step using cudagraph batchsizes #11031

Conversation

youkaichao commented Dec 9, 2024

github-actions bot commented Dec 9, 2024

youkaichao commented Dec 9, 2024