-
-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
monitor metrics of tokens per step using cudagraph batchsizes #11031
Conversation
Signed-off-by: youkaichao <youkaichao@gmail.com>
Signed-off-by: youkaichao <youkaichao@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
test code: from vllm import LLM, SamplingParams
# Sample prompts.
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
# Create an LLM.
llm = LLM(model="facebook/opt-125m", disable_log_stats=False)
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
stat_logger = llm.llm_engine.stat_loggers['prometheus']
metric = stat_logger.metrics.histogram_iteration_tokens.labels(**stat_logger.labels)
values = metric._upper_bounds
counts = [x.get() for x in metric._buckets]
value_and_counts = zip(values, counts)
value_and_counts = sorted(value_and_counts, key=lambda x: x[1], reverse=True)
for x in value_and_counts:
print(x) bins:
|
…roject#11031) Signed-off-by: youkaichao <youkaichao@gmail.com>
…roject#11031) Signed-off-by: youkaichao <youkaichao@gmail.com>
Basing bucket sizes on cudagraph capture sizes was introduced in PR vllm-project#11031 and vllm-project#12243. Signed-off-by: Mark McLoughlin <markmc@redhat.com>
will be used for
torch.compile