-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Benchmark] Enable benchmark to run with encoding_format="bytes"
#27467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds support for encoding_format="bytes" in the benchmarking tool for pooling requests. The change correctly handles the case where metadata is sent in response headers instead of the body. I've added one comment to improve the robustness of header parsing by using .get() to avoid potential KeyErrors and provide more informative error messages on failure. Overall, the change is good and addresses the intended purpose.
| metadata = json.loads(response.headers["metadata"]) | ||
| usage = metadata.get("usage", {}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using direct dictionary access response.headers["metadata"] is unsafe as it will raise a KeyError if the header is missing. While the surrounding try...except block will catch this, the resulting error message ('metadata') is not very informative for debugging. It's more robust to use .get() to safely access the header and raise a ValueError with a clear error message if it's not present. This will improve error reporting for failed benchmark requests.
metadata_str = response.headers.get("metadata")
if not metadata_str:
raise ValueError("Missing 'metadata' header for 'bytes' encoding.")
metadata = json.loads(metadata_str)
usage = metadata.get("usage", {})…llm-project#27467) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…llm-project#27467) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…llm-project#27467) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…llm-project#27467) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…llm-project#27467) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Purpose
Support benchmarking #27066
Test Plan
vllm bench serve \ --model Qwen/Qwen3-Embedding-0.6B \ --backend openai-embeddings \ --endpoint /v1/embeddings \ --dataset-name sharegpt \ --dataset-path benchmarks/ShareGPT_V3_unfiltered_cleaned_split.json \ --extra_body '{"encoding_format": "bytes"}'Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.