Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions vllm/benchmarks/lib/endpoint_request_func.py
Original file line number Diff line number Diff line change
Expand Up @@ -498,10 +498,17 @@ async def _run_pooling_request(
async with session.post(url=api_url, headers=headers, json=payload) as response:
if response.status == 200:
output.ttft = output.latency = time.perf_counter() - st
data = await response.json()

if payload.get("encoding_format", "float") == "bytes":
metadata = json.loads(response.headers["metadata"])
usage = metadata.get("usage", {})
Comment on lines +503 to +504
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using direct dictionary access response.headers["metadata"] is unsafe as it will raise a KeyError if the header is missing. While the surrounding try...except block will catch this, the resulting error message ('metadata') is not very informative for debugging. It's more robust to use .get() to safely access the header and raise a ValueError with a clear error message if it's not present. This will improve error reporting for failed benchmark requests.

metadata_str = response.headers.get("metadata")
if not metadata_str:
    raise ValueError("Missing 'metadata' header for 'bytes' encoding.")
metadata = json.loads(metadata_str)
usage = metadata.get("usage", {})

else:
data = await response.json()
usage = data.get("usage", {})

output.success = True
output.generated_text = ""
output.prompt_len = data.get("usage", {}).get("prompt_tokens", 0)
output.prompt_len = usage.get("prompt_tokens", 0)
else:
output.success = False
output.error = response.reason or ""
Expand Down