fix: Gracefully handle no choices in remote vLLM response #1424

bbrowning · 2025-03-05T19:21:30Z

What does this PR do?

This gracefully handles the case where the vLLM server responded to a completion request with no choices, which can happen in certain vLLM error situations. Previously, we'd error out with a stack trace about a list index out of range. Now, we just log a warning to the user and move past any chunks with an empty choices list.

A specific example of the type of stack trace this fixes:

  File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response
    choice = chunk.choices[0]
             ~~~~~~~~~~~~~^^^
IndexError: list index out of range

Now, instead of erroring out with that stack trace, we log a warning that vLLM failed to generate any completions and alert the user to check the vLLM server logs for details.

This is related to #1277 and addresses the stack trace shown in that issue, although does not in and of itself change the functional behavior of vLLM tool calling.

Test Plan

As part of this fix, I added new unit tests to trigger this same error and verify it no longer happens. That is
test_process_vllm_chat_completion_stream_response_no_choices in the new tests/unit/providers/inference/test_remote_vllm.py. I also added a couple of more tests to trigger and verify the last couple of remote vllm provider bug fixes - specifically a test for #1236 (builtin tool calling) and #1325 (vLLM <= v0.6.3).

This required fixing the signature of
_process_vllm_chat_completion_stream_response to accept the actual type of chunks it was getting passed - specifically changing from our openai_compat OpenAICompatCompletionResponse to openai.types.chat.chat_completion_chunk.ChatCompletionChunk. It was not actually getting passed OpenAICompatCompletionResponse objects before, and was using attributes that didn't exist on those objects. So, the signature now matches the type of object it's actually passed.

Run these new unit tests like this:

pytest tests/unit/providers/inference/test_remote_vllm.py

Additionally, I ensured the existing test_text_inference.py tests passed via:

VLLM_URL="http://localhost:8000/v1" \
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" \
LLAMA_STACK_CONFIG=remote-vllm \
python -m pytest -v tests/integration/inference/test_text_inference.py \
--inference-model "meta-llama/Llama-3.2-3B-Instruct" \
--vision-inference-model ""

This gracefully handles the case where the vLLM server responded to a completion request with no choices, which can happen in certain vLLM error situations. Previously, we'd error out with a stack trace about a list index out of range. Now, we just log a warning to the user and move past any chunks with an empty choices list. A specific example of the type of stack trace this fixes: ``` File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response choice = chunk.choices[0] ~~~~~~~~~~~~~^^^ IndexError: list index out of range ``` Now, instead of erroring out with that stack trace, we log a warning that vLLM failed to generate any completions and alert the user to check the vLLM server logs for details. This is related to llamastack#1277 and addresses the stack trace shown in that issue, although does not in and of itself change the functional behavior of vLLM tool calling. As part of this fix, I added new unit tests to trigger this same error and verify it no longer happens. That is `test_process_vllm_chat_completion_stream_response_no_choices` in the new `tests/unit/providers/inference/test_remote_vllm.py`. I also added a couple of more tests to trigger and verify the last couple of remote vllm provider bug fixes - specifically a test for llamastack#1236 (builtin tool calling) and llamastack#1325 (vLLM <= v0.6.3). This required fixing the signature of `_process_vllm_chat_completion_stream_response` to accept the actual type of chunks it was getting passed - specifically changing from our openai_compat `OpenAICompatCompletionResponse` to `openai.types.chat.chat_completion_chunk.ChatCompletionChunk`. It was not actually getting passed `OpenAICompatCompletionResponse` objects before, and was using attributes that didn't exist on those objects. So, the signature now matches the type of object it's actually passed. Run these new unit tests like this: ``` pytest tests/unit/providers/inference/test_remote_vllm.py ``` Additionally, I ensured the existing `test_text_inference.py` tests passed via: ``` VLLM_URL="http://localhost:8000/v1" INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" LLAMA_STACK_CONFIG=remote-vllm python -m pytest -v tests/integration/inference/test_text_inference.py --inference-model "meta-llama/Llama-3.2-3B-Instruct" --vision-inference-model "" ``` Signed-off-by: Ben Browning <bbrownin@redhat.com>

terrytangyuan

Thank you!

leseb · 2025-03-06T16:11:01Z

llama_stack/providers/remote/inference/vllm/vllm.py

    tool_call_buf = UnparseableToolCall()
    async for chunk in stream:
+        if not chunk.choices:
+            log.warning("vLLM failed to generation any completions - check the vLLM server logs for an error.")


s/generation/generate/?

bbrowning requested review from SLR722, ashwinb, dineshyv, dltn, ehhuang, hardikjshah, raghotham, sixianyi0721, terrytangyuan, vladimirivic and yanxi0830 as code owners March 5, 2025 19:21

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 5, 2025

terrytangyuan approved these changes Mar 5, 2025

View reviewed changes

terrytangyuan merged commit 9c4074e into llamastack:main Mar 5, 2025
3 checks passed

bbrowning deleted the remote-vllm-no-choices branch March 5, 2025 20:16

leseb reviewed Mar 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Gracefully handle no choices in remote vLLM response #1424

fix: Gracefully handle no choices in remote vLLM response #1424

Uh oh!

bbrowning commented Mar 5, 2025

Uh oh!

terrytangyuan left a comment

Uh oh!

Uh oh!

leseb Mar 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: Gracefully handle no choices in remote vLLM response #1424

fix: Gracefully handle no choices in remote vLLM response #1424

Uh oh!

Conversation

bbrowning commented Mar 5, 2025

What does this PR do?

Test Plan

Uh oh!

terrytangyuan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

leseb Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants