fix: Gracefully handle no choices in remote vLLM response #1424
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This gracefully handles the case where the vLLM server responded to a completion request with no choices, which can happen in certain vLLM error situations. Previously, we'd error out with a stack trace about a list index out of range. Now, we just log a warning to the user and move past any chunks with an empty choices list.
A specific example of the type of stack trace this fixes:
Now, instead of erroring out with that stack trace, we log a warning that vLLM failed to generate any completions and alert the user to check the vLLM server logs for details.
This is related to #1277 and addresses the stack trace shown in that issue, although does not in and of itself change the functional behavior of vLLM tool calling.
Test Plan
As part of this fix, I added new unit tests to trigger this same error and verify it no longer happens. That is
test_process_vllm_chat_completion_stream_response_no_choicesin the newtests/unit/providers/inference/test_remote_vllm.py. I also added a couple of more tests to trigger and verify the last couple of remote vllm provider bug fixes - specifically a test for #1236 (builtin tool calling) and #1325 (vLLM <= v0.6.3).This required fixing the signature of
_process_vllm_chat_completion_stream_responseto accept the actual type of chunks it was getting passed - specifically changing from our openai_compatOpenAICompatCompletionResponsetoopenai.types.chat.chat_completion_chunk.ChatCompletionChunk. It was not actually getting passedOpenAICompatCompletionResponseobjects before, and was using attributes that didn't exist on those objects. So, the signature now matches the type of object it's actually passed.Run these new unit tests like this:
Additionally, I ensured the existing
test_text_inference.pytests passed via: