remote-vllm not working with builtin::websearch tools

### System Info

llamastack from master branch

### Information

- [x] The official example scripts
- [x] My own modified scripts

### 🐛 Describe the bug

The next error appears on the llamastack server with remote-vllm

```
Traceback (most recent call last):
  File "/app/llama-stack-source/llama_stack/distribution/server/server.py", line 208, in sse_generator
    async for item in event_gen:
  File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agents.py", line 165, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 197, in create_and_execute_turn
    async for chunk in self.run(
  File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 389, in run
    async for res in self._run(
  File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 631, in _run
    async for chunk in await self.inference_api.chat_completion(
  File "/app/llama-stack-source/llama_stack/distribution/routers/routers.py", line 191, in <genexpr>
    return (chunk async for chunk in await provider.chat_completion(**params))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 327, in _stream_chat_completion
    async for chunk in res:
  File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response
    choice = chunk.choices[0]
             ~~~~~~~~~~~~~^^^
IndexError: list index out of range
```

Steps to reproduce:

Create a docker image with the next command:
`$ llama stack build --config build.yaml --image-type container --image-name vllm-tools`

Where build.yaml has the next:
```
version: '2'
distribution_spec:
  description: Use (an external) vLLM server for running LLM inference
  providers:
    inference:
    - remote::vllm
    - inline::sentence-transformers
    vector_io:
    - inline::faiss
    - remote::chromadb
    safety:
    - inline::llama-guard
    agents:
    - inline::meta-reference
    eval:
    - inline::meta-reference
    datasetio:
    - remote::huggingface
    - inline::localfs
    scoring:
    - inline::basic
    - inline::llm-as-judge
    - inline::braintrust
    telemetry:
    - inline::meta-reference
    tool_runtime:
    - remote::tavily-search
    - inline::code-interpreter
    - inline::rag-runtime
    - remote::model-context-protocol
  container_image: registry.access.redhat.com/ubi9
image_type: container
```

Run it with the resulting run.yaml, with the right env vars:
```
$ podman run --security-opt label=disable -it --network host \
  -v ~/.llama/distributions/vllm-tools/vllm-tools-run.yaml:/app/config.yaml \
  -v ~/toolbox_utils/llama-stack:/app/llama-stack-source \
  --env LLAMA_STACK_PORT=$LLAMA_STACK_PORT \
  --env VLLM_API_TOKEN=$VLLM_API_TOKEN \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env VLLM_URL=$VLLM_URL \
  --env TAVILY_SEARCH_API_KEY=$TAVILY_SEARCH_API_KEY \
  --entrypoint='["python", "-m", "llama_stack.distribution.server.server", "--yaml-config", "/app/config.yaml"]' localhost/vllm-tools:dev
```

And then test it with the agent with the next python code:
```
>>> from llama_stack_client.lib.agents.agent import Agent
>>> from llama_stack_client.lib.agents.event_logger import EventLogger
>>> from llama_stack_client.types.agent_create_params import AgentConfig
>>> from termcolor import cprint
>>> from llama_stack_client import LlamaStackClient
>>> 
>>> def create_client(llamastack_server_endpoint):
>>>     client = LlamaStackClient(
>>> 	    base_url=llamastack_server_endpoint)
>>> 	return client
>>> client = create_client("http://localhost:5001")
>>> model_id = "granite-3-8b-instruct" 
>>> agent_config = AgentConfig(                                
...     model=model_id,                                        
...     instructions="You are a helpful assistant",
...     toolgroups=["builtin::websearch"],                     
...     input_shields=[],                                      
...     output_shields=[],                                     
...     enable_session_persistence=False,                      
... )                                                         
>>> agent = Agent(client, agent_config)                        
>>> user_prompts = [                                           
...     "Hello",                                               
...     "Which teams played in the NBA western conference finals of 2024",
... ]                                                          
>>> session_id = agent.create_session("test-session")
>>> for prompt in user_prompts:
>>>     cprint(f"User> {prompt}", "green")                     
...     response = agent.create_turn(                          
...         messages=[                                         
...             {                                              
...                 "role": "user",                            
...                 "content": prompt,                         
...             }                                              
...         ],                                                 
...         session_id=session_id,                             
...     )                                                      
...     for log in EventLogger().log(response):                
...         log.print()                                        
...                                                            
User> Hello                                                    
inference> Traceback (most recent call last):                  
  File "<stdin>", line 12, in <module>                         
  File "/home/ltomasbo/toolbox_utils/llama-stack/stack-feb25/lib64/python3.12/site-packages/llama_stack_client/lib/agents/event_logger.py", line 163, in log
    for chunk in event_generator:                              
                 ^^^^^^^^^^^^^^^                               
  File "/home/ltomasbo/toolbox_utils/llama-stack/stack-feb25/lib64/python3.12/site-packages/llama_stack_client/lib/agents/agent.py", line 165, in _create_turn_streaming
    tool_calls = self._get_tool_calls(chunk)                   
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^                   
  File "/home/ltomasbo/toolbox_utils/llama-stack/stack-feb25/lib64/python3.12/site-packages/llama_stack_client/lib/agents/agent.py", line 61, in _get_tool_calls
    if chunk.event.payload.event_type not in {"turn_complete", "turn_awaiting_input"}:                                         
       ^^^^^^^^^^^^^^^^^^^                                     
AttributeError: 'NoneType' object has no attribute 'payload'
```



### Error logs

```
Traceback (most recent call last):
  File "/app/llama-stack-source/llama_stack/distribution/server/server.py", line 208, in sse_generator
    async for item in event_gen:
  File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agents.py", line 165, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 197, in create_and_execute_turn
    async for chunk in self.run(
  File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 389, in run
    async for res in self._run(
  File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 631, in _run
    async for chunk in await self.inference_api.chat_completion(
  File "/app/llama-stack-source/llama_stack/distribution/routers/routers.py", line 191, in <genexpr>
    return (chunk async for chunk in await provider.chat_completion(**params))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 327, in _stream_chat_completion
    async for chunk in res:
  File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response
    choice = chunk.choices[0]
             ~~~~~~~~~~~~~^^^
IndexError: list index out of range
``` 

### Expected behavior

Tool being called and agent/llm processing the response from it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

remote-vllm not working with builtin::websearch tools #1277

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

remote-vllm not working with builtin::websearch tools #1277

Description

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions