-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed as not planned
Labels
Description
System Info
llamastack from master branch
Information
- The official example scripts
- My own modified scripts
🐛 Describe the bug
The next error appears on the llamastack server with remote-vllm
Traceback (most recent call last):
File "/app/llama-stack-source/llama_stack/distribution/server/server.py", line 208, in sse_generator
async for item in event_gen:
File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agents.py", line 165, in _create_agent_turn_streaming
async for event in agent.create_and_execute_turn(request):
File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 197, in create_and_execute_turn
async for chunk in self.run(
File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 389, in run
async for res in self._run(
File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 631, in _run
async for chunk in await self.inference_api.chat_completion(
File "/app/llama-stack-source/llama_stack/distribution/routers/routers.py", line 191, in <genexpr>
return (chunk async for chunk in await provider.chat_completion(**params))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 327, in _stream_chat_completion
async for chunk in res:
File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response
choice = chunk.choices[0]
~~~~~~~~~~~~~^^^
IndexError: list index out of range
Steps to reproduce:
Create a docker image with the next command:
$ llama stack build --config build.yaml --image-type container --image-name vllm-tools
Where build.yaml has the next:
version: '2'
distribution_spec:
description: Use (an external) vLLM server for running LLM inference
providers:
inference:
- remote::vllm
- inline::sentence-transformers
vector_io:
- inline::faiss
- remote::chromadb
safety:
- inline::llama-guard
agents:
- inline::meta-reference
eval:
- inline::meta-reference
datasetio:
- remote::huggingface
- inline::localfs
scoring:
- inline::basic
- inline::llm-as-judge
- inline::braintrust
telemetry:
- inline::meta-reference
tool_runtime:
- remote::tavily-search
- inline::code-interpreter
- inline::rag-runtime
- remote::model-context-protocol
container_image: registry.access.redhat.com/ubi9
image_type: container
Run it with the resulting run.yaml, with the right env vars:
$ podman run --security-opt label=disable -it --network host \
-v ~/.llama/distributions/vllm-tools/vllm-tools-run.yaml:/app/config.yaml \
-v ~/toolbox_utils/llama-stack:/app/llama-stack-source \
--env LLAMA_STACK_PORT=$LLAMA_STACK_PORT \
--env VLLM_API_TOKEN=$VLLM_API_TOKEN \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env VLLM_URL=$VLLM_URL \
--env TAVILY_SEARCH_API_KEY=$TAVILY_SEARCH_API_KEY \
--entrypoint='["python", "-m", "llama_stack.distribution.server.server", "--yaml-config", "/app/config.yaml"]' localhost/vllm-tools:dev
And then test it with the agent with the next python code:
>>> from llama_stack_client.lib.agents.agent import Agent
>>> from llama_stack_client.lib.agents.event_logger import EventLogger
>>> from llama_stack_client.types.agent_create_params import AgentConfig
>>> from termcolor import cprint
>>> from llama_stack_client import LlamaStackClient
>>>
>>> def create_client(llamastack_server_endpoint):
>>> client = LlamaStackClient(
>>> base_url=llamastack_server_endpoint)
>>> return client
>>> client = create_client("http://localhost:5001")
>>> model_id = "granite-3-8b-instruct"
>>> agent_config = AgentConfig(
... model=model_id,
... instructions="You are a helpful assistant",
... toolgroups=["builtin::websearch"],
... input_shields=[],
... output_shields=[],
... enable_session_persistence=False,
... )
>>> agent = Agent(client, agent_config)
>>> user_prompts = [
... "Hello",
... "Which teams played in the NBA western conference finals of 2024",
... ]
>>> session_id = agent.create_session("test-session")
>>> for prompt in user_prompts:
>>> cprint(f"User> {prompt}", "green")
... response = agent.create_turn(
... messages=[
... {
... "role": "user",
... "content": prompt,
... }
... ],
... session_id=session_id,
... )
... for log in EventLogger().log(response):
... log.print()
...
User> Hello
inference> Traceback (most recent call last):
File "<stdin>", line 12, in <module>
File "/home/ltomasbo/toolbox_utils/llama-stack/stack-feb25/lib64/python3.12/site-packages/llama_stack_client/lib/agents/event_logger.py", line 163, in log
for chunk in event_generator:
^^^^^^^^^^^^^^^
File "/home/ltomasbo/toolbox_utils/llama-stack/stack-feb25/lib64/python3.12/site-packages/llama_stack_client/lib/agents/agent.py", line 165, in _create_turn_streaming
tool_calls = self._get_tool_calls(chunk)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ltomasbo/toolbox_utils/llama-stack/stack-feb25/lib64/python3.12/site-packages/llama_stack_client/lib/agents/agent.py", line 61, in _get_tool_calls
if chunk.event.payload.event_type not in {"turn_complete", "turn_awaiting_input"}:
^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'payload'
Error logs
Traceback (most recent call last):
File "/app/llama-stack-source/llama_stack/distribution/server/server.py", line 208, in sse_generator
async for item in event_gen:
File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agents.py", line 165, in _create_agent_turn_streaming
async for event in agent.create_and_execute_turn(request):
File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 197, in create_and_execute_turn
async for chunk in self.run(
File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 389, in run
async for res in self._run(
File "/app/llama-stack-source/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 631, in _run
async for chunk in await self.inference_api.chat_completion(
File "/app/llama-stack-source/llama_stack/distribution/routers/routers.py", line 191, in <genexpr>
return (chunk async for chunk in await provider.chat_completion(**params))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 327, in _stream_chat_completion
async for chunk in res:
File "/app/llama-stack-source/llama_stack/providers/remote/inference/vllm/vllm.py", line 170, in _process_vllm_chat_completion_stream_response
choice = chunk.choices[0]
~~~~~~~~~~~~~^^^
IndexError: list index out of range
Expected behavior
Tool being called and agent/llm processing the response from it