-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Closed
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
System Info
python -m "torch.utils.collect_env"
<frozen runpy>:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviour
Collecting environment information...
PyTorch version: 2.6.0+cu124
Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: CentOS Stream 9 (x86_64)
GCC version: (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5)
Clang version: Could not collect
CMake version: version 3.26.5
Libc version: glibc-2.34
Python version: 3.12.0 | packaged by Anaconda, Inc. | (main, Oct 2 2023, 17:29:18) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.4.3-0_fbk15_zion_2630_gf27365f948db-x86_64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration:
GPU 0: NVIDIA H100
GPU 1: NVIDIA H100
GPU 2: NVIDIA H100
GPU 3: NVIDIA H100
GPU 4: NVIDIA H100
GPU 5: NVIDIA H100
GPU 6: NVIDIA H100
GPU 7: NVIDIA H100
Nvidia driver version: 535.154.05
cuDNN version: Probably one of the following:
/usr/lib64/libcudnn.so.8.9.2
/usr/lib64/libcudnn.so.9.7.1
/usr/lib64/libcudnn_adv.so.9.7.1
/usr/lib64/libcudnn_adv_infer.so.8.9.2
/usr/lib64/libcudnn_adv_train.so.8.9.2
/usr/lib64/libcudnn_cnn.so.9.7.1
/usr/lib64/libcudnn_cnn_infer.so.8.9.2
/usr/lib64/libcudnn_cnn_train.so.8.9.2
/usr/lib64/libcudnn_engines_precompiled.so.9.7.1
/usr/lib64/libcudnn_engines_runtime_compiled.so.9.7.1
/usr/lib64/libcudnn_graph.so.9.7.1
/usr/lib64/libcudnn_heuristic.so.9.7.1
/usr/lib64/libcudnn_ops.so.9.7.1
/usr/lib64/libcudnn_ops_infer.so.8.9.2
/usr/lib64/libcudnn_ops_train.so.8.9.2
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 384
On-line CPU(s) list: 0-383
Vendor ID: AuthenticAMD
Model name: AMD EPYC 9654 96-Core Processor
CPU family: 25
Model: 17
Thread(s) per core: 2
Core(s) per socket: 96
Socket(s): 2
Stepping: 1
Frequency boost: enabled
CPU(s) scaling MHz: 85%
CPU max MHz: 3707.8120
CPU min MHz: 1500.0000
BogoMIPS: 4792.65
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd amd_ppin cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
Virtualization: AMD-V
L1d cache: 6 MiB (192 instances)
L1i cache: 6 MiB (192 instances)
L2 cache: 192 MiB (192 instances)
L3 cache: 768 MiB (24 instances)
NUMA node(s): 2
NUMA node0 CPU(s): 0-95,192-287
NUMA node1 CPU(s): 96-191,288-383
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Vulnerable: eIBRS with unprivileged eBPF
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-cu12==12.4.127
[pip3] nvidia-cudnn-cu12==9.1.0.70
[pip3] nvidia-cufft-cu12==11.2.1.3
[pip3] nvidia-curand-cu12==10.3.5.147
[pip3] nvidia-cusolver-cu12==11.6.1.9
[pip3] nvidia-cusparse-cu12==12.3.1.170
[pip3] nvidia-cusparselt-cu12==0.6.2
[pip3] nvidia-nccl-cu12==2.21.5
[pip3] nvidia-nvjitlink-cu12==12.4.127
[pip3] nvidia-nvtx-cu12==12.4.127
[pip3] torch==2.6.0
[pip3] torchvision==0.21.0
[pip3] triton==3.2.0
[conda] numpy 1.26.4 pypi_0 pypi
[conda] nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
[conda] nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
[conda] nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
[conda] nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
[conda] nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
[conda] nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
[conda] nvidia-cusparselt-cu12 0.6.2 pypi_0 pypi
[conda] nvidia-nccl-cu12 2.21.5 pypi_0 pypi
[conda] nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
[conda] nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
[conda] torch 2.6.0 pypi_0 pypi
[conda] torchvision 0.21.0 pypi_0 pypi
[conda] triton 3.2.0 pypi_0 pypi
Information
- The official example scripts
- My own modified scripts
🐛 Describe the bug
Open a 3.2 90B meta_reference-gpu llama-stack server, then run the ReAct agent example:
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
import os
import uuid
import fire
from llama_stack_client import LlamaStackClient
from llama_stack_client.lib.agents.client_tool import client_tool
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.lib.agents.react.agent import ReActAgent
@client_tool
def torchtune(query: str = "torchtune"):
"""
Answer information about torchtune.
:param query: The query to use for querying the internet
:returns: Information about torchtune
"""
dummy_response = """
torchtune is a PyTorch library for easily authoring, finetuning and experimenting with LLMs.
torchtune provides:
PyTorch implementations of popular LLMs from Llama, Gemma, Mistral, Phi, and Qwen model families
Hackable training recipes for full finetuning, LoRA, QLoRA, DPO, PPO, QAT, knowledge distillation, and more
Out-of-the-box memory efficiency, performance improvements, and scaling with the latest PyTorch APIs
YAML configs for easily configuring training, evaluation, quantization or inference recipes
Built-in support for many popular dataset formats and prompt templates
"""
return dummy_response
def main(host: str, port: int):
client = LlamaStackClient(
base_url=f"http://{host}:{port}",
provider_data={"tavily_search_api_key": os.getenv("TAVILY_SEARCH_API_KEY")},
)
model = "meta-llama/Llama-3.2-90B-Vision-Instruct"
agent = ReActAgent(
client=client,
model=model,
builtin_toolgroups=["builtin::websearch"],
client_tools=[torchtune],
# json_response_format=True,
)
session_id = agent.create_session(f"ttest-session-{uuid.uuid4().hex}")
response = agent.create_turn(
messages=[
{
"role": "user",
"content": "Whats the best place in new york for a pizza slice at 2am ?",
}
],
session_id=session_id,
stream=True,
)
for log in EventLogger().log(response):
log.print()
response2 = agent.create_turn(
messages=[
{
"role": "user",
"content": "What are the popular llms supported in torchtune?",
}
],
session_id=session_id,
stream=True,
)
for log in EventLogger().log(response2):
log.print()
if __name__ == "__main__":
fire.Fire(main)
Error logs
server side log:
ValueError: Non supported ToolPromptFormat ToolPromptFormat.python_list
Traceback (most recent call last):
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 206, in sse_generator
async for item in event_gen:
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/inline/agents/meta_reference/agents.py", line 164, in _create_agent_turn_streaming
async for event in agent.create_and_execute_turn(request):
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 190, in create_and_execute_turn
async for chunk in self._run_turn(request, turn_id):
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 279, in _run_turn
async for chunk in self.run(
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 354, in run
async for res in self._run(
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 512, in _run
async for chunk in await self.inference_api.chat_completion(
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper
result = await method(self, *args, **kwargs)
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 210, in chat_completion
return (chunk async for chunk in await provider.chat_completion(**params))
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 102, in async_wrapper
result = await method(self, *args, **kwargs)
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/inference.py", line 277, in chat_completion
request.messages = chat_completion_request_to_messages(request, self.llama_model.core_model_id.value)
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/utils/inference/prompt_adapter.py", line 304, in chat_completion_request_to_messages
messages = augment_messages_for_tools_llama_3_1(request)
File "/home/kaiwu/.conda/envs/omni/lib/python3.10/site-packages/llama_stack/providers/utils/inference/prompt_adapter.py", line 385, in augment_messages_for_tools_llama_3_1
tool_gen = JsonCustomToolGenerator()
ValueError: Non supported ToolPromptFormat ToolPromptFormat.python_list
client error:
~/work/llama-stack-apps (computer-use)]$ python test_tool.py localhost 8321
`agent_config` is deprecated. Use inlined parameters instead.
`client_tools` is deprecated. Use `tools` instead.
inference> 400: Invalid value: Non supported ToolPromptFormat ToolPromptFormat.python_list
inference> 400: Invalid value: Non supported ToolPromptFormat ToolPromptFormat.python_list
Expected behavior
Should be able to run ReAct agent with 3.2 vision model. Somehow the error is from augment_messages_for_tools_llama_3_1
, I believe something is wrong.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working