-
Notifications
You must be signed in to change notification settings - Fork 8.9k
Description
Feature description
Enter your prompt (or 'exit'/'quit' to quit): 你好
2025-03-09 17:31:06.118 | WARNING | main:main:19 - Processing your request...
2025-03-09 17:31:06.120 | INFO | app.agent.base:run:137 - Executing step 1/30
2025-03-09 17:31:06.249 | ERROR | app.llm:ask_tool:260 - API error: Error code: 400 - {'object': 'error', 'message': '"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set', 'type': 'BadRequestError', 'param': None, 'code': 400}
Your Feature
vllm info:
(venv) userroot@userroot-NF5568M4:/mnt/mydrive$ pip show vllm
Name: vllm
Version: 0.6.3.post1
Summary: A high-throughput and memory-efficient inference and serving engine for LLMs
Home-page: https://github.com/vllm-project/vllm
Author: vLLM Team
Author-email:
License: Apache 2.0
Location: /home/userroot/Project/AI/qwen25/venv/lib/python3.8/site-packages
Requires: aiohttp, compressed-tensors, einops, fastapi, filelock, gguf, importlib-metadata, lm-format-enforcer, mistral-common, msgspec, numpy, nvidia-ml-py, openai, outlines, partial-json-parser, pillow, prometheus-client, prometheus-fastapi-instrumentator, protobuf, psutil, py-cpuinfo, pydantic, pyyaml, pyzmq, ray, requests, sentencepiece, tiktoken, tokenizers, torch, torchvision, tqdm, transformers, typing-extensions, uvicorn, xformers
Required-by:
model info:
model:qwq-32b-awq
startup command:
(venv) userroot@userroot-NF5568M4:/mnt/mydrive$ export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
(venv) userroot@userroot-NF5568M4:/mnt/mydrive$ python -m vllm.entrypoints.openai.api_server --model /mnt/mydrive/Models/llms/qwq-32b-awq/ --served-model-name qwen25-32b --trust-remote-code --max-model-len 32768 -tp 2 --gpu-memory-utilization 1 --enforce-eager --port 6657
INFO: Uvicorn running on socket ('0.0.0.0', 6657) (Press CTRL+C to quit)
INFO 03-09 17:37:49 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO 03-09 17:37:59 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO: 192.168.1.5:64732 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
INFO: 192.168.1.5:64736 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
INFO: 192.168.1.5:64738 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request
INFO 03-09 17:38:09 metrics.py:349] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
INFO: 192.168.1.5:64754 - "POST /v1/chat/completions HTTP/1.1" 400 Bad Request