[Usage]:Input prompt (2501 tokens) is too long and exceeds limit of 2048

### Your current environment

```text
env: 16*H800
model:deepseekr1
version:0.7.2
start scrpts:python3 -m vllm.entrypoints.openai.api_server --host 0.0.0.0 --port 80 --max-model-len 128000 --trust-remote-code --pipeline-parallel-size 2 --tensor-parallel-size 8 --gpu-memory-utilization 0.8  --served-model-name deepseek  --model /mnt/workspace/models/public-models/llm/DeepSeek-R1


“WARNING 02-16 15:37:32 scheduler.py:947] Input prompt (2501 tokens) is too long and exceeds limit of 2048
INFO 02-16 15:37:32 metrics.py:453] Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 9.3 tokens/s, Running: 0 reqs, Swapped: 0 reqs, Pending: 0 reqs, GPU KV cache usage: 0.0%, CPU KV cache usage: 0.0%.
”
```


### How would you like to use vllm

Which parameter of the startup command can solve the problem

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]:Input prompt (2501 tokens) is too long and exceeds limit of 2048 #13370

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]:Input prompt (2501 tokens) is too long and exceeds limit of 2048 #13370

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions