Effect of `max_prompt_length` and `max_response_length` - seems that prompt truncation is not implemented, and this leads to vllm throws an exception: `The decoder prompt (length 42861) is longer than the maximum model length of 32768`

We hit vllm crashing with prompt length being larger of max_model_len (despite we had `max_prompt_length` set in trinity config)

I was trying to see in the code the effect of `max_prompt_length`, but it appears that `max_prompt_length` does not entail dataset filtering / prompt truncation:

https://github.com/search?q=repo%3Amodelscope%2FTrinity-RFT+max_prompt_length&type=code

I propose that `max_prompt_length` should lead to dataset filtering (or prompt truncation) during Trinity data prep + `vllm.LLM(..., truncate_prompt_tokens = max_prompt_length)` should also be set as the last resort - to prevent vllm from throwing an exception https://github.com/vllm-project/vllm/issues/16732)

Here is another related thing that seems to have a bug:
```python
 self.default_sampling_params = vllm.SamplingParams(
            n=1,
            temperature=0.0,
            max_tokens=config.max_response_tokens,
            min_tokens=1,
            skip_special_tokens=True,
            include_stop_str_in_output=False,
            output_kind=RequestOutputKind.FINAL_ONLY,
            logprobs=0,
        )
```

The problem is that vllm treats `max_tokens` as a limit on sum of len(prompt_tokens) + len(response_tokens), not only on response_tokens. So it should not then be surprising when actual len( response_tokens) is always much smaller and never reaches `max_response_tokens`

So this means that `max_response_tokens` can be set as high as `max_model_len`

 And also, len(prompt_tokens) + len(response_tokens) are limited by max_model_len (as Qwen3 seems not supporting sliding window in vllm yet).

---
Here is our original exception from vllm: `ERROR 08-17 23:32:15 scheduler.py:86] ValueError: The decoder prompt (length 42861) is longer than the maximum model length of 32768. Make sure that max_model_len is no smaller than the number of text tokens.`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Effect of `max_prompt_length` and `max_response_length` - seems that prompt truncation is not implemented, and this leads to vllm throws an exception: `The decoder prompt (length 42861) is longer than the maximum model length of 32768` #197

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Effect of max_prompt_length and max_response_length - seems that prompt truncation is not implemented, and this leads to vllm throws an exception: The decoder prompt (length 42861) is longer than the maximum model length of 32768 #197

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Effect of `max_prompt_length` and `max_response_length` - seems that prompt truncation is not implemented, and this leads to vllm throws an exception: `The decoder prompt (length 42861) is longer than the maximum model length of 32768` #197