Skip to content

Effect of max_prompt_length and max_response_length - seems that prompt truncation is not implemented, and this leads to vllm throws an exception: The decoder prompt (length 42861) is longer than the maximum model length of 32768 #197

@vadimkantorov

Description

@vadimkantorov

We hit vllm crashing with prompt length being larger of max_model_len (despite we had max_prompt_length set in trinity config)

I was trying to see in the code the effect of max_prompt_length, but it appears that max_prompt_length does not entail dataset filtering / prompt truncation:

https://github.com/search?q=repo%3Amodelscope%2FTrinity-RFT+max_prompt_length&type=code

I propose that max_prompt_length should lead to dataset filtering (or prompt truncation) during Trinity data prep + vllm.LLM(..., truncate_prompt_tokens = max_prompt_length) should also be set as the last resort - to prevent vllm from throwing an exception vllm-project/vllm#16732)

Here is another related thing that seems to have a bug:

 self.default_sampling_params = vllm.SamplingParams(
            n=1,
            temperature=0.0,
            max_tokens=config.max_response_tokens,
            min_tokens=1,
            skip_special_tokens=True,
            include_stop_str_in_output=False,
            output_kind=RequestOutputKind.FINAL_ONLY,
            logprobs=0,
        )

The problem is that vllm treats max_tokens as a limit on sum of len(prompt_tokens) + len(response_tokens), not only on response_tokens. So it should not then be surprising when actual len( response_tokens) is always much smaller and never reaches max_response_tokens

So this means that max_response_tokens can be set as high as max_model_len

And also, len(prompt_tokens) + len(response_tokens) are limited by max_model_len (as Qwen3 seems not supporting sliding window in vllm yet).


Here is our original exception from vllm: ERROR 08-17 23:32:15 scheduler.py:86] ValueError: The decoder prompt (length 42861) is longer than the maximum model length of 32768. Make sure that max_model_len is no smaller than the number of text tokens.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions