Skip to content

Conversation

@mayabar
Copy link
Collaborator

@mayabar mayabar commented Jul 24, 2025

Currently, responses are generated by randomly selecting one sentence from a predefined list. As a result, even when max_token is set to a large value, long messages are never returned.

The suggestion to use a lorem ipsum generator was declined because the library we identified can produce only up to 191 words; any request for a larger word count causes it to panic.

New behavior:

  • Echo mode: The input text is returned as-is, if max_tokens or max_completion_tokens is set to value higher than the input tokens number, if max_tokens or max_completion_tokens is lower - the input will be trimmed. Useful for testing where we need to know exact response in advance.
  • Random mode:
    • if max_tokens or max_completion_tokens is specified, sentences are selected from the predefined collection until the token count reaches the limit
    • if number of output tokens is not defined in the request, a random token count is generated for the response. This count is based on a Gaussian distribution with mean of 40 and standard deviation of 20, capped by a maximum response length which is currently set to 128 tokens.

Additional changes:

  • Use tokenize function which divide text by space and additional characters in request processing too (not only in tools related part)
  • Validate max_token and max_completion_token as request arrived and return 400 status if it's invalid
  • Protect generating any random value with mutex
  • Fix test for the changes above + add test for random texts creation

…racters in request processing too (not only in tools related part)

- Validate max_token and max_completion_token as request arrived
- Protect generating any random value with mutex
- Fix test for the changes above + add test for random texts creation

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
…at it could be built from the predefined parts

Signed-off-by: Maya Barnea <mayab@il.ibm.com>
@mayabar mayabar requested a review from shmuelk July 27, 2025 04:56
Signed-off-by: Maya Barnea <mayab@il.ibm.com>
Copy link
Collaborator

@shmuelk shmuelk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

/approve

@mayabar mayabar merged commit 2b4a79a into llm-d:main Jul 27, 2025
2 checks passed
@mayabar mayabar deleted the long-responses branch July 29, 2025 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants