How to predict a specific length of tokens? #1975

simmonssong · 2025-03-19T03:00:55Z

In llama.cpp, --n-predict option is used to set the number of tokens to predict when generating text/

I don't find the binding for that in doc.

The text was updated successfully, but these errors were encountered:

DanieleMorotti · 2025-03-19T07:54:55Z

Hi, the binding for that parameter is max_tokens.

simmonssong · 2025-03-21T01:17:24Z

max_tokens cannot ensure an exact number of predicted tokens. Sometimes, a model predicts less than max_tokens .

DanieleMorotti · 2025-03-21T08:24:11Z

Yes, and the --n-predict option in llama.cpp won't work unless you ignore the EOS token, as explained here. Thus, I don't know if it was what you were looking for, to sample until the --n-predict value is reached and then truncate.

I was not able to find such option on the high level API of this repo, maybe you can have a look at this example, that adopts the low level api.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to predict a specific length of tokens? #1975

How to predict a specific length of tokens? #1975

simmonssong commented Mar 19, 2025

DanieleMorotti commented Mar 19, 2025

simmonssong commented Mar 21, 2025

DanieleMorotti commented Mar 21, 2025

How to predict a specific length of tokens? #1975

How to predict a specific length of tokens? #1975

Comments

simmonssong commented Mar 19, 2025

DanieleMorotti commented Mar 19, 2025

simmonssong commented Mar 21, 2025

DanieleMorotti commented Mar 21, 2025