MAX_BATCH_TOKENS parameter #367
Unanswered
maioranisimone
asked this question in
Q&A
Replies: 1 comment
-
I would also like to know more about this parameter. I just tested with all-minilm-l6-v2 on an L4 GPU and actually setting this parameter to 2048 increased throughput significantly compared to the default setting of 16K. Why does throughput decrease beyond and below this value for this model? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
hello everyone, I would like to ask where I can find the maximum value to assign to max_batch_tokens parameter. I read that text-embeddings-inference cannot find this value automatically, where is this value declared in the models present on Hugging Face?
Thanks in advance.
Beta Was this translation helpful? Give feedback.
All reactions