You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @NikolaBorisov , thank you for reporting the issue. It likely happens due to the fact that one request has some value set in the samplingConfig, while the other request does not have it. Could you, please, either confirm or deny that this is the case in your setup? If it is the case, the temporary fix is, on the caller side, to enforce that either all requests or none of them specify parameter to sampling config.
Meanwhile, we'll work on the fix from our side, thanks
System Info
H100 x8 SXM 80G, 2TB Ram, x86, main branch of TRTLLM
Who can help?
@byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Run Triton with tensorrtLLM on 8x H100 server with mistral 8x22b model
Expected behavior
No crashes
actual behavior
at some point the server prints an error
We are seeing a crash in samplingConfig.h:46.
After this crash the server continues to work, but batch size is limited to 2. It prints the above error number of times.
additional notes
Causes server to be unsuable
The text was updated successfully, but these errors were encountered: