-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent Batch Index Order in Decoupled Mode with trt-llm #2777
Comments
I guess it returns by the order of first completed, it doesn't state anywhere that the return order is guaranteed to be the same as the input order |
Does this mean we can only send one request at a time from the client? And we must also batch the requests on the trt-llm side? |
you can send requests as usual, just make sure you either sort them or assert that the result that you choose out of all results is the one that you want, don't assume that they are ordered |
Thanks for the reply. |
I guess TRT-LLM already supports VLM in the executor API, so you don't need to have two separate models and handle the batching yourself |
@MahmoudAshraf97 |
I did try executor API with enc-dec speech models, and it accepts encoder input, encoder input ids and decoder input ids, so I don't see why VLMs might not be supported, I don't know about triton tbh. |
I'll try to find a way around this issue. If I make any progress, I'll update here. |
System Info
Question:
While using trt-llm (tensorrt_llm 0.17.0.dev2024121700 + Triton) with decoupled mode enabled and a batch size greater than 1, I observed an issue where the batch_index in the returned data does not always match the expected order of inputs.
For example, if I input
[A, B, C]
in a batch(batchsize=3), I expect the model to return[a, b, c]
in the same order. However, in some cases, the output batch indices get shuffled:Here, the batch_index is incorrect, leading to an unexpected order of results.
Is this behavior expected in decoupled mode, or is there a way to ensure the output follows the correct sequence order?
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
none
Expected behavior
right order
actual behavior
wrong order
additional notes
none
The text was updated successfully, but these errors were encountered: