Inconsistent Batch Index Order in Decoupled Mode with trt-llm #2777

Oldpan · 2025-02-12T09:50:39Z

System Info

Question:

While using trt-llm (tensorrt_llm 0.17.0.dev2024121700 + Triton) with decoupled mode enabled and a batch size greater than 1, I observed an issue where the batch_index in the returned data does not always match the expected order of inputs.

For example, if I input [A, B, C] in a batch(batchsize=3), I expect the model to return [a, b, c] in the same order. However, in some cases, the output batch indices get shuffled:

data: {"batch_index":2, "model_name":"pinyin_nougat", "model_version":"1", "sequence_end":false, "sequence_id":0, "sequence_start":false, "text_output":"b"}
data: {"batch_index":1, "model_name":"pinyin_nougat", "model_version":"1", "sequence_end":false, "sequence_id":0, "sequence_start":false, "text_output":"c"}
data: {"batch_index":0, "model_name":"pinyin_nougat", "model_version":"1", "sequence_end":false, "sequence_id":0, "sequence_start":false, "text_output":"a"}

Here, the batch_index is incorrect, leading to an unexpected order of results.

Is this behavior expected in decoupled mode, or is there a way to ensure the output follows the correct sequence order?

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

none

Expected behavior

right order

actual behavior

wrong order

additional notes

none

The text was updated successfully, but these errors were encountered:

MahmoudAshraf97 · 2025-02-12T10:33:13Z

I guess it returns by the order of first completed, it doesn't state anywhere that the return order is guaranteed to be the same as the input order

Oldpan · 2025-02-12T13:55:54Z

I guess it returns by the order of first completed, it doesn't state anywhere that the return order is guaranteed to be the same as the input order

Does this mean we can only send one request at a time from the client? And we must also batch the requests on the trt-llm side?

MahmoudAshraf97 · 2025-02-12T14:18:25Z

you can send requests as usual, just make sure you either sort them or assert that the result that you choose out of all results is the one that you want, don't assume that they are ordered

Oldpan · 2025-02-12T14:37:12Z

you can send requests as usual, just make sure you either sort them or assert that the result that you choose out of all results is the one that you want, don't assume that they are ordered你可以像往常一样发送请求，只是确保你要么对请求进行排序，要么断言你从中选择的结果是你想要的，不要假设它们是有序的

Thanks for the reply.
Do I have to send the requests one by one in order? My scenario involves a VLM model, where an encoder is used, and batching is most efficient. After the encoder, a batch of features and the corresponding input_ids are passed into tensorrt_llm together. At this point, the batch of requests is sent simultaneously. So, I need the batch results from trt_llm to match the inputs.

MahmoudAshraf97 · 2025-02-12T14:44:39Z

I guess TRT-LLM already supports VLM in the executor API, so you don't need to have two separate models and handle the batching yourself

Oldpan · 2025-02-12T15:09:05Z

@MahmoudAshraf97
To my knowledge, VLM, or multimodal, has not yet been integrated into the executor. In the triton-tensorrt-llm-backend, the encoder model runs here: https://github.com/triton-inference-server/tensorrtllm_backend/blob/main/all_models/multimodal/multimodal_encoders/1/model.py. What you mentioned about being integrated into the executor might refer to the enc-dec
If I’m wrong, please correct me...

MahmoudAshraf97 · 2025-02-12T15:29:50Z

I did try executor API with enc-dec speech models, and it accepts encoder input, encoder input ids and decoder input ids, so I don't see why VLMs might not be supported, I don't know about triton tbh.

Oldpan · 2025-02-13T02:40:08Z

I'll try to find a way around this issue. If I make any progress, I'll update here.
@byshiue Can you help me check this issue? Thanks.

Oldpan added the bug Something isn't working label Feb 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Batch Index Order in Decoupled Mode with trt-llm #2777

Inconsistent Batch Index Order in Decoupled Mode with trt-llm #2777

Oldpan commented Feb 12, 2025

MahmoudAshraf97 commented Feb 12, 2025

Oldpan commented Feb 12, 2025

MahmoudAshraf97 commented Feb 12, 2025

Oldpan commented Feb 12, 2025

MahmoudAshraf97 commented Feb 12, 2025 •

edited

Loading

Oldpan commented Feb 12, 2025

MahmoudAshraf97 commented Feb 12, 2025

Oldpan commented Feb 13, 2025 •

edited

Loading

Inconsistent Batch Index Order in Decoupled Mode with trt-llm #2777

Inconsistent Batch Index Order in Decoupled Mode with trt-llm #2777

Comments

Oldpan commented Feb 12, 2025

System Info

Question:

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

MahmoudAshraf97 commented Feb 12, 2025

Oldpan commented Feb 12, 2025

MahmoudAshraf97 commented Feb 12, 2025

Oldpan commented Feb 12, 2025

MahmoudAshraf97 commented Feb 12, 2025 • edited Loading

Oldpan commented Feb 12, 2025

MahmoudAshraf97 commented Feb 12, 2025

Oldpan commented Feb 13, 2025 • edited Loading

MahmoudAshraf97 commented Feb 12, 2025 •

edited

Loading

Oldpan commented Feb 13, 2025 •

edited

Loading