-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enc-Dec C++ Runtime Paged KV - Inflight Batching output junks while inference with multiple input texts #1753
Comments
just reproduced on our end. Investigating now |
Hi @thanhlt998 , can you post GPU specs? Somehow we can randomly reproduce this on NVLink H100, but not on PCIe H100? Are you using a NVL machine? |
Hi @symphonylyh , I am using one |
@thanhlt998 fixed. It was due to missing cuda stream synchronization between encoder stream and decoder stream. The fix will be released in next week's weekly main branch update |
@symphonylyh, thanks for your support! |
@symphonylyh, I found the latest PR merged yesterday. Was the fix included in that PR? |
@thanhlt998 When I attempt to do this the model runner seems to look directly in the engine directory for the config files rather than in engine_dir/encoder and engine_dir/decoder. What does the config.json file you have located directly in your engine_dir look like? |
For this issue, if I want to quickly modify the code, which part should I change? I look forward to your reply. |
Hi @thanhlt998 do u still have further issue or question now? If not, we'll close it soon. |
I try inference my T5 model with C++ runtime used Paged KV at the commit
b777bd64750abf30ca7eda48e8b6ba3c5174aafd
. Its result is normal when inference with single input text, but with multiple input texts the outputs are something weird.My T5 model config:
I followed the README at enc-dec example folder:
convert checkpoint
build engine
Run C++ runtime with the built engine:
1st try
command
output
2nd try: just change the order of input texts
command
output
May it be some bugs in the release of C++ runtime + inflight batching for Enc-Dec model?
The text was updated successfully, but these errors were encountered: