-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Pytorch model] Triton inference server didn't response the second request from client (only run with first request) #6593
Comments
Hi @nhthanh0809, I am wondering if the model supports batching?
|
Hi @kthui
Yes, It is. |
I checked diffrence of Triton server log between the first and the second request (in above log) and saw that the server exported this part of log for the first request:
But in the log of the second request, the server didn't log this part. |
[Update] I downgraded the triton server container to 22.09. The problem has been solved. |
Thanks for the update. I have filed a ticket to investigate if this is introduced after 22.09. |
Hi @nhthanh0809, apologize it took a while for the ticket to get triaged. I am able to replicate the server stops printing new logs at
(extracted from your second log) Are you still able to reproduce the issue with a later release of Triton, i.e. 24.02? The issue might have been resolved by the PyTorch team. If it is still reproducible with the latest release of Triton, can you provide a complete minimal reproduction?
|
Hello @kthui , I encountered the same issue in version 23.08-py3, but it was not present in version 24.02. I'm not sure which version fixed this problem. The following is the log when terminal:
|
Hi @vonchenplus, I think the quickest way to find the version that fixed the problem is to check every release from 23.08 and up using the pre-built containers from NGC until you find the one that is working. Otherwise, you could also reach out to the PyTorch team for help. |
Hi @nhthanh0809, without the complete reproduction, we are not able to pinpoint if the issue is in the PyTorch framework or somewhere else. Please feel free to re-open this issue if you need this followed-up. |
I encountered the same issue in two newest versions 24.02-py3 and 24.03-py3. I think this issue is model-dependent. |
I found that this issue only occur when running on GPU. Inference on CPU is ok. |
We encountered the same issue. After several attempts, we found using |
Description
Hi all,
I got this problem when running my own pytorch model (is converted to Torchscript) on Triton inference server.
I followed to this guide to run my own pytorch model on triton https://github.com/triton-inference-server/tutorials/tree/main/Quick_Deploy/PyTorch
Triton Information
I'm using Triton 23.10, cuda 12.2, nvidia driver version 535.129.03
I've tested on GTX 1090 Ti and GTX 3080 and got the same the problem
I'm using triton inference container: nvcr.io/nvidia/tritonserver23.10-py3
Here is my configuration for my model
My client code:
With the first request from client, the above client code will print shape of input and shape of output as below:
However, with the another request, the server didn't response anything, leading to Timeout error in the client side.
I've check log in the Triton server then get below log:
Expected behavior
With every request from client, the Triton must response the same output.
But in my problem, the server only run well with the first request but other request.
The text was updated successfully, but these errors were encountered: