-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf_client has limits on concurrency #14
Comments
Is there any particular reason that you are using the -a flag. That causes a single thread to attempt to drive the entire load and that will eventually be limited by that thread. Try running perf_client without -a and specify a larger -t value and see if that doesn't increase throughput. Also are you aware of the -d flag which allows perf_client to automatically search and create a latency vs. infer/sec curve. See the documentation for more information on that. |
This is the only way I could make it work at all. I was getting timeouts without -a. I can try again next week. |
I managed to get perf_client working with high concurrency without the As to my original issue with concurrency: I can now exceed the results previously obtained with |
For the warmup-phase problem we will look into removing that from the measurement window so you don't need to use a large -p value. For large concurrency values with -a it is possible that the single thread used with -a is unable to keep up with handing the requests and responses for 72 requests. |
I cannot max out my two GPUs using one instance of perf_client, but using multiple instances I can.
A little bit more detail:
Using perf_client with GRPC and -a, the throughput does not increase when I go beyond -t 9.
But using four instances with -t 9 simultaneously for the same model, I get much higher total throughput. This is can be verified using trtis metrics.
Is this expected behavior? I can share the precise command that I use when I am back at my desk, but I cannot share the model that I use.
The text was updated successfully, but these errors were encountered: