[Question] What is the recommended way to run Triton ? #5981
-
I am running Triton behind a reverse proxy and flask. For context we process tens of millions of images per day at a sub second latency on traditional vision tasks and diffusion related tasks |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 8 replies
-
cc @GuanLuo @tanmayv25 on "how other people run triton in practice and also what is the recommended way". |
Beta Was this translation helpful? Give feedback.
-
@MatthieuTPHR Can you elaborate more on this? Triton has a very wide adoption. Depending upon the use-case, customer use Triton as an inference microservice. For this Triton gRPC and HTTP end-points can be utilized. The users can also write their own service/application and link it to Triton shared library using C API interface. If you would like to learn more about the optimization then you can consult: |
Beta Was this translation helpful? Give feedback.
For Python gRPC, each instance of InferenceServerClient creates a new channel. It does not reuse the channel.
There is a single channel connection per InferenceServerClient. The same channel is used for all requests (all model infer requests and also non-infer requests are pushed through the same channel)
We don't have a specific recommendation to follow. However, our C++ client library is more…