[FEA] Add remote inference support with the use of Triton Inference Server

**Is your feature request related to a problem? Please describe.**
App SDK currently supports inference within the application process itself. This is simple and efficient for some use cases, though when multiple applications/models are hosted in the "production" environment, remote inference service, e.g. Triton, may be needed, so that the heavy use of resource of inference itself can be centrally managed.    
**Describe the solution you'd like**
Add remote inference support to the built-in inference operators in the App SDK, with runtime options, e.g. using strategy pattern, to support the choice of in proc, or remote inference.
**Describe alternatives you've considered**
One of the main reasons to use remote inference server, e.g. Triton, is to have dedicated model and required runtime resource management (scheduling and queuing inference requests), so that the application need to directly request local GPU and/or system memory. With the remote service, the whole application with in-proc inference, then need to be scheduled on servers running multiple applications or instances thereof, to ensure resources are available when the app requests them. Simpler with just system mem request, but GPU mem requests has to be properly managed (e.g. K8s fractional request on a visible GPU).
**Additional context**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Add remote inference support with the use of Triton Inference Server #212

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Add remote inference support with the use of Triton Inference Server #212

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions