Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add remote inference support with the use of Triton Inference Server #212

Open
MMelQin opened this issue Dec 1, 2021 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@MMelQin
Copy link
Collaborator

MMelQin commented Dec 1, 2021

Is your feature request related to a problem? Please describe.
App SDK currently supports inference within the application process itself. This is simple and efficient for some use cases, though when multiple applications/models are hosted in the "production" environment, remote inference service, e.g. Triton, may be needed, so that the heavy use of resource of inference itself can be centrally managed.
Describe the solution you'd like
Add remote inference support to the built-in inference operators in the App SDK, with runtime options, e.g. using strategy pattern, to support the choice of in proc, or remote inference.
Describe alternatives you've considered
One of the main reasons to use remote inference server, e.g. Triton, is to have dedicated model and required runtime resource management (scheduling and queuing inference requests), so that the application need to directly request local GPU and/or system memory. With the remote service, the whole application with in-proc inference, then need to be scheduled on servers running multiple applications or instances thereof, to ensure resources are available when the app requests them. Simpler with just system mem request, but GPU mem requests has to be properly managed (e.g. K8s fractional request on a visible GPU).
Additional context

@MMelQin MMelQin added the enhancement New feature or request label Dec 1, 2021
@vikashg
Copy link
Collaborator

vikashg commented Dec 1, 2021

If we add the Triton Support, we should also add a method to extract the names of the input and output nodes for creating the config.pbtxt file needed for Triton. @slbryson worked on it last month and he should have more notes on this.

Can we also create this config.pbtxt file automatically given a pytorch model ?

@vikashg vikashg self-assigned this Dec 1, 2021
@MMelQin
Copy link
Collaborator Author

MMelQin commented Dec 2, 2021

@vikashg Just to add a little more information after the monthly sync-up meeting with the Triton team. The Triton Inference Server can parse the metadata (tensor dims, data types etc) of all supported model types, except PyTorch for its inherent lack of such support, because of which, the Triton team actually filed an issue for PyTorch,, over a year ago, to embedded metadata in PyTorch.

Within MONAI, similar issue had also being discussed and an issue/PR was created to add model metadata in the model zip, as a non-standard way to convey the information, and it is really up to the model exporter to decide setting the metadata or not. Of course, model of unknown provenance will not adhere to this anyway.

I will file a separate ticket for Triton, specifying the need for it to load the TorchScript model, and parse out the tensor dims, and types (the tensor names are really Triton specific and can be chosen by the app dev); this will piggy back on Triton's request on PyTorch.

@ericspod
Copy link
Member

ericspod commented Dec 3, 2021

Hi @MMelQin and @vikashg, the PR I opened on MONAI would be a good fix for the lack of metadata, I can see this mechanism used to store information that Triton would use as well as huge variety and volume of other things relating to the model and its use context. On top of a metadata JSON file we could also include example notebooks or scripts in the Torchscript zip file. If you have any comments to add to the PR please do and I can revisit it to get it integrated into core if you think it's a good mechanism.

@MMelQin MMelQin self-assigned this Dec 8, 2021
@ericspod
Copy link
Member

ericspod commented Dec 9, 2021

I'll mention here that the core team is hashing out a format for stored models that would include more information than just metadata. We've started looking at MMAR and the experience with that, and comparing with how MLFlow, Huggingface, and others have tackled similar problems.

@ristoh
Copy link

ristoh commented Dec 9, 2021

@ericspod can you add a link to the PR or conversation from the core wg work you're referring to?

@dbericat
Copy link
Member

@CPBridge have a look at this.

@ericspod
Copy link
Member

We have an issue open for discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants