Template for deploying any HuggingFace pipeline for supported tasks with Torchserve #1818

tripathiarpan20 · 2022-08-27T06:08:04Z

Hi!
Lately I have been working on my repo that contains a template to deploy any Huggingface model supported by the pipeline, where pipeline is a simple-to-use abstraction provided by HF. The repo also includes copy-paste commands in READMEs for AWS EC2 instance.

Although I have only focused on deploying model with PyTorch backend as I would be adding scripts to deploy Torchscripted & LLM.Int8 pipeline models soon. Moreover, TF models present in an HF repo (example of an HF repo) can also be deployed by changing the framework attribute while initialising pipeline.

I have also tried to make the repo as beginner friendly as possible by including comments, references and compact code. There are also plans to integrate the HuggingFace optimum library that integrates elegantly with pipeline, so by extension, it would integrate well with my repo too with a few short scripts.

My repo could be useful to the open-source community and I believe it would reach a greater audience if added to the News section and/or examples/Huggingface_Transformers.

Thanks.

The text was updated successfully, but these errors were encountered:

msaroufim · 2022-08-27T17:33:02Z

Hi @tripathiarpan20 thank you for sharing this. I'm wondering how does the present state differ from the work here? https://github.com/pytorch/serve/tree/master/examples/Huggingface_Transformers

Ultimately if we choose to merge in your work I'd like us to have 1 recommended way of deploying HF models

cc: @HamidShojanazeri

tripathiarpan20 · 2022-08-27T18:45:33Z

Hi @msaroufim ,
The main difference is that examples/Huggingface_Transformers does not use the HuggingFace pipeline abstraction at all, and thus supports only a limited number of tasks, meanwhile pipeline can assert the required AutoTokenizer, AutoConfig and the AutoModel class corresponding to the selected task, out of any listed tasks with cleaner code. Hence, the work here can be used as a template to support the remaining tasks not supported by examples/Huggingface_Transformers with a few modifications to the handler, which are explained here.

Moreover, the examples/Huggingface_Transformers does not have the example of inference requests with Docker, which is important for production purposes, the work aims to provide simple copy-paste commads to handle the Docker part for the beginners (which many HuggingFace users are, given how easy-to-use HF is).

The work currently has code for image-classification task with MobileViT XX Small model which was deployable even on the weakest AWS EC2 instance (t2.micro) with average single inference time of 126 ms (when max_batch_delay for the registered model was set to 10ms).

msaroufim · 2022-08-27T18:50:23Z

Ok that clarifies things, thank you. please feel free to make your PR directly. I think for now we can focus on improving the support for our HF models and once the PR is in we can work together to publicize the work. Please add @HamidShojanazeri and myself as reviewers for your PRs

tripathiarpan20 · 2022-08-27T19:03:52Z

Thanks, how exactly do you suggest I should make the PR? like should I make a new folder in examples/Huggingface_Transformers called docker_integration or something?

Another point I missed is that the work utilises the shared memory feature of mounted volumes in Docker to keep the .mar file only have the size of the handler file (by passing in dummy/empty file to the --serialized-file attribute during torch-model-archiver and having the handler file utilize model checkpoints only from HF-models folder (Ctrl-F in this readme for "HF-models" for context)).

This might be important in scenarios with LLMs like BLOOM that might take large space in disk and cause problems on low-storage machines if a copy of model checkpoint file is made during model archiving process.

msaroufim · 2022-08-27T19:10:32Z

We can split the work into

Add support for pipeline in existing handler code, refactor and simplify as needed
Docker support for large models with its own sub folder

tripathiarpan20 · 2022-08-27T19:38:37Z

I have raised the PR, we can plan how to integrate the pipeline in the code there.

msaroufim added enhancement New feature or request triaged_wait Waiting for the Reporter's resp labels Aug 27, 2022

msaroufim assigned HamidShojanazeri Aug 27, 2022

tripathiarpan20 mentioned this issue Aug 27, 2022

HF pipeline integration - Part 1 #1822

Closed

2 tasks

tripathiarpan20 closed this as completed Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Template for deploying any HuggingFace pipeline for supported tasks with Torchserve #1818

Template for deploying any HuggingFace pipeline for supported tasks with Torchserve #1818

tripathiarpan20 commented Aug 27, 2022 •

edited

Loading

msaroufim commented Aug 27, 2022

tripathiarpan20 commented Aug 27, 2022 •

edited

Loading

msaroufim commented Aug 27, 2022

tripathiarpan20 commented Aug 27, 2022 •

edited

Loading

msaroufim commented Aug 27, 2022

tripathiarpan20 commented Aug 27, 2022

Template for deploying any HuggingFace pipeline for supported tasks with Torchserve #1818

Template for deploying any HuggingFace pipeline for supported tasks with Torchserve #1818

Comments

tripathiarpan20 commented Aug 27, 2022 • edited Loading

msaroufim commented Aug 27, 2022

tripathiarpan20 commented Aug 27, 2022 • edited Loading

msaroufim commented Aug 27, 2022

tripathiarpan20 commented Aug 27, 2022 • edited Loading

msaroufim commented Aug 27, 2022

tripathiarpan20 commented Aug 27, 2022

tripathiarpan20 commented Aug 27, 2022 •

edited

Loading

tripathiarpan20 commented Aug 27, 2022 •

edited

Loading

tripathiarpan20 commented Aug 27, 2022 •

edited

Loading