Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HF pipeline integration - Part 1 #1822

Closed
wants to merge 17 commits into from

Conversation

tripathiarpan20
Copy link

@tripathiarpan20 tripathiarpan20 commented Aug 27, 2022

@msaroufim @HamidShojanazeri

As discussed in #1818 , I have uploaded all the relevant files so far.

Tasks:

  • Add preliminary support for HF pipeline in existing handler code, refactor and simplify as needed
  • Docker support for large models with its own sub folder

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @tripathiarpan20, I'm excited about merging this in and improving the coverage we have for our HuggingFace models but I have a few concerns still

  1. You need to decide if you'd rather create a third party library or if you want your changes natively integrated in TS. Typically it's always a challenge to add a new dependency because I need to think about the long term maintenance, if you're opening a PR I assume you're looking for a native integration
  2. If you decide on a native integration, we need to generalize the handler a bit more so it works without the need to comment out code per task. Any of the tasks mentioned in your dict need to just work otherwise we don't mention them yet
  3. We need to combine your work with this https://github.com/pytorch/serve/blob/master/examples/Huggingface_Transformers/Transformer_handler_generalized.py#L76-L81 which had 3 tasks. We need to have one obvious way to run HuggingFace models for users. Your approach seems to have higher coverage but is missing a lot more details that @HamidShojanazeri worked on

I think this can be an amazing PR, I love the idea of working with any task or downloading any model form the hub but it just needs a bit more work to make that experience a reality

@tripathiarpan20
Copy link
Author

tripathiarpan20 commented Aug 30, 2022

@msaroufim ,
I have made a PR with cleanups and a ground basis on which the rest of the progress in this PR can be structured, that ground basis is that the whole code block in TLDR section of the current Image_classification_docker/README must work!

The basis on which that would work are:

  • Transformer_handler_generalized.py would take input args --task and --output-file, i.e, running the script with these args would output a .py file with a handler specialised for the task as output file, this would be used by the create_hf_handler.sh script as seen in TLDR section in README above.
    Edit: Above idea was discarded, it is implemented with create_hf_handler.sh instead.
  • The above would be implemented by almost-total restructuring of Transformer_handler_generalized.py to resemble the structure of reference ViTXXSmall handler (remember this point though). We have to figure out how to do that without losing the current utilities of Transformer_handler_generalized.py like FasterTransformer, do_lower_case, save_mode, model_parallel etc, need to heavily refer pipeline docs.
  • As a starting point, we can implement the Transformer_handler_generalized.py to support both 'image-classification' and 'sentiment-analysis' tasks with conditionals in preprocess and postprocess (as these two tasks have already been implemented and tested by me before in ViTXXSmall handler & DistilBERT sentiment analysis).
  • The create_hf_handler.sh script currently includes only the above two tasks as supported (Line 5 here) and throws error otherwise.
  • Once the above two tasks are fully implemented, we can extend to the remaining tasks supportable by the pipeline one-at-a-time as follows:
    • Implement appropriate preprocess and postprocess conditionals by taking reference to default handlers as much as possible.
    • Be vary of exceptions.

TODO now:

  • Verify the consistency of the rest of the Image_classification_docker/README.md after the Transformer_Handler_generalized.py code is restructured appropriately.
  • Implement the handler conditional statements for Image Classification & Sentiment Analysis tasks in Transformer_handler_generalized.py script with reference to this.

@tripathiarpan20
Copy link
Author

tripathiarpan20 commented Aug 31, 2022

@msaroufim
The commits so far implement the approach in a new file Pipeline_handler_generalized.py, it is also completely tested with the code block in TLDR section of Image_classification_docker/README with MobileViTXXSmall, I am also able to get inference outputs by following the TLDR section:
image

We don't need to worry about the accuracy of model predictions as they solely depend on the code within the pipeline, the job of Pipeline_handler_generalized.py is to just preprocess POST inference requests to a format (List[inputs]) that is a valid input to Huggingface pipeline, and to postprocess back to a serializable form to return inference request outputs.

The code for sentiment-analysis is currently incomplete.

The following points are important as of now:

  • Before moving on to the other pipeline tasks, we have to try to integrate functionalities from examples/Huggingface_transformers/setup_config.json, I have these thoughts about them:
    • num_labels might be redundant as the pipeline figures it out by itself given appropriate task, however, we can support passing arguments to task specific pipelines like return_all_scores with **kwars that would get mapped while initialising pipeline, for example: pipe = pipeline(task="fill-mask", model="bhadresh-savani/distilbert-base-uncased-emotion", return_all_scores=True, device = device_id) (docs).
    • do_lower_case, save_mode, max_length, FasterTransformer & model_parallel might be integratable by adding a few lines to initialize of the handler.
    • As for captum_explanation & embedding_name, I have never used Captum before and it seems that implementing explanations in the script for each task would take some more work, so for the time being it is left as future work. However, it can be implemented in a modular way by implementing a new class ExplanationsFactory for usage in explain_handle method of BaseHandler in Pipeline_handler_generalized.py (similar to HFPipelinePreprocessFactory / HFPipelinePostprocessFactory but with this control flow).

From there, we can move on to implement other tasks by following these steps for each task:

  • Implement appropriate methods in HFPipelinePreprocessFactory and HFPipelinePostprocessFactory of Pipeline_handler_generalized.py for the task, can refer to default handler code.
  • Make appropriate changes in inference method of Pipeline_handler_generalized.py for special cases requiring chunk batching or something.
  • Modify pipeline_supported_tasks in Pipeline_handler_generalized.py & SUPPORTED_TASKS in create_hf_handler.sh.

@tripathiarpan20
Copy link
Author

@msaroufim @HamidShojanazeri
So far we have integrated Text & Image classification into the pipeline (any HF model on the hub that can be used in HF pipeline for these tasks can be deployed on Torchserve now) and added Image_classification_docker example whose README describes the approach in this PR with copy-pasteable commands, it might be useful for hosting LLMs on Torchserve with Docker without creating copies of it to save disk storage (with shared volume feature of Docker). Moreover, either PyTorch or Tensorflow models can be deployed with the framework attribute in pipeline initialization, given that the HF hub model has the relevant checkpoint file.
The following points are yet to be addressed:

  • Updating Huggingface_Transformers/README according to the revisions made to the code in this PR, Huggingface_Transformers/Image_classification_docker/README can be taken as reference for the same, along with commands specific to running server locally & on relevant Docker container (CPU/GPU/IPEX etc).
  • Doing necessary cleanups in Huggingface_Transformers.
  • Adding support for task specific arguments through setup_config.json (as the implementation before this PR did) and adding support for the current functionalities like:
    • save_mode for a Torchscript model might be implemented by doing something with the pipe.model attribute in initialize method of handler in Pipeline_handler_generalized.py.
    • model_parallel and FasterTransformer might be implemented similar as above points, but it's not currently clear how.
    • On the bright side, other major inference optimizations can be integrated in the future with Optimum pipelines.
    • Support for Captum is depreciated as per this PR, but it might be integrated for each task as mentioned in my previous comment in this thread. Necessary changes/removals in the repo might be required, like in examples/captum.
  • Adding support for remaining tasks in pipeline, they would require support for multiple inputs (like for question-answering) and maybe even multiple outputs , some edge cases that require batch chunking would need to be handled too. Default handlers can be referred for implementing preprocess directly for most tasks (as is done in the PR for image-classification & text-classification with VisionHandler and TextClassifier). Appropriate example commands have to be added for each task to the README, like in Image_classification_docker/README.

I currently have to work on learning how to integrate Torchserve with the AWS ecosystem so I would only be able to contribute the work done so far, the above points are left for future to be implemented by maintainers of this repo, I hope that's fine.

Requesting a review to finalize the PR.

In case there are any questions in the future, I can be contacted on mail: tripathiarpan20@gmail.com , or via any contact information in my Github profile.

@tripathiarpan20 tripathiarpan20 requested review from msaroufim and removed request for HamidShojanazeri September 1, 2022 04:56
@tripathiarpan20
Copy link
Author

tripathiarpan20 commented Sep 1, 2022

I don't know how removed request for @HamidShojanazeri showed up, totally unintentional.

@msaroufim
Copy link
Member

msaroufim commented Sep 1, 2022

No worries, I readded @HamidShojanazeri as an FYI since he's done a lot in this space and added @agunapal as the secondary official reviewer

@tripathiarpan20 tripathiarpan20 changed the title HF pipeline integration HF pipeline integration - Part 1 Sep 2, 2022
@tripathiarpan20
Copy link
Author

No worries, I readded @HamidShojanazeri as an FYI since he's done a lot in this space and added @agunapal as the secondary official reviewer

Are there any feedbacks I could help with yet?

Copy link
Member

@msaroufim msaroufim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please bare with me since this will be an important PR in our next release notes, I have tons of feedback but only because I really want this PR to be merged and used by tons of people. Thank you for your patience

TASK=""
OUTFILE=""
#TODO: Add tasks supported by `Transformer_handler_generalized.py` to this list progressively
SUPPORTED_TASKS=("image-classification" "sentiment-analysis")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this all the extra tasks we're supporting? I found your PR appealing because it would have expanded the scope drastically relative to the status quo

Copy link
Author

@tripathiarpan20 tripathiarpan20 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently only these two tasks are written in this list as the other tasks are not implemented yet, as explained here, all the other tasks supported by HF pipeline can be implemented by following similar methodology as these two tasks (that are implemented currently) and it is left to the maintainers of the repo to do so.

Edit: Explained in more details below.

;;
-f|--framework)
FRAMEWORK="$2"
if ! (echo "${SUPPORTED_FRAMEWORKS[@]}" | fgrep -wq "$FRAMEWORK") ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I haven't spent too much time testing how torchserve works out of the box with a tensorflow model. Were you able to get this working?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I haven't really tested it yet, but if the documentations are to be trusted, if the Huggingface Repo has a Tensorflow checkpoint, pipeline would do it for you out of the box if framework="tf" during pipeline initialisation.

Copy link
Author

@tripathiarpan20 tripathiarpan20 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moreover, it seemed like a Torchserve handler only needs a breakdown into initialize, preprocess and postprocess steps, as long as it is possible to do, Torchserve should be able to serve any model? Let's say, a model ported to ONNX runtime using Huggingface optimum library, or TFLite/CoreML using Huggingface exporters library?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we haven't tested a tf model with this workflow it's best to remove the argument and reenable it later

docs/FAQs.md Outdated
@@ -54,7 +54,7 @@ Yes, you can deploy Torchserve in Kubernetes using Helm charts.
Refer [Kubernetes deployment ](../kubernetes/README.md) for more details.

### Can I deploy Torchserve with AWS ELB and AWS ASG?
Yes, you can deploy Torchserve on a multi-node ASG AWS EC2 cluster. There is a cloud formation template available [here](https://github.com/pytorch/serve/blob/master/cloudformation/ec2-asg.yaml) for this type of deployment. Refer [ Multi-node EC2 deployment behind Elastic LoadBalancer (ELB)](https://github.com/pytorch/serve/tree/master/cloudformation#multi-node-ec2-deployment-behind-elastic-loadbalancer-elb) more details.
Yes, you can deploy Torchserve on a multi-node ASG AWS EC2 cluster. There is a cloud formation template available [here](https://github.com/pytorch/serve/blob/master/examples/cloudformation/ec2-asg.yaml) for this type of deployment. Refer [ Multi-node EC2 deployment behind Elastic LoadBalancer (ELB)](https://github.com/pytorch/serve/blob/master/examples/cloudformation#multi-node-ec2-deployment-behind-elastic-loadbalancer-elb) more details.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's revert changes to this file

echo "Enter a valid output path that ends with `.py`"
exit 1
fi
mkdir -p "$(dirname "$2")" && cp Pipeline_handler_generalized.py $2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intent of this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a copy of the Handler that has the task and framework variable set in these two lines.

The line 39 creates a copy of the Pipeline_handler_generalized.py handler & line 41 alters the placeholder variables for task and framework in the copy that's finally used as the handler.

You can see how this is put into action in TLDR section of Image_classification_docker.

@@ -0,0 +1,45 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this file is needed since we shouldn't be creating a handler per task. Ideally we have a single handler that works well already otherwise we'd be guessing at inference time whether something actually works or not

Copy link
Author

@tripathiarpan20 tripathiarpan20 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That might be true given more experience with CLI/Docker/REST API etc, which I don't seem to have enough of yet.

This felt like the most straightforward thing to do, i.e, to create a copy of the Pipeline_hander_generalized.py but with task and framework variable set as per user needs, this altered copy is finally used as the Handler for the model.

Feel free to alter this workflow later.

"text-classification"
]

task="<PLACEHOLDER>"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need placeholders?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned above, it might be done in another way and feel free to do so later.



#Reference: https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.pipeline.task
pipeline_supported_tasks= [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity: if I didn't make you worry about unifying this file with Transformer_handler_generalized.py could you expand the supported pipelines drastically? For example I was super happy to see image classification supported and would be really lovely so see an audio one for example since that's been a recurring ask if you go through our Github issues

So if the answer is yes, then maybe we can worry about unificiation later

Copy link
Author

@tripathiarpan20 tripathiarpan20 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well yes actually, if we totally forget the supported things in Transformer_handler_generalized.py like FasterTransformer, do_lower_case, save_mode, model_parallel etc whose possible integrations with pipeline didn't seem really straightforward to me at that time, I believe that literally all the tasks in the Huggingface pipeline can be supported and could be deployable to Torchserve.

As I mentioned earlier, the integration of remaining tasks is left to maintainers of the repo and it can be done in a step-by-step manner for these tasks one at a time:

  • See how Image-classification and Text-classification is supported in the code of Pipeline_handler_generalized.py.
  • Read documentation of the particular task in pipeline docs to figure out what kind of input it expects (example: For text-classification it expects a list of text, for image-classification it expects either list of image paths or list of PIL Image instances).
  • Implement an appropriate function in HFPreprocessDispatcher to deserialize inputs and output a format expected by the pipeline as discovered in above step, can copy preprocess code from relevant default handlers (except Audio does not have a Default Handler so I'm clueless, but there must be some way to preprocess it)
  • Implement postprocessing if necessary in HFPostprocessDispatcher.

I would have implemented more functions but I'm needed elsewhere.

Copy link
Member

@msaroufim msaroufim Sep 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this PR includes

  1. A pipeline handler that works for sentiment analysis and image classification
  2. Helper scripts that create a mar file
  3. Some documentation to run the above with Docker

I like that we can do something new like image classification and your pipeline pre/post process dispatchers, I'm still not sure I follow why we need to create create_hf_handler.sh, why an extra config.properties is included and why the changes couldn't be made to the existing handler. I do agree with the goal that if we use HF pipelines that it'll help us more easily support new models and take advantage of optimium optimizations but until it actually is implemented the devil is in the details.

If you don't have time to answer these questions now I understand, keep your PR open and we can revisit when someone on the team has more time to contribute code.

If you'd like to create a separate PR to link to your repo in HuggingFace_Transformers we can merge that now as well and describe it as an example to integrate a HuggingFace pipeline into a TorchServe handler but can caveat that it has a limited feature set



def load_model(self, device_id, model_name, hf_models_folder = "/home/model-server/HF-models"):
print('Entered `load_model` function')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace print statements by logger.info()



'''
`preds` is something like this for MobileViT XX Small pipeline predictions for 1 inference request:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this in docs instead

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any place in Docs where a sample output from a specific Huggingface pipeline could be written.

Plus I seem to have missed removing this earlier, should we remove it instead?

exit(0)

class HFPipelineHandler(BaseHandler):
class HFPipelinePreprocessFactory:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a factory? It feels more like a Dispatcher or Strategy to me

Copy link
Author

@tripathiarpan20 tripathiarpan20 Sep 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, let's change every occurance of 'factory' to 'dispatcher' instead.

pip install git-lfs
```

## Registering a 🤗 model from the [hub](https://huggingface.co/models)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't make the assumption that everyone will be using Docker. If you only have time to polish one experience: docker vs no docker I'd suggest we go for no docker

;;
-f|--framework)
FRAMEWORK="$2"
if ! (echo "${SUPPORTED_FRAMEWORKS[@]}" | fgrep -wq "$FRAMEWORK") ; then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we haven't tested a tf model with this workflow it's best to remove the argument and reenable it later


logger.info("Creating pipeline")
pipe = pipeline(task=task, framework=framework, model=model_folder, device = device_id)
logger.info("Successfully loaded DistilBERT model from HF hub")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should log the actually loaded model

@@ -0,0 +1,47 @@
#!/bin/bash
HF_REPO_URL=""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to make sure to whole example is tested and actually runs in serve/test/pytest



#Reference: https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.pipeline.task
pipeline_supported_tasks= [
Copy link
Member

@msaroufim msaroufim Sep 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this PR includes

  1. A pipeline handler that works for sentiment analysis and image classification
  2. Helper scripts that create a mar file
  3. Some documentation to run the above with Docker

I like that we can do something new like image classification and your pipeline pre/post process dispatchers, I'm still not sure I follow why we need to create create_hf_handler.sh, why an extra config.properties is included and why the changes couldn't be made to the existing handler. I do agree with the goal that if we use HF pipelines that it'll help us more easily support new models and take advantage of optimium optimizations but until it actually is implemented the devil is in the details.

If you don't have time to answer these questions now I understand, keep your PR open and we can revisit when someone on the team has more time to contribute code.

If you'd like to create a separate PR to link to your repo in HuggingFace_Transformers we can merge that now as well and describe it as an example to integrate a HuggingFace pipeline into a TorchServe handler but can caveat that it has a limited feature set

git clone $HF_REPO_URL $WORKDIR/HF-models/$MODEL_NAME/
cd $WORKDIR/HF-models/$MODEL_NAME/ && git lfs install && git lfs pull && cd ../..
touch dummy_file.pth
echo "transformers==4.21.2" > transformers_req.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assume users will supply their own requirements.txt you can provide a default one but I wouldn't just create it here

@msaroufim msaroufim marked this pull request as draft September 26, 2022 03:04
@msaroufim
Copy link
Member

haven't heard back from op so might be best to close this PR since it seems better suited as a tutorial

@msaroufim msaroufim closed this Jul 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants