From 2e414f78af938bfb9dacb4bd1469c34e3fef742a Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Tue, 10 Dec 2024 11:28:15 -0800 Subject: [PATCH 01/13] Added a recipe for showcasing torch.export flow for 4 models --- recipes_source/recipes_index.rst | 7 + .../torch_export_challenges_solutions.rst | 319 ++++++++++++++++++ 2 files changed, 326 insertions(+) create mode 100644 recipes_source/torch_export_challenges_solutions.rst diff --git a/recipes_source/recipes_index.rst b/recipes_source/recipes_index.rst index 7d6a067b7f..1cb2daefdd 100644 --- a/recipes_source/recipes_index.rst +++ b/recipes_source/recipes_index.rst @@ -157,6 +157,13 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu :link: ../recipes/torch_export_aoti_python.html :tags: Basics +.. customcarditem:: + :header: Demonstration of torch.export flow, common challenges and the solutions to address them + :card_description: Learn how to export models for popular usecases + :image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png + :link: ../recipes/torch_export_challenges_solutions.html + :tags: Basics + .. Interpretability .. customcarditem:: diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst new file mode 100644 index 0000000000..6f86110ec5 --- /dev/null +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -0,0 +1,319 @@ +Demonstration of torch.export flow, common challenges and the solutions to address them +======================================================================================= +**Authors:** `Ankith Gunapal`, `Jordi Ramon`, `Marcos Carranza` + +In a previous `tutorial `__ , we learnt how to use `torch.export `__. +This tutorial builds on the previous tutorial and explores the process of exporting popular models with code & addresses common challenges one might face with `torch.export`. + +You will learn how to export models for these usecases + +* Video classifier (MViT) +* Pose Estimation (Yolov11 Pose) +* Image Captioning (BLIP) +* Promptable Image Segmentation (SAM2) + +Each of the four models were chosen to demonstrate unique features of `torch.export`, some practical considerations +& issues faced in the implementation. + +Prerequisites +------------- + +* PyTorch 2.4 or later +* Basic understanding of ``torch.export`` and PyTorch Eager inference. + + +Key requirement for `torch.export`: No graph break +------------------------------------------------ + +`torch.compile `__ speeds up PyTorch code by JIT compiling PyTorch code into optimized kernels. It optimizes the given model +using TorchDynamo and creates an optimized graph , which is then lowered into the hardware using the backend specified in the API. +When TorchDynamo encounters unsupported Python features, it breaks the computation graph, lets the default Python interpreter +handle the unsupported code, then resumes capturing the graph. This break in the computation graph is called a `graph break `__. + +One of the key differences between `torch.export` and `torch.compile` is that `torch.export` doesn’t support graph breaks +i.e the entire model or part of the model that you are exporting needs to be a single graph. This is because handling graph breaks +involves interpreting the unsupported operation with default Python evaluation, which is incompatible with what torch.export is +designed for. + +You can identify graph breaks in your program by using the following + +.. code:: console + + TORCH_LOGS="graph_breaks" python .py + +You will need to modify your program to get rid of graph breaks. Once resolved, you are ready to export the model. +PyTorch runs `nightly benchmarks `__ for `torch.compile` on popular HuggingFace and TIMM models. +Most of these models have no graph break. + +The models in this recipe have no graph break, but fail with `torch.export` + +Video Classification +-------------------- + +MViT is a class of models based on `MultiScale Vision Transformers `__. This has been trained for video classification using the `Kinetics-400 Dataset `__. +This model with a relevant dataset can be used for action recognition in the context of gaming. + + +The code below exports MViT by tracing with `batch_size=2` and then checks if the ExportedProgram can run with `batch_size=4` + +.. code:: python + + import numpy as np + import torch + from torchvision.models.video import MViT_V1_B_Weights, mvit_v1_b + import traceback as tb + + model = mvit_v1_b(weights=MViT_V1_B_Weights.DEFAULT) + + # Create a batch of 2 videos, each with 16 frames of shape 224x224x3. + input_frames = torch.randn(2,16, 224, 224, 3) + # Transpose to get [1, 3, num_clips, height, width]. + input_frames = np.transpose(input_frames, (0, 4, 1, 2, 3)) + + # Export the model. + exported_program = torch.export.export( + model, + (input_frames,), + ) + + # Create a batch of 4 videos, each with 16 frames of shape 224x224x3. + input_frames = torch.randn(4,16, 224, 224, 3) + input_frames = np.transpose(input_frames, (0, 4, 1, 2, 3)) + try: + exported_program.module()(input_frames) + except Exception: + tb.print_exc() + + +Error: Static batch size +~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: console + + raise RuntimeError( + RuntimeError: Expected input at *args[0].shape[0] to be equal to 2, but got 4 + + +By default, the exporting flow will trace the program assuming that all input shapes are static, so if you run the program with +inputs shapes that are different than the ones you used while tracing, you will run into an error. + +Solution +~~~~~~~~ + +To address the error, we specify the first dimension of the input (`batch_size`) to be dynamic , specifying the expected range of `batch_size`. +In the corrected example shown below, we specify that the expected `batch_size` can range from 1 to 16. +One detail to notice that `min=2` is not a bug and is explained in `The 0/1 Specialization Problem `__. A detailed description of dynamic shapes +for torch.export can be found in the export tutorial. The code shown below demonstrates how to export mViT with dynamic batch sizes. + +.. code:: python + + import numpy as np + import torch + from torchvision.models.video import MViT_V1_B_Weights, mvit_v1_b + import traceback as tb + + + model = mvit_v1_b(weights=MViT_V1_B_Weights.DEFAULT) + + # Create a batch of 2 videos, each with 16 frames of shape 224x224x3. + input_frames = torch.randn(2,16, 224, 224, 3) + + # Transpose to get [1, 3, num_clips, height, width]. + input_frames = np.transpose(input_frames, (0, 4, 1, 2, 3)) + + # Export the model. + batch_dim = torch.export.Dim("batch", min=2, max=16) + exported_program = torch.export.export( + model, + (input_frames,), + # Specify the first dimension of the input x as dynamic + dynamic_shapes={"x": {0: batch_dim}}, + ) + + # Create a batch of 4 videos, each with 16 frames of shape 224x224x3. + input_frames = torch.randn(4,16, 224, 224, 3) + input_frames = np.transpose(input_frames, (0, 4, 1, 2, 3)) + try: + exported_program.module()(input_frames) + except Exception: + tb.print_exc() + + + + + +Pose Estimation +--------------- + +Pose Estimation is a popular Computer Vision concept that can be used to identify the location of joints of a human in a 2D image. +`Ultralytics `__ has published a Pose Estimation model based on `YOLO11 `__. This has been trained on the `COCO Dataset `__. This model can be used +for analyzing human pose for determining action or intent. The code below tries to export the YOLO11 Pose model with `batch_size=1` + + +.. code:: python + + from ultralytics import YOLO + import torch + from torch.export import export + + pose_model = YOLO("yolo11n-pose.pt") # Load model + pose_model.model.eval() + + inputs = torch.rand((1,3,640,640)) + exported_program: torch.export.ExportedProgram= export(pose_model.model, args=(inputs,)) + + +Error: strict tracing with TorchDynamo +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code:: console + + torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {zuf0} not in returned outputs FakeTensor(..., size=(6400, 1)) ((1, 1), 0). + + +By default `torch.export` traces your code using `TorchDynamo `__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph. +This analysis provides a stronger guarantee about safety but not all python code is supported. When we export the `yolo11n-pose` model using the +default strict mode, it errors. + +Solution +~~~~~~~~ + +To address the above error `torch.export` supports non_strict mode where the program is traced using the python interpreter, which works similar to +PyTorch eager execution, the only difference is that all Tensor objects will be replaced by ProxyTensors, which will record all their operations into +a graph. By using `strict=False`, we are able to export the program. + +.. code:: python + + from ultralytics import YOLO + import torch + from torch.export import export + + pose_model = YOLO("yolo11n-pose.pt") # Load model + pose_model.model.eval() + + inputs = torch.rand((1,3,640,640)) + exported_program: torch.export.ExportedProgram= export(pose_model.model, args=(inputs,), strict=False) + + + +Image Captioning +---------------- + +Image Captioning is the task of defining the contents of an image in words. In the context of gaming, Image Captioning can be used to enhance the +gameplay experience by dynamically generating text description of the various game objects in the scene, thereby providing the gamer with additional +details. `BLIP `__ is a popular model for Image Captioning `released by SalesForce Research `__. The code below tries to export BLIP with `batch_size=1` + + +.. code:: python + + import torch + from models.blip import blip_decoder + + device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') + image_size = 384 + image = torch.randn(1, 3,384,384).to(device) + caption_input = "" + + model_url = 'https://storage.googleapis.com/sfr-vision-language-research/BLIP/models/model_base_capfilt_large.pth' + model = blip_decoder(pretrained=model_url, image_size=image_size, vit='base') + model.eval() + model = model.to(device) + + exported_program: torch.export.ExportedProgram= torch.export.export(model, args=(image,caption_input,), strict=False) + + + +Error: Unsupported python operations +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +While exporting a model, it might fail because the model implementation might contain certain python operations which are not yet supported by `torch.export`. +Some of these failures may have a workaround. BLIP is an example where the original model errors and making a small change in the code resolves the issue. +`torch.export` lists the common cases of supported and unsupported operations in `ExportDB `__ and shows how you can modify your code to make it export compatible. + +.. code:: console + + File "/BLIP/models/blip.py", line 112, in forward + text.input_ids[:,0] = self.tokenizer.bos_token_id + File "/anaconda3/envs/export/lib/python3.10/site-packages/torch/_subclasses/functional_tensor.py", line 545, in __torch_dispatch__ + outs_unwrapped = func._op_dk( + RuntimeError: cannot mutate tensors with frozen storage + + + +Solution +~~~~~~~~ + +Clone the `tensor `__ where export fails. + +.. code:: python + + text.input_ids = text.input_ids.clone() # clone the tensor + text.input_ids[:,0] = self.tokenizer.bos_token_id + + + +Promptable Image Segmentation +----------------------------- + +Image segmentation is a computer vision technique that divides a digital image into distinct groups of pixels, or segments, based on their characteristics. +Segment Anything Model(`SAM `__) introduced promptable image segmentation, which predicts object masks given prompts that indicate the desired object. `SAM 2 `__ is +the first unified model for segmenting objects across images and videos. The `SAM2ImagePredictor `__ class provides an easy interface to the model for prompting +the model. The model can take as input both point and box prompts, as well as masks from the previous iteration of prediction. Since SAM2 provides strong +zero-shot performance for object tracking, it can be used for tracking game objects in a scene. The code below tries to export SAM2ImagePredictor with batch_size=1 + + +The tensor operations in the predict method of `SAM2ImagePredictor `__ are happening in the `_predict `__ method. So, we try to export this. + +.. code:: python + + ep = torch.export.export( + self._predict, + args=(unnorm_coords, labels, unnorm_box, mask_input, multimask_output), + kwargs={"return_logits": return_logits}, + strict=False, + ) + + +Error: Model is not of type `torch.nn.Module` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +`torch.export` expects the module to be of type `torch.nn.Module`. However, the module we are trying to export is a class method. Hence it errors. + +.. code:: console + + Traceback (most recent call last): + File "/sam2/image_predict.py", line 20, in + masks, scores, _ = predictor.predict( + File "/sam2/sam2/sam2_image_predictor.py", line 312, in predict + ep = torch.export.export( + File "python3.10/site-packages/torch/export/__init__.py", line 359, in export + raise ValueError( + ValueError: Expected `mod` to be an instance of `torch.nn.Module`, got . + + +Solution +~~~~~~~~ + +We write a helper class, which inherits from `torch.nn.Module` and call the `_predict method` in the `forward` method of the class. The complete code can be found `here `__. + +.. code:: python + + class ExportHelper(torch.nn.Module): + def __init__(self): + super().__init__() + + def forward(_, *args, **kwargs): + return self._predict(*args, **kwargs) + + model_to_export = ExportHelper() + ep = torch.export.export( + model_to_export, + args=(unnorm_coords, labels, unnorm_box, mask_input, multimask_output), + kwargs={"return_logits": return_logits}, + strict=False, + ) + +Conclusion +---------- + +In this tutorial, we have learned how to use `torch.export` to export models for popular use cases by addressing challenges through correct configuration & simple code modifications. From 1e4d8c6e2c3051b7e929813380891c83a23479c6 Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Tue, 10 Dec 2024 15:26:12 -0800 Subject: [PATCH 02/13] Apply suggestions from code review Co-authored-by: Svetlana Karslioglu --- .../torch_export_challenges_solutions.rst | 78 +++++++++---------- 1 file changed, 39 insertions(+), 39 deletions(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index 6f86110ec5..a808a10ab0 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -3,17 +3,17 @@ Demonstration of torch.export flow, common challenges and the solutions to addre **Authors:** `Ankith Gunapal`, `Jordi Ramon`, `Marcos Carranza` In a previous `tutorial `__ , we learnt how to use `torch.export `__. -This tutorial builds on the previous tutorial and explores the process of exporting popular models with code & addresses common challenges one might face with `torch.export`. +This tutorial expands on the previous one and explores the process of exporting popular models with code, as well as addresses common challenges that may arise with ``torch.export``. -You will learn how to export models for these usecases +In this tutorial, you will learn how to export models for these use cases: * Video classifier (MViT) * Pose Estimation (Yolov11 Pose) * Image Captioning (BLIP) * Promptable Image Segmentation (SAM2) -Each of the four models were chosen to demonstrate unique features of `torch.export`, some practical considerations -& issues faced in the implementation. +Each of the four models were chosen to demonstrate unique features of `torch.export`, as well as some practical considerations +and issues faced in the implementation. Prerequisites ------------- @@ -22,20 +22,20 @@ Prerequisites * Basic understanding of ``torch.export`` and PyTorch Eager inference. -Key requirement for `torch.export`: No graph break ------------------------------------------------- +Key requirement for ``torch.export``: No graph break +---------------------------------------------------- `torch.compile `__ speeds up PyTorch code by JIT compiling PyTorch code into optimized kernels. It optimizes the given model -using TorchDynamo and creates an optimized graph , which is then lowered into the hardware using the backend specified in the API. +using ``TorchDynamo`` and creates an optimized graph , which is then lowered into the hardware using the backend specified in the API. When TorchDynamo encounters unsupported Python features, it breaks the computation graph, lets the default Python interpreter -handle the unsupported code, then resumes capturing the graph. This break in the computation graph is called a `graph break `__. +handle the unsupported code, and then resumes capturing the graph. This break in the computation graph is called a `graph break `__. -One of the key differences between `torch.export` and `torch.compile` is that `torch.export` doesn’t support graph breaks -i.e the entire model or part of the model that you are exporting needs to be a single graph. This is because handling graph breaks -involves interpreting the unsupported operation with default Python evaluation, which is incompatible with what torch.export is +One of the key differences between ``torch.export`` and ``torch.compile`` is that ``torch.export`` doesn’t support graph breaks +which means that the entire model or part of the model that you are exporting needs to be a single graph. This is because handling graph breaks +involves interpreting the unsupported operation with default Python evaluation, which is incompatible with what ``torch.export`` is designed for. -You can identify graph breaks in your program by using the following +You can identify graph breaks in your program by using the following command: .. code:: console @@ -50,11 +50,11 @@ The models in this recipe have no graph break, but fail with `torch.export` Video Classification -------------------- -MViT is a class of models based on `MultiScale Vision Transformers `__. This has been trained for video classification using the `Kinetics-400 Dataset `__. +MViT is a class of models based on `MultiScale Vision Transformers `__. This model has been trained for video classification using the `Kinetics-400 Dataset `__. This model with a relevant dataset can be used for action recognition in the context of gaming. -The code below exports MViT by tracing with `batch_size=2` and then checks if the ExportedProgram can run with `batch_size=4` +The code below exports MViT by tracing with ``batch_size=2`` and then checks if the ExportedProgram can run with ``batch_size=4``. .. code:: python @@ -95,15 +95,15 @@ Error: Static batch size By default, the exporting flow will trace the program assuming that all input shapes are static, so if you run the program with -inputs shapes that are different than the ones you used while tracing, you will run into an error. +input shapes that are different than the ones you used while tracing, you will run into an error. Solution ~~~~~~~~ -To address the error, we specify the first dimension of the input (`batch_size`) to be dynamic , specifying the expected range of `batch_size`. -In the corrected example shown below, we specify that the expected `batch_size` can range from 1 to 16. -One detail to notice that `min=2` is not a bug and is explained in `The 0/1 Specialization Problem `__. A detailed description of dynamic shapes -for torch.export can be found in the export tutorial. The code shown below demonstrates how to export mViT with dynamic batch sizes. +To address the error, we specify the first dimension of the input (``batch_size``) to be dynamic , specifying the expected range of ``batch_size``. +In the corrected example shown below, we specify that the expected ``batch_size`` can range from 1 to 16. +One detail to notice that ``min=2`` is not a bug and is explained in `The 0/1 Specialization Problem `__. A detailed description of dynamic shapes +for ``torch.export`` can be found in the export tutorial. The code shown below demonstrates how to export mViT with dynamic batch sizes: .. code:: python @@ -145,7 +145,7 @@ for torch.export can be found in the export tutorial. The code shown below demon Pose Estimation --------------- -Pose Estimation is a popular Computer Vision concept that can be used to identify the location of joints of a human in a 2D image. +**Pose Estimation** is a Computer Vision concept that can be used to identify the location of joints of a human in a 2D image. `Ultralytics `__ has published a Pose Estimation model based on `YOLO11 `__. This has been trained on the `COCO Dataset `__. This model can be used for analyzing human pose for determining action or intent. The code below tries to export the YOLO11 Pose model with `batch_size=1` @@ -171,16 +171,16 @@ Error: strict tracing with TorchDynamo torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {zuf0} not in returned outputs FakeTensor(..., size=(6400, 1)) ((1, 1), 0). -By default `torch.export` traces your code using `TorchDynamo `__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph. -This analysis provides a stronger guarantee about safety but not all python code is supported. When we export the `yolo11n-pose` model using the +By default ``torch.export`` traces your code using `TorchDynamo `__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph. +This analysis provides a stronger guarantee about safety but not all Python code is supported. When we export the ``yolo11n-pose`` model using the default strict mode, it errors. Solution ~~~~~~~~ -To address the above error `torch.export` supports non_strict mode where the program is traced using the python interpreter, which works similar to -PyTorch eager execution, the only difference is that all Tensor objects will be replaced by ProxyTensors, which will record all their operations into -a graph. By using `strict=False`, we are able to export the program. +To address the above error ,``torch.export`` supports the``non_strict`` mode where the program is traced using the Python interpreter, which works similar to +PyTorch eager execution. The only difference is that all ``Tensor`` objects will be replaced by ``ProxyTensors``, which will record all their operations into +a graph. By using ``strict=False``, we are able to export the program. .. code:: python @@ -199,9 +199,9 @@ a graph. By using `strict=False`, we are able to export the program. Image Captioning ---------------- -Image Captioning is the task of defining the contents of an image in words. In the context of gaming, Image Captioning can be used to enhance the +**Image Captioning** is the task of defining the contents of an image in words. In the context of gaming, Image Captioning can be used to enhance the gameplay experience by dynamically generating text description of the various game objects in the scene, thereby providing the gamer with additional -details. `BLIP `__ is a popular model for Image Captioning `released by SalesForce Research `__. The code below tries to export BLIP with `batch_size=1` +details. `BLIP `__ is a popular model for Image Captioning `released by SalesForce Research `__. The code below tries to export BLIP with ``batch_size=1`` .. code:: python @@ -223,12 +223,12 @@ details. `BLIP `__ is a popular model for Imag -Error: Unsupported python operations +Error: Unsupported Python Operations ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -While exporting a model, it might fail because the model implementation might contain certain python operations which are not yet supported by `torch.export`. -Some of these failures may have a workaround. BLIP is an example where the original model errors and making a small change in the code resolves the issue. -`torch.export` lists the common cases of supported and unsupported operations in `ExportDB `__ and shows how you can modify your code to make it export compatible. +While exporting a model, it might fail because the model implementation might contain certain Python operations which are not yet supported by ``torch.export``. +Some of these failures may have a workaround. BLIP is an example where the original model errors, which can be resolved by making a small change in the code. +``torch.export`` lists the common cases of supported and unsupported operations in `ExportDB `__ and shows how you can modify your code to make it export compatible. .. code:: console @@ -255,14 +255,14 @@ Clone the `tensor `__) introduced promptable image segmentation, which predicts object masks given prompts that indicate the desired object. `SAM 2 `__ is +**Image segmentation** is a computer vision technique that divides a digital image into distinct groups of pixels, or segments, based on their characteristics. +`Segment Anything Model (SAM) `__) introduced promptable image segmentation, which predicts object masks given prompts that indicate the desired object. `SAM 2 `__ is the first unified model for segmenting objects across images and videos. The `SAM2ImagePredictor `__ class provides an easy interface to the model for prompting the model. The model can take as input both point and box prompts, as well as masks from the previous iteration of prediction. Since SAM2 provides strong zero-shot performance for object tracking, it can be used for tracking game objects in a scene. The code below tries to export SAM2ImagePredictor with batch_size=1 -The tensor operations in the predict method of `SAM2ImagePredictor `__ are happening in the `_predict `__ method. So, we try to export this. +The tensor operations in the predict method of `SAM2ImagePredictor `__ are happening in the `_predict `__ method. So, we try to export like this. .. code:: python @@ -274,10 +274,10 @@ The tensor operations in the predict method of `SAM2ImagePredictor `__. +We write a helper class, which inherits from ``torch.nn.Module`` and call the ``_predict method`` in the ``forward`` method of the class. The complete code can be found `here `__. .. code:: python @@ -316,4 +316,4 @@ We write a helper class, which inherits from `torch.nn.Module` and call the `_pr Conclusion ---------- -In this tutorial, we have learned how to use `torch.export` to export models for popular use cases by addressing challenges through correct configuration & simple code modifications. +In this tutorial, we have learned how to use ``torch.export`` to export models for popular use cases by addressing challenges through correct configuration and simple code modifications. From 7039a64925a8dc78f5c36617c0af1267aef454a3 Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Tue, 10 Dec 2024 17:28:49 -0800 Subject: [PATCH 03/13] Addressed review comments --- .../torch_export_challenges_solutions.rst | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index a808a10ab0..995a951e36 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -1,16 +1,16 @@ Demonstration of torch.export flow, common challenges and the solutions to address them ======================================================================================= -**Authors:** `Ankith Gunapal`, `Jordi Ramon`, `Marcos Carranza` +**Authors:** `Ankith Gunapal `__, `Jordi Ramon `__, `Marcos Carranza `__ -In a previous `tutorial `__ , we learnt how to use `torch.export `__. +In the `Introduction to torch.export Tutorial `__ , we learned how to use `torch.export `__. This tutorial expands on the previous one and explores the process of exporting popular models with code, as well as addresses common challenges that may arise with ``torch.export``. In this tutorial, you will learn how to export models for these use cases: -* Video classifier (MViT) -* Pose Estimation (Yolov11 Pose) -* Image Captioning (BLIP) -* Promptable Image Segmentation (SAM2) +* Video classifier (`MViT `__) +* Pose Estimation (`Yolov11 Pose `__) +* Image Captioning (`BLIP `__) +* Promptable Image Segmentation (`SAM2 `__) Each of the four models were chosen to demonstrate unique features of `torch.export`, as well as some practical considerations and issues faced in the implementation. @@ -178,7 +178,7 @@ default strict mode, it errors. Solution ~~~~~~~~ -To address the above error ,``torch.export`` supports the``non_strict`` mode where the program is traced using the Python interpreter, which works similar to +To address the above error , ``torch.export`` supports the ``non_strict`` mode where the program is traced using the Python interpreter, which works similar to PyTorch eager execution. The only difference is that all ``Tensor`` objects will be replaced by ``ProxyTensors``, which will record all their operations into a graph. By using ``strict=False``, we are able to export the program. @@ -259,7 +259,7 @@ Promptable Image Segmentation `Segment Anything Model (SAM) `__) introduced promptable image segmentation, which predicts object masks given prompts that indicate the desired object. `SAM 2 `__ is the first unified model for segmenting objects across images and videos. The `SAM2ImagePredictor `__ class provides an easy interface to the model for prompting the model. The model can take as input both point and box prompts, as well as masks from the previous iteration of prediction. Since SAM2 provides strong -zero-shot performance for object tracking, it can be used for tracking game objects in a scene. The code below tries to export SAM2ImagePredictor with batch_size=1 +zero-shot performance for object tracking, it can be used for tracking game objects in a scene. The tensor operations in the predict method of `SAM2ImagePredictor `__ are happening in the `_predict `__ method. So, we try to export like this. @@ -317,3 +317,6 @@ Conclusion ---------- In this tutorial, we have learned how to use ``torch.export`` to export models for popular use cases by addressing challenges through correct configuration and simple code modifications. +Once you are able to export a model, you can lower the ``ExportedProgram`` into your hardware using `AOTInductor `__ in case of servers and `ExecuTorch `__ in case of edge device. +To learn more about ``AOTInductor``(AOTI), please refer to the `AOTI tutorial ` +To learn more about ``ExecuTorch``, please refer to the `ExecuTorch tutorial `__ From 14664a8c0c9f49ff374769774cf51311f4219697 Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Tue, 10 Dec 2024 17:29:51 -0800 Subject: [PATCH 04/13] Apply suggestions from code review Co-authored-by: Svetlana Karslioglu --- recipes_source/torch_export_challenges_solutions.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index 995a951e36..4c9d6a5c56 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -25,7 +25,7 @@ Prerequisites Key requirement for ``torch.export``: No graph break ---------------------------------------------------- -`torch.compile `__ speeds up PyTorch code by JIT compiling PyTorch code into optimized kernels. It optimizes the given model +`torch.compile `__ speeds up PyTorch code by using JIT to compile PyTorch code into optimized kernels. It optimizes the given model using ``TorchDynamo`` and creates an optimized graph , which is then lowered into the hardware using the backend specified in the API. When TorchDynamo encounters unsupported Python features, it breaks the computation graph, lets the default Python interpreter handle the unsupported code, and then resumes capturing the graph. This break in the computation graph is called a `graph break `__. @@ -173,7 +173,7 @@ Error: strict tracing with TorchDynamo By default ``torch.export`` traces your code using `TorchDynamo `__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph. This analysis provides a stronger guarantee about safety but not all Python code is supported. When we export the ``yolo11n-pose`` model using the -default strict mode, it errors. +default strict mode, it typically returns an error. Solution ~~~~~~~~ From f894ff6d167327743be3e476e5e29cb67e3650f2 Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Tue, 10 Dec 2024 18:51:23 -0800 Subject: [PATCH 05/13] Addressed review comments --- recipes_source/torch_export_challenges_solutions.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index 4c9d6a5c56..ef9d2bdd08 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -318,5 +318,5 @@ Conclusion In this tutorial, we have learned how to use ``torch.export`` to export models for popular use cases by addressing challenges through correct configuration and simple code modifications. Once you are able to export a model, you can lower the ``ExportedProgram`` into your hardware using `AOTInductor `__ in case of servers and `ExecuTorch `__ in case of edge device. -To learn more about ``AOTInductor``(AOTI), please refer to the `AOTI tutorial ` -To learn more about ``ExecuTorch``, please refer to the `ExecuTorch tutorial `__ +To learn more about ``AOTInductor`` (AOTI), please refer to the `AOTI tutorial `__ +To learn more about ``ExecuTorch`` , please refer to the `ExecuTorch tutorial `__ From 9f14f824ebe8ffdffe901db52879a46a6e819468 Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Fri, 13 Dec 2024 14:34:41 -0800 Subject: [PATCH 06/13] Added Page TOC and updated tags --- recipes_source/recipes_index.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/recipes_source/recipes_index.rst b/recipes_source/recipes_index.rst index 1cb2daefdd..b841d9ee75 100644 --- a/recipes_source/recipes_index.rst +++ b/recipes_source/recipes_index.rst @@ -162,7 +162,7 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu :card_description: Learn how to export models for popular usecases :image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png :link: ../recipes/torch_export_challenges_solutions.html - :tags: Basics + :tags: Compiler,TorchCompile .. Interpretability @@ -479,3 +479,4 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu /recipes/distributed_optim_torchscript /recipes/mobile_interpreter /recipes/distributed_comm_debug_mode + /recipes/torch_export_challenges_solutions From 8d76548820213d8f3459d5216de2ced551b1648f Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Wed, 15 Jan 2025 17:01:30 -0800 Subject: [PATCH 07/13] Replaced Pose Estimation model with ASR model, addressed review comments --- .../torch_export_challenges_solutions.rst | 47 ++++++++++++------- 1 file changed, 31 insertions(+), 16 deletions(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index ef9d2bdd08..857df3ca96 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -8,7 +8,7 @@ This tutorial expands on the previous one and explores the process of exporting In this tutorial, you will learn how to export models for these use cases: * Video classifier (`MViT `__) -* Pose Estimation (`Yolov11 Pose `__) +* Automatic Speech Recognition (`OpenAI Whisper-Tiny `__) * Image Captioning (`BLIP `__) * Promptable Image Segmentation (`SAM2 `__) @@ -33,7 +33,7 @@ handle the unsupported code, and then resumes capturing the graph. This break in One of the key differences between ``torch.export`` and ``torch.compile`` is that ``torch.export`` doesn’t support graph breaks which means that the entire model or part of the model that you are exporting needs to be a single graph. This is because handling graph breaks involves interpreting the unsupported operation with default Python evaluation, which is incompatible with what ``torch.export`` is -designed for. +designed for. You can read details about the differences between the various PyTorch frameworks in this `link `__ You can identify graph breaks in your program by using the following command: @@ -152,15 +152,22 @@ for analyzing human pose for determining action or intent. The code below tries .. code:: python - from ultralytics import YOLO import torch - from torch.export import export + from transformers import WhisperProcessor, WhisperForConditionalGeneration + from datasets import load_dataset - pose_model = YOLO("yolo11n-pose.pt") # Load model - pose_model.model.eval() + # load model + model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny") + + # dummy inputs for exporting the model + input_features = torch.randn(1,80, 3000) + attention_mask = torch.ones(1, 3000) + decoder_input_ids = torch.tensor([[1, 1, 1 , 1]]) * model.config.decoder_start_token_id + + model.eval() + + exported_program: torch.export.ExportedProgram= torch.export.export(model, args=(input_features, attention_mask, decoder_input_ids,)) - inputs = torch.rand((1,3,640,640)) - exported_program: torch.export.ExportedProgram= export(pose_model.model, args=(inputs,)) Error: strict tracing with TorchDynamo @@ -168,7 +175,7 @@ Error: strict tracing with TorchDynamo .. code:: console - torch._dynamo.exc.InternalTorchDynamoError: PendingUnbackedSymbolNotFound: Pending unbacked symbols {zuf0} not in returned outputs FakeTensor(..., size=(6400, 1)) ((1, 1), 0). + torch._dynamo.exc.InternalTorchDynamoError: AttributeError: 'DynamicCache' object has no attribute 'key_cache' By default ``torch.export`` traces your code using `TorchDynamo `__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph. @@ -184,15 +191,21 @@ a graph. By using ``strict=False``, we are able to export the program. .. code:: python - from ultralytics import YOLO import torch - from torch.export import export + from transformers import WhisperProcessor, WhisperForConditionalGeneration + from datasets import load_dataset - pose_model = YOLO("yolo11n-pose.pt") # Load model - pose_model.model.eval() + # load model + model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-tiny") - inputs = torch.rand((1,3,640,640)) - exported_program: torch.export.ExportedProgram= export(pose_model.model, args=(inputs,), strict=False) + # dummy inputs for exporting the model + input_features = torch.randn(1,80, 3000) + attention_mask = torch.ones(1, 3000) + decoder_input_ids = torch.tensor([[1, 1, 1 , 1]]) * model.config.decoder_start_token_id + + model.eval() + + exported_program: torch.export.ExportedProgram= torch.export.export(model, args=(input_features, attention_mask, decoder_input_ids,), strict=False) @@ -223,7 +236,7 @@ details. `BLIP `__ is a popular model for Imag -Error: Unsupported Python Operations +Error: Cannot mutate tensors with frozen storage ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While exporting a model, it might fail because the model implementation might contain certain Python operations which are not yet supported by ``torch.export``. @@ -250,6 +263,8 @@ Clone the `tensor Date: Wed, 15 Jan 2025 17:07:56 -0800 Subject: [PATCH 08/13] Update recipes_source/torch_export_challenges_solutions.rst Co-authored-by: Angela Yi --- recipes_source/torch_export_challenges_solutions.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index 857df3ca96..94e94d6598 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -43,9 +43,9 @@ You can identify graph breaks in your program by using the following command: You will need to modify your program to get rid of graph breaks. Once resolved, you are ready to export the model. PyTorch runs `nightly benchmarks `__ for `torch.compile` on popular HuggingFace and TIMM models. -Most of these models have no graph break. +Most of these models have no graph breaks. -The models in this recipe have no graph break, but fail with `torch.export` +The models in this recipe have no graph breaks, but fail with `torch.export` Video Classification -------------------- From 10f51cf39b2d070c5a0a4fd8cdaeedde953cdaa2 Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Fri, 17 Jan 2025 10:24:48 -0800 Subject: [PATCH 09/13] Added description for ASR --- recipes_source/torch_export_challenges_solutions.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index 94e94d6598..ddb07ec9f5 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -142,12 +142,12 @@ for ``torch.export`` can be found in the export tutorial. The code shown below d -Pose Estimation +Automatic Speech Recognition --------------- -**Pose Estimation** is a Computer Vision concept that can be used to identify the location of joints of a human in a 2D image. -`Ultralytics `__ has published a Pose Estimation model based on `YOLO11 `__. This has been trained on the `COCO Dataset `__. This model can be used -for analyzing human pose for determining action or intent. The code below tries to export the YOLO11 Pose model with `batch_size=1` +**Automatic Speech Recognition**(ASR) is the use of machine learning to transcribe spoken language into text. +`Whisper `__ is a Transformer based encoder-decoder model from OpenAI, which was trained on 680k hours of labelled data for ASR and speech translation. +The code below tries to export ``whisper-tiny`` model for ASR. .. code:: python From 82eb2496c334e55e1d6931be93b587540366af18 Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Fri, 17 Jan 2025 10:55:28 -0800 Subject: [PATCH 10/13] Added description for ASR --- recipes_source/torch_export_challenges_solutions.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index ddb07ec9f5..a59719b7fa 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -145,7 +145,7 @@ for ``torch.export`` can be found in the export tutorial. The code shown below d Automatic Speech Recognition --------------- -**Automatic Speech Recognition**(ASR) is the use of machine learning to transcribe spoken language into text. +**Automatic Speech Recognition** (ASR) is the use of machine learning to transcribe spoken language into text. `Whisper `__ is a Transformer based encoder-decoder model from OpenAI, which was trained on 680k hours of labelled data for ASR and speech translation. The code below tries to export ``whisper-tiny`` model for ASR. From ba06ef001a57c8009d51be12b828c5771aff48f1 Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Fri, 17 Jan 2025 12:05:37 -0800 Subject: [PATCH 11/13] Update recipes_source/torch_export_challenges_solutions.rst Co-authored-by: Angela Yi --- recipes_source/torch_export_challenges_solutions.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index a59719b7fa..72d8200c4e 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -179,8 +179,8 @@ Error: strict tracing with TorchDynamo By default ``torch.export`` traces your code using `TorchDynamo `__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph. -This analysis provides a stronger guarantee about safety but not all Python code is supported. When we export the ``yolo11n-pose`` model using the -default strict mode, it typically returns an error. +This analysis provides a stronger guarantee about safety but not all Python code is supported. When we export the ``whisper-tiny`` model using the +default strict mode, it typically returns an error in dynamo due to an unsupported feature. Solution ~~~~~~~~ From 8655733edca387566dfe264a34528c387c91b600 Mon Sep 17 00:00:00 2001 From: Ankith Gunapal Date: Fri, 17 Jan 2025 12:11:30 -0800 Subject: [PATCH 12/13] linking github issue --- recipes_source/torch_export_challenges_solutions.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index 72d8200c4e..cb2a2cb579 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -180,7 +180,7 @@ Error: strict tracing with TorchDynamo By default ``torch.export`` traces your code using `TorchDynamo `__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph. This analysis provides a stronger guarantee about safety but not all Python code is supported. When we export the ``whisper-tiny`` model using the -default strict mode, it typically returns an error in dynamo due to an unsupported feature. +default strict mode, it typically returns an error in Dynamo due to an unsupported feature. To understand why this errors in Dynamo, you can refer to this `GitHub issue `__ Solution ~~~~~~~~ From 71c468fdc2b8869180b16c55c9832daa1ae8a97f Mon Sep 17 00:00:00 2001 From: Svetlana Karslioglu Date: Fri, 17 Jan 2025 13:56:50 -0800 Subject: [PATCH 13/13] Formatting cleanup --- .../torch_export_challenges_solutions.rst | 24 +++++++------------ 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst index cb2a2cb579..1f8b1ae45a 100644 --- a/recipes_source/torch_export_challenges_solutions.rst +++ b/recipes_source/torch_export_challenges_solutions.rst @@ -37,7 +37,7 @@ designed for. You can read details about the differences between the various PyT You can identify graph breaks in your program by using the following command: -.. code:: console +.. code:: sh TORCH_LOGS="graph_breaks" python .py @@ -45,7 +45,7 @@ You will need to modify your program to get rid of graph breaks. Once resolved, PyTorch runs `nightly benchmarks `__ for `torch.compile` on popular HuggingFace and TIMM models. Most of these models have no graph breaks. -The models in this recipe have no graph breaks, but fail with `torch.export` +The models in this recipe have no graph breaks, but fail with `torch.export`. Video Classification -------------------- @@ -88,7 +88,7 @@ The code below exports MViT by tracing with ``batch_size=2`` and then checks if Error: Static batch size ~~~~~~~~~~~~~~~~~~~~~~~~ -.. code:: console +.. code-block:: sh raise RuntimeError( RuntimeError: Expected input at *args[0].shape[0] to be equal to 2, but got 4 @@ -139,9 +139,6 @@ for ``torch.export`` can be found in the export tutorial. The code shown below d tb.print_exc() - - - Automatic Speech Recognition --------------- @@ -180,7 +177,7 @@ Error: strict tracing with TorchDynamo By default ``torch.export`` traces your code using `TorchDynamo `__, a byte-code analysis engine, which symbolically analyzes your code and builds a graph. This analysis provides a stronger guarantee about safety but not all Python code is supported. When we export the ``whisper-tiny`` model using the -default strict mode, it typically returns an error in Dynamo due to an unsupported feature. To understand why this errors in Dynamo, you can refer to this `GitHub issue `__ +default strict mode, it typically returns an error in Dynamo due to an unsupported feature. To understand why this errors in Dynamo, you can refer to this `GitHub issue `__. Solution ~~~~~~~~ @@ -207,14 +204,12 @@ a graph. By using ``strict=False``, we are able to export the program. exported_program: torch.export.ExportedProgram= torch.export.export(model, args=(input_features, attention_mask, decoder_input_ids,), strict=False) - - Image Captioning ---------------- **Image Captioning** is the task of defining the contents of an image in words. In the context of gaming, Image Captioning can be used to enhance the gameplay experience by dynamically generating text description of the various game objects in the scene, thereby providing the gamer with additional -details. `BLIP `__ is a popular model for Image Captioning `released by SalesForce Research `__. The code below tries to export BLIP with ``batch_size=1`` +details. `BLIP `__ is a popular model for Image Captioning `released by SalesForce Research `__. The code below tries to export BLIP with ``batch_size=1``. .. code:: python @@ -263,9 +258,8 @@ Clone the `tensor `__ in case of servers and `ExecuTorch `__ in case of edge device. -To learn more about ``AOTInductor`` (AOTI), please refer to the `AOTI tutorial `__ -To learn more about ``ExecuTorch`` , please refer to the `ExecuTorch tutorial `__ +To learn more about ``AOTInductor`` (AOTI), please refer to the `AOTI tutorial `__. +To learn more about ``ExecuTorch`` , please refer to the `ExecuTorch tutorial `__.