PyTorch Profiler — PyTorch Tutorials 2.2.1+cu121 documentation #690
Labels
Algorithms
Sorting, Learning or Classifying. All algorithms go here.
code-generation
code generation models and tools like copilot and aider
data-validation
Validating data structures and formats
Software2.0
Software development driven by AI and neural networks.
PyTorch Profiler — PyTorch Tutorials 2.2.1+cu121 documentation
DESCRIPTION:
PyTorch Profiler
This recipe explains how to use PyTorch profiler and measure the time and memory consumption of the model’s operators.
Introduction
PyTorch includes a simple profiler API that is useful when user needs to determine the most expensive operators in the model.
In this recipe, we will use a simple Resnet model to demonstrate how to use profiler to analyze model performance.
Setup
To install torch and torchvision use the following command:
Steps
In this recipe we will use torch, torchvision.models and profiler modules:
Let’s create an instance of a Resnet model and prepare an input for it:
PyTorch profiler is enabled through the context manager and accepts a number of parameters, some of the most useful are:
Note: when using CUDA, profiler also shows the runtime CUDA events occurring on the host.
Let’s see how we can use profiler to analyze the execution time:
Note that we can use record_function context manager to label arbitrary code ranges with user provided names (model_inference is used as a label in the example above).
Profiler allows one to check which operators were called during the execution of a code range wrapped with a profiler context manager. If multiple profiler ranges are active at the same time (e.g. in parallel PyTorch threads), each profiling context manager tracks only the operators of its corresponding range. Profiler also automatically profiles the asynchronous tasks launched with torch.jit._fork and (in case of a backward pass) the backward pass operators launched with backward() call.
Let’s print out the stats for the execution above:
The output will look like (omitting some columns):
Here we see that, as expected, most of the time is spent in convolution (and specifically in mkldnn_convolution for PyTorch compiled with MKL-DNN support). Note the difference between self cpu time and cpu time - operators can call other operators, self cpu time excludes time spent in children operator calls, while total cpu time includes it. You can choose to sort by the self cpu time by passing sort_by="self_cpu_time_total" into the table call.
To get a finer granularity of results and include operator input shapes, pass group_by_input_shape=True (note: this requires running the profiler with record_shapes=True):
URL: PyTorch Profiler Recipe
Suggested labels
The text was updated successfully, but these errors were encountered: