You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Author**: `Chunyuan Wu <https://github.com/chunyuan-w>`_, `Bin Bao <https://github.com/desertfire>`__, `Jiong Gong <https://github.com/jgong5>`__
@@ -10,85 +10,120 @@ Prerequisites:
10
10
Introduction
11
11
------------
12
12
13
-
Python, as the primary interface of PyTorch, is easy to use and efficient for development and debugging.
14
-
The Inductor's default wrapper generates Python code to invoke generated kernels and external kernels.
15
-
However, in deployments requiring high performance, Python, as an interpreted language, runs relatively slower compared to compiled languages.
13
+
In ``torch.compile``, the default backend **TorchInductor** emits Python wrapper
14
+
code that manages memory allocation and kernel invocation. This design provides
15
+
flexibility and ease of debugging, but the interpreted nature of Python
16
+
introduces runtime overhead in performance-sensitive environments.
16
17
17
-
We implemented an Inductor C++ wrapper by leveraging the PyTorch C++ APIs
18
-
to generate pure C++ code that combines the generated and external kernels.
19
-
This allows for the execution of each captured Dynamo graph in pure C++,
20
-
thereby reducing the Python overhead within the graph.
18
+
To address this limitation, TorchInductor includes a specialized mode that
19
+
generates **C++ wrapper code** in place of the Python wrapper, enabling faster
20
+
execution with minimal Python involvement.
21
21
22
22
23
-
Enabling the API
23
+
Enabling the C++ wrapper mode
24
24
----------------
25
-
This feature is still in prototype stage. To activate this feature, add the following to your code:
25
+
To enable this C++ wrapper mode for TorchInductor, add the following config to your code:
26
26
27
27
.. code:: python
28
28
29
29
import torch._inductor.config as config
30
30
config.cpp_wrapper =True
31
31
32
-
This will speed up your models by reducing the Python overhead of the Inductor wrapper.
33
-
34
32
35
33
Example code
36
34
------------
37
35
38
-
We will use the below frontend code as an example:
36
+
We will use the following model code as an example:
39
37
40
38
.. code:: python
41
-
39
+
42
40
import torch
41
+
import torch._inductor.config as config
42
+
43
+
config.cpp_wrapper =True
44
+
45
+
deffn(x, y):
46
+
return (x + y).sum()
43
47
44
-
deffn(x):
45
-
return torch.tensor(list(range(2, 40, 2)), device=x.device) + x
In this tutorial, we introduced a new C++ wrapper in TorchInductor to speed up your models with just two lines of code changes.
155
-
We explained the motivation of this new feature and walked through the easy-to-use API to activate this experimental feature.
156
-
Furthermore, we demonstrated the Inductor-generated code using the default Python wrapper and the new C++ wrapper on both CPU and GPU
157
-
to visually showcase the difference between these two wrappers.
180
+
This tutorial introduced the **C++ wrapper** feature in TorchInductor, designed
181
+
to improve model performance with minimal code modification. We described the
182
+
motivation for this feature, detailed the experimental API used to enable it,
183
+
and compared the generated outputs of the default Python wrapper and the new
184
+
C++ wrapper on both CPU and GPU backends to illustrate their distinctions.
158
185
159
-
This feature is still in prototype stage. If you have any feature requests or run into any issues, please file a bug report at `GitHub issues <https://github.com/pytorch/pytorch/issues>`_.
0 commit comments