Skip to content

Commit b4734c8

Browse files
martinlsmMartin Lindström
andauthored
Arm backend: Update docs to mention partial quantization (#16291)
VGF has now support for partial quantization, i.e., having the model run in mixed numerical precision. Update the markdown documentation to include and explain this feature. Signed-off-by: Martin Lindström <Martin.Lindstroem@arm.com> Co-authored-by: Martin Lindström <Martin.Lindstroem@arm.com>
1 parent 51d9c75 commit b4734c8

File tree

4 files changed

+36
-4
lines changed

4 files changed

+36
-4
lines changed

docs/source/backends/arm-ethos-u/arm-ethos-u-quantization.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ The Arm Ethos-U delegate supports the following quantization schemes:
1010

1111
- 8-bit symmetric weights with 8-bit asymmetric activations (via the PT2E quantization flow).
1212
- Limited support for 16-bit quantization with 16-bit activations and 8-bit weights (a.k.a 16x8 quantization). This is under development.
13+
- Partial quantization is *not* supported on the Ethos-U backend. The entire model must be quantized.
1314

1415
### Quantization API
1516

docs/source/backends/arm-vgf/arm-vgf-overview.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,8 @@ See [Partitioner API](arm-vgf-partitioner.md) for more information of the Partit
8484
The VGF quantizer supports [Post Training Quantization (PT2E)](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_ptq.html)
8585
and [Quantization-Aware Training (QAT)](https://docs.pytorch.org/ao/main/tutorials_source/pt2e_quant_qat.html).
8686

87+
Partial quantization is supported, allowing users to quantize only specific parts of the model while leaving others in floating-point.
88+
8789
For more information on quantization, see [Quantization](arm-vgf-quantization.md).
8890

8991
## Runtime Integration

docs/source/backends/arm-vgf/arm-vgf-quantization.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,25 @@ The quantization schemes supported by the VGF Backend are:
1313

1414
Weight-only quantization is not currently supported on the VGF backend.
1515

16+
### Partial Quantization
17+
18+
The VGF backend supports partial quantization, where only parts of the model
19+
are quantized while others remain in floating-point. This can be useful for
20+
models where certain layers are not well-suited for quantization or when a
21+
balance between performance and accuracy is desired.
22+
23+
For every node (op) in the graph, the quantizer looks at the *quantization
24+
configuration* set for that specific node. If the configuration is set to
25+
`None`, the node is left in floating-point; if it is provided (not `None`), the
26+
node is quantized according to that configuration.
27+
28+
With the [Quantization API](#quantization-api), users can specify the
29+
quantization configurations for specific layers or submodules of the model. The
30+
`set_global` method is first used to set a default quantization configuration
31+
(could be `None` as explained above) for all nodes in the model. Then,
32+
configurations for specific layers or submodules can override the global
33+
setting using the `set_module_name` or `set_module_type` methods.
34+
1635
### Quantization API
1736

1837
```python

docs/source/backends/arm-vgf/tutorials/vgf-getting-started.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -78,13 +78,17 @@ The example below shows how to quantize a model consisting of a single addition,
7878
```python
7979
import torch
8080

81-
class Add(torch.nn.Module):
81+
class AddSigmoid(torch.nn.Module):
82+
def __init__(self):
83+
super().__init__()
84+
self.sigmoid = torch.nn.Sigmoid()
85+
8286
def forward(self, x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
83-
return x + y
87+
return self.sigmoid(x + y)
8488

8589
example_inputs = (torch.ones(1,1,1,1),torch.ones(1,1,1,1))
8690

87-
model = Add()
91+
model = AddSigmoid()
8892
model = model.eval()
8993
exported_program = torch.export.export(model, example_inputs)
9094
graph_module = exported_program.graph_module
@@ -98,13 +102,19 @@ from executorch.backends.arm.vgf import VgfCompileSpec
98102
from torchao.quantization.pt2e.quantize_pt2e import convert_pt2e, prepare_pt2e
99103

100104
# Create a compilation spec describing the target for configuring the quantizer
101-
compile_spec = VgfCompileSpec("TOSA-1.0+INT")
105+
compile_spec = VgfCompileSpec()
102106

103107
# Create and configure quantizer to use a symmetric quantization config globally on all nodes
104108
quantizer = VgfQuantizer(compile_spec)
105109
operator_config = get_symmetric_quantization_config(is_per_channel=False)
110+
111+
# Set default quantization config for the layers in the models.
112+
# Can also be set to `None` to let layers run in FP as default.
106113
quantizer.set_global(operator_config)
107114

115+
# OPTIONAL: skip quantizing all sigmoid ops (only one for this model); let it run in FP
116+
quantizer.set_module_type(torch.nn.Sigmoid, None)
117+
108118
# Post training quantization
109119
quantized_graph_module = prepare_pt2e(graph_module, quantizer)
110120
quantized_graph_module(*example_inputs) # Calibrate the graph module with the example input

0 commit comments

Comments
 (0)