You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add 16A8W quantization configuration utility for ARM backend (#13175)
Summary:
Pull Request resolved: #13175
This diff implements a 16A8W (16-bit activations, 8-bit weights) quantization configuration utility for the ExecutorTorch ARM backend, following the feedback from D79746479.
## Key Changes
**1. New Quantization Configuration Function**
- Add `get_16a8w_quantization_config()` in `fbcode/executorch/backends/arm/quantizer/arm_quantizer.py`
- Provides 16-bit activations with HistogramObserver (better precision than 8A8W)
- Maintains 8-bit weights with MinMaxObserver/PerChannelMinMaxObserver (memory efficient)
- **Technically supported by TOSA through [EXT-INT16 extension/profile](https://www.mlplatform.org/tosa/tosa_spec.html#_conv2d)**
## Benefits
- **Better Precision**: 16-bit activations provide higher precision than 8-bit. Useful for carrying precision for recurring neural nets.
Differential Revision: D79763381
0 commit comments